Driving performance and neurocognitive skills of long‐term users of benzodiazepine anxiolytics and hypnotics

Abstract Objective The aim of this study is to compare actual driving performance and skills related to driving of patients using benzodiazepine anxiolytics or hypnotics for at least 6 months to that of healthy controls. Methods Participants were 44 long‐term users of benzodiazepine and benzodiazepine‐related anxiolytics (n = 12) and hypnotics (n = 32) and 65 matched healthy controls. Performance was assessed using an on‐the‐road driving test measuring standard deviation of lateral position (SDLP, in cm) and a battery of neurocognitive tasks. Performance differences between groups were compared with a blood alcohol concentration of 0.5 mg/ml to determine clinical relevance. Results Compared with controls, SDLP was significantly increased in hypnotic users (+1.70 cm) but not in anxiolytic users (+1.48 cm). Anxiolytic and hypnotic users showed significant and clinically relevant impairment on neurocognitive task measuring executive functioning, vigilance, and reaction time. For patients using hypnotics for at least 3 years, no significant driving impairment was observed. Conclusion Impairing effects of benzodiazepine hypnotics on driving performance may mitigate over time following longer term use (i.e. 3 years or more) although neurocognitive impairments may remain.

Data from experimental studies on drug effects on driving and neurocognitive function have also been used to classify fitness to drive. Such classification systems (de Gier, Alvarez, Mercier-Guyon, & Verstraete, 2009;Ravera et al., 2012) express drug-induced impairment in BAC equivalents. Classifications that are commonly used to define drug effects on driving, in relationship to alcohol, are no/minor influence (Category 0/I, BAC < 0.5 mg/ml), moderate influence (Category II, 0.5 mg/ml ≤ BAC ≤ 0.8 mg/ml), and severe influence (Category III, BAC > 0.8 mg/ml). A limitation of existing drug categorization systems is their lack of information about the effect of longterm drug usage on driving performance. Current classifications are mainly based on acute effects of single doses or short-term treatment in healthy volunteers. Consequently, most benzodiazepines are put in Category III because their acute effects on performance are usually severe. Yet it is known that tolerance to benzodiazepine impairment might develop after repeated administration in healthy volunteers (Ghoneim, Mewaldt, Berie, & Hinrichs, 1981;Pomara et al., 1998) and patients (O'Hanlon, Vermeeren, Uiterwijk, Van Veggel, & Swijgman, 1995;van Laar, Volkerts, & van Willigenburg, 1992).
Nevertheless, driving performance may not completely normalize, as suggested by impairment found in a range of neuropsychological functions of long-term benzodiazepine users (Barker, Greenwood, Jackson, & Crowe, 2004;Crowe & Stranks, 2017). The severity and relevance of such impairment with respect to patients' driving performance is not clear however. The classification of benzodiazepines in Category III, irrespective of duration of use, may be overly conservative for drivers who have been receiving long-term treatment, limiting their mobility. As a partial solution, taking duration of use into account, current Dutch laws state that benzodiazepine users are unfit to drive when treated for less than 3 years but can request an individual driver fitness evaluation after more than 3 years of stable usage (Ministry of Infrastructure and Water Management, 2000). The criterion of 3 years seems rather arbitrary, because there is no clear scientific support for this particular cut-off point. As far as we know, there are no published studies comparing driving performance of long-term benzodiazepine users before and after 3 years of use.
The primary objective of the present study was to evaluate driving performance of long-term users of benzodiazepine anxiolytics and long-term users of hypnotics separately, as compared with that of a normative control group consisting of healthy volunteers. Only users of benzodiazepines classified as Category III were included. Long-term usage was defined as longer than 6 months. The secondary objective was to evaluate driving performance separately for patients who had been using treatment for less than 3 years and those whose use exceeded 3 years. Driving performance was assessed by a standardized highway driving test in actual traffic and various neurocognitive tests related to driving.

| Design
The study was designed as a multicentre trial (Universities of Maastricht, Utrecht and Groningen, the Netherlands) comparing groups of long-term users of benzodiazepines with healthy controls. Patients treated with benzodiazepines anxiolytics and hypnotics were analysed separately, because of the difference between these groups in time of drug intake relative to time of driving. It is known that the impairing effects of benzodiazepines on driving decrease with increased time after intake. Hypnotics are taken at bedtime, and driving occurs the next day, 8 hr or more after administration. In contrast, anxiolytics are administered during the day, and driving is likely to occur within 8 hr of administration. A combination of self-reported indication and usual time of drug administration was used to classify a patient as user of hypnotics or anxiolytics.
To explore the potential difference in impairment before and after 3 years of use, hypnotic users were subdivided into two groups based on duration of treatment, that is, long-term use between 6 months-3 years (LT3−) and long-term use >3 years (LT3+). Anxiolytic users could not be divided based on treatment duration due to the low sample size of this group.

| Participants
Patients were recruited via patient organizations, hospitals, and practitioners affiliated with UPPER (Koster, Blom, Philbert, Rump, & Bouvy, 2014) and regional advertisement. Controls were recruited via flyers and advertisement in local newspapers.
Study participants were informed about the study's goal, procedures, and potential hazards. The Medical Ethics Committee of Maastricht University and the Maastricht Academic Hospital approved the study. Furthermore, the study was conducted in agreement with the code of ethics on human experimentation established by the Declaration of Helsinki (1964), amended in Edinburgh (2000), Seoul (2008), and Fortaleza (2013. Written informed consent was obtained from each volunteer before enrolment. Volunteers received a financial compensation for their participation in the study.

| Patients
A group of 44 long-term users of benzodiazepines or benzodiazepinelike drugs (i.e., Z drugs) was recruited (12 users of anxiolytics and 32 users of hypnotics). All patients used category III drugs that are expected to severely affect fitness to drive. These included alprazolam, bromazepam, brotizolam, diazepam, lorazepam, lormetazepam, midazolam, nitrazepam, oxazepam, temazepam, zolpidem, or zopiclone. Initial screening was based on a medical history questionnaire that was evaluated by a clinician.
The following inclusion criteria had to be met: use of a category III benzodiazepine or benzodiazepine-like drug over a period of at least 6 months with a frequency of at least two times a week (≈90 days/year), possession of a valid driver's license for at least 3 years, driving an average of at least 500 km/year, normal or corrected to normal vision, and body mass index between 17 and 35 kg/m 2 . Although Dutch law deems benzodiazepine users who have been treated for less than 3 years are unfit to drive, many of them drive a motor vehicle simply because they are unaware of this legal provision and because this provision is not actively enforced by the Dutch government either. Patients were excluded if they used concomitant medication classified as International Council on Alcohol, Drugs and Traffic Safety (ICADTS) Category III. Concomitant medication classified as ICADTS Category 0/I was allowed, whereas ICADTS Category II was evaluated by a clinician on individual bases. Additional exclusion criteria were alcohol use >21 glasses per week, smoking >20 cigarettes a day, and use of illegal drugs.
Before test days, patients took their anxiolytic or hypnotic medication as usual, that is, in the evening or morning before testing.
Patients usual dosing regimen were established at screening and monitored by self-report on the practice and test day.

| Controls
A group of 65 healthy volunteers was recruited with comparable age, gender distribution, and driving experience as patients. Inclusion criteria were a valid driver's license for at least 3 years, driving an average of 3,000 km/year, normal or corrected to normal vision, and a body mass index between 19 and 29 kg/m 2 . Exclusion criteria were diagnosed with a neurological disorder or sleeping disorder, alcohol use >21 glasses per week, smoking >10 cigarettes a day, and use of illegal drugs and psychoactive medication (e.g., antidepressants, benzodiazepines, antiepileptics, anticonvulsants, antihistamines, and opioids).

| Driving test
In the standardized on-the-road highway driving test (Figure 1; O'Hanlon, 1984;Ramaekers, 2017;Verster & Roth, 2011), volunteers drive a specially instrumented car over a 100 km (61 miles) primary highway circuit accompanied by a licensed driving instructor having access to dual controls. The volunteers' task is to maintain a constant speed of 95 km/hr (58 miles/hr) and a steady lateral position between the delineated boundaries of the slower right hand F I G U R E 1 Standard highway driving test. Left: Volunteers drive a specially instrumented vehicle for about 1 hr over a 100-km primary highway circuit, accompanied by a licensed driving instructor having access to dual controls. The volunteer's task is to drive with a steady lateral position between the delineated boundaries of the slower (right) traffic lane, while maintaining a constant speed of 95 km/hr. The lateral position of the car relative to the middle line, between the left and right traffic lane, is continuously measured by means of a camera that is mounted on the roof of the car. Right: schematic drawing of the highway driving test. The standard deviation of lateral position (SDLP) is an index of road tracking error or "weaving." Drugs that induce sleepiness or sedation cause loss of vehicle control, leading to increased road tracking error traffic lane. The vehicle's speed and lateral position relative to the left lane delineation is continuously recorded. These signals are digitally sampled at 4 Hz and edited offline to remove data recorded during overtaking manoeuvres or disturbances caused by roadway or traffic situations. The remaining data yield the standard deviation of lateral position (SDLP) and speed for each successive 5-km segment and, as the square root of pooled variance over all segments, for the test as a whole. The primary outcome variable is the SDLP (in cm), which is a measure of road tracking error or "weaving." Drug-induced impairments in the standardized highway driving test have been compared with that of a well-known benchmark drug (i.e., alcohol) that is known to jeopardize traffic safety and shows a clear exponential dose-dependent relationship with accident crash risk (Blomberg et al., 2009;Borkenstein et al., 1974). The clinical relevance of performance changes in the highway driving test has previously been determined by establishing the relationship between BAC and SDLP (Louwerens, Gloerich, DeVries, Brookhuis, & O'Hanlon, 1987). A recent meta-analysis of nine alcohol calibration studies revealed a mean increment in SDLP of 2.5 cm while operating the vehicle at a BAC of 0.5 mg/ml, which has been defined as the minimal cut-off value to represent clinically relevant impairment . The highway driving test has been used in more than 100 studies and has proven sensitivity to alcohol, benzodiazepines, and many other sedating drugs (Ramaekers, 2017;Roth et al., 2014;Vermeeren, 2004).

| Trail Making Test
The Trail Making Test (TMT) is a paper-and-pencil test measuring selective and divided attention, as well as executive functions (Reitan, 1958). The test comprises two parts. In Part A, the task of the volunteer is to connect, as fast as possible, 25 circles that contain the Numbers 1 to 25, by means of connecting the circles in ascending order. In Part B, the 25 circles contain letters (A to L) and numbers (1 to 13).
Volunteers are required to connect, as fast as possible, the 25 circles in an alternately ascending fashion (i.e., 1-A-2-B-3-C, and so on). The maximum time allowed for part A is 5 min, and for part B, it is 6 min.
The outcome measures for Parts A and B is the time (in seconds) needed to complete the task.

| Digit Symbol Substitution Test
The Digit Symbol Substitution Test (DSST) is a paper-and-pencil test measuring executive attention and processing speed (Wechsler, 1958). Volunteers are presented with rows of digits (1 to 9) and have to respond by writing the corresponding symbol in a blank space, according to a key presented at the top of the paper. The primary outcome measure is the number of correctly substituted digits in 90 s.

| Adaptive Tachistoscopic Traffic Perception Test
The Adaptive Tachistoscopic Traffic Perception Test (ATTPT) assesses visual orientation ability, visual observational ability, speed of perception, and skills in obtaining a traffic overview (Schuhfried, 2009). Volunteers are presented with pictures of traffic situations for a very short duration. After each picture, volunteers are required to indicate what was in the picture, by choosing from five answer options (i.e., cars, cyclists, pedestrians, traffic signs, and/or traffic lights). Pictures are presented adaptively, meaning that the difficulty of the pictures is adapted to the abilities of the volunteer (i.e., volunteers, who perform poorly and receive pictures containing less complex traffic situations; vice versa for volunteers who perform well). The primary outcome is the number of correct answers. Time to complete the task is 10 min.

| Reaction Test
The Reaction Test (RT) assesses reaction time and motor time in response to simple and complex visual or acoustic signals (Prieler, 2008). Before the test, volunteers are instructed to lay their index finger on a pressure-sensitive key (i.e., rest key). During the test, volunteers are required to press a target key, with their index finger, whenever a target stimulus is presented. After pressing the target key, they must return their index finger immediately to the rest key. By means of using a rest key and target key, it is possible to distinguish between reaction time (time between the presentation of the target stimulus and the moment the index finger is removed from the rest key) and motor time (the time between releasing the rest key and pressing the target key). The current experiment uses three versions of the reaction test, that is, S1, in which volunteers have to respond whenever a yellow circle is shown on screen; S2, in which volunteers have to respond whenever they hear a tone; and S3, in which volunteers have to respond whenever they see a yellow circle on screen and a hear a tone in combination; all other stimuli combinations are to be ignored. Time to complete all three versions of this task is 10 min.
Outcome measures for these tests are reaction time and motor time.
The test measures the ability to sustain attention over a period of approximately 10 min. Volunteers are presented with visual stimuli of varying colour and sounds with a different pitch, in a serial order. For each stimulus, a predefined button has to be pressed. The presentation of stimuli is adaptive to the reaction speed of the volunteer, meaning that the interstimulus interval is shortened when volunteers make correct and fast responses and is slowed down when volunteers make mistakes or make slow responses. During the task, volunteers are presented with the following stimuli and have to press the following corresponding buttons: (a) visual coloured circles (white, yellow, red, green, and blue), each presented colour has a matching coloured key on the keyboard; (b) auditory signals (low pitch and high pitch), each auditory signal has its own response key on the keyboard; and (c) motor signals (displayed as a white rectangle on the left or right side of the bottom of the screen), each motor signal required the volunteer to press a response pad with his right or left foot, depending on the position of the white rectangle on screen. The outcome measure is the average reaction time of all responses made.

| Risk-Taking Test Traffic
The Risk-Taking Test Traffic (RTTT) measures risk-taking behaviour in potentially dangerous driving situations (Hergovich, Bognar, Arendasy, & Sommer, 2005). Volunteers are presented with 24 items (i.e., video clips) that show diverse driving situations, which are described in words before they are shown on screen. Each driving situation is shown twice. During the first time, volunteers observe the entire driving situation. During the second time, volunteers are required to press a key on the keyboard, indicating the distance from the potential hazard at which the driving manoeuvre that has just been described becomes critical or dangerous (i.e., the point at which the volunteer would no longer perform the manoeuvre). The first item of the 24 items serves as a practice item. Time to complete the task is approximately 15 min. The variable "willingness to take risk in driving situations" is measured by obtaining the distance between the moment of a potential hazard, measured in hundreds of a second, and the moment the volunteer presses the key indicating that the potential hazard becomes critical or potentially dangerous. This distance is a measure of subjectively accepted level of risk. Higher scores indicate higher levels of subjectively accepted risk.

| Psychomotor Vigilance Test
The Psychomotor Vigilance Test (PVT) is based on a simple visual reaction time test (Dinges & Powell, 1985). The test measures the ability to sustain attention over a period of approximately 10 min. Volunteers are required to respond to a visual stimulus presented at a variable interval (2-10 s) by pressing a button with the dominant hand. The visual stimulus is the presentation of a counter that starts running from 0 to 60 s at 1-ms intervals. Volunteers are required to respond to this visual counter as soon as they perceive it on screen by pressing the corresponding button. If a response is made, the counter stops, stays on screen for 500 ms as visual feedback for the volunteer, and disappears. During this period, a variable interval is presented, and afterwards, the next counter appears on screen. This cycle repeats until 100 stimuli have been presented on screen. If a response has not been made within 60 s, the clock resets and the counter restarts. Primary outcome measures are mean response speed and number of lapses (defined as responses with RT ≥ 500 ms; Basner & Dinges, 2011). Performance on the PVT has been calibrated for dose effects of alcohol and one night of sleep deprivation (Jongen, Perrier, Vuurman, Ramaekers, & Vermeeren, 2015;Jongen, Vuurman, Ramaekers, & Vermeeren, 2014).

| Beck's Depression Inventory
The Beck Depression Inventory (BDI; Beck, Steer, & Carbin, 1988) is a 21-item self-report questionnaire measuring depression-related symptomology. Answer options for each question range from 0 to 3. The obtained total score for the BDI serves as an indicator for the presence of depression-related symptoms, ranging from 0 to 63.
Higher total scores indicate the presence of more symptoms of depression.

| State-Trait Anxiety Index-Trait
The State-Trait Anxiety Index-Trait (STAI-T; Spielberger, Gorsuch, & Lushene, 1970) is the trait dimension of the 40-item self-reported STAI questionnaire. The STAI-T contains 20 questions that measure trait anxiety (i.e., how individuals feel in general). Answer options for each questions range from 1 to 4, with total scores ranging from 20 to 80. Higher total scores indicate more anxiety-related symptoms.

| Pittsburgh Sleep Quality Index
The Pittsburgh Sleep Quality Index (PSQI; Buysse, Reynolds, Monk, Berman, & Kupfer, 1989) is a self-report questionnaire that assesses the quality and patterns of sleep over the last month, by rating seven sleep-related domains: subjective sleep quality, sleep latency, sleep duration, habitual sleep efficiency, sleep disturbance, use of medication, and daytime disturbance. A summary score ranging from 0 to 21 can be derived, with higher scores indicating poorer sleep quality.
A summary score ≥5 indicates a poor sleeper.

| Groningen Sleep Quality Scale
The The total duration of a test day was approximately 4 hr ( Figure 2).

| Statistical analysis
Statistical power to detect a clinically relevant mean difference in SDLP of 2.5 cm between patients and controls was as follows: anxiolytic users versus controls, β = .58; hypnotic users versus controls, Next, noninferiority analyses were used to determine whether the 95% confidence interval (CI) of performance differences between patients and controls exceeded the criterion level of clinical relevance, that is, an equivalent performance change as seen at a BAC of 0.5 mg/ml. When evaluating the 95% CI of differences between groups, three interpretations are possible (Figure 3). Patients' performance was considered not impaired (i.e., noninferior) when the upper limit of the 95% CI of the difference from controls was below the alcohol criterion for impairment. Patients' performance was considered impaired (i.e., inferior) when the lower limit of the 95% CI of the difference from controls was above zero and the upper limit exceeded the alcohol criterion for impairment. When the 95% CI of the difference from controls included both zero and the alcohol criterion for impairment, the results were considered inconclusive. The noninferiority limit for the on-the-road driving test (Figure 4 Clinical relevance of impairment of neurocognitive performance was also based on direct comparison impairing the effects of alcohol at a BAC of 0.5 mg/ml. In a separate study (Verster et al., 2016), an alcohol calibration was performed to determine which neurocognitive parameters were able to detect impairment at a BAC of 0.5 mg/ml. Results of the calibration study showed that the only parameters sensitive for the impairing effects of alcohol were TMT-A, DSST, RT-S1, RT-S2, RT-S3, DT, and PVT. Consequently, these are the only parameters that provided noninferiority limits for the present study. The clinical relevance of neurocognitive tests used in the present study will only be discussed for these parameters.
All statistical analyses were conducted by using the IBM Statistical Package for the Social Sciences for Windows (Version 24.0.01., IBM Corp., Armonk, NY, USA). Power calculations were performed using G*Power Version 3.1 (Faul, Erdfelder, Lang, & Buchner, 2007). Table 1 summarizes the characteristics of the patient groups and control group. Age, gender, and driving experience did not differ significantly between groups. As expected, patients had on average more complaints of anxiety, depression, and sleep problems compared with F I G U R E 2 Schedule of a testing day. Time (in hours) is displayed relative from start. ATTPT, Adaptive Tachistoscopic Traffic Perception Test; DSST, Digit Symbol Substitution Test; DT, Determination Test; PVT, Psychomotor Vigilance Test; RT, Reaction Test; RTTT, Risk-Taking Test Traffic; TMT, Trail Making Test F I G U R E 3 Hypothetical example of the qualification of clinical relevance of performance differences between patients and controls. The dotted line indicates the change in performance after alcohol intake (relative to placebo). A (drug induced) change in performance will be classified as inferior when the 95% confidence interval (CI) includes the alcohol criterion but not zero (A-inferiority). Noninferiority is concluded when the 95% CI does not include the alcohol criterion (B-noninferiority). If the 95% CI includes the alcohol criterion as well as zero, the qualification of clinical relevance is undecided (C-inconclusive). BAC, blood alcohol concentration controls. Differences in scores on BDI, STAI-T, PSQI, and GSQS were significant for anxiolytic and hypnotic users, as well as for the hypnotic LT3− and LT3+ subgroups (all ps < .01). Table 2 gives an overview of psychoactive medication used per patient group. All participants took their medication at least 4 days/week. Users of anxiolytics indicated they used their medication daily, and users of hypnotics reported medication use at least four nights per week. In total, 30 patients used psychoactive comedication (Table 2), mostly second-generation antidepressants (n = 22) and second-generation antipsychotics (n = 6). Most antidepressants were selective serotonin reuptake inhibitors and serotoninnorepinephrine reuptake inhibitors, which have minor effects on driving (Category I). Second-generation antipsychotics can have moderate effects on driving (Category II). The proportion of patients using category II comedication was higher in the LT3− group (44%, four out of nine) than in the LT3+ group (26%, six out of 23) and the anxiolytic group (25%, three out of 12).

| Missing data
Data from the highway driving test were missing for one person in the control group, and for one patient in the hypnotic LT3− subgroup, due to problems with the recording system.

| Matching to controls
Analyses showed no significant effect of age, gender, or driving experience in the analysis of covariance model on SDLP, ATTPT, RTTT, and PVT mean reaction time. For these parameters, the entire control group sample was used as a reference for comparison with patient groups. For the remaining parameters, matched healthy controls were used for each patient (sub)group. Analysis of variance (ANOVA) also showed a significant difference between LT3− hypnotic users and controls, F(1, 71) = 9.38, F I G U R E 4 Left: mean (±standard error) standard deviation of lateral position (SDLP) for controls and patients groups. Right: mean (95% confidence interval) differences in SDLP between patient groups and controls. The dotted line indicates the change in performance after alcohol intake (relative to placebo). Symbols above bars indicate significant difference from controls, p < .05. BAC, blood alcohol concentration   Table 3 shows the mean (±standard error) for all performance parameters for each patient (sub)group and healthy controls and the results from ANOVA analyses. Table 4 shows an overview of the 95% CI of mean changes between patients and (matched) controls on alcoholsensitive parameters only, including inferiority limits and analyses.

| Neurocognitive performance
Comparisons between patients using anxiolytics and controls showed significant impairment of patients' performance on the DT and PVT MeanRT . The 95% CI of mean changes in reaction time in the DT and PVT was above zero and exceeded the BAC of 0.5-mg/ml criterion, indicating clinically relevant impairment.
ANOVA showed significant performance differences between patients using hypnotics and controls in the TMT-B, DSST, RT-S1 (motor and reaction time), RT-S2 (motor and reaction time), RT-S3 (motor and reaction time), DT, RTTT, and mean reaction time in PVT.
Noninferiority analysis of alcohol-sensitive parameters showed that the 95% CIs of differences in DSST, RT-S1, RT-S2, RTS3, and PVT exceeded zero and the alcohol criterion indicating that impairment on these parameters can be considered clinically relevant.

| DISCUSSION
This study aimed to compare driving performance of long-term users of benzodiazepine or benzodiazepine-related anxiolytics and hypnotics to that of a normative control group consisting of healthy volunteers, in order to evaluate whether classification of these drugs in Category III may be too conservative for patients who receive longterm treatment. Overall, mean SDLP was significantly higher in patients treated with hypnotics as compared with controls, indicating their driving performance is worse than normal. This seemed mainly due to patients who had been using hypnotic less than 3 years, as the difference in SDLP form controls was significant in this group but not in those who had received hypnotics treatment for more than 3 years.
Mean SDLP did not differ significantly between patients treated with anxiolytics and controls, which may be explained by a lack of power due to the small sample size and large individual variation. Both patient groups (users of hypnotics and anxiolytics) displayed increased reaction times in a number of neurocognitive tasks. In line with findings for SDLP in hypnotic users, these impairments were most prominent in patients who used these drugs for less than 3 years. Clinical relevance seemed less present in patients using hypnotics for more than 3 years. 2.5 cm in drivers operating with a BAC level of 0.5 mg/ml . In the present study, the mean increase in SDLP in patients using anxiolytic or hypnotic drugs was 1.48 and 1.70 cm, respectively, relative to healthy controls. The 95% CI of these mean differences included the alcohol criterion in both groups, as well as zero in case of the anxiolytic users. As mentioned above, driving performance of anxiolytic users was inconsistent. Some individuals showed marked increments in SDLP, whereas others did not. Overall, no conclusion can be drawn for this group from these data.
Driving impairment observed in hypnotic users is of clinical relevance, because it exceeds the level of impairment associated with the legal limit of alcohol in traffic. This is in line with results from epidemiological studies showing increased risk of traffic accidents associated with the use of hypnotics (Gustavsen et al., 2008;Hansen et al., 2015;Orriols et al., 2011). Interestingly, however, severe impairment was present only in patients who used hypnotics less than 3 years, whereas no relevant impairment, as measured by SDLP, was found in those who had been using hypnotics longer than 3 years. The mean difference in SDLP between controls and hypnotic LT3− users was +4.56 cm, which is equivalent to a BAC > 0.8 mg/ml (Louwerens et al., 1987). For LT3+ users, the mean difference was only 0.70 cm, and the 95% CI remained below the alcohol criterion, indicating no relevant impairment of driving. The latter finding is in line with that from a previous driving study in insomnia patients who frequently used hypnotics (Leufkens, Ramaekers, de Weerd, Riedel, & Vermeeren, 2014).
Duration of use in these patients was on average 7.7 years, and their SDLP did not differ from those of a group of normal sleepers. The difference in impairment between the LT3− and LT3+ groups in our study corresponds with gradually decreasing accident risk found in epidemiological studies following benzodiazepine treatment (Neutel, 1995;Verster et al., 2005). Although development of physiological tolerance may explain the mitigation in impairment, other factors may also play a role, such as improvements in underlying conditions, reduction in comedication, and behavioural tolerance (i.e., learning to minimize unwanted drug effects on performance by cognitive or behavioural adaptations).
Users of hypnotics and anxiolytics also demonstrated increments of reaction times as compared with controls on a number of neurocognitive tasks that exceeded the alcohol criterion of clinical relevance. In line with the results of the driving test, results of cognitive tests were mostly inconclusive for users of anxiolytics and showed clinically relevant impairment in users of hypnotics. Similar to driving impairment, psychomotor impairment in hypnotic users was most severe in patients who had been using these drugs less than 3 years.
Contrary to the absence of driving impairment, patients who used hypnotics longer than 3 years showed relevant impairment in some tests (i.e., the PVT, DSST, and DT). This finding seems in line with recent reviews (Crowe & Stranks, 2017;van der Sluiszen, Vermeeren, Jongen, Vinckenbosch, & Ramaekers, 2017) concluding that long-term treatment with benzodiazepines can be associated with deleterious neuropsychological effects. Impairment was even found to persist following benzodiazepine withdrawal (Crowe & Stranks, 2017 hypnotic use is comparable with, or larger than, the effects found for alcohol at a BAC of 0.5 mg/ml. It can be concluded that impairment may be moderate, but the evidence that patients who have been using hypnotics more than 3 years are severely impaired is weak. So the current classification, and the associated general prohibition to drive when using these drugs, may be too strict for this group of patients.
For shorter use of hypnotics, our data support the current classification as Category III. For anxiolytics, there is no evidence to support a change in the current classification.
Several limitations may be present in this study. First, there may be a selection bias, in so far as probably only patients who estimated themselves as fit to drive volunteered for the study.
However, these patients may be representative for the target population, that is, long-term users who are active car drivers. Patients who do not feel fit to drive are less likely to drive in real life. Second, it should be noted that the statistical power to detect clinically relevant impairment was much less in anxiolytic users (n = 12) and hypnotic LT3− users (n = 9), due to the low sample size in both groups. Yet differences from controls were relatively large in the LT3− group and therefore still achieved statistical significance.
Third, the anxiolytic and hypnotic users in the present study formed a heterogeneous sample due to the diversity in benzodiazepines, daily dosages, time since last dosage, and comedication. Such factors may generate variability in performance between patients and hinder the interpretation of underlying mechanisms. However, it does reflect the diversity in the population of long-term users of benzodiazepines who drive a car. Fourth, subdivision of benzodiazepine use below and above 3 years was a legislative measure as adopted in the Netherlands. Strict conclusions based on this subdivision should therefore be avoided. Nonetheless, future studies could explore the time needed to build op tolerance to the impairing effects of daily antidepressant usage and handle treatment duration as a continuous variable over time.
Overall, the results show that impairing effects of benzodiazepine hypnotics on driving performance may mitigate over time following long-term use of 3 years, although forms of neurocognitive impairment may remain. This supports the idea to take duration of treatment into account when evaluating the impact of hypnotics on individual drivers. The implication would be that classification systems that grade effects of drugs on driving should allow for differential classification of hypnotics in relation to treatment duration. The results do not support differential grading for benzodiazepine anxiolytics.