Evaluation of a new method of assessing depth of sedation using two-choice visual reaction time testing on a mobile phone*


  • *

    This study was presented at the 2005 Annual Scientific Meeting of the UK Society for Intravenous Anaesthesia in Edinburgh.

Dr A. J. Thomson
E-mail: a.j.thomson@ed.ac.uk


The utility of two-choice visual reaction time testing using a specially programmed mobile telephone as a measure of sedation level was investigated in 20 healthy patients sedated with target controlled infusions of propofol. At gradually increasing target concentrations visual reaction time was compared with patient-assessed visual analogue scale sedation scores and an observer-rated scale. Propofol sedation caused dose-dependent increases in visual reaction time and visual analogue scale scores that were statistically significant when the calculated effect-site concentration reached 0.9 μg.ml−1 (p < 0.05) and 0.5 μg.ml−1 (p < 0.01) respectively. While visual analogue scale scores were more sensitive at lower levels of sedation than visual reaction time, the latter demonstrated marked increase in values at higher levels of sedation. Visual reaction time may be useful for identifying impending over-sedation.

Objective assessment of the effects of sedative drugs is important both in clinical practice, to reduce the likelihood of over-sedation, and in research studies. Unfortunately, there is no ideal monitor or measure of degree of sedation. Different sedation scales [1–3] are all subjective, relying on assessments made by either patient or observer. A variety of objective electrophysiological monitoring including bispectral index [4], spectral entropy [5], Narcotrend [6] and auditory evoked potentials [7, 8] are of some value but all share the disadvantage of not assessing patient responsiveness. Recently, both auditory and visual reaction times have been correlated with the level of sedation produced by propofol infusion [9]. These two measures are attractive as they are objective and require the patient to respond to a sensory stimulus. Unfortunately, the equipment used involved subjects wearing goggles and headphones connected to a computer, potentially hampering communication and impairing bedside access [9]. It is now possible to perform similar tests of visual reaction time using handheld computers [10], or specially programmed mobile telephones [11].

The aim of this study, therefore, was to investigate whether visual reaction times, determined using a specially programmed mobile telephone, were correlated with the level of sedation produced by infusion of propofol. We aimed to compare visual reaction time measurements at different levels of propofol sedation with a patient-assessed method (visual analogue scales) and a validated observer-assessed scale (Observer’s Assessment of Alertness/Sedation (OAA/S)) [1].


This study was approved by the local research ethics committee. We recruited 20 patients (American Society of Anesthesiologists grade I or II) who were about to undergo elective surgery. Written informed consent was obtained from all patients. We excluded any patients with significant visual impairment that was not corrected by spectacles; with a history of central nervous system disorder; and any patients who were taking benzodiazepines or any other sedative drugs. No sedative premedication was given before the study.

Visual reaction time (VRT) was assessed using a Java-enabled mobile telephone (Nokia 6610, Keilalahdentie, Finland) with arrowrt software (http://www.penscreen.com) that had been developed by one of the authors (BT). This employs a two-choice technique. The patient was asked to hold the mobile telephone in both hands and observe the screen. A series of large arrows, pointing either left or right then flashed up, one at a time, at random intervals in a random order. The patient was asked to press an appropriate key on either the left or right side of the key-pad, corresponding to the direction of the arrow, as quickly as possible after each arrow appeared. At the end of each 1-min test period the mean visual reaction time for correct responses was displayed on the screen. In addition, the number of responses made during the test period was displayed along with the proportion of incorrect responses. A reaction time that was adjusted to penalise incorrect responses was also calculated and displayed [12] (Appendix).

Patients were also asked to assess their own level of sedation during the study using visual analogue scales (VAS). We used four different 100-mm scales with the descriptive anchor terms Alert/Drowsy, Clear-headed/Muzzy, Energetic/Lethargic, and Sober/Drunk [13]. The results obtained using each individual scale were subsequently used to calculate a mean VAS score for each time point. In addition, sedation was assessed by the anaesthetist conducting the study using an adapted version of the OAA/S scale [1, 14] (Table 1). Both the VAS and the VRT were demonstrated to each patient before consent was obtained. All patients then carried out three 1-min reaction time tests to allow familiarisation with the technique.

Table 1.   Adapted Observer’s Assessment of Alertness/Sedation Scale (OAA/S). Responsiveness and speech (by asking the patient to repeat a standard sentence) were assessed independently and scored 1–5. The lowest score from either category was used as the composite score.
Assessment categoriesComposite score level
Responds readily to name spoken in normal toneNormal5 (Alert)
Lethargic response to name spoken in normal toneMild slowing or thickening4
Responds only after name is called loudly and/or repeatedlySlurring or prominent slowing3
Responds only after mild prodding or shakingFew recognisable words2
Does not respond to mild prodding or shaking1 (Deep Sleep)

All studies were carried out immediately before planned anaesthesia and surgery. ECG, blood pressure and pulse oximeter saturation were monitored (Datex S/5, GE Healthcare, Helsinki, Finland) and supplemental oxygen (4 l.min−1) was administered via nasal cannulae throughout each study.

Before the start of sedation, two baseline tests of reaction time were performed. Propofol 1% was then delivered using a customised device commissioned by the hospital specifically for this research project. A Graseby 3500 infusion pump with Diprifusor® Target Controlled Infusion (TCI) software, was modified to allow effect-site TCI. This system uses the Marsh pharmacokinetic model [15] and allows input of a value for ke0 (blood–effect-site equilibration rate constant). Laboratory studies confirmed that propofol delivery with this system with ke0 values ranging from 0.2 to 1.2 min−1, and with effect-site target concentrations (CeT) up to 3.0 μg.ml−1, was in close agreement with that predicted by an independent pharmacokinetic simulation program (PK-SIM®, Specialized Data Systems, Jenkintown, PA, USA). The delivery of propofol at the target settings envisaged for the study also fell within the range of infusion rates advocated in Diprivan® (propofol) prescribing information. A ke0 of 0.8 min−1 was chosen for the study. This has previously been suggested for use with the Marsh pharmacokinetic model [16] and lies within the range of published values (0.2–1.21) [17, 18]. Initially, a CeT of 0.3 μg.ml−1 was set. Once the calculated effect-site concentration (CeCALC) had reached the CeT this was increased in 0.2 μg.ml−1 increments until the patient was unable to carry out the visual reaction time test successfully (defined as failure to respond to visual stimuli within 10 s). At each level, once the CeCALC had reached the CeT, the patient’s visual reaction time was measured twice using the 1-min test on the mobile telephone. The average of these two measures was used for subsequent analysis. In addition, the patient was asked to rate their level of sedation using the visual analogue scales and the OAA/S score was documented. Once the patient had become too drowsy to carry out the reaction time test, the study was complete and induction of anaesthesia and surgery proceeded.

Data were examined using excel 2002 (Microsoft®). statsdirect software (StatsDirect Ltd., Altrincham, UK) was then used to calculate Spearman’s rank correlation coefficients. Comparisons between different treatment conditions were made using one-way analysis of variance for repeated measures and the Mann–Whitney U-test as appropriate. Repeatability of visual reaction time measurements was assessed using the method described by Bland and Altman [19]. All results are presented as mean (SD) unless otherwise stated.


Table 2 shows the subject characteristics. All completed the study successfully, although one study was stopped early, because of time constraints in the operating theatre, before the patient became too drowsy to complete the reaction time test.

Table 2.   Patient characteristics. Values are mean (SD).
Sex; M:F13 : 7
Age; years46 (10); range 28–64
Weight; kg87 (20)
BMI; kg.m−228.2 (4.6)

Visual reaction time (prior to adjustment for any incorrect responses) increased with rising propofol effect-site concentration in all patients (Figs 1 and 2). There appeared to be no marked changes in visual reaction time at calculated effect-site concentrations up to 0.7 μg.ml−1, but significant increases at and above 0.9 μg.ml−1 (Figs 1a, 2a and 2b).

Figure 1.

 (a) The effect of increasing propofol CeCALC on mean unadjusted visual reaction time (VRT) in individual patients; (b) the effect of increasing propofol metadose on mean unadjusted visual reaction time (this plot is scaled so that a metadose of 1.0 represents the propofol concentration required to produce a 50% increase in VRT relative to baseline for each patient and is designed to account for inter-patient differences in sensitivity to propofol); (c) the effect of increasing propofol CeCALC on mean visual analogue scale scores (VAS) in individual patients.

Figure 2.

 The effects of increasing propofol CeCALC on (a) mean unadjusted visual reaction time (VRT). Significant increases above baseline VRT were produced with propofol CeCALC of 0.9 μg.ml−1 and above. Single factor ANOVA; *p < 0.05; †p < 0.01; ‡p < 0.001. (b) Percentage change in mean unadjusted VRT from baseline; (c) mean visual analogue scale (VAS) score. Significant increases in VAS score were noted with propofol CeCALC of 0.5 μg.ml−1 and above. Single factor ANOVA; †p < 0.01; ‡p < 0.001 (comparisons made between scores at each level and initial recordings at CeCALC 0.3 μg.ml−1). (d) Total number of patients in study at each propofol CeCALC. Data are mean (SD).

Marked inter-individual sensitivity to the effects of propofol was observed. One patient demonstrated a considerable increase in VRT at a CeCALC of 0.7 μg.ml−1 whilst for others little change in VRT occurred below a CeCALC of 1.9 μg.ml−1 (Fig. 1a). To aid the comparison of this dose-response relationship between patients, a metadose plot was constructed [20] (Fig. 1b). This was scaled so that a propofol metadose of 1.0 represents the propofol concentration required to produce a 50% increase in VRT relative to baseline for each patient.

Patient-rated visual analogue scores (from the combination of four scales) rose with increasing levels of propofol (Figs 1c and 2c). However, we found that there was marked inter-subject variation in VAS scores at each propofol concentration with a mean coefficient of variation of 56 (17)%. This was considerably greater than the variability encountered with VRT [mean coefficient of variation of 29 (15)%].

The number of responses made during the 1-min reaction time test fell at higher levels of sedation (data not shown). In addition, the proportion of incorrect responses increased with rising propofol concentrations (data not shown). When a penalty factor was included in the calculation to take account of these errors, the ‘adjusted’ reaction time also increased with rising levels of propofol sedation. Significant correlations were noted between propofol CeCALC and visual reaction time (= 0.78), ‘adjusted’ visual reaction time (r = 0.80), and the number of responses made during the test (r = −0.76).

Observer-rated sedation (OAA/S) scores fell with increasing levels of propofol (Fig. 3). In all 19 patients who completed the study to the point where they were unable to carry out the visual reaction time test, the OAA/S score had fallen to 3. This corresponded to the patient responding only when their name was called out loudly, and/or having speech that had become slurred or markedly slowed. VRT appeared to discriminate somewhat better between OAA/S values of 3 and 4 than did VAS (Figs 3b and c).

Figure 3.

 (a) Relationship between OAA/S score and Propofol CeCALC. Box denotes inter-quartile range and median, whiskers denote range. Mann–Whitney U-test; *p < 0.001, comparison with Propofol CeCALC at OAA/S 5; ‡p < 0.001, comparison with Propofol CeCALC at OAA/S 4. (b) Relationship between OAA/S score and mean unadjusted visual reaction time (VRT). Box denotes inter-quartile range and median, whiskers denote range. Mann–Whitney U-test; *p < 0.001, comparison with VRT at OAA/S 5; ‡p < 0.001, comparison with VRT at OAA/S 4. (c) Relationship between OAA/S score and mean visual analogue scale (VAS) scores. Box denotes inter-quartile range and median, whiskers denote range. Mann–Whitney U-test; *p < 0.001, comparison with VRT at OAA/S 5; ‡p < 0.01, comparison with VRT at OAA/S 4.

The 20 paired assessments of visual reaction time that were made before the start of sedation were used to evaluate the repeatability of the technique. The difference between measurements was plotted against the mean value for each pair [19] (Fig. 4). 95% of the differences between paired measurements were within two standard deviations of their mean.

Figure 4.

 Repeatability of unadjusted visual reaction time (VRT). Bland-Altman plot of difference between the two measurements of VRT plotted against the mean of the measurements for each pair of recordings made before the start of sedation: 95% of the differences were within two standard deviations of their mean value.


In this study, we have demonstrated that two-choice visual reaction time measurement using an appropriately programmed mobile telephone is a simple, repeatable method for objectively monitoring the effects of propofol sedation. Visual reaction times are considerably less variable than patient-assessed sedation scores and may be used to identify impending over-sedation.

A quantitative measure of degree of sedation produced by propofol is desirable not only in clinical practice (to help avoid risks of excessive sedation), but also in research studies where objectivity is required. This measure should ideally be valid, simple to use with easily-interpretable results, have well defined, discrete categories for levels of sedation; have good inter-rater reliability [21]. The current availability of so many different monitors and sedation scales indicates that none are ideal.

Sedation scales were first developed more than 30 years ago for use in intensive care units (ICU) [2], but these may not necessarily be applicabale to intra-operative sedation. Because the OAA/S scale [1] has been validated for propofol sedation in the clinical setting [5, 9, 14], we chose it as our benchmark in this study. However like all observer-administered sedation scales, it is subjective. In contrast, VAS scales (used widely in psychological medicine) were used to provide a patient-assessed measure of sedation [22]. Simple and quick to use, they are limited at deeper levels of sedation when the patient can no longer report the scale. Bispectral index (BIS) [4] and spectral entropy [5] are useful in predicting the likelihood of continued response to verbal command during propofol infusion, they are less reliable for monitoring the effects of lighter levels of sedation [23, 24].

The arrowrt mobile telephone two-choice test of VRT was simple to administer and patients performed the test successfully. The numerical results obtained were not subject to observer bias. Once VRT had began to increase significantly from baseline, only small additional increases in the effect-site concentration of propofol produced further large increases in VRT until the patient could no longer perform the VRT due to excessive drowsiness. It was interesting that, even at this point, the OAA/S score had not fallen < 3 in 19 patients, suggesting VRT is a more sensitive measure of sedation. This finding is in agreement with a study by Doufas et al. [25] who showed similar results with an automated responsiveness test (ART).

In an attempt to define a VRT ‘threshold’ above which over-sedation is likely to occur, we estimated that the mean increase in VRT once an OAA/S score of 3 had been reached was 128 (90) % above baseline VRT. However, the range of increase in reaction time at this point was large, the smallest value being 27.5%. It may therefore be reasonable to use the addition of 28% to baseline VRT as a very cautious early warning of impending over-sedation. In our study, this increase in VRT was associated with an OAA/S score of 4 in 17 patients and 3 in the remaining three patients. Alternatively, it may be possible to utilise a combination of the OAA/S scale and VRT to assess small changes in moderately deep levels of sedation: the former used to help identify large changes in depth of sedation, the latter used, once moderately deep sedation is achieved, to make assessments of further small changes in sedation level.

As previously demonstrated with BIS and spectral entropy [5, 23], we encountered considerable overlap in the range of VRT at different levels of sedation (OAA/S categories 5, 4, and 3). However, unlike with BIS and entropy, there was very minimal overlap in VRT at OAA/S category 5 (minimal sedation) vs 3 (moderately deep sedation) (Fig. 3b). VRT measurement may therefore be a better discriminator of these two levels of sedation than are BIS or spectral entropy. Unfortunately, like the processed EEG monitors, VRT is little changed by the lightest levels of sedation. Interestingly, we found that patient-rated VAS was a more sensitive measure of these lighter levels of sedation (Figs 1c and 3c).

While we observed that the number of incorrect VRT responses increased with depth of sedation and that the adjusted VRT (made to penalise such incorrect responses) also increased, we did not find an advantage in using either of these measures over the unadjusted VRT measurement. By performing two consecutive measures of VRT, we confirmed the repeatability of the technique [19] (Fig. 4).

Marked inter-subject variability in the response to propofol was demonstrated in the range of OAA/S, VRT and VAS scores for the same propofol effect-site concentrations. This pharmacodynamic variability is a well recognised feature of both propofol and other sedative agents [4]. This implies that calculated effect-site concentrations cannot be used in isolation to gauge the level of sedation produced by propofol. It also complicates the comparison of different measures of sedation as any new measure cannot simply be related to predicted brain or blood concentrations, and inevitably must be compared with an existing tool. It is possible that alternative measures or scales may be assessing different aspects of sedation, which in turn may have differing sensitivities to any given sedative drug. This is implicit in the OAA/S scale [1] where different scores can be applied to each of the assessment categories at any particular level of sedation. Equally, it may be the case that VRT and VAS are assessing different aspects of sedation and have differing sensitivity to the effects of propofol sedation. The combination of all these two factors mean that any direct comparison of alternative methods of assessing sedation is challenging. However, in demonstrating that there is a good correlation between VRT and OAA/S, we have shown that the former has some validity as a measure of depth of sedation.


The customised Diprifusor® TCI system used in the study was provided by AstraZeneca. The software modification to incorporate effect-site TCI with a range of ke0 values was created by Martyn Gray of Anaesthesia Technology Ltd.


Visual reaction times were calculated by the arrowrt programme using the following formulae:


Incorrect responses were accounted for in the adjusted reaction time formula by doubling the sum of all incorrect reaction times. This was designed to penalise guessing.