1. Top of page
  2. Abstract
  3. Patients and Methods
  4. Results
  5. Discussion
  6. Note Added in Proofs:
  7. References
  8. Supporting Information

Minimal hepatic encephalopathy (MHE) detection is difficult because of the unavailability of short screening tools. Therefore, MHE patients can remain undiagnosed and untreated. The aim of this study was to use a Stroop smartphone application (app) (EncephalApp_Stroop) to screen for MHE. The app and standard psychometric tests (SPTs; 2 of 4 abnormal is MHE, gold standard), psychometric hepatic encephalopathy score (PHES), and inhibitory control tests (ICTs) were administered to patients with cirrhosis (with or without previous overt hepatic encephalopathy; OHE) and age-matched controls from two centers; a subset underwent retesting. A separate validation cohort was also recruited. Stroop has an “off” state with neutral stimuli and an “on” state with incongruent stimuli. Outcomes included time to complete five correct runs as well as number of trials needed in on (Ontime) and off (Offtime) states. Stroop results were compared between controls and patients with cirrhosis with or without OHE and those with or without MHE (using SPTs, ICTs, and PHES). Receiver operating characteristic analysis was performed to diagnose MHE in patients with cirrhosis with or without previous OHE. One hundred and twenty-five patients with cirrhosis (43 previous OHE) and 134 controls were included in the original cohort. App times were correlated with Model for End-Stage Liver Disease (Offtime: r = 0.57; Ontime: r = 0.61; P < 0.0001) and were worst in previous OHE patients, compared to the rest and controls. Stroop performance was also significantly impaired in those with MHE, compared to those without MHE, according to SPTs, ICTs, and PHES (all P < 0.0001). A cutoff of >274.9 seconds (Ontime plus Offtime) had an area under the curve of 0.89 in all patients and 0.84 in patients without previous OHE for MHE diagnosis using SPT as the gold standard. The validation cohort showed 78% sensitivity and 90% specificity with the >274.9-seconds Ontime plus Offtime cutoff. App result patterns were similar between the centers. Test-retest reliability in controls and those without previous OHE was good; a learning effect on Ontime in patients with cirrhosis without previous OHE was noted. Conclusion: The Stroop smartphone app is a short, valid, and reliable tool for screening of MHE. (Hepatology 2013;58:1122-1132)




area under the curve


block design test


digit symbol test


inhibitory control test


line tracing test


Model for End-Stage Liver Disease


minimal hepatic encephalopathy


number connection test A/B


overt hepatic encephalopathy


psychometric hepatic encephalopathy score


quality of life


receiver operating characteristic


standard deviations


serial dotting test


spectrum of neurocognitive impairment in cirrhosis


standard psychometric tests


VA Medical Center


Virginia Commonwealth University


weighted lures

The spectrum of neurocognitive impairment in cirrhosis (SONIC), which ranges from unimpaired, minimal (MHE), to overt hepatic encephalopathy (OHE), can adversely affect the daily life of affected patients and caregivers.[1, 2] MHE is associated with impaired quality of life (QoL), employment, driving capability, and a higher risk of progression to OHE.[3] MHE treatment can improve QoL, driving capability, and progression to OHE.[7] However, this is not the standard of care, partly because routine MHE testing is often not feasible in the United States.[11] To increase testing and, subsequently, treatment rates in MHE, a high-sensitivity, easily applicable screening test, which can be administered in the clinic within a few minutes, is required.[12] More-definitive cognitive and electrophysiological testing could then only be reserved for patients who perform poorly on this task.[12] The Stroop task is a test of psychomotor speed and cognitive flexibility that evaluates the functioning of the anterior attention system and has been found to be sensitive for the detection of cognitive impairment in MHE.[13] There is also a smartphone application (app) for the Stroop task (EncephalApp_Stroop), which is an attractive option for a point-of-care testing strategy for the diagnosis of MHE and cognitive dysfunction in cirrhosis (

We aimed to validate the use of this Stroop smartphone app for the screening of cognitive dysfunction in cirrhosis.

Patients and Methods

  1. Top of page
  2. Abstract
  3. Patients and Methods
  4. Results
  5. Discussion
  6. Note Added in Proofs:
  7. References
  8. Supporting Information

Healthy controls and patients with cirrhosis were recruited prospectively from two independent hepatology centers (Virginia Commonwealth University [VCU] Medical Center and McGuire VA Medical Center [VAMC], Richmond, VA) after obtaining written informed consent. Cirrhosis was diagnosed by compatible laboratory features of thrombocytopenia and aspartate/alanine aminotransferase reversal with radiological findings of cirrhosis or endoscopic evidence of varices in the setting of chronic liver disease. We excluded patients who were not able to consent, had uncontrolled neuropsychiatric diagnoses, were on psychoactive medications apart from stable antidepressants, were abusing alcohol or illicit drugs in the past 3 months, had red-green color blindness, and had uncontrolled OHE (defined as Mini-Mental Status Examination score <25) at the time of the examination.

Subjects then underwent a battery of recommended cognitive tests,[1, 12, 16] including psychometric hepatic encephalopathy score (PHES), block design test (BDT; subjects are required to replicate standardized designs with given blocks in a timed manner; the score is based on the designs correctly copied), and the inhibitory control test (ICT; this is a 15-minute computerized test; subjects are instructed to respond to alternating presentations of X and Y on the screen [targets] while inhibiting response when X and Y are not alternating [lures] or responding to letters other than X or Y [random]).[17] The PHES consists of five tests: the number connection test-A/B (NCT-A/B; subjects are asked to “join the dots” between numbers or numbers and letter in a timed fashion and the number of seconds required is the outcome); the digit symbol test (DST; subjects are required to pair numbers with special symbols; an individual's score reflects the number of correct pairs achieved within a 120-second time frame); line tracing test (LTT; in LTTtime, subjects are required to trace a line between two parallel lines and the time required is noted and, in LTTerrors, number of times the subject strays outside the lines); and the serial dotting test (SDT; subjects are asked to dot the center of a group of blank circles and the time required is the outcome). The PHES is a validated battery for cognitive dysfunction in cirrhosis and tests for psychomotor speed, visuomotor coordination, attention, and set-shifting. The BDT requires an individual to use colored blocks to reproduce a two-dimensional visual design, which tests visual-motor coordination and nonverbal problem solving. The ICT is a validated computerized test of attention, psychomotor speed, response inhibition, and working memory. Weighted lures (WLs) are lures/square of target accuracy and are a composite outcome of ICT in impaired patients.[18] A high score on BDT, DST, and ICT targets and a low score on the rest of the tests indicate good cognitive performance.

Age-balanced healthy controls without chronic medical illness or alcohol and illicit drug abuse were recruited from the community through word-of-mouth referrals and advertisements. A portion of these controls had been recruited for ongoing cognitive dysfunction studies within the last 2 years and only had performance on NCT-A, NCT-B, DST, BDT, and ICT, whereas the remainder had all tests performed (PHES, ICT, and Stroop).

We used three complementary modalities to diagnose MHE in our population. As recommended by Ferenci et al., we used impaired performance in any two of NCT-A/B, DST, or BDT of 2 SDs (standard deviations) beyond healthy controls to be the gold standard.[1] We also used two other modalities (the ICT and PHES) to compare the App. We used the healthy control performances to score the PHES from +1 to −3 SDs, and any score impaired beyond −4 SDs was considered MHE by PHES, whereas we used WL to define poor ICT performance.[19]

Stroop (EncephalApp_Stroop) App

The application was downloaded from the Apple app store (EncephalApp Stroop) and used on the apple iPod platforms. The iPod screens were used to administer the task to all subjects. The task has two components (the “off” and “on” state), depending on the discordance or concordance of the stimuli. Both components were administered after two training runs were given for each state. In the easier off state, the subject views a neutral stimulus, pound signs (###) presented in red, green, or blue, one at a time, and has to respond as quickly as possible by touching the matching color of the stimulus to the colors displayed at the bottom of the screen. The colors at the bottom of the screen are also randomized and not fixed to their respective positions. This continues until a total of 10 presentations, which is one run and the total time taken for the run as well as the individual responses. If the subject makes a mistake (i.e., presses a wrong color), the run stops and has to restart again. Therefore, the number of runs required to make five correct runs also indicates the number of mistakes. We continued the off state until the subject had achieved five correct runs.

The on state is more challenging, from a cognitive standpoint, in that incongruent stimuli are presented in 9 of the 10 stimuli. In this portion, the subject has to accurately touch the color of the word presented, which is actually the name of the color in discordant coloring (i.e., the word “RED” is displayed in blue color and the correct response is blue, not red). Similar to the off state, we gave two training runs and then continued the task until five correct runs were achieved.

The specific outcomes at the end of the Stroop app were as follows: (1) total time for five correct runs in the off state (Offtime); (2) number of runs needed to complete the five correct Off runs; (3) total time for five correct runs in the on state (Ontime); and (4) number of runs needed to complete the five correct on runs. The test of cognitive processing controlling for psychomotor speed was subtracting the Offtime from the Ontime and this was performed for all groups. All administrations and psychometric tests were supervised by the psychologist (J.B.W.). We compared the app results between the two independent study sites (VCU and VAMC). One administrator (A.U.) started working on this trial recently and her findings were compared to the previously obtained data.

Prospective Validation Cohort

A further group of patients with cirrhosis were prospectively recruited to validate the findings on the receiver operating characteristic (ROC) generated with the first cohort.

Longitudinal Study

A subgroup of patients with cirrhosis (with or without previous OHE) and controls, whose clinical status remained unchanged, underwent Stroop testing twice within 6 months of the previous testing.

Statistical Analysis

We compared demographics, Model for End-Stage Liver Disease (MELD) score, venous ammonia, and serum sodium across study groups. Patients with cirrhosis were studied as a whole, whereas the subgroup without previous OHE was studied separately as well. All cognitive tests, including Stroop outcomes, were compared between controls and the groups with cirrhosis. App results were correlated with age, education status, ammonia, sodium, MELD score, and other cognitive tasks and were compared in patients with and without an alcoholic and hepatitis C etiology of cirrhosis. An ROC analysis was performed with all Stroop results comparing it to MHE diagnosed using standard psychometric tests (SPTs) as well as by PHES and weighted lures in all patients with cirrhosis and in patients without previous OHE. The inflection point of the variable with the highest sensitivity or specificity was chosen as the cutoff for determination of cognitive dysfunction using the Stroop app. This inflection point was then used to evaluate the results of the prospective cohort. Baseline variables, age, education, and MELD score along with app results were entered into a logistic regression for MHE diagnosis for all three modalities. Kappa statistics were used to study agreement between the three modes of MHE diagnosis and with the app results. We compared the app and cognitive results between the two study centers and used the sites in a regression analysis for prediction of MHE and cognitive dysfunction. Longitudinal results were studied using paired t tests of Offtime and Ontime between the first and second time separately in the control, patients with cirrhosis with previous OHE, and those without previous OHE.

The institutional review boards at VCU and McGuire VAMC approved the protocols.


  1. Top of page
  2. Abstract
  3. Patients and Methods
  4. Results
  5. Discussion
  6. Note Added in Proofs:
  7. References
  8. Supporting Information

We recruited 126 patients with cirrhosis (43 with previous OHE) and 51 age-balanced healthy controls for the initial cohort and 43 additional patients with cirrhosis (12 with previous OHE) for the prospective validation cohort. One male patient with cirrhosis without previous OHE could not perform Stroop because of color-blindness and therefore was not considered further. Of the 43 patients with previous OHE, the median time before the last episode was 2.5 months and the median previous OHE episodes was two. The majority (n = 29) were controlled on lactulose alone, whereas the remainder was on additional rifaximin for OHE therapy. Demographics and cognitive performance of the controls, compared to patients with cirrhosis with and without previous OHE are shown in Table 1.

Table 1. Cross-Sectional Comparison of Healthy Controls,*† Compared to Patients With Cirrhosis
 Healthy Controls (n = 51)Cirrhosis (n = 125)
Without Previous OHE (n = 82)With Previous OHE (n = 43)
  1. Healthy control results here are those who underwent the Stroop and PHES. Overall results of an additional 83 controls who underwent only the four SPTs and ICT are described in text. SPTs are the following: NCT-A/B, DST, and BDT, two of which need to be abnormal to be considered MHE.

  2. a

    P = 0.05-0.01.

  3. b

    P < 0.0001.

  4. Abbreviations: Alc, alcohol; HCV, hepatitis C virus; NASH, nonalcoholic steatohepatitis.

Age, years55 ± 556 ± 657 ± 7
Gender, male/female30/2151/3128/15
Race, Caucasian/African American/Hispanic/other34/11/5/154/21/7/031/11/2/0
Education, years14 ± 2a13 ± 213 ± 2
Etiology of cirrhosis, HCV/Alc/HCV+Alc/NASH/other47/8/4/16/821/3/6/6/7
Venous ammonia, mg/dL43 ± 23 48 ± 19
Serum sodium, mmol/L137 ± 16137 ± 4
MELD score9 ± 316 ± 7b
NCT-A, seconds27 ± 737 ± 1952 ± 25b
NCT-B, seconds69 ± 29105 ± 77168 ± 110b
DST, raw score73 ± 1167 ± 2994 ± 40b
SDT, seconds51 ± 1257 ± 1645 ± 17b
LTT, seconds80 ± 25100 ± 37133 ± 71b
LTTerrors, no.34 ± 3837 ± 3240 ± 33
BDT, raw score36 ± 1331 ± 1524 ± 14b
ICT lures, no.7 ± 510 ± 814 ± 8b
ICT targets, % right97 ± 696 ± 789 ± 18b
ICT random, no.6 ± 28 ± 414 ± 15a
WLs, no.9 ± 712 ± 1223 ± 18b
MHE using SPTs (%)24 (29)31 (72)b
MHE using WLs >22 (%)15 (18)18 (42)b
MHE using PHES >4 SDs (%)44 (54)34 (79)b
Stroop app results   
Total Offtime, seconds98 ± 13121 ± 27153 ± 40b
Median trials for five off correct runs (range)5 (5-9)5 (5-19)6 (5-17)a
Total Ontime, seconds119 ± 17148 ± 38198 ± 63b
Median trials for five on correct runs (range)5 (5-13)6 (5-16)6 (5-16)a
Total Ontime minus Offtime, seconds22 ± 1327 ± 2247 ± 37b
Total Ontime plus Offtime, seconds217 ± 27271 ± 60365 ± 98b
Control Group

Although 51 age-matched controls were recruited for this study specifically, the gold-standard definition of MHE using NCT-A/B, DST, and BDT were obtained by pooling these 51 controls with an 83 additional controls who had been previously tested with these four tests and ICT, but had not received the remaining PHES tests or Stroop. Mean age of the pooled 134 controls was 52 ± 4 years, and years of education was 13 ± 5 years (74 high school, 15 some college education, and 45 college degree or higher). The mean ± SD for the 134 controls was NCT-A 25 ± 5, NCT-B 90 ± 15, DST 83 ± 16, BDT 44 ± 17, and WLs 8 ± 7. Therefore, the mean for standard MHE diagnosis was two of the following: NCT-A >35 seconds, NCT-B >120 seconds, DST <51, or BDT <10, whereas that using WLs was 22 or higher.

Based on these cutoffs, we found that in the whole cohort (n = 125), prevalence of cognitive dysfunction by all three modalities was significantly higher in those with previous OHE (Table 1). Prevalence of MHE in patients with cirrhosis without previous OHE was 33% with SPTs, 22% using WLs, and 52% using PHES (Tables 2, 3, and 4). The kappa of MHE diagnosis between the three modalities was highest between SPTs and PHES (0.63), then between SPTs and ICT (0.41), and lowest between PHES and ICT (0.2). Despite these different prevalence rates and lack of agreement according the methods, app results were consistently worse in those with poor performance on any of the three modalities.

Table 2. Cognitive Test Performance Between Healthy Controls and Subgroups of Patients With Cirrhosis Without Previous OHE Based on SPTs
MHE Defined as Two of the Four of NCT-A, NCT-B, DST, or BDT AbnormalCirrhosis Without Previous OHE (n = 82)
(n = 55)(n = 27)
  1. MHE diagnosed on the basis of 134 controls' performance.

  2. a

    P < 0.0001.

  3. b

    P = 0.05-0.01.

NCT-A, seconds29 ± 857 ± 23a
NCT-B, seconds73 ± 20182 ± 105a
DST, raw score64 ± 1340 ± 10a
BDT, raw score37 ± 1517 ± 8a
SDT, seconds51 ± 1281 ± 29a
LTT, seconds91 ± 31120 ± 42a
LTTerrors, no.35 ± 3241 ± 33
ICT lures, no.8 ± 716 ± 10a
ICT targets, % right98 ± 389 ± 11a
ICT random7 ± 310 ± 6a
WLs, no.9 ± 722 ± 17a
Stroop app results  
Total Offtime, seconds112 ± 18145 ± 28a
Median trials for five correct off runs, range5 (5-12)6 (5-19)
Total Ontime, seconds138 ± 28174 ± 43a
Median trials for five correct on runs, range6 (5-13)7 (5-16)b
Total Ontime minus Offtime, seconds22 ± 1532 ± 22a
Total Ontime plus Offtime, seconds249 ± 43319 ± 68a
Table 3. Cognitive Test Performance Between Healthy Controls and Subgroups of Patients With Cirrhosis Without Previous OHE Based on PHES
Based on PHES Impaired >4 SDsCirrhosis Without Previous OHE (n = 82)
(n = 39)(n = 43)
  1. PHES abnormalities are based on data of 51 controls.

  2. a

    P < 0.0001.

  3. b

    P = 0.05-0.01.

NCT-A, seconds26 ± 747 ± 20a
NCT-B, seconds65 ± 16138 ± 91a
DST, raw score68 ± 1449 ± 13a
SDT, seconds51 ± 1281 ± 29a
LTT, seconds81 ± 23117 ± 39a
BDT, raw score40 ± 1525 ± 12a
ICT lures, no.7 ± 613 ± 9a
ICT targets, % right97 ± 593 ± 9a
ICT random, no.7 ± 38 ± 4
WLs, no.8 ± 716 ± 14a
Stroop app results  
Total Offtime, seconds108 ± 18133 ± 27a
Median trials for five correct off runs (range)5 (5-8)6 (5-19)b
Total Ontime, seconds130 ± 22164 ± 40a
Median trials for five correct on runs (range)6 (5-12)6 (5-16)
Total Ontime minus Offtime, seconds22 ± 1332 ± 25b
Total Ontime plus Offtime, seconds237 ± 37297 ± 61a
Table 4. Cognitive Test Performance Between Healthy Controls and Subgroups of Patients With Cirrhosis Without Previous OHE Based on WLs
Based on WL >22Cirrhosis Without Previous OHE (n = 82)
(n = 64)(n = 18)
  1. MHE diagnosed on the basis of 134 controls' performance.

  2. a

    P < 0.0001.

  3. b

    P = 0.05-0.01.

NCT-A, seconds33 ± 1257 ± 31a
NCT-B, seconds85 ± 34196 ± 137a
DST, raw score61 ± 1541 ± 15a
SDT, seconds61 ± 2495 ± 29a
LTT, seconds93 ± 34129 ± 38a
BDT, raw score34 ± 1519 ± 13a
ICT lures, no.7 ± 525 ± 5a
ICT targets, % right97 ± 589 ± 11b
ICT random, no.7 ± 312 ± 5a
WLs, no.7 ± 634 ± 12a
Stroop app results  
Total Offtime, seconds117 ± 22142 ± 36b
Median trials for five correct off runs (range)5 (5-12)5.5 (5-19)
Total Ontime, seconds140 ± 27187 ± 52a
Median trials for five correct on runs (range)6 (5-14)7 (5-16)
Total Ontime minus Offtime, seconds22 ± 1544 ± 23b
Total Ontime plus Offtime, seconds257 ± 45329 ± 86a

Of the subjects enrolled, 24% of the patients and 27% of the controls had operated, and were familiar with, an iPod or smartphone. Stroop Ontime was significantly higher than Offtime in both healthy controls (72 ± 10 versus 59 ± 8 seconds; P < 0.0001) and patients with cirrhosis (100 ± 32 versus 80 ± 21 seconds; P < 0.0001), whereas the number of runs needed to complete five runs were similar between off and on states for controls and patients with cirrhosis. Patients with previous OHE had a significantly higher Stroop Ontime and Offtime and number of trials needed to complete five correct runs in the off and on state (Table 1; Fig. 1A), and similar findings were noted in the MHE group, compared to the no-MHE and control groups (Table 2). In patients with alcoholic etiology of cirrhosis (including those with concomitant hepatitis C), there was a significantly higher Offtime (159 ± 43 versus 128 ± 32 seconds; P = 0.004), Ontime (196 ± 65 versus 161 ± 50 seconds; P = 0.028), and number of trials for five off runs (median, 8 versus 5; P = 0.002), but similar number of trials for five on runs and Ontime minus Offtime (37 ± 38 versus 33 ± 27 seconds; P = 0.66), compared with the remainder of patients with cirrhosis. Patients with alcoholic etiology had similar MELD score (13 ± 6 versus 12 ± 6; P = 0.33) to nonalcoholic patients. When patients with only hepatitis C (not with concomitant alcohol) were compared with the rest of the patients with cirrhosis, no significant difference in Offtime (133 ± 33 versus 133 ± 37 seconds; P = 0.99), Ontime (166 ± 55 versus 1.63 ± 52 seconds; P = 0.74), and Ontime minus Offtime (35 ± 30 versus 32 ± 27 seconds; P = 0.49) number of trials to achieve five off (median, 6 versus 5; P = 0.65) and five on runs (median, 6 versus 6; P = 0.85) was observed; their MELD scores were similar (12 ± 6 versus 12 ± 5; P = 0.67).


Figure 1. Median and interquartile range of time required to complete the app in the off and on states. (A) Distribution of values in the entire group in which control values are compared to patients with cirrhosis without previous OHE and those with previous OHE. (B) Comparison between controls and patients with cirrhosis with MHE and no MHE based on SPTs. No_OHE, no previous OHE; OHE, previous OHE; No_MHE, no MHE and MHE, minimal hepatic encephalopathy. All comparisons were highly statistically significant.

Download figure to PowerPoint


Figure 2. ROC curve of Ontime plus Offtime for all patients with cirrhosis (A) and in those without previous OHE (B) showing a high AUC for diagnosis when SPTs are used as the gold standard. The cutoff was 274.9 seconds for both populations.

Download figure to PowerPoint

There was no significant difference between those who had previous exposure to smartphones, compared with the rest on Offtime (141 ± 37 versus 131 ± 60 seconds; P = 0.75), Ontime (173 ± 52 versus 161 ± 61 seconds; P = 0.67), and Ontime minus Offtime (33 ± 31 versus 33 ± 28 seconds; P = 0.54) number of trials to achieve five off (median, 5 versus 5; P = 0.78) and five on runs (median, 6 versus 6; P = 0.76).

In the group of patients with cirrhosis as a whole, a significant correlation was noted between MELD score and Offtime (0.57; P < 0.001) and Ontime (0.61; P < 0.0001). There was a modest correlation between age and Offtime (0.38; P < 0.001) and Ontime (0.31; P = 0.001), but not between these variables and education, ammonia, and serum sodium. No consistent correlations between number of on/off trials was found with the above variables. In patients, Stroop Offtime was significantly correlated with Ontime (0.87; P < 0.0001), NCT-A (0.62; P < 0.0001), NCT-B (0.65; P < 0.0001), DST (−0.79; P < 0.0001), BDT (−0.54; P < 0.0001), SDT (0.57; P < 0.0001), LTT (0.48; P < 0.0001), ICT lures (0.4; P < 0.0001), and targets (−0.62; P < 0.0001). Similarly, Ontime was significantly correlated with NCT-A (0.57; P < 0.0001), NCT-B (0.67; P < 0.0001), DST (−0.71; P < 0.0001), BDT (−0.51; P < 0.0001), SDT (0.52; P < 0.0001), LTT (0.4; P < 0.0001), ICT lures (0.4; P < 0.0001), and targets (−0.65; P < 0.0001). An overall weaker correlation between Ontime minus Offtime and cognitive tests was noted: NCT-A (0.30; P = 0.001), NCT-B (0.43; P < 0.0001), DST (−0.36; P < 0.0001), BDT (−0.29; P = 0.001), SDT (0.26; P = 0.004), LTT (0.07; P = 0.43), ICT lures (0.21; P = 0.02), and targets (−0.50; P < 0.0001). When kappa was performed between Stroop app Ontime plus Offtime and the three modalities, the agreement between the SPTs and Stroop (0.7) and PHES and Stroop (0.7) were similar, whereas it was comparatively lower for ICT WLs (0.53).

Comparison Between Centers

The two independent centers (VCU and VAMC) were compared with respect to their results using a combination of the original and validation cohort (n = 168; 62 VCU and 106 VAMC). We found that the spread of cognitive and Stroop app results were aligned with the total findings; that is, the worse scores were at the VAMC because the patients there were more advanced in their liver disease and were older, compared to the VCU patients (Supporting Table 1). Stroop results were significantly correlated with MELD and age in both centers independently. MELD score was positively correlated with Offtime (VCU: r = 0.4, P = 0.005; VAMC: r = 0.5, P < 0.0001) and Ontime (VCU: r = 0.5, P < 0.0001; VAMC: r = 0.5, P < 0.0001). Age had similar correlations between centers with Offtime (VCU: r = 0.4, P = 0.006; VAMC: r = 0.3, P = 0.02), Ontime (VCU: r = 0.3, P = 0.03; VAMC: r = 0.25, P = 0.04). On regression analysis, site was not an independent predictor when added to age and MELD score for Ontime, Offtime, and Ontime plus Offtime.

ROC Analysis

Using the four standard tests, the app variables (Offtime, Ontime, number of trials for five on runs, number of trials for five off runs, product of Offtime and number of runs, product of Ontime and number of runs, and sum of Offtime and Ontime) are shown in Tables 5A and 6, The highest area under the curve (AUC) and sensitivity was equivalent between Offtime and the sum of Offtime and Ontime in the total cirrhosis group and in patients with cirrhosis without previous OHE (Fig. 2). The cutoff for Stroop Ontime plus Offtime was 274.9 seconds. A majority (56%; n = 70) of the 125 patients with cirrhosis took >274.9 seconds to complete the app, whereas when those only without previous OHE were considered, it was 43% (n = 35). The ROC for Stroop to diagnose MHE in patients with and without previous OHE was similarly highest with Ontime plus Offtime with AUC of 0.77 and 0.79, respectively, using WLs as the gold standard and 0.87 and 0.82, respectively, for PHES (Supporting Tables 2 and 3). We used logistic regression to assess the effect of age, MELD, and Ontime plus Offtime (education, ammonia, and sodium were not significant on univariate analysis and were not considered further) on the division of the group into MHE or not using the three modalities. Using SPTs, age and MELD were significant predictors (P = 0.032 and P = 0.0007, respectively) on their own, but when Ontime plus Offtime was added, this prediction became nonsignificant (P = 0.89 and P = 0.32), whereas Ontime plus Offtime had significance of P < 0.0001. This pattern was again repeated when PHES (age P = 0.02, MELD P = 0.01 alone and became P = 0.94 and P = 0.36 after Ontime plus Offtime was introduced, which had P < 0.0001) or WL (age P = 0.62, MELD P = 0.003 before and became P = 0.41 and P = 0.69 after Ontime plus Offtime was introduced, which had P < 0.0001) were used to diagnose MHE.

Table 5. ROC Results for Stroop for Diagnosis of Cognitive Dysfunction in All Patients With Cirrhosis Using SPTs as the Gold Standard
VariableAUCCut PointSensitivitySpecificity
Total time off (Offtime)0.91125.840.940.79
Total time on (Ontime)0.87148.70.880.74
Offtime × no. of runs off0.85644.70.940.66
Ontime × no. of runs on0.79982.80.710.76
Ontime minus Offtime0.6441.80.440.86
Sum of Offtime and Ontime0.89274.90.920.75
Table 6. ROC Results for Stroop for Diagnosis of Cognitive Dysfunction in Patients With Cirrhosis Without Previous OHE Using SPTs as the Gold Standard
VariableAUCCut PointSensitivitySpecificity
Total time off (Offtime)0.87125.840.920.79
Total time on (Ontime)0.80148.70.790.76
Offtime × no. of runs off0.80631.50.960.63
Ontime × no. of runs on0.721070.30.540.83
Ontime minus Offtime0.5829.80.540.71
Sum of Offtime and Ontime0.84274.90.880.78
Prospective Validation Cohort

We recruited 43 additional patients with cirrhosis (age, 56 ± 7 years; education, 12 ± 2 years; MELD score: 11 ± 5) for validation, of which 12 had controlled OHE. The mean venous ammonia was 42 ± 13 mg/dL, and sodium was 137 ± 4.6 meq/L. Using the cutoff of two abnormal tests, 22 patients (50%) had MHE diagnosed by standard criteria, whereas using the ROC value of Ontime plus Offtime >274.9 seconds, 19 (44%) were impaired. Sensitivity for this ROC value was 78% (17 of the 22 impaired by SPTs were impaired on the app), whereas specificity was 90% (19 of the 21 unimpaired on SPTs were unimpaired on the app; Figs. 2 and 3C; Table 7).


Figure 3. Individual plots of Ontime plus Offtime in the original and prospective validation cohort. Open circles are individual patients, whereas bars represent 95% confidence interval for the mean. (A) Cutoff generated by ROC analysis dividing the original 125 patients with cirrhosis. (B) Similar values in the 82 patients with cirrhosis of the original cohort without previous OHE. (C) Performance of this cutoff in prospective patients with cirrhosis validation cohort.

Download figure to PowerPoint

Table 7. Validation Cohort
Validation Cohort (n = 43)Ofttime + Ontime <274.9 secondsOfttime + Ontime >274.9 seconds
  1. Using the cutoff developed by the original cohort, the validation cohort was significantly impaired on the cognitive tests.

  2. a

    P < 0.0001.

  3. b

    P = 0.05-0.01.

Age (years)54.8 ± 6.757.4 ± 6.8
Education (years)12.9 ± 1.711.8 ± 2.9
MELD score10.6 ± 3.911.9 ± 5.9
OHE (%)14%50%a
Number connection-A (sec)35.7 ± 9.858.5 ± 31.0b
Number connection-B (sec)97.5 ± 67.1190.0 ± 112.0b
Digit symbol (raw score)59.1 ± 13.732.7 ± 13.6b
Serial dotting (sec)59.3 ± 17.1110.5 ± 69.1b
Line tracing (seconds)107.1 ± 27.7119.0 ± 52.1
Line tracing errors (number)23.6 ± 29.073.2 ± 45.4a
Block design (raw score)34.0 ± 13.732.7 ± 13.6b
ICT lures (number)8.6 ± 7.418.2 ± 10.2b
ICT targets (% right)96.4 ± 5.584.5 ± 15.4b
ICT random (number)7.2 ± 4.216.7 ± 10.6b
Weighted lures (number)10.1 ± 10.128.6 ± 18.6a
Comparison of the New Test Administrator to the Remainder

Of the 43 patients in the validation cohort, 25 were administered the app by the new investigator. When these 25 patients were compared with the remaining patients with cirrhosis in the study (n = 143), there were no significant differences in age (57 ± 7 versus 59 ± 5 years; P = 0.1), MELD score (12 ± 6 versus 12 ± 7; P = 0.9) or in app results Offtime (133 ± 35 versus 127 ± 34 seconds; P = 0.6), OnTime (166 ± 53 versus 154 ± 55 seconds; P = 0.5), and trials in the off (median, 6 versus 5.5; P = 0.1) and on state (median, 6 versus 5.5; P = 0.5).

Longitudinal Analysis

The Stroop tests were readministered 40 ± 14 days apart. The results of the Stroop Offtime and Ontime remained statistically similar between the two testing periods for controls and patients with cirrhosis with previous OHE. Although the Offtime remained similar, there was a significant reduction in Ontime in patients with cirrhosis without OHE after retesting (Table 8).

Table 8. Retesting of Stroop Between Baseline and Second Testing
 Baseline (Seconds)Second Test (Seconds)
  1. a

    P < 0.05 on paired t test.

Controls (n = 10)  
Total Offtime59 ± 759 ± 7
Total Ontime73 ± 1168 ± 12
Patients With Cirrhosis Without OHE (n = 21)
Total Offtime70 ± 1270 ± 9
Total Ontime89 ± 1384 ± 12a
Patients With Cirrhosis With Previous OHE (n = 9)
Total Offtime84 ± 2081 ± 20
Total Ontime108 ± 4691 ± 29


  1. Top of page
  2. Abstract
  3. Patients and Methods
  4. Results
  5. Discussion
  6. Note Added in Proofs:
  7. References
  8. Supporting Information

We found that the Stroop smartphone app is able to detect cognitive dysfunction and has good discriminative validity and test-retest reliability in cirrhosis. The EncephalApp_Stroop (available on iTunes for free download) was easy to administer, quick to teach to subjects, and simple to score and interpret. Use of this convenient app may improve the screening process and subsequent treatment rates in potential patients with MHE, especially in the United States where the testing is not routinely performed.

Abnormalities in attention and psychomotor speed are the hallmark of cognitive impairment in MHE and negatively affect on patient QoL.[2] The anterior attention system is hypothesized to modulate response inhibition, behavioral selection, and executive control, which are cognitive skills necessary to perform the Stroop task, as well as other tests in our cognitive battery (e.g., the ICT and DST).[20] Studies have shown that the Stroop paradigm is sensitive to cognitive change in MHE. Specifically, MHE patients are much more likely to show impairment on tasks assessing the integrity of the anterior, compared to the posterior, attention system.[13, 14] We indeed found that the Stroop task's psychomotor speed component (Offtime and Ontime) was highly correlated with a wide range of other cognitive domains, specifically response inhibition (ICT lures), visuomotor coordination (BDT), and set-shifting (NCT-B) and not just with tests of psychomotor speed and accuracy (serial dotting, line drawing, and digit symbol).

The gold standard for MHE/cognitive dysfunction diagnosis varies across populations; therefore, we used three methods (SPTs, ICT, and PHES) to test the ability of the app to differentiate between groups.[21] We found that the app performance was worse in impaired patients with or without OHE defined by any of the three techniques. This was further confirmed by applying the app cutoff in a prospective validation cohort with similar discriminative capability. This could potentially increase the applicability of this app, regardless of the standard used for MHE/cognitive dysfunction.

The Stroop task, regardless of whether it is the on or off condition, from a reaction time perspective, requires an individual to allocate attentional resources and focus on the visually presented stimulus, determine its color, and provide a motor response by pressing the appropriate choice on the screen.[13] In our study, the slowness noted in cirrhosis may be the result of problems in cognitive processing and/or problems in motor speed.[22] The cognitive processing demand is higher during the On state, and correspondingly, we found that Ontimes were always greater than Offtimes in all groups. Although the accuracy of responses was lower and Ontime minus Offtime was greater in more-affected patients, psychomotor speed variables (Offtime and Ontime themselves) were the best differentiators. This could be the result of the design of the task in which the app run stops as soon as a subject makes a mistake, providing fewer opportunities to get accuracy data. These mistakes would demonstrate to the subject their error and could potentially reinforce the test rules, preventing them from making future errors. Overall, there was a significant correlation between Ontime and Offtime, with all cognitive tests spanning several domains. The common denominator linking achievement in all the cognitive tests with Stroop Ontime and Offtime is psychomotor speed.[22] Psychomotor speed in cirrhosis can be affected by central and peripheral causes, including hepatic encephalopathy with the accompanying cognitive deficits, neuromuscular weakness, and incoordination.[5] Although not specifically studied here, Stroop abnormalities have been associated with poor connectivity between anterior cingulate cortex, dorsolateral prefrontal cortex, and posterior parietal lobes in MHE and previous OHE.[23] However, it is possible that worsening psychomotor speed in patients with advanced cirrhosis could have a peripheral, neuromuscular component, as is also reflected in the poor cognitive performance in tests other than Stroop. Therefore, the Ontime minus Offtime variable was created to control for psychomotor speed and provide a measure of cognitive processing and this was also significantly worse in affected patients. However, this measure was specific (AUC, 0.71-0.86), but not sensitive (AUC, 0.44-0.54), in differentiating groups on ROC analysis. This could indicate that the psychomotor impairment, rather than the “Stroop” component, of the app may be responsible for its discriminating ability. However, regardless of the specific underlying cognitive deficit, the app was able to replicate the end result of the other tests in our population.

As expected, we found that advanced cirrhosis and patients with previous OHE was associated with worse cognitive and app performance in both speed and accuracy.[6] This replicates earlier studies and proves that this app has discriminative ability for patients from the early to advanced stage of cirrhosis, provided they can understand the task.[6, 12, 24, 25] It is also interesting that patients with previous OHE remain cognitively challenged and did not improve on repeated testing.[25] This was in contrast to the practice effect learning demonstrated in patients without OHE in the Ontime (which is a measure of response inhibition and motor speed), but not in the Offtime, condition (which primarily assesses psychomotor ability). This phenomenon has been noted with other studies of patients with cirrhosis without previous OHE.[24, 25] Therefore, with this app, like other tests for MHE, the learning effect has to be considered for therapeutic trials.[7, 22, 26]

We also found that patients with alcoholic liver disease, despite similar MELD score, were more likely to perform worse on the app. The effect of alcohol on cognitive performance, even after cirrhosis has set in, has been described before and is additional evidence of the discriminative validity of the app; interestingly, this difference was confined to psychomotor speed because cognitive processing (Ontime minus Offtime) was similar between groups.[27] We did not find any differences in app performance in patients with hepatitis C, compared to others. This could be because of the predominant precirrhotic effect of hepatitis C on cognition, which is overwhelmed by the impairment as a result of cirrhosis itself.[28]

We found that all subjects, despite previous unfamiliarity with smartphones in the majority, were able to adapt to the iPod screen and follow instructions to complete the app in accord with research staff instructions. The administering staff was also able to explain the task and export the data easily after completion. We also found that the app results achieved by a new administrator of the app were in line with earlier overall results, and the app results between centers were individually and similarly related to age and cirrhosis severity and independent on regression, which could increase its generalizability. The app is limited in that it requires a compatible device to operate; however, considering the high financial burden of untreated MHE, it could be a worthwhile upfront cost.[29, 30] Our study is limited by the relatively high educational status of patients with cirrhosis and these findings may change when applied to persons of lower educational status. In addition, the demonstration of cognitive impairment with any of the tests, included the app should be interpreted in light of the subjects' clinical status because all these modalities are sensitive, but not specific, for MHE.[21]

We conclude that the Stroop app (EncephalApp_Stroop) is a valid, reliable method for screening for MHE. Further studies evaluating its validity in other populations are needed, especially since cutoffs may change with the new EncephalApp_Stroop app.

Note Added in Proofs:

  1. Top of page
  2. Abstract
  3. Patients and Methods
  4. Results
  5. Discussion
  6. Note Added in Proofs:
  7. References
  8. Supporting Information

On the basis of the concept of the old App tested in the above study, a user-friendly, newer app using the same order and presentations, was created specifically for MHE diagnosis and is the one available for download in iTunes as “EncephalApp_Stroop” with detailed administration and interpretation instructions on


  1. Top of page
  2. Abstract
  3. Patients and Methods
  4. Results
  5. Discussion
  6. Note Added in Proofs:
  7. References
  8. Supporting Information

Supporting Information

  1. Top of page
  2. Abstract
  3. Patients and Methods
  4. Results
  5. Discussion
  6. Note Added in Proofs:
  7. References
  8. Supporting Information

Additional Supporting Information may be found in the online version of this article.

hep26309-sup-0001-suppinfo.doc132KSupporting Information

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.