Neuropsychological assessment of hepatic encephalopathy: ISHEN practice guidelines
Dr Christopher Randolph, 1 East Erie, Suite 355, Chicago, IL 60611, USA
Tel: +1 708 216 3539
Fax: +1 708 216 4629
Low-grade or minimal hepatic encephalopathy (MHE) is characterised by relatively mild neurocognitive impairments, and occurs in a substantial percentage of patients with liver disease. The presence of MHE is associated with a significant compromise of quality of life, is predictive of the onset of overt hepatic encephalopathy and is associated with a poorer prognosis for outcome. Early identification and treatment of MHE can improve quality of life and may prevent the onset of overt encephalopathy, but to date, there has been little agreement regarding the optimum method for detecting MHE. The International Society on Hepatic Encephalopathy and Nitrogen Metabolism convened a group of experts for the purpose of reviewing available data and making recommendations for a standardised approach for neuropsychological assessment of patients with liver disease who are at risk of MHE. Specific recommendations are presented, along with a proposed methodology for further refining these assessment procedures through prospective research.
Hepatic encephalopathy (HE) is a condition that is relatively common in patients with liver disease (1, 2), results in significant compromise of quality of life (3, 4), requires a high burden of care (2), and is associated with poor prognostic outcomes, including an elevated risk of death (5). Overt HE involves clinically obvious compromise of consciousness/arousal, behavior, and motor functions. This is typically classified along a gradient of severity ranging from mild confusion to coma (6, 7).
Low-grade or minimal hepatic encephalopathy (MHE) is characterised predominantly by a subtle impairment of neurocognitive status, and is not readily detectible via standard mental status testing or neurological examination. This has also been termed ‘subclinical’ HE in the past, because of the lack of overt clinical symptomatology, but MHE is now the preferred nosology (8). There is currently a lack of consensus regarding how best to detect MHE. Early detection of MHE is perceived to be clinically important; however, as MHE can impair quality of life (3, 4, 9–11), it is predictive of the onset of overt HE (12–15), and may have some prognostic value in the outcome for patients with end-stage liver disease (16). Treatment of MHE can improve quality of life (17), and may theoretically avert more severe HE.
A variety of approaches have been used in efforts to detect MHE, including neuropsychological testing, EEG/evoked potentials (18–22) and critical flicker frequency (23–25), which is a psychophysiological measure. It is not yet clear as to which approach is superior in terms of sensitivity and/or clinical validity. Neuropsychological tests have more face validity in this context, as they directly measure cognitive functions (e.g. memory, attention and visuospatial skills) that are directly relevant to activities of daily living. Biological or psychophysiological markers, on the other hand, may be less affected by variables such as age, education and language, which are known to impact performance on neuropsychological tests, and may therefore complicate interpretation/classification.
Although the neuropathophysiology of MHE is not definitively established, elevated levels of ammonia have been implicated, and a variety of structural and functional imaging studies have suggested that the primary manifestations of MHE may be mediated by subcortical systems, including the basal ganglia (26–31). This hypothesis is consistent with neuropsychological investigations that have attempted to ‘profile’ the patterns of impairment in MHE. These typically report a pattern of impairment characterised by prominent deficits in the domains of attention, visuospatial abilities and fine motor skills (32, 33).
Although these domains (attention, visuospatial abilities and fine motor skill) are most commonly implicated in MHE, impairments of memory have also been reported (34–38). These appear to be primarily characterised by diminished immediate memory performance as a consequence of slowed or inefficient cognitive processing (33, 39, 40). This is distinct from a primary impairment of anterograde memory produced by damage to limbic memory systems, as in Alzheimer's disease. Language is typically reported to be intact, with the exception of slowed verbal fluency. The consistent finding of impaired verbal fluency performance is also likely secondary to overall slowing of cognitive processing speed, rather than to an intrinsic disruption of language.
The attentional impairments in MHE are observed on a variety of measures. These include measures of cognitive processing speed involving psychomotor responding, such as the Number Connection Test or Trailmaking Test and the Digit Symbol subtest from the Wechsler Intelligence Scales or the Symbol Digit Modalities Test (1, 17, 37, 41–46). Impairments on measures of cognitive processing speed and response inhibition that do not require a motor response have also been reported (e.g. with verbal fluency tasks and measures such as the Stroop test) (33, 47–49). Impairments have also been reported on various measures of attention span/working memory, in both verbal and visual domains (33, 49, 50). In addition, the use of continuous response-type measures of sustained attention and freedom from distractibility (e.g. inhibitory control test) have been reported to be sensitive to impairments in patients with liver disease (51).
Visuospatial impairments have been primarily reported on block design tasks (17, 37, 42, 43, 52) (which also include a motor/practic component), but also on more pure measures of visuospatial perception, such as line orientation or the Hooper test (48, 53). Fine motor skill impairments have been noted on measures such as the grooved pegboard task (33, 47, 54), and on line tracing tasks (55, 56) (the latter also involve visuospatial abilities).
The International Society for Hepatic Encephalopathy and Nitrogen Metabolism (ISHEN) formed a commission to review the available data on the role of neuropsychological testing in this context, and to make recommendations regarding the routine neuropsychological assessment of patients with liver disease. The commission chairs (C. R. and K. W.) recruited a panel of experts in this field, and drafted a survey designed to reach some consensus on the nature of a suitable battery for this purpose. The goal of this process was to identify the characteristics of a ‘gold standard’ battery via which researchers could pool results, compare findings across studies, make clinical decisions regarding their own patients and ultimately compare with other methodologies for diagnosis and treatment planning in MHE. This process is similar to consensus batteries that have been developed for the purpose of evaluating patients with other diseases where neurocognitive outcome has emerged as a potential target of treatment, such as neuropsychiatric systemic lupus erythematosus (57) and schizophrenia (58).
Commission members were informed of the overall purpose of the survey, and each commission member then independently responded to a series of questions about the features of a putative ‘gold standard’ battery for the assessment of MHE. Responses were recorded on a seven-point Likert-type scale, with a score range from 1 reflecting ‘not important’ to 7 reflecting ‘very important’. The final question asked was whether or not each member would recommend an existing battery for this purpose.
There were several questions regarding the general nature of a ‘gold standard’ battery. Four of these resulted in near-universal agreement, with the modal response for each being a 7, and the mean response also being 7 (means were rounded to whole numbers). The general characteristics of such a battery that were seen as strongly desirable were as follows:
- •A specific battery should be identified for this purpose, and that it should serve as a benchmark against which to compare newer, or experimental approaches.
- •The battery should measure multiple cognitive domains.
- •The battery should be easily translatable and applicable cross-culturally.
- •The battery should have age-based norms.
General features of the battery that received moderately strong support included applicability for patients who are illiterate (modal response 7, mean=6), and that it was not important that the battery be computerised (modal response=1, mean=3). As far as the length of the battery was concerned, the modal response was that it needed to be <60 min. The mean suggested that the maximum time for completion was 40 min. Generally, commission members felt that the shorter the battery, the better, but they also recognised that obtaining a reliable measurement of neurocognitive status was likely to require a minimum of 20–40 min of testing. Most felt that a computerised battery would be more cumbersome (i.e. less portable), require greater expense and might not be as useful as pencil-and-paper testing in this context. Several members spontaneously pointed out the need for alternate forms of the battery, to eliminate or reduce practice effects. The need for appropriate training in order to correctly administer and score the battery was also pointed out, as was the desirability of a global score, to improve reliability, increase power and ease interpretation.
Commission members were also queried regarding the desirability of specific test paradigms as components of a gold standard battery. The paradigms are listed in Table 1, in the order of perceived desirability.
Table 1. ISHEN committee ratings
|Verbal memory (anterograde)||7||5|
|Visual memory (anterograde)||6||5|
In their comments, several members pointed out that language per se was felt to be unaffected, but that verbal fluency measures were useful as a processing speed or ‘executive’ component. Most did not feel that measures of reaction time were feasible without the use of a computer, which was discouraged. It was also noted that, while MHE has not been reported to produce a true impairment of anterograde memory (i.e. rapid forgetting), slowed processing impacts upon memory performance and it was felt that this was a clinically useful measure that might have ecological significance (i.e. in terms of affecting daily functioning).
There was some discussion regarding the inclusion of executive or self-regulatory measures, but it was also noted that these are typically not amenable to the creation of equivalent multiple forms, that there is limited agreement on what types of executive tests might be useful in this context (apart from measures of verbal fluency). The use of motor measures, despite the demonstrated sensitivity of some of these to MHE, was discouraged by several members who felt that performance on these measures could be confounded by other variables not directly related to MHE. It was noted that motoric dysfunction could potentially impact upon any neuropsychological measure that involved a motor response (e.g. drawing, coding and block design), but that this was to some extent an unavoidable confound.
Existing candidate batteries
Commission members were also asked to nominate any existing batteries that could potentially serve as a gold standard for assessment of MHE. Most of the members did not choose a specific battery for this purpose. The only two specific batteries that were recommended were the PSE-Syndrom-Test (59) and the Repeatable Battery for the Assessment of Neuropsychological Status™ (RBANS) (60).
The PSE-Syndrom-Test is a battery consisting of five paper-and-pencil tasks, including Number Connection Tests A and B, a coding test (Digit Symbol Test) similar to the Digit Symbol subtest of the Wechsler scales, the Serial Dotting Test and the Line Drawing Test. The Serial Dotting Test consists of 10 rows of 10 circles, and the subject is timed on how quickly he or she can place a dot in the center of each circle. The Line Drawing Test requires the subject to draw a continuous line between two parallel (winding) lines, and scores include completion time and errors. There are four alternate forms of the PSE-Syndrom-Test (only the Serial Dotting Test is unchanged across forms), and the battery requires 15–20 min to complete.
Normative data were initially collected in Germany (32, 61). The analysis of the single test results showed that they were normally distributed only after logarithmic transformation. After such a transformation, all data showed a linear dependence on age with normally distributed residuals of homogenous variance as determined by linear regression analysis and Kolmogorov–Smirnov test. The effect of education and occupation were negligible compared with the age effect. Thus, the regression lines together with parallel lines of ±1, ±2 and ±3 standard deviations were calculated, yielding the known normal quantiles, including the 95% range around the midpoint for each single test. The regression lines and the standard deviation lines were finally transformed into the original scales (32). For the purpose of scaling an individual's test performance, scores on each subtest are assigned a value ranging from +1 to −3, based on age-related norms (+1 for scores better than 1 SD above the normal mean to −3 for scores more than 3 SDs below the normal mean. Because the Line Drawing Test generates two scores, there are a total of six measures that contribute to the total score, which can therefore range from +6 to −18.
The PSE-Syndrom-Test was specifically developed to measure the effects of MHE, and has been shown to be sensitive to impairment in patients with cirrhosis (24, 31, 32, 40, 62). It has also been shown to correlate with functional neuroimaging results in these patients (63–65). A consensus paper by the World Congress of Gastroenterology in 1998 recommended the PSE-Syndrom-Test for the evaluation of patients at risk of MHE (7). Additional normative data from Italy have been published (66), and normative data have also been collected in Spain and Great Britain. While the British data are not yet available, the Spanish data can be accessed by the interested clinician via the internet (http://www.redeh.org). Unfortunately, the calculation of the normative data has been performed differently in different countries; the Italian and the Spanish groups did not include line-tracing errors in their scoring (changing the score range from +6 to −18 to +5 to −15), and there would appear to be a need to perform a direct comparison of the raw data from the four European countries, standardise the scoring and determine the extent to which local norms are required for these individual countries. On principle, the battery is presumably relatively culture-free, given the components, and the instructions are easily translatable. This battery is only commercially available in the German form presently.
Repeatable Battery for the Assessment of Neuropsychological Status™
The RBANS was designed with two basic goals: To serve as a ‘core’ battery for the assessment of dementia, and to efficiently screen/track neurocognitive impairment in other disorders. It contains measures of verbal and visual anterograde memory, working memory, cognitive processing speed, language (including semantic fluency) and visuospatial function (line orientation and figure copy). There are four alternate forms (A, B, C and D), and there are no practice effects with repeated testing using alternate forms. The RBANS underwent a US population-based standardisation for ages 20–89, and generates age-scaled index scores with a normal mean of 100 and SD=15 for five domains (immediate memory, visuospatial/constructional, language, attention and delayed memory), as well as a total scale score. It is a portable pencil-and-paper test that requires a folding stimulus booklet and paper record form to administer. Administration time is approximately 20–25 min.
The RBANS has undergone extensive clinical and psychometric validation for a variety of disorders, including various forms of dementia, traumatic brain injury, schizophrenia, stroke, multiple sclerosis and bipolar disorder, including studies completed in North America, Europe and Asia. It is currently being used in multiple clinical trials for Alzheimer's disease and schizophrenia, and as of August 2008, there were approximately 20 different official translations available for clinical and research purposes.
The use of the RBANS in evaluating patients with liver disease has been largely restricted to the USA to date. In a sample of 300 consecutive outpatients presenting for liver transplantation, RBANS scores were strongly correlated with liver disease as measured by the model for end-stage liver disease staging (38). Scores on the RBANS also predicted disability independently of liver disease severity in this study. A similar finding was reported in a separate study of 148 liver transplant candidates (36), and in a recent report on 66 patients with end-stage liver disease (67).
Candidate battery characteristics compared with commission recommendations
Table 2 lists the parameters of the ideal ‘gold standard’ battery for the detection of MHE as recommended by the commission, together with an indication of the degree to which the PSE-Syndrom-Test and the RBANS meet these requirements.
Table 2. Existing tests compared with commission recommendations
|Measurement of multiple cognitive domains||Cognitive processing speed (psychomotor) and visuospatial demands (psychomotor)||Verbal memory, visual memory, working memory, visuospatial perception, visuospatial construction, language (including fluency), and cognitive processing speed (psychomotor).|
|Ease of translation/cross-cultural application||Demonstrated||Demonstrated|
|Use with illiterate patients||Yes||Yes|
|Availability of age-based norms||German||US population-based|
|Italian and British (yet unpublished)|
|Alternate forms||Four forms||Four forms|
|Global score generated||Yes – sum of six categorical scores based on normal SDs – range +6 to −18||Yes, index score (mean of 100, SD=15) – normally distributed|
|Time for administration (min)||15–20||25|
|Retest reliability, minimal practice effects||Retest reliability for total score ∼0.81 in normals, no practice effects||Retest reliability for total score ∼0.86 in normals, no practice effects|
|Number of cognitive domains measured with a modal ranking of 5 or higher in importance by the commission||2/7||6/7|
Discussion and recommendations
The commission members were remarkably consistent in their recommendations for a gold standard neuropsychological battery for the detection of MHE. A clear need was seen for a consistent approach across centers for the purposes of both diagnosis and measuring the effects of treatment. The commission members were in agreement that a suitable battery should be a portable, pencil-and-paper battery taking <40 min to complete, that it should be easily translated, should have alternate forms to minimise or eliminate practice effects, have age-based norms and that it should measure multiple cognitive domains but generate a single global score with adequate retest reliability for detecting change. The suggested component neurocognitive domains to be tested by the battery were also fairly consistently agreed upon.
The only existing batteries that were recommended for this purpose (each by a minority of the commission members) were the PSE-Syndrom-Test and the RBANS. Each of these has at least a few peer-reviewed publications, suggesting sensitivity to the effects of MHE. In addition, the PSE-Syndrom-Test has been shown to correlate with functional brain imaging results in cirrhotic patients, and the RBANS has been demonstrated to be predictive of disability in these patients. These initial findings are encouraging, but there has been no direct comparison of the two tests to date.
The RBANS has the advantage of having a large body of clinical validity data in other disorders, demonstrating both the sensitivity of the test to various forms of cerebral dysfunction and the predictive value of the total scale score with respect to various measures of functional independence in disorders such as stroke (68), schizophrenia (61, 69) and Alzheimer's disease (70). It is somewhat closer in nature to the battery recommended by the commission, as it includes measures of verbal and visual memory, visuospatial perception and construction, and verbal fluency that are not contained in the PSE-Syndrome-Test. The RBANS also underwent a rigorous population-based standardisation and norming in the USA, and the broad clinical utility of this test (in conjunction with its use in ongoing multinational clinical trials in other diseases) is likely to stimulate local norming projects, many of which are already underway. Therefore, the application of the RBANS in the context of liver disease may benefit from the more widespread use of this scale.
On the other hand, the PSE-Syndrome-Test was developed specifically for the purpose of detecting MHE, utilising measures extracted from a much larger battery for their sensitivity to this syndrome. It requires somewhat less time to administer than the RBANS, and all of the tasks are essentially non-verbal, requiring translation of only the instructions for administration. It has been utilised in the measurement of MHE over a longer period of time than the RBANS, and cross-cultural sensitivity in that context has been demonstrated. Substantial normative data have been collected in Germany, Spain, Italy and Great Britain, and the basic psychometrics of the test appear to be satisfactory.
The commission recommends that, until one of these batteries is demonstrated to have superior clinical validity in the detection and monitoring of MHE, researchers choose one or the other for the routine assessment of patients at risk for MHE. This choice should be driven on the basis of available local test translations and normative reference data. If a local translation of the RBANS is not available, the use of the PSE-Syndrom-Test is recommended, as only the instructions for this test require translation, and this is a much simpler process than translation of the RBANS. In the absence of local norms for the test being used, the commission recommends relying upon existing US or German norms while pursuing the collection of local norms. Local norms can be calibrated to existing test norms on the basis of relatively small sample sizes, if the samples are appropriately stratified. Diagnostic cut-offs can then be set with known false-positive rates for the population of interest.
The commission also recommends the following general approach for the future: comparison of the RBANS and PSE-Syndrom-Test, as well as the comparison of any candidate replacement batteries. We recognise that there may be a more efficient or sensitive approach to the detection of MHE in patients with liver disease than either the RBANS or the PSE-Syndrom-Test. Any candidate battery or other (e.g. biological) measure should, however, have an established scoring methodology that provides a single global score for classification purposes, in order to directly compare sensitivity/specificity and reliability to the RBANS or the PSE-Syndrom-Test. Candidate measures should be compared with index measures using appropriate sample sizes as follows:
- 1A common normative reference group, matching a local cirrhotic sample on demographic variables (age, gender and education), should be administered both batteries (in counterbalanced sequence) on two occasions at least 1 week apart (the retest intervals should be the same for both batteries). The first administration of each test will be used for analyses of sensitivity/clinical validity. The second will be used to establish test–retest reliability. The latter measure is important to establish the capacity of the test to identify group and individual changes in response to treatment. Normal subject retest reliability is preferred to facilitate comparison with various existing standardised tests, as patient groups may vary in terms of the stability of their neurocognitive status as a function of disease severity.
- 2Both tests should also be given (in counterbalanced sequence) on a single occasion to a sample of cirrhotic patients, along with the collection of as much additional clinical data (disease variables, measurements of quality of life, independent activities of daily living, performance on driving simulators, measurements of outcome, etc.) as possible.
- 3Using the common normative reference sample, the global scores from each scale can be used to determine optimal cut-off points via ROC analyses, and sensitivity/specificity of the scales can be directly compared. The relative predictive value of each scale with respect to other clinical data of interest can also be directly compared.
- 4The statistical properties of each scale can be weighed against the practical parameters of the scale (time to administer, cost, effort, available norms, etc.) to determine as to which scale will be of greater utility in this context.
The commission believes that sufficient data exist to recommend the use of the RBANS or the PSE-Syndrom-Test as standard assessments for patients at risk of MHE at this point. This will allow a more systematic approach to the clinical management of these patients, and facilitate communication and comparison of results across studies. The commission further recommends ongoing research to establish local norms for these tests, preferably using a stratified sampling approach to calibrate local norms to published norms in a consistent fashion. Finally, the commission recommends a systematic approach for future studies to compare approaches to diagnosing and monitoring MHE.
- •Neuropsychological testing is an established methodology for quantifying cognitive impairment due to various forms of encephalopathy, including low-grade or minimal hepatic encephalopathy. A
- •Neuropsychological test batteries that measure multiple domains of cognitive function are generally more reliable than single tests, and tend to be more strongly correlated with functional status. A
- •Both the Repeatable Battery for the Assessment of Neuropsychological Status (RBANS) and PSE-Syndrom-Test have met psychometric and clinical validity criteria for use in assessment of patients at risk for minimal hepatic encephalopathy. B
- •Use of either the RBANS or the PSE-Syndrom-Test is recommended for diagnosing and monitoring minimal hepatic encephalopathy. The choice of which battery to use should be based upon the availability of local translations and normative data (Table 3). 2
Table 3. Grading of evidence and recommendations*
|Grading of Evidence|
| High-quality evidence||Further research is very unlikely to change our confidence in the estimate of effect||A|
| Moderate-quality evidence||Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate||B|
| Low- or very low-quality evidence||Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate. Any estimate of effect is uncertain||C|
|Grading of Recommendation|
| Strong recommendation warranted||Factors influencing the strength of the recommendation included the quality of the evidence, presumed patient-important outcomes, and cost||1|
| Weaker recommendation||Variability in preferences and values, or more uncertainty: more likely a weak recommendation is warranted.|
Recommendation is made with less certainty; higher cost or resource consumption
These guidelines have been prepared by the Commission on Neuropsychological Assessment of Hepatic Encephalopathy appointed by the ISHEN. The content was discussed and approved in the 13th ISHEN Symptosium, Padova, Italy, 28 April to 1 May 2008. The members of the commission gratefully acknowledge the direction and assistance of Professor Piero Amodio in the completion of this assignment.
Disclosures: Christopher Randolph is the author of the RBANS, and receives royalties on the sales of that instrument. None of the other commission members report any potential conflicts of interest.