Depression rating scales in Parkinson's disease: Critique and recommendations

Authors


Abstract

Depression is a common comorbid condition in Parkinson's disease (PD) and a major contributor to poor quality of life and disability. However, depression can be difficult to assess in patients with PD due to overlapping symptoms and difficulties in the assessment of depression in cognitively impaired patients. As several rating scales have been used to assess depression in PD (dPD), the Movement Disorder Society commissioned a task force to assess their clinimetric properties and make clinical recommendations regarding their use. A systematic literature review was conducted to explore the use of depression scales in PD and determine which scales should be selected for this review. The scales reviewed were the Beck Depression Inventory (BDI), Hamilton Depression Scale (Ham-D), Hospital Anxiety and Depression Scale (HADS), Zung Self-Rating Depression Scale (SDS), Geriatric Depression Scale (GDS), Montgomery-Asberg Depression Rating Scale (MADRS), Unified Parkinson's Disease Rating Scale (UPDRS) Part I, Cornell Scale for the Assessment of Depression in Dementia (CSDD), and the Center for Epidemiologic Studies Depression Scale (CES-D). Seven clinical researchers with clinical and research experience in the assessment of dPD were assigned to review the scales using a structured format. The most appropriate scale is dependent on the clinical or research goal. However, observer-rated scales are preferred if the study or clinical situation permits. For screening purposes, the HAM-D, BDI, HADS, MADRS, and GDS are valid in dPD. The CES-D and CSDD are alternative instruments that need validation in dPD. For measurement of severity of depressive symptoms, the Ham-D, MADRS, BDI, and SDS scales are recommended. Further studies are needed to validate the CSDD, which could be particularly useful for the assessment of severity of dPD in patients with comorbid dementia. To account for overlapping motor and nonmotor symptoms of depression, adjusted instrument cutoff scores may be needed for dPD, and scales to assess severity of motor symptoms (e.g., UPDRS) should also be included to help adjust for confounding factors. The HADS and the GDS include limited motor symptom assessment and may, therefore, be most useful in rating depression severity across a range of PD severity; however, these scales appear insensitive in severe depression. The complex and time-consuming task of developing a new scale to measure depression specifically for patients with PD is currently not warranted. © 2007 Movement Disorder Society

Depressive symptoms commonly occur in Parkinson's disease (PD), affecting approximately 40% of patients in cross-sectional studies.1–3 Depressive symptoms have also been recognized to be a major determinant of health-related quality of life in PD, and can affect functional ability, cognitive function, and caregiver quality of life.4–6 It is, therefore, important to recognize and assess depressive symptoms in patients with PD adequately. The gold standard for the diagnosis of depressive disorder at present are the criteria of the Diagnostic and Statistical Manual, Fourth Edition (DSM-IV), of the American Psychiatric Association. However, in clinical practice and research studies, particularly in epidemiological studies, surveys, and treatment trials measuring severity of depressive symptoms, use of DSM-IV criteria often is not feasible or useful. Several rating scales for screening and/or assessment of severity of depression are available and have been used widely to assess depression in patients with and without PD. However, there are several methodological difficulties in assessing depressive symptoms in PD, and it is unclear which scales are suitable for the assessment of depression in this patient group. The Movement Disorder Society (MDS) Task Force on Rating Scales for Parkinson's Disease therefore commissioned a critique of existing scales as applied to Parkinson's disease and to place them in a clinical and clinimetric context, similar to MDS reviews of the Unified Parkinson's Disease Rating Scale (UPDRS)7 and Hoehn & Yahr staging system.8 The purpose of this effort was the evaluation of all commonly used or appropriate rating scales for depression in PD (dPD) and to make recommendations on the utilization of specific scales and their need for modifications or replacement in this population.

The DSM-IV criteria for depressive disorder are the current gold standard against which such scales are compared. However, the use of these criteria (or other criteria such as the those of the International Classification of Diseases [ICD-10]) in PD has shortfalls, and recommendations have been made to revise the DSM-IV criteria for depressive disorder when applied to PD to overcome these methodological difficulties.9 Although a discussion of the validity of these criteria for depression in patients with PD is not the subject of this manuscript, these problems and their impact on the use of scales to assess of presence and severity of depression in patients with PD are recognized and discussed.

Abbreviations

PD, Parkinson's disease; dPD, depression in Parkinson's disease; MDS, Movement Disorder Society; BDI, Beck Depression Inventory; Ham-D, Hamilton Depression Scale; HADS, Hospital Anxiety and Depression Scale; SDS, Zung Self-Rating Depression Scale; GDS, Geriatric Depression Scale; MADRS, Montgomery-Asberg Depression Rating Scale; UPDRS, Unified Parkinson's Disease Rating Scale; CSDD, Cornell Scale for the Assessment of Depression in Dementia; CES-D, Center for Epidemiologic Studies Depression Scale; DSM-IV, Diagnostic and Statistical Manual-4th Edition; ICD 10, International Classification of Diseases; POMS, Profile of Mood States; NPI, NeuropsychiatricInventory; NPV, negative predictive value; PPV, positive predictive values.

PATIENTS AND METHODS

Administrative Organization and Critique Process

The MDS Task Force on Rating Scales for Parkinson's Disease Steering Committee under its director (C.G.) invited the chairperson (A.S.) to form a committee to critique existing depression rating scales in Parkinson's disease and to place them in a clinical and clinimetric context. A committee of seven members from Europe, North America, and Australia was formed, including neurologists, psychiatrists, and psychologists who had worked extensively in the area of dPD. Initial discussions among these task force members identified the unresolved problems in the overall assessment of dPD. Then eligible scales were identified (see below criteria) and discussed for inclusion in the review. A survey of MDS members explored the members' clinical experience with depression scales in dPD. However, as the response rate was only 4% (79 of 2000 questionnaires, possibly indicating that few neurologists routinely use depression scales), the sample was considered too small to be representative. Nevertheless, no additional scales were reported to have been used by more than one respondent. A proforma was drawn up to allow structured assessment of the scales regarding their descriptive properties, availability, content, use, acceptability, clinimetric properties (in patients without and with PD) and overall impression (see Web site at the following address:). All statements were referenced and quantitative results as well as qualitative results were tabulated and summarized. Each scale was reviewed by two task force members, one acting as the lead. The completed reviews were assessed by all other members of the task force and modified according to their suggestions. The results of the reviews, identified problems, and conclusions were summarized by the chairperson and the draft report modified in several iterations following discussions with all task force members. The report was reviewed and altered according to suggestions by the members of the Steering Committee and submitted and approved by the Scientific Issues Committee of the MDS before submission to Movement Disorders.

Selection of Scales

We included all scales that have either been used previously to assess depression in PD in more than one study, or, based on literature review and expert evaluation, have potential utility in PD based on their content, their widespread use, and clinimetric evidence from studies in depressed patients without PD. We limited our assessment to depression-specific scales, as assessment of all multidimensional scales that include assessment of depression was beyond the scope of this project. Scales specifically assessing features that can occur as aspects of depression such as anxiety, anhedonia, and apathy scales were also excluded, but the ability of depression scales to capture these aspects was assessed. We excluded scales that assess short-lived mood states only.

Literature Search Strategy

Medline on PubMed was searched to the relevant papers for all listed publications up to June 2005. The terms used were “Parkinson's disease” or “parkinsonism” or “Parkinson disease,” “depression,” “psychiatric status rating scale,” “scale,” or “measure.” For each scale, a search was conducted for the terms “Parkinson's disease” (or “parkinsonism” or “Parkinson disease”) and the name of the scale. Only published or in press peer-reviewed papers or abstracts known to the task force members were included in this review.

RESULTS

Identified Problems When Using Rating Scales for dPD

Overlap Between Symptoms of Depression and PD

DSM-IV defines an episode of major depressive disorder as the presence of depressed mood or loss of interest or pleasure for at least 2 weeks together with at least five other symptoms if they represent significant change from previous functioning. These other features include change in appetite or weight, insomnia or hypersomnia, psychomotor agitation or retardation (i.e., generalized slowing of thought, speech, and body movements), fatigue or loss of energy, feelings of worthlessness or excessive or inappropriate guilt, diminished ability to think or concentrate, indecisiveness, recurrent thoughts of death, or recurrent suicidal ideation. Importantly, symptoms that are clearly due to a general medical condition are excluded. Rating some of the core symptoms of depression is, therefore, difficult due to the considerable overlap of symptoms of depression and core symptoms of PD (e.g., cognitive impairment, apathy, psychomotor changes [both retardation and agitation], attentional or concentration changes, loss of appetite, weight change, sleep disturbances, and fatigue; see Table 1). It is unclear whether an inclusive approach (i.e., rating all symptoms that are present on a depression scale without judgment of their specific relation to depression) should be adopted for rating scales as has been suggested when applying diagnostic criteria for depression.9 The decision about how to rate these symptoms is not trivial and markedly affects study results.10 The use of scales (and diagnostic criteria), which automatically include all somatic symptoms may lead to inflated depression scores.11, 12 In such cases, patients risk scoring as depressed without the core symptoms of depression. On the other hand, scales that exclude these overlapping symptoms, which may cluster with depression rather than with motor features,12, 13 may have poor criterion validity, particularly at the severe end of the depression spectrum. Current evidence suggests that some somatic symptoms in PD are important and sensitive aspects of dPD and should not be neglected in the assessment of depression in PD.14

Table 1. Symptom overlap with depression in Parkinson's disease
ParkinsonismDiminished facial expression
 Psychomotor changes (slowness, motor restlessness)
 Fatigue or loss of energy
 Insomnia
 Loss of appetite
 Weight loss
Sequelae of ParkinsonismSocial withdrawal
 Fearfulness
 Foreshortened future
 Hopelessness
 Helplessness
Cognitive impairmentSlowness of thoughts
 Poor memory
 Diminished attention and concentration
 Impaired executive function
Other psychiatric symptomsAnxiety
 Apathy (loss of interest, initiative and purposeful activity)

Overlap Between Symptoms of Depression and Apathy

Apathy is one of the core symptoms of depression. However, it may also occur independently of depression as part of a syndrome of apathy. A significant proportion of patients who are not depressed have apathy with loss of interest, motivation, and effortful behavior but without other cognitive, affective, or somatic symptoms of depression.15–18

Assessment of Cognitively Impaired Patients

Cognitive impairment is common in PD, and approximately 30 to 40% of patients with PD meet criteria for dementia.19 The frequent occurrence of dementia adds an additional complexity to accurate diagnosis and monitoring of depression in cognitively impaired PD patients.

Differences Between Depression Without PD and dPD

Depression in PD differs in some aspects from major depression, for example, with a relative rarity of feelings of guilt, self-blame, or worthlessness in PD.20–22 Furthermore, the majority of patients with PD have depressive symptoms not fulfilling the criteria for major depressive episode.23

Use for Different Study Purposes

Depression scales serve different purposes. One purpose is to assess the severity of depression and monitor the response to antidepressant treatment. For this clinical or research task, the validity, reliability, and responsiveness of a scale to mood changes are relevant. Another reason to use rating scales is to screen for the presence of depressive symptoms in patients with PD. For screening purposes, ease of use is important in large epidemiological studies and in clinical settings that use untrained raters or self-rating scales. Scales with good sensitivity and specificity at appropriate cutoffs may serve as screening tools. Rating scales alone, however, should not be used for the diagnosis of depression, which is reserved for the appropriate “gold standard” diagnostic instrument (i.e., structured DSM interviews).

Timing of Assessment

Rating scales for depression do not usually specify the timing of assessment, which is of importance in PD patients with motor and nonmotor fluctuations.

Use of Collateral Information

Most rating scales are patient reported or clinician rated, yet the input of collateral information when assessing PD patients may be important. However, whether or how to incorporate such collateral information from informed others needs to be operationalized.

Identified Scales and Their Utilization in Clinical Practice and Research

Nine scales were identified in multiple publications to assess dPD, including the Hamilton Depression Scale (Ham-D),24 the Beck Depression Inventory (BDI),25 the Geriatric Depression Scale (GDS),26, 27 the Zung Self-Rating Depression Scale (SDS),28 the Hospital Anxiety and Depression Scale (HADS),29 and the Montgomery-Asberg Depression Rating Scale (MADRS).30 In addition, the Cornell Scale for the Assessment of Depression in Dementia (CSDD)31 was included, as it is the only (widely used) scale designed for use in patients with cognitive impairment, a common condition in dPD. The Centre for Epidemiologic Studies Depression Scale (CES-D)32 was also reviewed as it is used worldwide in epidemiological studies and might, therefore, be considered for epidemiological studies of depression including patients with PD. These two scales were selected for review following literature review and based on expert opinion due to their wide use and potential utility based on their content and clinimetric evidence from studies in depressed patients without PD. The Unified Parkinson's Disease Rating Scale (UPDRS)33 Part I was also included, as it is the most widely used rating scale to assess PD symptoms and includes questions on psychiatric symptoms. Scales that were considered but not included were scales that assess short-lived mood states only, such as the Profile of Mood States (POMS),34 and multidimensional scales that include a dimension of depression within a wider assessment of psychiatric symptoms, such as the Neuropsychiatric Inventory (NPI),35 as we limited our assessment to depression-specific scales. We also did not include scales that were only used in individual studies (e.g., the Andersen scale36).

Critique of Depression Scales

A summary review of each scale is given here. The complete reviews are available online at the Web site at the following address: http://www.interscience.wiley.com/jpages/0885-3185/suppmat. Whilst we recognize the limitations of the diagnostic criteria in DSM-IV and the recent recommendations to improve these criteria,9 these criteria and structured/semistructured interviews for DSM-IV or ICD-10 diagnoses were used as the available “gold standard” and as a measure of criterion validity in the available literature.

All scales were found to be valid in both genders, although the factor structure may vary and scores may differ.37–39 No data were available to give recommendations on who should administer the observer-rated scales. However, information on the need for training for each scale is provided. A summary of the properties of the depression scales reviewed is provided in Table 2.

Table 2. Properties of depression scales in Parkinson's disease
 SensitivitySpecificityCutoff score for screening in patients without PDCutoff score for screening in patients with PDSensitivity to changeSomatic itemsPsychological items
  1. +/− sensitivity/specificity limited; + some sensitivity/specificity; ++ good sensitivity/specificity; na = not sufficiently assessed in patients with Parkinson's disease; *<25% of items; **25–50% of items; ***>50% of items

HAM-D++++13/149/10+*****
MADRS++++6/714/15+***
BDI++9/1013/14+*****
HADS++/−7/810/11na****
SDSnana50/51na+******
GDS 30++++9/109/10na****
GDS 15++++2/34/5na****
CSDDnana6/7nana****
CES-Dnana15/16nana****
UPDRS part Inanananana**

Hamilton Depression Rating Scale (Ham-D)

Depression in non-PD Patients

The interviewer-rated Ham-D is the most widely used and accepted measure for evaluating the severity of depression.40 Appropriate training in the administration and scoring of the scale is important to obtain reliable scores.41 It has been shown to have good sensitivity to change in depressed patients.42–45 Although it covers DSM-IV criteria incompletely, it has acceptable discriminant validity, high sensitivity, and high specificity. Furthermore, it has high negative predictive value (NPV), and acceptable positive predictive values (PPV) for a DSM-IV diagnosis of depressive disorder.46 Its sensitivity and specificity have been shown to be superior to that of the BDI and SDS,46 and similar to the MADRS.47 It has good test–retest and interrater reliability, although item reliability is poor.46 Semistructured versions have been developed, including scoring guidelines that improved item reliability.48 It has also been shown to be valid in patients with significant cognitive impairment.49, 50 It correlates with biological markers of depression.51–55 However, somatic symptoms are heavily represented,56 and its use as a screening measure, particularly in patients with physical illness, has been criticized.56, 57 Thus, almost 60% of the total items could be experienced by a typical patient with PD. From a clinimetric point of view, its disadvantages include a lack of consistency at the item level; some items assess multiple symptoms, and some symptoms can be rated on multiple items. In addition, the 17-item version has also been shown to measure more than one dimension.58 The Ham-D is available in the public domain (see, for example, Table 3 “located in Supplementary Material”) and has been translated into most European and Asian languages. There are self-rated and over the telephone-administered formats that yield comparable results to the interviewer-administered version.59 Multiple versions of the scale exist, of which the 17-item version is the most frequently used. However, use of different scales limits comparability of study results.

Depression in PD

The HAM-D has been shown to have good sensitivity and specificity.60 Cutoff scores of 9/1060 and 11/1247 to screen for dPD, and 15/1660 and 13/1447 to diagnose major depressive disorder (although diagnosis using a scale alone is not recommended) have been suggested. Using these cutoffs, sensitivity, specificity, and positive and negative predictive values for a DSM-IV diagnosis of major depressive disorder in PD have been found to be acceptable. It has been demonstrated to be sensitive to change in PD patients61–68 and to correlate with biological markers of dPD.69–72

Suitability for Studies

The Ham-D is most suitable for assessing depression severity in treatment trials of dPD, correlation studies with biological markers or other parkinsonism scales, and the study of the phenomenology of depression.14, 73, 74 It is an adequate screening measure for dPD, but, like all scales and diagnostic criteria that include somatic items, it overlaps with core PD symptoms. As an observer-rated scale, it requires training, and self-report questionnaires may be more appropriate as screening instruments for dPD in routine clinical neurological clinics or in large-scale epidemiological studies.60

Montgomery-Asberg Depression Rating Scale (MADRS)

Depression in non-PD Patients

The MADRS is an observer-rated scale and requires some (although not extensive) clinical experience with depression. It covers all the DSM-IV criteria of a major depressive episode, with the exception of psychomotor retardation/agitation and reverse neurovegetative symptoms (e.g., hypersomnia and increased appetite). When compared to other observer-rated scales, such as the Ham-D, the MADRS has relatively few somatic items. It was designed to measure change in severity of depressive symptoms during antidepressant clinical trials and is at least as sensitive to change as the Ham-D.30, 75 It is not usually used for screening purposes. It has good face validity, criterion validity, and concurrent validity.30, 76 Interrater agreement and internal consistency are high.30, 76 Although it has been shown to be valid in older patients with mild cognitive impairment,77 sparse data on patients with severe cognitive impairment exist.78 The MADRS (see, for example, Table 3 “located in Supplementary Materials”) is in the public domain (although because of its publication in the British Journal of Psychiatry, the scale is formally copyrighted by this journal). It has been translated into several European and Asian languages.

Depression in PD

Although the MADRS is not usually used for screening purposes, one group79, 80 reported on its use in screening for dPD. Cutoff scores in PD of 14/15 for screening (high sensitivity and high NPV) and 17/18 (high specificity and high PPV) for diagnostic purposes have been validated in PD against a diagnosis of major depressive disorder.47 It has been used in medication trials in dPD and been shown to be sensitive to change in the level of severity of depression.73, 81 It has also been used to assess the phenomenology of dPD.82

Suitability for Studies

The MADRS is appropriate for medication trials in PD and for correlation studies with biological markers. It is also suitable for screening purposes, when the appropriate cutoff scores are used, and for studying the phenomenology of dPD.

Beck Depression Inventory (BDI)

Depression in non-PD Patients

The BDI is one of the most used self-rated instruments for major depression in clinical practice.83 It has been used both to measure severity of depression and as a screening instrument in more than 2,000 studies.84, 85 Several modified versions exist for adaptation to DSM-IV criteria. In the revised BDI-II, agitation, concentration difficulties and loss of energy were added to the original version. However, most validation studies have been performed for the BDI-1A, which is also the most commonly used version.86 In a study on ease of comprehension of self-reported depression scales, the BDI had high overall cognitive complexity.87 Nevertheless, it has high test–retest reliability88, 89 and internal consistency25, 88, 89 in a variety of patient groups. Its concurrent and discriminant validity is good.85, 90–92 Although it contains several somatic symptoms,12, 11 it is weighted toward psychological symptoms of depression. It has been shown to correlate with biological markers of depression93 and to be sensitive to change.94, 95 It also appears to be valid in patients with significant cognitive impairment.96, 97 The scale is generally considered to be in the public domain (see, for example, Table 3 “located in Supplementary Materials”), but purchase of the scale from the publisher is required for use in large-scale research projects.98 It has been validated and used worldwide. Cross-cultural evaluations suggest some cultural differences in depression measurements particularly for the psychological aspects,39, 99–101 but the scale has been shown to be valid across cultures.

Depression in PD

The BDI has been used widely in PD. It has been used to screen for dPD,23, 102–104 measure severity,20, 105 and assess the response to pharmacological or surgical treatment.106–108 Although the most commonly used version of the BDI assesses the state of mood during the past week (or past 2 weeks for the BDI-II), it has been used to quantify on or off state-dependent mood.109 However, the use of the BDI as a short-time scale has not been validated. It has good internal consistency and test–retest reliability in dPD.12, 110 Concurrent64, 111, 112 and discriminant validity110, 113 in PD are acceptable. Different cutoff scores have been used, with recommendations from 8/9 from screening and 16/17 from diagnosis of dPD.113 Recently, an optimal cutoff of 13/14 has been suggested with acceptable sensitivity and specificity.110 Despite concerns about the number of somatic symptoms, it has been shown to have good reliability and validity compared to a DSM-IV diagnosis of major depression in dPD, superior to that of the HADS.113, 114

Suitability for Studies

The BDI is suitable for screening purposes if an appropriate cutoff is used. It is also suitable for assessing the severity of depressive symptoms and for monitoring change during treatment. It can also be used in phenomenological studies of dPD.

Hospital Anxiety and Depression Scale (HADS)

Depression in non-PD Patients

The HADS is a short self-rated scale yielding subscores for depression and anxiety. Face validity is moderate as some of the core diagnostic criteria for depression are not included in the scale. Anxiety symptoms are rated separately from the depression symptoms but, due to high comorbidity between anxiety and depression, some researchers have used the total HADS as a measure of global mood disorder. The depression subscale is weighted toward the emotional aspects of depression (emphasizing anhedonia rather than sadness)29 and does not include physical and cognitive symptoms, or suicidal ideation. Its face validity has been criticized as it excludes items at the severe end of the severity spectrum of depression, including suicidal ideation, psychotic features, and vegetative symptoms. Nevertheless, sensitivity and specificity for DSM criteria for major depressive disorder and other depression scales were reported as good.115 The HADS has been reported to have medium overall cognitive complexity or respondent comprehensibility, in between the BDI (high) and the SDS (low).87 The internal consistency and test–retest reliability of the scale is good.115, 116 Sensitivity to change has been shown to be good, both for studies evaluating pharmacotherapy and psychotherapy for depression.117–121 It has not been validated in patients with significant cognitive impairment and has only rarely been used in this population.122, 123 The scale is in the public domain (see, for example, Table 3 “located in Supplementary Material”), but for use in large scale research projects, purchase of the scale from the publisher is required.124 It has been validated and used in many countries in all parts of the world.118 However, in a multinational study, it was found that nationality significantly influenced HADS scores.125 The use of this scale in different countries could be influenced by the different perception and expression of emotions by the patients from different cultural backgrounds.118

Depression in PD

There is little overlap with nondepression symptoms of PD, as only the item querying about feeling “slowed down” overlaps with core PD symptoms. Whereas this is advantageous in patients with PD with mild depression, this reduces its validity in severe depression. It has also been criticized for its use in PD due to its reverse coding of some items, which has been reported to result in frequent cross-outs and inconsistent ratings, perhaps related to problems with concept-shifting.126 Cutoff scores for the total HADS of 10/11 for screening purposes and of 23/24 for diagnostic purposes have been suggested,114 although specificity was low at a cutoff of 10/11.114 The internal consistency and test–retest reliability of the scale is good in patients with PD.114, 127 The HADS has not been used in PD for treatment trials. However, it has been used to measure severity of dPD.128, 129

Suitability for Studies

The HADS is moderately suitable for screening purposes for dPD.114 Its use as a severity measure in PD is controversial, and, as it excludes most somatic symptoms, it may be more suitable for mild to moderate than for more severe depression. Due to its low content validity, it is not suitable for phenomenological studies of depression.

Zung Self-Rating Depression Scale (SDS)

Depression in non-PD Patients

The SDS is a short self-rated scale that assesses psychological and somatic symptoms of depression. It has been widely used to screen for130–133 and measure severity of depression.134–136 Several shortened versions are available, but the original version is the most commonly used. It is more easily comprehended than the HADS, CES-D, and BDI.87 It has good internal consistency38, 137–139 and test–retest reliability.139 Content and criterion validity are good, as it includes most of the DSM-IV criteria for major depression,140 and concurrent validity is acceptable.38, 141, 142 There are a large number of somatic items. To adjust for an expected higher baseline score in elderly patients seen in medical settings, it has been recommended that the cutoff score be raised from 50 in the general population to 60 or greater in this population.143 No data on its use in patients with significant cognitive impairment are available. The scale is in the public domain (see, for example, Table 3 “located in Supplementary Material”). It has been used in numerous languages and has been validated in English, Japanese, Chinese, Finish, and Italian.

Depression in PD

The SDS has been used in several studies to screen for and measure severity of depressive symptoms in patients with PD. However, few validation studies in PD are available, and there are a large number of somatic items that overlap with PD symptoms. Cutoff scores for patients with PD have not been established. It appears to have adequate discriminant validity in patients with PD,144 and has been reported to be sensitive to change,81, 145, 146 despite the limitation of yes/no answer options. Similar to the HADS, the use of reverse coding introduces complexity, particularly for patients with PD who may have difficulty in set-shifting.

Suitability for Studies

The SDS may be suitable to measure change of severity of dPD, although further studies are needed to confirm its validity in patients with PD. The large number of somatic items is likely to inflate depression rates, which limits its use as a screening instrument and needs to be taken into account when evaluating change of depression severity scores.

Geriatric Depression Scale (GDS)

Depression in non-PD Patients

The GDS is a short, self-report, yes/no screening instrument for depression in the elderly. It focuses on the psychological aspects and social consequences of depression, avoiding symptom overlap with medical disorders or aging in general. However, there is limited concordance between GDS items and DSM-IV symptoms of depressive disorder, as it excludes somatic symptoms and suicidal ideation, thus raising questions about the content validity of the instrument. It has nevertheless been reported to have discriminant validity similar to the Ham-D and better than the SDS,27 and correlates highly with other depression scales.27, 147 It also has been shown to have good internal consistency and test–retest reliability.148 Two commonly used versions exist (30-item GDS [GDS-30]27 and 15-item GDS [GDS-15]26), and they perform equally well.149 A telephone version has demonstrated good agreement with the self-report questionnaire.150 As each question on the GDS can only be scored “0” or “1,” the instrument is not able to capture degrees of severity at the level of individual items. However, there is preliminary evidence that the overall scale may be sensitive to changes in depression severity.151 The GDS has been validated in subjects age 55 and older, but not in younger patients. Although it performs well in patients with mild to moderate cognitive impairment,152–154 the data on its validity in moderate to severely cognitively impaired patients are conflicting.153, 155–158 The GDS is in the public domain (see, for example, Table 3) and has been translated in many European and Asian languages.

Depression in PD

Although there is limited published research on its use in PD, the GDS-30 has been reported to have adequate discriminant validity for a DSM-IV diagnosis of major depressive disorder in PD at a cutoff of 9/10.159 The GDS-15 appears to have adequate discriminant validity for a diagnosis of major and minor depressive disorder at a cutoff of 4/5, performing comparably to the HAM-D.160 The GDS avoids many, but not all, symptoms overlapping between depression and PD. It has been reported to perform similarly to the CSDD in PD patients with dementia,161 suggesting that it may be a sensitive indicator of depression severity in cognitively impaired patients with PD. The scale has not been adequately evaluated in patients younger than 55 years of age.

Suitability for Studies

The GDS is short and easily understood, making it appropriate for use in both clinical research and routine clinical care as a screening instrument for depression in elderly PD patients. However, there is insufficient evidence to recommended its use to assess depression severity (i.e., as an outcome measure in dPD treatment trials or in correlation studies with biological markers or other scales).

Cornell Scale for Depression in Dementia (CSDD)

Depression in non-PD Patients

This interview-based scale was developed specifically for the assessment of depression in patients with dementia and uses a caregiver to provide collateral information. The CSDD is, therefore, appropriate in assessing dPD in patients with comorbid cognitive impairment. The CSDD was developed to measure depression severity, but has also been used to screen for depression in patients with dementia. The CSDD is based on observation and interviews with both an informed other and the patient. This technique increases the amount of information obtained, but the instructions do not specify how the information from the informed other is to be weighed in making the final assessment. There is also no formal definition of an informed other, and informants may vary in their relationship to the patient. The CSDD de-emphasizes questions related to motor symptoms of PD, but retains some overlap with PD symptoms (e.g., retardation, physical complaints, sleep, energy). The observer scoring the scale also relies on the informed other to make difficult clinical distinctions about whether symptoms are secondary to PD or dPD. Instructions specify that if items are secondary to physical illness they should be excluded, but it is unclear whether most informed others can make this distinction. Administration of the scale requires some sophistication in assessing psychiatric symptoms and training in understanding the motor symptoms of PD. The scale may, therefore, be difficult for neurologists and general physicians to complete accurately without some psychiatric expertise or training.

In research studies, internal consistency, interrater reliability, and concurrent validity were shown to be acceptable,31, 78, 162 and the CSDD was shown to have acceptable psychometric properties in severe dementia.78 Sensitivity to change was demonstrated in a few trials.163–166 Informed other ratings on the CSDD in patients with dementia were shown to have higher sensitivity for correctly diagnosing depression than other depression scales, including the BDI, the Ham-D, and the GDS (all modified for caregiver rating).97 Although there is limited data on the reliability, validity, and sensitivity of the CSDD in nondemented elderly patients,162, 167 it has also been recommended as a useful scale for screening older adults for a diagnosis of depression.168 The scale is in the public domain (see, for example, Table 3 “located in Supplementary Material”) and has been translated into several European and Asian languages.

Depression in PD

The scale was developed for patients with dementia and could be appropriate for use in PD patients with comorbid dementia, but no studies have yet been performed in this population.

Suitability for Studies

The CSDD offers an opportunity to assess severity and screen for depression in patients dPD and comorbid dementia. The CSDD has shown reasonable psychometric properties in depressed demented patients without comorbid PD. However, its administration requires experience and training, and attribution of symptoms to depression or PD is a particular problem in the informed other-rated component of the scale. Whilst all observer-rated scales require some training, particularly for clinically inexperienced researchers, this is particularly important for the use of this scale. No validation studies have been conducted in PD, and few data from patients with PD are available. Although it should be easily adaptable to patients with PD, clarification of the issues of overlapping symptoms and validation studies are needed before it should be used widely in PD.

Unified Parkinson's Disease Rating Scale (UPDRS) Part I

Depression in non-PD Patients

Not used, as it was designed for patients with PD.

Depression in PD

The first part of the current UPDRS33 comprises four screening questions on “Mood, Mentation, and Behavior”, of which only one assesses mood (the other three assess intellectual impairment, thought disorder, and motivation/initiative). The question on mood lumps several symptoms of depression in one question. It serves as a screening tool but has not been used as an outcome measure in clinical trials. The UPDRS Part I is clinician-rated, requiring training, but a self-rated version has been validated.169 It is short and specifically designed for use in patients with PD, but it only includes one aspect of depression (with some additional information in the other three questions). Therefore, it has limited face and content validity, and construct validity cannot be assessed in a single question. The test–retest and interrater reliability and the concordance rates between patient and observer rating of the depression item are fair or moderate in nondepressed PD patients.169–171 Part I in its entirety has been reported to be sensitive to change in some studies of antiparkinsonian drugs with purported antidepressant properties.172

The UPDRS is currently being revised and a new version, which will be partly observer- and partly self-rated, will be published in the future. For assessment of depression, the new version will focus only on mood to avoid ambiguities and overlap with symptoms of PD. In this way, depression can only be screened for with the new version of the UPDRS, but an accompanying appendix will provide “Recommended” and “Suggested” depression scales for further evaluations. Some of the somatic features of depression, as well as apathy, cognitive impairment, anxiety, and sleep disturbances, will be assessed in separate questions of the new version of the UPDRS, and can be used to document problems but the scale is designed to “rate what you see” and not to attribute causation to depression, PD per se, or another comorbid condition. The original UPDRS is in the public domain (see, for example, Table 3 “located in Supplementary Materials”) and has been used in many languages.

Suitability for Studies

The original UPDRS Part 1 and the revised version (unpublished, contact cgoetz@rush.edu for working draft) should only be used as a crude screening tool. They are not recommended to diagnose depression or measure severity of depression. In clinical practice, many clinicians complete UPDRS Part I as they are completing other parts of the scale for the complete examination of patients with PD, using the results to crudely screen for a variety of psychiatric symptoms. The psychometric properties of the revised version of the UPDRS Part 1 should be assessed in clinical studies before a recommendation can be made.

Center for Epidemiologic Studies Depression Scale (CES-D)

Depression in non-PD Patients

The CES-D was derived from other depression scales as a screening instrument for depression in older adults with physical illness. It has been used extensively in epidemiological studies. It does not require training or experience and has been validated in several formats, including face-to-face interviews and self-report. Several versions, including a short version for use in older adults, are available and have similar psychometric properties.173 It has medium cognitive complexity, similar to the HADS.87 It has some face validity, but lacks several symptoms of depression included in DSM-IV or ICD-10. It is strongly weighted to the assessment of depressed mood and depressive thinking, and somatic symptoms are underrepresented, and no question assesses loss of interest. It is mainly used as a screening tool, but is skewed toward the less severe end of depressive illness. It has rarely been reported as an outcome measure in clinical trials. The CES-D has arguably been subject to wider evaluation than any other depression scale in different populations, age groups, and cultures, particularly in epidemiological studies. It has good internal consistency and acceptable test–retest reliability.32, 174 It has acceptable construct validity32, 174, 175 and discriminant validity (no depression vs. major depressive disorder),176, 177 but it may lack utility in distinguishing between gradations of severity within the clinical range of depression (minor vs. major depression).178 The scale is in the public domain (see, for example, Table 3 “located in Supplementary Materials”) and has been translated and used in multiple European, Middle Eastern, and Asian languages.

Depression in PD

The CES-D appears acceptable for use in PD in terms of the language and format. As it contains few somatic items and no item on loss of interest, it is unlikely to be significantly contaminated by nondepressive symptoms of PD and may be useful across the range of PD disease severity. Despite its widespread use in other settings, the CES-D has been used relatively infrequently in PD179 and has not been formally evaluated for its psychometric properties. However, in one study,180 it has been reported to be sensitive to change. Due to its low number of somatic items, it may not be sensitive at the severe end of the depression severity spectrum and may be suitable for patients with mild to moderate depression.

Suitability for Studies

The CES-D, or one of its shortened versions, is a suitable screening instrument for depression in older adults with physical illness in community studies or primary care settings. It has limited validity at the more severe end of the spectrum of depression, but it may be particularly useful for the detection of subsyndromal depression. However, further validation studies in PD are needed before it can be recommended for wider use as a primary study tool. Unless further evidence becomes available, it is not recommended to assess change of depression severity.

CONCLUSIONS AND RECOMMENDATIONS

  • All reviewed scales have some utility in the assessment of dPD. Apart from the UPDRS Part I (which is merely a screening instrument in the context of overall assessment of PD symptoms), they are useful in assessing depressive symptoms in PD. The BDI and Ham-D have been validated and widely used in patients with PD, whereas there are few data available on other scales, particularly the CES-D and CSDD. Further validation studies are required before their use can be recommended in PD. Overall, observer-rated scales (e.g., HAM-D and MADRS) have better psychometric properties than self-rated scales, and observer-rated scales should, therefore, be preferred if the study or clinical situation permits.

  • Available depression scales serve diverse purposes (e.g., screening instruments vs. instruments used to measure severity and to follow symptoms over time). Different uses require that different scale properties be taken into account and that adaptations of cutoff scores are made as needed (depending on whether sensitivity, specificity, PPV, or NPV are important to the aims of the study). Recommendations have been made above for the appropriate use of each scale. For screening purposes, the HAM-D, BDI, HADS, MADRS, and GDS appear to be useful instruments. The CES-D and CSDD are promising alternatives from a theoretical point of view and should be studied further. The new UPDRS Part I is likely to provide a crude screening instrument for presence of depression, anxiety, apathy, and other target behaviors within the assessment of the spectrum of symptoms of PD. For measurement of severity of depressive symptoms observer-rated scales such as the Ham-D and MADRS as well as the BDI and SDS self-rating scales are more useful and valid, and the CSDD should be studied further.

  • The diagnosis of depression should not be solely made on the basis of a score on a rating scale. A cutoff score on these instruments cannot comprehensively capture the range of depressive disorders in PD; high scores can occur when somatic symptoms are endorsed even without the two core symptoms of depression (i.e., sad mood and loss of interest or pleasure); low scores can occur despite serious depressive symptoms when somatic or vegetative problems are absent. For this reason, the gold standard for establishing the diagnosis of depression remains a (semi)structured interview using DSM-IV criteria or its equivalent future diagnostic adaptation.

  • Insufficient evidence is available to recommend the best depression rating scales for PD patients with dementia. Current evidence suggests that the MADRS, GDS, and CSDD may be useful, but further studies are required.

  • Patients may perceive their own condition differently in an off than during an on period.181Off periods may be associated with severe psychiatric symptoms, including depression, anxiety, and delusions,181, 182 which usually improve together with motor symptoms and are, therefore, typically short-lived. As the reviewed scales are designed to assess the preceding 1 or 2 weeks and as these off periods are also not considered the same as untreated PD and may represent rebound worsening after the beneficial effect of levodopa has worn off,183 we therefore recommend, in line with common practice, that patients with motor fluctuations be assessed for depression during on periods. The scales are therefore also not suitable to specifically assess fluctuating depressive symptoms during off periods versus on periods in the same way that motor symptoms or dyskinesia can be monitored.

  • All depression scales include items that assess symptoms with overlap between depression, parkinsonism, cognitive impairment, and apathy (see Table 1), particularly the Ham-D and SDS, and to a lesser degree MADRS and BDI. The scale with one of the highest number of items assessing overlapping symptoms, the HAM-D, has the best psychometric properties compared to DSM-IV criteria at recommended adjusted cutoffs. For most studies, instruments that have been demonstrated to have good psychometric properties are recommended above those with poorer validity or reliability or those not validated in dPD. Appropriately adjusted cutoff scores for patients with PD should be chosen and overlapping symptom areas should be assessed in parallel with a primary PD scale like the UPDRS motor scale. This twofold assessment could allow for adjustment of confounding factors in the assessment of depression. The HADS and GDS lack many overlapping items and may, therefore, be useful in the comparison of patients with different disease stages and could also be used to monitor change in depression even in the context of changes in underlying Parkinsonism. They have limited content validity and appear insensitive at the severe end of the depression severity spectrum. As such, they may be useful candidates for studies of mild or mild–moderate depressive symptoms, the most commonly encountered problem in cross-sectional cases of dPD. They would, however, be less useful to assess moderate to severe depression.

  • In line with the NINDS recommendations9 use of “loss of pleasure” (reflecting anhedonia) may be more specific to depression than loss of interest, which, as a symptom of apathy, may occur in the absence of depression, but this needs to be researched further.

The following unresolved issues require further research:

  • More studies are needed on the sensitivity, specificity, and positive and negative predictive values of each scale for major depressive disorder in PD, in particular the CSDD scale and the CES-D.

  • The assessment of concurrent validity of depression scales is typically made in comparison to DSM-IV criteria of major depression. The criteria for assessment of depressive disorder of dPD are undergoing changes,9 and the validity of depression rating scales using these assessment criteria will need to be established.

  • The inclusion of somatic symptoms in depression scales theoretically leads to falsely inflated depression scores in patients with PD and may influence the results of treatment trials of depression in PD (e.g., with antiparkinsonian medication). This needs to investigated in clinical trials.

  • In general, the observer should score answers on the scales using an inclusive approach, and patients should be instructed not to attribute their symptoms to either PD or depression when scoring self-rated scales. An exclusive approach may lead to an underestimation of depression severity. However, some scales require judgment, such as the CSDD and to a lesser degree other observer-rated scales such as the MADRS and Ham-D, and may be conducive to using a more etiological approach. Whilst this should be investigated further, in the absence of evidence for advantages of exclusive or etiological approach, the task force advises to follow an inclusive approach.

  • The evaluated instruments were not designed or are used to identify minor or subsyndromal depression, and do not reflect the diversity of mood disorders seen in PD, including recurrent brief depressive disorder or dysthymia. Thus, further characterization of other types of dPD is required, and cutoffs must be adapted to the purpose of the study and a time frame specified to include more diverse depressive disorders in PD rather than merely using a cutoff for major depression.

  • Insufficient evidence is currently available on score improvements that represent remission of dPD.

  • The ability of scales to measure anxiety, anhedonia, or apathy when they occur outside the context of depression needs to be assessed separately.

  • Further studies are needed on impact of age, cognitive impairment, apathy, and cultural differences on the validity of the depression scales.

  • The minimal clinically important change and the minimal clinically important difference has been evaluated for only a few of the evaluated scales.110

  • In this review, we did not assess multidimensional scales, which assess depression as part of a wider assessment. However, these scales, such as the POMS or the NPI, may be useful in some circumstances and require validation before their use can be recommended.

  • The role of the caregiver in reporting symptoms of depression should be operationalized, particularly on scales such as the CSDD, which assess dPD with comorbid dementia.

  • The use of scales to measure present state of mood (e.g., for the measurement of short-term mood fluctuations), which requires a change of time scales, needs to be validated before it can be recommended.

  • Whilst the assessment of dPD with the reviewed scales has many shortcomings, the task force committee agreed that many of the same problems will be encountered when developing a new depression scale for PD. Therefore, at present, the task force does not recommend the development of new scales, but rather advises to better study existing scales. Development of a special depression scale in PD is only useful and feasible if some basic conceptual issues have been agreed upon. Moreover, the possibility of comparing depressive symptoms in PD with those in other (neuro-)psychiatric disorders may have additional advantages over the development of a range of disease-specific depression scales for a large number of disorders, as the issues raised here are not specific to PD.

Drs. Schrag, Barone, Brown, Leentjens, McDonald, Starkstein, and Weintraub are members of the Depression Scale Task Force. Drs. Poewe, Rascol, Sampaio, and Stebbins are members of the Rating Scales Task Force Steering Committee. Dr. Schrag is the chairperson of the Depression Task Force. Dr. Goetz is Chairperson of the Rating Scales Task Force. Critques of scales are presented in more detail in the Appendix that is available on the Movement Disorders Journal website at http://www.interscience.wiley.com/jpages/0885-3185/suppmat.

Ancillary