The opinions of this publication are those of the grantee and do not necessarily reflect those of the Department of Education or the Department of Veterans' Affairs.
The Center for Epidemiologic Studies Depression Scale (CES-D) is an instrument commonly used to assess depressive symptoms. Although the psychometric properties of the instrument are well established, the instrument's ability to identify confirmed cases of major depression has been unclear. The purpose of this study was to evaluate the ability of cutoff scores from both a full scale and a modified CES-D to detect major depression in people with rheumatoid arthritis (RA).
Data were analyzed from 457 persons with RA, including 91 who met criteria for major depression.
Results indicated that, in general, a full scale cutoff score of 19 was the most efficient in identifying cases of major depression; the cutoff score of 19 outperformed a variety of other cutoff scores from the modified scale. Even the most efficient cutoff scores, however, demonstrated problems in accurately identifying people with depression.
The CES-D, while potentially useful as a screening tool, should not be used to identify cases of major depression.
Depression has been identified as a problem for persons living with rheumatoid arthritis (RA). Although estimates of prevalence rates vary, studies indicate that approximately one-quarter of persons with RA experience major depressive disorder (MDD) (1, 2). Estimating the prevalence of MDD in persons with RA can be complicated because many of the symptoms of MDD (e.g., fatigue, insomnia) also are associated with RA (3). However, research clearly indicates that depression in RA is related to a host of negative consequences, including greater pain, more fatigue, and reduced quality of life (4–6).
A self report measure often used clinically to assess for the presence of depression is the Center for Epidemiologic Studies Depression Scale (CES-D) (7). Research supports the reliability and validity of the CES-D on samples of persons with RA (8, 9). However, the accuracy of the CES-D for assessing cases of MDD is unclear. The most commonly used CES-D cutoff score for identification of potential depression is 16 (7, 10, 11). Researchers, though, have proposed cutoff scores ranging from 19 in the chronic pain population (12) to 27 in a combined primary care and psychiatric population (11). Yet, the accuracy of these cutoff scores in identifying confirmed cases of MDD is unknown.
Research studies indicate that using CES-D cutoff scores generally results in adequate sensitivity (the degree to which actual depressed persons are identified based on the score), specificity (the degree to which actual nondepressed persons are identified based on the score), and negative predictive value (NPV; the proportion of individuals who are not depressed when the score identifies them as not depressed) (11, 13–15). In contrast, research indicates that CES-D cutoff scores often result in poor positive predictive value (PPV), which is defined as the proportion of individuals who are depressed when the score identifies them as depressed (11, 13–15). Thus, the commonly used CES-D cutoff scores often result in a high percentage of false positives, with numerous individuals being categorized as depressed who, in fact, do not meet the criteria for MDD. A high number of false positives may be especially problematic in populations experiencing RA because symptoms that are attributable to the condition may overlap the symptoms of MDD (3, 16).
In an effort to overcome some of the aforementioned problems in use of the CES-D to assess for MDD, Santor and Coyne (15) developed a shortened, 9-item version of the CES-D by comparing responses on each CES-D item between a group of primary care patients diagnosed with MDD and a group without MDD. Using this revised scale, the researchers tested a variety of modified cutoff scores and determined that cutoff scores from the shortened version provided an overall improvement in efficiency (combined sensitivity, specificity, PPV, and NPV) over full scale cutoff scores of 16 and 27. To our knowledge, the Santor and Coyne scale has not been tested with a sample of persons with RA.
The purpose of this study was to compare the efficiency (assessed via sensitivity, specificity, PPV, and NPV) of diagnosing confirmed cases of MDD in RA using several different CES-D cutoff scores (i.e., full scale cutoff scores of 16 and 19 and modified cutoff scores ranging from 3 to 8 on the shortened CES-D developed by Santor and Coyne ). The hypothesis was that a cutoff score from the modified CES-D would provide greater overall efficiency than the full scale CES-D cutoff scores of 16 and 19 in an RA population.
SUBJECTS AND METHODS
This project consisted of secondary analyses of data obtained from 2 previous studies of persons with RA who were evaluated for MDD after reporting depressive symptoms (CES-D ≥ 11) during a phone screening for depression. The first previous research study (17) involved a longitudinal research design with baseline (preintervention) data being included in the analyses reported herein; the second previous research study (18) involved a cross-sectional research design. Subjects from the first study (17) included 75 individuals out of an initial pool of 638, with individuals being excluded from the original study for either displaying low depressive symptomatology or not consenting to a diagnostic interview. Subjects from the second study (18) included 137 individuals out of an initial pool of 515, with individuals being excluded for the same reasons. A reliability check indicated that differences existed between the 2 subject pools in terms of efficiency results, especially for specificity, PPV, and NPV. However, these differences may be explained by the fact that the former subject pool (17) had a very low number of individuals who did not meet criteria for MDD (only 28%). Prior studies assessing efficiency and depression have typically utilized samples where a majority of subjects did not meet diagnostic criteria for depression (13, 15). Thus for this series of studies, the subject pools were combined. Persons with confirmed MDD were aggregated and randomly included in either the exploratory (study 1) or the confirmatory (study 2) phase of the present study. The purpose of study 1 was to test cutoff scores from both the full scale and modified CES-D (15); the purpose of study 2 was to replicate within a sample of persons with RA the results of study 1. A majority of subjects (75%) were assigned to study 1 because it was decided that it would be beneficial to have more subjects involved in testing the cutoff scores prior to replication. Persons without MDD were aggregated and divided in the same manner. Following these initial analyses, supplemental analyses that included an additional nondepressed group were conducted. The methodologic strategy used in this study is presented in Figure 1.
SUBJECTS AND METHODS, STUDY 1
Subjects, study 1.
Subjects were 160 persons with RA confirmed by a board certified rheumatologist (SEW) using the 1987 American College of Rheumatology (formerly American Rheumatism Association) criteria (19). Sixty-four percent of the subjects were female (n = 102). The mean ± SD age was 52.8 ± 21.5 years, the mean educational level was 12.1 ± 2.2 years, and the mean disease duration was 13.8 ± 10.9 years. The breakdown of the sample by functional class was as follows: class I: 5% (n = 7); class II: 34% (n = 55); class III: 60% (n = 96), and class IV: 1% (n = 2).
Measures, study 1.
The CES-D (7) is a 20-item self report measure that assesses depressive symptoms. Each item is assessed on a 4-point scale and addresses the frequency that each symptom is experienced (0 = none of the time, 3 = all of the time). Satisfactory reliability and validity have been established for the scale (20). A cutoff score of 16 is commonly used on the CES-D to indicate a need for further assessment of depression (7, 10, 11), although others (12) have recommended using 19 as the cutoff score for depression when evaluating patients experiencing pain.
Cutoff scores of 16 and 19 were evaluated, as well as cutoff scores ranging from 3 to 8 from the modified CES-D as recommended by Santor and Coyne (see Table 1 for the shortened version of the instrument) (15). Items on the modified Santor and Coyne CES-D are scored dichotomously as follows: A score of 0 or 1 (experiencing the symptom 2 days or less in the past week) is assigned a 0 on the modified CES-D; a score of 2 or 3 (experiencing the symptom 3 or more days in the past week) is given a score of 1. Santor and Coyne (15) reported that a cutoff score of 4 yielded superior efficiency than the full scale cutoff scores of 16 or 27.
Shortened scale was developed by Santor and Coyne (15). CES-D = Center for Epidemiologic Studies Depression Scale.
I was bothered by things that usually don't bother me.
I felt that I could not shake off the blues even with the help from my family or friends.
I had trouble keeping my mind on what I was doing.
I felt depressed.
I felt that everything I did was an effort.
My sleep was restless.
I was happy. (reverse scored)
I enjoyed life. (reverse scored)
I felt sad.
To assess Diagnostic and Statistical Manual of Mental Disorders, Third Edition, Revised (DSM-III-R) criteria for MDD, a portion of the subjects (n = 57) were administered the Structured Clinical Interview for DSM-III-R (SCID) (21). The SCID is a structured tool designed to assess a number of psychiatric disorders, including MDD. For the present study, the focus was only on the presence or absence of current MDD.
A second group of subjects (n = 103) was assessed for MDD via the Primary Care Evaluation for Mental Disorders (PRIME-MD) (22). The PRIME-MD is a brief psychiatric diagnostic scale designed for use by primary care physicians to diagnose mental disorders, including MDD. The PRIME-MD has been validated via standard diagnostic interviews (23).
Procedures, study 1.
The pool of 160 subjects was gathered via screening procedures with the CES-D for depressive symptoms. Individuals with CES-D scores ≥11 were invited to participate in an evaluation for MDD and were interviewed via the SCID or the PRIME-MD. The recruitment details of these subjects have been described in detail elsewhere and will not be repeated here (17, 18). Subjects who met the criteria for MDD were categorized into the depressed group, and those not meeting the criteria were categorized into the nondepressed group. Subjects meeting the criteria for MDD were offered appropriate treatment.
Sensitivity, specificity, PPV, and NPV were calculated for full scale CES-D cutoff scores of 16 and 19, and modified cutoff scores ranging from 3 to 8. Sensitivity was calculated by dividing the number of subjects diagnosed with MDD who had CES-D scores above the cutoff by the total number of subjects diagnosed with MDD. Specificity was calculated by dividing the number of subjects without an MDD diagnosis who had CES-D scores below the cutoff by the total number of subjects without an MDD diagnosis. PPV was calculated by dividing the number of subjects with CES-D scores above the cutoff who had MDD by the total number of subjects with CES-D scores above the cutoff. NPV was calculated by dividing the number of subjects with CES-D scores below the cutoff who had no MDD diagnosis by the total number of subjects with CES-D scores below the cutoff. See Figure 2 for the computational formulas of these efficiency values for this particular study.
These values were compared across cutoff scores, and a decision was made regarding the most efficient cutoff score for both the full scale and the modified CES-D. When evaluating sensitivity, specificity, PPV, and NPV, there are no generally agreed upon rules for evaluating what represents “good” efficiency, so any evaluation of these values is somewhat subjective. Furthermore, there are no generally agreed upon guidelines regarding appropriate sample size for efficiency analyses. However, the sample sizes for the exploratory analyses in this article (n = 160 in study 1 and n = 344 in the supplemental analysis), where a majority of the subjects were assigned, were comparable to other studies that have examined the efficiency of a scale (n = 213–425) (11, 13–15).
RESULTS, STUDY 1
Results of the efficiency analyses can be seen in Table 2. There are several noteworthy aspects of these results. First, possibly because all of these subjects had CES-D scores ≥11, specificity was lower for many of the cutoff scores than would be expected with a full range of CES-D scores. Specificity values ranged from 0.40 (modified CES-D, cutoff of 3) to 0.97 (modified CES-D, cutoff of 8). Second, for the full scale cutoff scores of 16 and 19, a score of 19 seems to be more accurate in identifying depression cases. Specificity was 23 percentage points higher and PPV was 9 percentage points higher for a score of 19 compared with 16, whereas sensitivity was only 10 points lower and NPV 5 points lower. Third, none of the modified cutoff scores was as efficient as the full scale cutoff score of 19 in identifying MDD cases. The most efficient modified cutoff score identified by Santor and Coyne (15) was clearly not as efficient as the full scale cutoff score of 19, as all 4 measures of efficiency were lower for the modified cutoff score of 4 versus the full scale cutoff score of 19. Of the modified cutoff scores, a value of 6 seems to be the most efficient in identifying MDD cases, especially compared with the full scale cutoff score of 19. Although sensitivity and NPV are clearly lower (29 and 12 percentage points), specificity and PPV are clearly higher (22 and 12 percentage points). Taken together, it was concluded that for this sample a full scale cutoff score of 19 and a modified cutoff score of 6 provided the most efficient results in terms of classifying cases of MDD in this sample.
There are several conclusions to be drawn from this study. First, results indicated that a full scale cutoff of 19 was more efficient in identifying cases of MDD than a score of 16. Second, even though a score of 19 was more efficient than a score of 16, using a score of 19 was problematic, especially in terms of specificity and PPV. Third, sensitivity was 10 percentage points lower for a cutoff score of 19 than 16. Thus, using a cutoff score of 19 caused more patients diagnosed with MDD to be identified as nondepressed, which is undesirable from a clinical perspective. Fourth, the modified CES-D was less efficient than the full scale CES-D in identifying cases of MDD. Santor and Coyne (15) identified a modified cutoff score of 4 as most efficient for this purpose. Yet, in the present study, overall efficiency analyses for the modified scale cutoff scores were less effective than a full scale cutoff score of 19. The most efficient cutoff score that emerged from the modified CES-D for this study was 6, although this score did not provide overall benefits over a full scale cutoff score of 19. Thus, it was concluded from this study that a full scale cutoff score of 19 was most efficient in identifying cases of MDD in an RA population, followed by a modified cutoff score of 6.
SUBJECTS AND METHODS, STUDY 2
Subjects, study 2.
Subjects for this study consisted of the 25% of subjects who were not assigned to study 1 and included 52 persons with RA. Fifty-eight percent of the subjects were female (n = 30). The mean age ± SD was 51.2 ± 19.8 years, the mean educational level was 12.5 ± 3.3 years, and the mean disease duration was 13.6 ± 10.1 years. The breakdown of the sample by functional class was as follows: class I: 6% (n = 3); class II: 46% (n = 24); class III: 48% (n = 25); and class IV: 0% (n = 0). Results indicated no significant differences on any demographic variables between the subjects in study 1 and the subjects in study 2 (P > 0.05).
Measures, study 2.
The measures for study 2 were the same as for study 1.
Procedures, study 2.
Subjects were recruited from the same pool as study 1, so recruitment methods will not be repeated. Once again, sensitivity, specificity, PPV, and NPV were calculated for this sample, but only the most efficient cutoff scores from study 1 were utilized. Thus, the “best” full scale and “best” modified scale cutoff scores were selected from study 1, which were a full scale cutoff of 19 and a modified cutoff of 6. This study, then, served to confirm the efficiency of these optimal cutoff scores with a separate group.
RESULTS, STUDY 2
The efficiency analyses can be found in Table 3. Results indicate that the full scale cutoff score of 19 was superior to the modified cutoff score of 6. The full scale score of 19 yielded superior sensitivity, PPV, and NPV, and its specificity was only 4 percentage points lower than the modified cutoff score of 6. Also noteworthy is that although the results for the modified cutoff score of 6 are similar between the 2 studies, in study 2 the full scale cutoff score of 19 yielded more efficient results than in study 1. Sensitivity, specificity, PPV, and NPV were all higher for the full scale cutoff score of 19 for study 2 than study 1.
The purpose of study 2 was to provide a confirmatory analysis of the results of study 1. Results of the analyses generally support the conclusions generated by study 1. A full scale cutoff score of 19 was clearly more efficient than a modified cutoff score of 6, indicating that the modified scale proposed by Santor and Coyne (15) was not effective within the RA samples. Furthermore, even though results were stronger for the full scale cutoff of 19 in study 2 as compared with study 1, individuals making a judgment regarding MDD using the cutoff score would still be wrong a considerable amount of the time. For example, a sensitivity value of 0.86 indicates that 14% of those individuals with MDD were misclassified. Thus, while the results of these analyses provide further support for using a full scale cutoff score of 19 versus a modified CES-D cutoff score to determine cases of MDD, the overall effectiveness of even the most efficient cutoff score appears marginal.
SUBJECTS AND METHODS, SUPPLEMENTAL ANALYSES
A potential limitation of the aforementioned 2 studies is that a full range of CES-D scores (0–60) was not tested. Specifically, all subjects had somewhat elevated CES-D scores because scoring greater than 10 on the CES-D was a criteria for further evaluation via the SCID or the PRIME-MD. Thus, to test the various cutoff scores with a wider range of CES-D scores, a group of subjects who had been assessed via the CES-D at several time periods but never reported CES-D scores greater than 10 were added. Although these subjects were never assessed for MDD via DSM-III-R criteria, the assumption was made that they did not meet such criteria due to their consistently low CES-D scores (<11). These subjects were included from the initial pool of 638 individuals from the study described earlier (17). The mean ± SD CES-D score for subjects diagnosed with MDD in studies 1 and 2 was 30.1 ± 11.0, and the data were normally distributed. Thus, one would expect that only 2% of depressed individuals would have CES-D scores <8. Given that the mean CES-D score for this additional group was 3.4 ± 3.0 and was positively skewed (toward 0), one can be reasonably sure that most, if not all, of these additional subjects would not meet criteria for MDD. For the purposes of this study, the latest in the series of CES-D scores for each subject was utilized.
Subjects, supplemental analyses.
Subjects for the supplemental analyses included all of the subjects from the first 2 studies, but with the addition of a group of subjects who consistently reported low CES-D scores (<11) over time (n = 245). Demographic characteristics of these additional subjects were as follows: 52% of the subjects were female (n = 128). The mean age ± SD was 53.4 ± 24.4 years, the mean educational level was 12.4 ± 3.0 years, and the mean disease duration was 12.8 ± 10.6 years. The breakdown of the sample by functional class was as follows: class I: 25% (n = 60); class II: 41% (n = 101); class III: 33% (n = 82); and class IV: 1% (n = 2).
Procedures, supplemental analyses.
The procedures for study 1 (exploratory phase) and study 2 (confirmatory phase) were replicated with the addition of the larger pool of nondepressed subjects. These subjects were randomly assigned to either the nondepressed pool of study 1 (n = 184) or study 2 (n = 61). A range of full scale and modified cutoff scores were again assessed in the exploratory phase, and only the “best” scores from this study were tested in confirmatory phase.
RESULTS, SUPPLEMENTAL ANALYSES
Efficiency analyses for the exploratory group (study 1) with the additional subjects can be seen in Table 4. There were several important findings in this set of analyses. First, for almost all cutoff scores, efficiency values generally improved, which is to be expected with a wider range of CES-D values within the sample. Second, the overall efficiency of the full scale CES-D cutoff scores of 16 and 19 were almost the same. Although a score of 16 yielded stronger sensitivity and NPV, a score of 19 yielded stronger specificity and PPV. Finally, once again, scores from the shortened CES-D were overall less efficient in identifying cases of MDD than scores from the entire scale. A modified scale cutoff score of 3 was within 10 percentage points of either of the full scale CES-D cutoff scores in terms of sensitivity, specificity, and NPV, but was considerably lower than the full scale cutoff score of 19 in terms of PPV. A modified cutoff score of 6 yielded higher specificity and PPV than either full scale cutoff score, but sensitivity and NPV were lower, with sensitivity considerably lower. Thus, once again it was concluded that cutoff scores from the full scale CES-D were more efficient than cutoff scores from the modified CES-D in identifying cases of MDD.
Table 4. Efficiency analyses for the study 1 group with additional nondepressed subjects*
The purpose of the final set of analyses was to test the “best” cutoff scores with a sample that included a wider range of CES-D scores. Even though the overall efficiency of the full scale cutoff scores of 16 and 19 was similar, in the interest of maintaining consistency from the earlier analyses, a full scale cutoff of 19 was tested. For the modified scale, however, cutoff scores of 3 and 6 were chosen. Results of the confirmatory analysis again indicated that the full scale CES-D cutoff score of 19 provided greater overall efficiency than either modified cutoff score (see Table 5).
Table 5. Efficiency analyses for study 2 group with additional nondepressed subjects*
A limitation from the first 2 studies was that the subjects did not represent a full range of CES-D scores because scoring 11 or higher on the instrument was a requirement for entry into the original studies from which the subjects were drawn. Thus, the efficiency of both the full scale and the modified CES-D with an expanded sample of subjects was assessed, including individuals with consistently low CES-D scores in the nondepressed groups. Overall, efficiency results were stronger compared with the first 2 studies, as would be expected when including a wider range of CES-D scores. Results from these analyses are consistent with study 1 and study 2, with the full scale CES-D cutoff scores performing better in terms of identifying cases of MDD than the modified cutoff scores. Thus, results from this set of analyses do not support the use of the modified CES-D proposed by Santor and Coyne (15) to identify cases of MDD in persons with RA.
The main purpose of this series of studies was to evaluate the efficiency of various CES-D cutoff scores for detecting cases of MDD in persons with RA. The main hypothesis was that a cutoff score from a modified version of the CES-D (15) would provide superior overall efficiency than commonly used full scale cutoff scores of 16 and 19. However, in every set of analyses conducted, the full scale cutoff score of 19 provided greater overall efficiency than any modified scale cutoff score. Thus, it was concluded that the modified CES-D yielded less efficient results in terms of classifying cases of MDD than the full scale CES-D.
There are at least 2 potential explanations as to why the results of this study did not replicate the results of Santor and Coyne (15). First, the modified CES-D was developed from a general primary care population. Given the unique features of RA, items included on this shortened scale may not discriminate well between individuals with and without MDD who also have RA. Prior research indicates that 2 of the CES-D items included on the shortened version of the scale (“I felt that everything I did was an effort” and “My sleep was restless”) may be influenced by aspects of RA and thus not necessarily due to depression (3, 16). These items were included in the present study because the purpose of the project was to assess the efficiency of the overall shortened scale developed by Santor and Coyne. Future researchers on the shortened scale within the RA population may consider removing these 2 items. A second explanation as to why the modified scale did not perform as well has to do with the way it is scored. According to procedures outlined in Santor and Coyne (15), the modified scale is to be scored dichotomously. Scoring the scale in such a manner, which reduces the CES-D from a 4-point to a 2-point scale, could cause the instrument to lose accuracy.
Although results from this study indicate that a full scale cutoff score of 19 on the CES-D is a more accurate means of identifying cases of MDD than scores from the modified CES-D, researchers and clinicians are urged to be cautious in making clinical decisions on the basis of this instrument alone. Results from this study indicate that with a sample of persons with RA who had CES-D scores ≥11, even the “best” cutoff score would be wrong a considerable amount of the time. Furthermore, even though the cutoff score of 19 generally yielded the most efficient overall results, if a clinician were most concerned about avoiding misclassifying patients with MDD, a cutoff score of 19 might not be appropriate. Indeed, for both analyses where a cutoff score of 16 was compared directly with a cutoff score of 19, the cutoff of 16 yielded sensitivity values approximately 10 percentage points higher than the cutoff of 19.
Prior research has indicated that the full scale CES-D is sensitive in identifying psychological distress, but may not be sensitive in identifying depression versus other types of psychological problems (e.g., bereavement, anxiety) (13, 14). Results from this study indirectly support the notion that the CES-D may be more effective in identifying general psychological distress rather than depression specifically. In the present study, both the full scale and modified CES-D had problems in terms of accurately classifying MDD cases. Even in the study where the shortened CES-D showed an improvement over the full scale CES-D, there were still inaccuracies in identifying MDD specifically (15). Taken together, these results suggest that both the full scale and shortened CES-D have flaws in accurately classifying cases of MDD within both the primary care and RA populations. Both the full scale and the shortened CES-D may accurately classify general psychological distress within persons with RA, but do not seem to be efficient enough to distinguish MDD itself.
There are limitations to this study. First, because this article involved secondary analyses of data, subjects were assessed for the presence or absence of MDD via 2 different measures. Assessing all subjects with the same instrument would be more desirable. Second, in the supplemental analyses, the additional nondepressed subjects were categorized on the basis of consistently low CES-D scores. Thus, not all subjects were assessed specifically for a DSM-III-R diagnosis, and subjects were categorized based on low scores from the instrument that was being tested. Defining a certain group of subjects in this manner was acceptable because standards were being compared within the same instrument, although this procedure may have introduced some bias to the study. Third, the sample was primarily white and from a rural Midwestern population, which raises questions regarding the generalizability of the results.
In conclusion, results from the series of studies indicate full scale CES-D cutoff scores are more efficient than scores derived from the modified scale developed by Santor and Coyne (15) in identifying cases of MDD in persons with RA. Despite the results supporting the superiority of full scale CES-D cutoff scores, clinicians should be cautious in using the CES-D to assess for actual cases of MDD. Clinicians and researchers are encouraged to think of the CES-D as a potentially useful screening tool, but not as a means of accurately identifying cases of MDD.