- Top of page
Aims: The Center for Epidemiologic Studies Depression Scale (CES-D) has been validated to avoid misdiagnoses of major depression in routine psychiatric outpatient settings, but it was reported to be only marginally feasible in these specific settings. A briefer and simpler version, known as the 10-item CES-D, meant to attain adequate feasibility, has been validated in geriatric outpatient settings, but it has not yet been examined in psychiatry outpatient settings. The purpose of the present study was therefore to compare the feasibility, reliability, and validity of the two types of CES-D.
Methods: A cross-sectional analysis was conducted of 86 consecutive outpatients in a psychiatric department in a general hospital.
Results: The 10-item CES-D has a higher feasibility than the 20-item CES-D, and its internal consistency, reliability, and validity are almost identical to those of the 20-item CES-D.
Conclusions: The 10-item CES-D is the better instrument to use because of the higher feasibility than the 20-item CES-D in psychiatric outpatient settings. The different answer format used in each questionnaire (a yes or no format in the former vs a multiple-choice format in the latter) may influence the feasibility, rather than the number of items.
ACCUMULATING EVIDENCE SUGGESTS that major depression, in particular major depression comorbid with dementia,1 is underrecognized in routine psychiatric practice.2–4 To avoid such underrecognition and resulting under-treatment, many screening instruments have been developed to detect the presence of major depression. Few of these instruments, however, have been specifically validated for use in routine psychiatric outpatient settings.2,5,6
Among these screening instruments, Schulberg et al. and Furukawa et al. examined the test characteristics of the Center for Epidemiologic Studies Depression Scale7 (CES-D or the 20-item CES-D) in psychiatric outpatients using semistructured interviews for criterion-standard diagnoses.2,5 Despite the demonstrated utility of the CES-D in these studies, the high CES-D incompletion rate of approximately 20–25% suggests that this tool presents problems for psychiatric patients; specifically, the CES-D utilizes a forced four-choice scale format that patients may find difficult to complete. To reduce such respondent burden and to attain an adequate response rate, a briefer and simpler version of the CES-D, known as the 10-item CES-D, has been proposed.8 The 10-item CES-D has been reported to retain acceptable reliability and validity in geriatric outpatients,8–10 but its reliability and validity have not been investigated in psychiatric outpatient settings.
Furthermore, administering a questionnaire to all patients regardless of risk status in practice-based screening has significant limitations to routine use in psychiatry outpatients. The psychiatric population also tends to have a broad spectrum of cognitive impairment derived from mental disorders,11 which may affect questionnaire feasibility. Particularly, cognitive disorders were strongly associated with the infeasibility of the CES-D.12 Because the cognitively impaired segment of the population in psychiatry settings grows with the general aging of the population, the routine use of a screening instrument will become more prohibitive due to the decreasing feasibility related to cognitive impairment. To our knowledge, previous work has not fully investigated the feasibility of any depression screening instruments in psychiatric outpatient settings. Here, ‘feasibility’ is defined as the failure to complete more than a predefined threshold number of items in a screening instrument.
The first aim of the present study was therefore to compare the feasibility of the 10-item and 20-item CES-D in a psychiatric outpatient setting. The second aim was to compare the reliability and validity of the two types of CES-D in this setting.
- Top of page
When the time required to complete the CES-D was surveyed, we found the 20-item CES-D to be much lengthier to administer than the 10-item CES-D (average time ± SD: 3.4 ± 2.4 min for the long form and 1.1 ± 1.0 min for the short form). On examination of the internal consistency reliability, alpha coefficients for the 20-item and 10-item CES-D were 0.92 and 0.80, respectively. ROC analysis illustrates the excellent ability of the CES-D to discriminate between depressive and non-depressive subjects. The AUC was 0.89 (95% confidence interval [CI]: 0.82–0.96) and 0.92 (95%CI: 0.86–0.98) for the 20-item and 10-item CES-D, respectively (Fig. 1). These two AUC, which were obtained from 76 subjects completing both types of CES-D, were not significantly different (P = 0.52). Table 3 lists the results for the sensitivity, specificity, and predictive values for the various cut-offs of the two types of CES-D. In addition, Table 4 lists the SSLR and aforementioned operating characteristics as a whole.
Figure 1. Receiver operating characteristic curve for the () 10-item and () 20-item Center for Epidemiologic Studies Depression Scale to screen for depressive episodes.
Download figure to PowerPoint
Table 3. Validity characteristics of the 10-item and 20-item CES-D at different cut-offs
Table 4. Validity characteristics for the 10-item and 20-item CES-D to screen for depressive episodes
|AUC (95%CI)|| ||0.89 (0.82–0.96)|| ||0.92 (0.86–0.98)|
|SSLR (95% CI)||0–20||0.11 (0.03–0.37)||0–3||0.09 (0.03–0.30)|
|21–36||0.73 (0.39–1.35)||4–6||0.79 (0.35–1.75)|
|37–60||13.59 (3.98–46.35)||7–10||10.29 (3.70–26.63)|
|LR+ (95% CI)||24–60||3.83 (2.24–6.54)||6–10||4.63 (2.51–8.55)|
|LR− (95% CI)||0–23||0.12 (0.04–0.32)||0–5||0.15 (0.06–0.35)|
|Sensitivity|| ||0.91|| ||0.88|
|Specificity|| ||0.76|| ||0.81|
Finally, the number of items not completed on the 10-item CES-D were 10 items for three patients and one item for three patients, while those on the 20-item CES-D were 20 items for nine patients, six items for one patient, and one item for one patient (Fig. 2). The number of subjects who failed to complete more than a predefined threshold number of items in the 10-item CES-D was significantly lower than that in the 20-item CES-D (3/86 vs 10/86; McNemar's χ2 = 5.14, d.f. = 1, P = 0.02). The diagnoses assigned to the subjects who failed to complete the 20-item CES-D consisted of six cases of dementia, two of mental retardation, and two of major depression; for the 10-item CES-D, the diagnoses were three cases of dementia (Table 1).
Figure 2. No. non-completed items on the (a) 10-item and (b) 20-item Center for Epidemiologic Studies Depression Scale.
Download figure to PowerPoint
- Top of page
The major findings of the present study are the following: (i) the feasibility of the 10-item CES-D is significantly and substantially higher than that of the 20-item CES-D; (ii) the 10-item CES-D's internal consistency, reliability, and validity were almost identical to those of the 20-item CES-D, and they indicate its use as a screening instrument for major depression in psychiatric outpatient settings; and (iii) the 10-item CES-D can be administered in approximately 30% of the time necessary for the 20-item CES-D. To the best of our knowledge, this study is the first to evaluate the feasibility of any depression-screening instrument in psychiatric outpatient settings. The second finding is in agreement with results reported in older primary care patients, who tend to have as broad a spectrum of cognitive impairments as the psychiatric population.9,10 With regard to administration time, the third finding is also in agreement with previous reports.8
Unfortunately, the psychiatric population tends to have a broad spectrum of cognitive impairments derived from their mental disorders.11 Furthermore, major depression is a common (30–50%) complication of dementia.23 Significant limitations thus hinder the routine administration of a questionnaire to all patients regardless of risk status in practice-based screenings. These limitations arise primarily due to patient cognitive impairment, which has been reported to reduce questionnaire acceptability and feasibility.24 To cope with this problem, it is desirable to use questionnaires that are as feasible and acceptable as possible. The present results show that almost all of the subjects who failed to complete the 20-item CES-D were unable to answer any of the items although half of the subjects who failed to complete the 10-item CES-D were unable to answer only one item on it (Fig. 2). This suggests that the feasibility of each questionnaire may not be so much influenced by the number of items used for each questionnaire but by the answer format, where the former use a multiple-choice format but the latter uses a yes or no format. Therefore, a questionnaire with a yes or no format (e.g. the 10-item CES-D) may be more suitable for psychiatric outpatient settings than those with a multiple-choice format (e.g. the 20-item CES-D) in the light of its feasibility.
From a clinical perspective, the purpose of screening is to improve diagnostic recognition. This requires high sensitivity and a corresponding small false-negative rate so that the clinician can be confident that a negative test result indicates little need to inquire about the target disorder's symptoms. In contrast, false positives are less of a problem for a screening instrument because their major cost is the time a clinician takes to determine that the disorder is not in fact present. Presumably, this is the time the clinician would have nonetheless spent for the same purpose.6 This perspective is based on the situation in which the sensitivity and specificity are used to gauge test performance. If the SSLR is instead used to test performance, it is not necessary to tolerate the cost of high false positives.
Using the data in strata rather than a series of cut-offs for positive versus negative findings is a more efficient use of the information included in a test. First, a patient's pre-test probability of disease is estimated from experience, local data, or published literature. Next, the pre-test probability can be converted to the post-test probability using the formula:
Note that these are odds, not probabilities. The conversions are simple but not intuitively obvious: odds = probability / (1 − probability) and probability = odds / (1 + odds).19 For example, consider a patient with a pre-test probability of 30% for a major depressive episode. Those patients with a 10-item CES-D score >7 have a post-test probability of 83% for this episode, whereas those with a score <3 have a post-test probability of 4.9%. Thus, we can make our recognition sensitive and specific at the same time by using a SSLR based on a given test score.
The first limitation of the present study is the reliability of cognitive disorder diagnoses, which differ from the diagnoses of other disorders based on the MINI. It must be noted that there was no confirmation of their reliability. The second limiting issue is the relatively small size of the study sample, which did not permit the examination of variables potentially causing CES-D infeasibility. Third, because we did not check whether or not each subject had difficulty in completing the CES-D, the extent to which external help in the completion of the instrument can affect its feasibility is not clear. Each type of CES-D, however, was administered in a consistent manner and thus, the comparison of two types of CES-D should be valid at least in the present study. The fourth issue is the histogram comparison of the uncompleted item between the two types of CES-D (Fig. 2), based on which we suggested that the feasibility for each questionnaire could be influenced by their answer format, rather than by the number of items. There is a possibility, however, that the feasibility of the questionnaires may be influenced by the number of items. For example, most subjects who failed to complete the 20-item CES-D recognized it too hard to answer items on it due to their symptoms (such as lack of self-confidence, lack of concentration, or tiredness). Another explanation is the different factor structures that underlie the two types of the CES-D. There are 50% of items in the 10-item CES-D belonging to the somatic factor, but only 35% of items in the 20-item CES-D.25 Such differences may affect the difference of the feasibility, rather than the answer format used. To eliminate this uncertainty, it is a better strategy to make a comparison between the same type of CES-D with different answer formats. One such example is the comparison between the 10-item CES-D with yes-no format and multiple-choice format. To make this comparison, we created the 10-item CES-D from the 20-item CES-D retrospectively, which is referred to as the post-hoc 10-item CES-D here. The number of items not completed on the post-hoc 10-item CES-D was 10 items for nine patients and four items for one patient, and thus, significantly more patients failed to complete the post hoc 10-item than the original 10-item CES-D (10/86 vs 3/86; P = 0.02). Therefore, the feasibility of the instruments seemed to be influenced by their answer format, rather than by the number of items, although there still remains the possibility that the number of items may influence the feasibility.
Despite these limitations, the present study has a higher success rate in making a diagnosis than previous studies5,9; this confers greater generalizability to the results.
In summary, the present data suggest that the 10-item CES-D (a questionnaire with a yes or no format) is a better instrument to use for detecting major depressive episodes in psychiatric outpatient settings because of (i) a substantial reduction of respondent burden; (ii) the resulting greater feasibility over the 20-item CES-D (a multiple-choice format test); and yet (iii) reliability and validity comparable to the 20-item CES-D. The different answer format used in each questionnaire may influence its feasibility, rather than the number of items.