Reliability and validity of the center for epidemiologic studies depression scale in patients with systemic sclerosis


  • Brett D. Thombs,

    Corresponding author
    1. Sir Mortimer B. Davis Jewish General Hospital McGill University, Montreal, Quebec, Canada
    • Sir Mortimer B. Davis Jewish General Hospital, Institute of Community and Family Psychiatry, 4333 Cote Saint Catherine Road, Montreal, Quebec H3T 1E4, Canada
    Search for more papers by this author
  • Marie Hudson,

    1. Sir Mortimer B. Davis Jewish General Hospital McGill University, Montreal, Quebec, Canada
    Search for more papers by this author
  • Orit Schieir,

    1. McGill University Montreal Quebec Canada
    Search for more papers by this author
  • Suzanne S. Taillefer,

    1. Sir Mortimer B. Davis Jewish General Hospital McGill University, Montreal, Quebec, Canada
    Search for more papers by this author
    • Dr. Taillefer is the research coordinator and Dr. Baron is the director of the Canadian Scleroderma Research Group

  • Murray Baron,

    1. Sir Mortimer B. Davis Jewish General Hospital McGill University, Montreal, Quebec, Canada
    Search for more papers by this author
    • Dr. Taillefer is the research coordinator and Dr. Baron is the director of the Canadian Scleroderma Research Group

  • The Canadian Scleroderma Research Group

    Search for more papers by this author
    • The Canadian Scleroderma Research Group Investigators are as follows: J. Markland: Saskatoon, Saskatchewan, Canada; J. Pope: London, Ontario, Canada; D. Robinson: Winnipeg, Manitoba, Canada; N. Jones: Edmonton, Alberta, Canada; P. Docherty: Moncton, New Brunswick, Canada; M. Abu-Hakima, S. Le Clercq: Calgary, Alberta, Canada; N. A. Khalidi, E. Kaminska: Hamilton, Ontario, Canada; E. Sutton: Halifax, Nova Scotia, Canada; C. D. Smith: Ottawa, Ontario, Canada; J.P. Mathieu, S. Ligier: Montreal, Quebec, Canada; P. Rahman: St. John's, Newfoundland, Canada.



Reported rates of depressive symptoms in patients with systemic sclerosis (SSc) are high. No depression assessment tools, however, have been validated for patients with SSc. Our objective was to assess the internal consistency reliability, convergent validity, and structural/construct validity of the Center for Epidemiologic Studies Depression Scale (CES-D) in patients with SSc.


We conducted a cross-sectional, multicenter study of 470 SSc patients. Internal consistency reliability was assessed with Cronbach's alpha and structural/construct validity with confirmatory factor analysis.


Internal consistency reliability was good for the overall CES-D scale (α = 0.88) and for its 4 factors (α = 0.67–0.88). Correlations of the CES-D total score were −0.73 with mental health, −0.36 with physical health, 0.41 with disability, and 0.44 with pain. The 4-factor model originally found in the general population and validated for patients with rheumatoid arthritis (depressed affect, somatic/vegetative, [lack of] positive affect, and interpersonal factors) fit the data well, as did a second-order version of the same model with an overarching depression factor that loaded onto each of the 4 first-order factors. The 4-factor model fit the SSc data better than alternative models.


Internal consistency reliability and convergent validity were good, the 4-factor structure reported in the general population was replicated, and a second-order model with an overarching depression factor fit well. These findings indicate that the CES-D is a valid and reliable measure of depressive symptoms for patients with SSc.


Systemic sclerosis (SSc), or scleroderma, is a chronic, multisystem disorder of connective tissue characterized by thickening and fibrosis of the skin, and by involvement of internal organs. Patients with SSc report high levels of pain, fatigue, and disability (1). A recent systematic review found that between 36% and 65% of patients with SSc have clinically significant symptoms of depression, a high rate even compared with patients with other acute and chronic conditions (e.g., postmyocardial infarction, congestive heart failure, diabetes, chronic obstructive pulmonary disease, rheumatoid arthritis [RA]) when the same assessment tools and scoring cutoffs are used (2). Studies included in that review reported multivariable associations between depressive symptoms and education, overall disease severity, gastrointestinal symptoms, pain, disability, and body image distress, although methodologic issues limited the ability to draw strong conclusions about predictors (2). A recent study of 403 patients with SSc (of the 470 included in the present study) found that patients with less education; patients who were not married; patients with higher physician-rated overall disease severity; and patients with more tender joints, more gastrointestinal symptoms, and more difficulty breathing had significantly higher symptoms of depression as measured by the Center for Epidemiologic Studies Depression Scale (CES-D). Patient sex, duration since onset of non-Raynaud's symptoms or since diagnosis, and total skin score or diffuse/limited classification were not significantly associated with depressive symptoms in multivariate analysis (3).

The CES-D is a widely used 20-item self-report measure that was originally designed for assessing depressive symptomatology in the general population (4) (Table 1). The CES-D is also commonly used as a general depression screening tool (5). Research supports the CES-D as a valid measure of depressive symptoms among patients with RA (6–8), and Rhee et al (6) found that the originally specified CES-D 4-factor model (4) fit RA data well. The 4 factors were depressive affect symptoms (7 items), somatic/vegetative symptoms (7 items), interpersonal symptoms (2 items), and (lack of) positive affect symptoms (4 items) (Figure 1). The findings from Rhee et al are consistent with the results of a recent systematic review (9) that also found strong evidence for the 4-factor model across many different patient groups. This is important because consistency of factor structures across groups provides evidence for construct validity.

Table 1. Factor loadings of Center for Epidemiologic Studies Depression Scale (CES-D) items in models tested*
CES-D itemModel 1: 1 factorModel 2: 2 factorsModel 3A: 3 factors (DA + S/V)Model 3B: 3 factors (DA + PA)Models 4A/4B: 4 factors
  • *

    Item allocation notations are based on Radloff's (4) 4-factor model (depressed affect [DA], somatic/vegetative [S/V], positive affect [PA], interpersonal [IP]). When ≥2 of Radloff's original factors are combined into a single factor, this is noted with an addition sign, as done by Rhee et al (6). For example, a single factor based on Radloff's DA and PA factors is shown as DA + PA. Factor loadings shown for the 4-factor models are from model 4A, which were not substantively different than those for model 4B.

 1. I was bothered by things that usually don't bother me.Depression = 0.70DA + S/V + IP = 0.72DA + S/V = 0.72S/V = 0.77S/V = 0.76
 2. I did not feel like eating: my appetite was poor.Depression = 0.55DA + S/V + IP = 0.58DA + S/V = 0.58S/V = 0.60S/V = 0.61
 5. I had trouble keeping my mind on what I was doing.Depression = 0.67DA + S/V + IP = 0.69DA + S/V = 0.69S/V = 0.73S/V = 0.73
 7. I felt that everything I did was an effort.Depression = 0.65DA + S/V + IP = 0.68DA + S/V = 0.68S/V = 0.71S/V = 0.71
11. My sleep was restless.Depression = 0.45DA + S/V + IP = 0.49DA + S/V = 0.49S/V = 0.50S/V = 0.51
13. I talked less than usual.Depression = 0.68DA + S/V + IP = 0.70DA + S/V = 0.70S/V = 0.74S/V = 0.74
20. I could not get going.Depression = 0.69DA + S/V + IP = 0.72DA + S/V = 0.72S/V = 0.76S/V = 0.76
 3. I felt that I could not shake off the blues.Depression = 0.83DA + S/V + IP = 0.84DA + S/V = 0.84DA + PA = 0.84DA = 0.85
 6. I felt depressed.Depression = 0.88DA + S/V + IP = 0.89DA + S/V = 0.89DA + PA = 0.89DA = 0.90
 9. I thought my life had been a failure.Depression = 0.78DA + S/V + IP = 0.79DA + S/V = 0.79DA + PA = 0.80DA = 0.80
10. I felt fearful.Depression = 0.66DA + S/V + IP = 0.68DA + S/V = 0.68DA + PA = 0.67DA = 0.68
14. I felt lonely.Depression = 0.77DA + S/V + IP = 0.78DA + S/V = 0.78DA + PA = 0.78DA = 0.79
17. I had crying spells.Depression = 0.80DA + S/V + IP = 0.81DA + S/V = 0.81DA + PA = 0.81DA = 0.81
18. I felt sad.Depression = 0.88DA + S/V + IP = 0.89DA + S/V = 0.90DA + PA = 0.90DA = 0.90
 4. I felt that I was just as good as other people.Depression = 0.37PA = 0.63PA = 0.63DA + PA = 0.39PA = 0.64
 8. I felt hopeful about the future.Depression = 0.56PA = 0.79PA = 0.79DA + PA = 0.58PA = 0.79
12. I was happy.Depression = 0.68PA = 0.91PA = 0.91DA + PA = 0.70PA = 0.91
16. I enjoyed life.Depression = 0.66PA = 0.87PA = 0.87DA + PA = 0.68PA = 0.86
15. People were unfriendly.Depression = 0.63DA + S/V + IP = 0.65IP = 0.79IP = 0.79IP = 0.79
19. I felt that people disliked me.Depression = 0.73DA + S/V + IP = 0.74IP = 0.93IP = 0.93IP = 0.93
Figure 1.

Correlated 4-factor model with the Radloff (4) item allocation. Item error variances are not shown. DA = depressed affect; S/V = somatic/vegetative; PA = positive affect; IP = interpersonal.

No measures of depressive symptoms have been validated for patients with SSc. The objective of this study was to use a large sample of patients with SSc from a Pan-Canadian registry to investigate the internal consistency reliability, convergent validity, and structural/construct validity of the CES-D in patients with SSc.

Patients and Methods

Patient sample.

The study sample comprised patients enrolled in the Canadian Scleroderma Research Group (CSRG) registry from September 2004 through August 2006 who completed the CES-D. Patients in the registry were recruited from 15 centers across Canada. To be eligible for the registry, patients must have a diagnosis of SSc made by the referring rheumatologist, be ≥18 years of age, and be fluent in English or French. Registry patients undergo extensive clinical history, physical evaluation, and laboratory investigations and complete a series of self-report questionnaires. Although eventually all patients with SSc receiving care from participating centers will be enrolled, for reasons related to resources, this will occur over time. Thus, this is a convenience sample, rather than a consecutive sample. Of the patients approached to participate in the CSRG registry, ∼90% have enrolled. Patients from all sites provided informed consent, and the research ethics board of each study site approved the data collection protocol.


The CES-D (4) is a 20-item measure designed to assess the presence and severity of depressive symptomatology. The frequency of occurrence of each symptom during the past week is rated on a 0–3 Likert-type scale (“rarely or none of the time” to “most or all of the time”), and total scores range from 0 to 60. Standard cutoffs are ≥16 for possible depression and ≥23 for probable depression (4). In addition to depressive symptoms, demographic and medical data were collected. Demographic information was based on self-report and included age, sex, education, marital status, and race/ethnicity. Patients' medical history and disease characteristics were obtained via clinical history and examinations by study physicians. Skin involvement was assessed using the modified Rodnan skin score ranging from 0 to 51 (10). Limited skin disease was defined as skin involvement distal to the elbows and knees with or without face involvement. Disease severity was assessed with a scale developed by Medsger et al (11), and a severity score of 0 (normal) to 4 (end stage) was generated for each of the 9 systems.

Self-report measures of mental health function (Short Form 36 [SF-36] Mental Component Summary [MCS]), physician function (SF-36 Physical Component Score [PCS]), disability (Health Assessment Questionnaire [HAQ] disability index [DI]), and pain (McGill Pain Questionnaire Short Form [MPQ]) were used to establish convergent validity. Higher scores on the HAQ DI indicate greater disability, and higher scores on the MPQ indicate greater pain. Both would be expected to be positively associated with higher scores on the CES-D. Higher scores on the MCS and PCS indicate better function, and would be expected to be negatively associated with CES-D scores. The association between the CES-D and the MCS would be expected to be the most robust because the MCS measures mental health and has a strong depression component.

Statistical analyses.

Internal consistency reliability was evaluated using Cronbach's alpha. Convergent validity of the CES-D with other self-report measures was assessed using Spearman's correlation coefficient. Confirmatory factor analysis (CFA) models were used to evaluate the factor structure of the CES-D and were conducted with Mplus, version 3.11 (Muthén & Muthén, Los Angeles, CA), explicitly modeling the CES-D items as ordinal data. To do this, Mplus initially estimates item thresholds for ordinal outcome variables using maximum likelihood methods. These estimates are then used to estimate a polychoric correlation matrix. Model parameters are subsequently estimated with weighted least squares using the inverse of the asymptotic covariance matrix as the weight matrix.

Following the methodology that Rhee et al used with RA patient data (6), 6 alternative models were compared: a single depression factor model (model 1), a 2-factor model of general depression and positive affect (model 2), a 3-factor model combining the depressive affect and somatic/vegetative factors (model 3A), a second 3-factor model combining the depressive affect and positive affect factors (model 3B), Radloff's 4-factor model (model 4A) (Figure 1) (4), and Radloff's 4-factor model with a second-order depression factor as done by Rhee et al (model 4B) (6). Second-order factors are global factors composed of all of the first-order factors (e.g., depressed affect, somatic/vegetative, (lack of) positive affect, and interpersonal) that provide a mechanism to test the plausibility that a single overarching construct is being measured. All item-factor allocations for each model are shown in Table 1. Although Sheehan et al (8) reported a slightly different item allocation than Radloff, these models were not tested because they did not fit as well as the Radloff allocations when directly compared with CFA by Rhee et al (6). Furthermore, the Sheehan et al item allocations appear to have come from initial exploratory analyses (6) and are not easily justified theoretically.

To assess the fit of the models to the data, practical fit indices were emphasized because chi-square tests of fit are highly sensitive to sample size and can lead to the rejection of well-fitting models (12). Four practical fit indices were used to evaluate model fit: the Tucker-Lewis index (TLI), the comparative fit index (CFI), the root mean square error of approximation (RMSEA), and the standardized root-mean-square residual (SRMR). Guidelines proposed by Hu and Bentler (13) suggest that models with TLI and CFI close to 0.95 or higher, RMSEA close to 0.06 or lower, and SRMR close to 0.08 or lower are representative of good-fitting models.


Sample characteristics.

A total of 470 patients were included in the study. Of these, 397 (86%) were female and 408 (87%) were white, which is consistent with North American samples from previous reports (14). The mean ± SD age of the sample was 55.4 ± 12.6 years, 209 (46%) of 457 patients with data completed some postsecondary education, and 332 (72%) were married or living as married.

The mean ± SD duration since onset of non-Raynaud's symptoms was 10.6 ± 8.7 years (median 8.3 years), and the mean duration since diagnosis of SSc was 8.4 ± 7.7 years (median 6.2 years). A total of 279 (60%) of 464 patients with data had diffuse SSc, and the mean ± SD total skin score was 11.2 ± 10.2 (median 8.0). Mean ± SD disease severity scores for each of the 9 systems were 0.9 ± 1.2 for general, 1.6 ± 1.2 for peripheral vascular, 1.3 ± 0.7 for skin, 0.9 ± 1.3 for joint/tendon, 0.3 ± 0.8 for muscle, 0.5 ± 1.0 for heart, 0.2 ± 0.7 for kidney, and 1.4 ± 0.7 for lung.

The mean ± SD CES-D score was 14.3 ± 10.4 (median 13.0). A total of 178 patients (37.8%) scored at least 16 on the CES-D, a standard cutoff for possible depression, and 95 patients (20.2%) scored ≥23 for probable depression.

Reliability of the CES-D.

Overall scale reliability was good (α = 0.88) and similar to the values reported in the original validation study (α = 0.88–0.90) (4). Corrected item-total correlations for individual items ranged from 0.24 (item 4, “good”) to 0.73 (item 6, “depressed”). Coefficient alphas were also very good for each of the 4 CES-D factors (4): 0.88 for depressive affect, 0.80 for somatic/vegetative, 0.67 for interpersonal symptoms, and 0.82 for (lack of) positive affect.

Confirmatory factor analysis of the CES-D.

Factor loadings for all models are shown in Table 1. As shown in Tables 2 and 3, model fit was best for the original 4-factor model specified by Radloff and for a second-order version of the 4-factor model, which specified a second-order depression factor rather than intercorrelations between the 4 factors. Intercorrelations between the 4 factors in model 4A ranged from 0.28 to 0.89 and were lowest for correlations that included the positive affect factor. In the second-order 4-factor model (model 4B), the overarching second-order depression factor accounted for 97% of the variance in the depressed affect factor (standardized regression coefficient = 0.99), 73% of the variance in the somatic/vegetative factor (standardized regression coefficient = 0.86), and 57% of the variance in the interpersonal factor (standardized regression coefficient = 0.76), but only 14% of the variance in the positive affect factor (standardized regression coefficient = 0.37). More detailed results for all models are available upon request from the corresponding author.

Table 2. Fit indices for confirmatory factor analysis models*
Model fit indicesχ2dfCFITLIRMSEASRMR
  • *

    CFI = comparative fit index; TLI = Tucker-Lewis index; RMSEA = root mean square error of approximation; SRMR = standardized root-mean-square residual; see Table 1 for additional definitions.

Model 1: 1 factor1,079.4580.700.860.190.14
Model 2: 2 factor (DA + S/V + IP, PA)243.8670.950.980.080.06
Model 3A: 3 factor (DA + S/V, PA, IP)222.9670.960.980.070.06
Model 3B: 3 factor (DA + PA, S/V, IP)973.3570.730.870.190.13
Model 4A: 4 factor (DA, S/V, IP, PA)180.2670.970.990.060.05
Model 4B: 4 factor, second-order (DA, S/V, IP, PA)180.5670.970.990.060.06
Table 3. Factor correlations (model 4A) and second-order factor loadings (model 4B) for confirmatory factor analysis models*
Factor correlations and second-order factor loadingsDAS/VPAIP
  • *

    See Table 1 for definitions.

Model 4A: correlated 4 factor    
Model 4B: second-order 4 factor    
 Second-order factor loadings0.990.860.370.76

Convergent validity.

Spearman's correlations between the CES-D total score and related self-report measures were −0.73 for the MCS, −0.36 for the PCS, 0.41 for the HAQ DI, and 0.44 for the MPQ. All correlations were in the expected direction and, as expected, the correlation with the MCS was strong whereas the other correlations were in the moderate range.


This study evaluated the reliability and construct validity of the CES-D in a Pan-Canadian sample of 470 patients with SSc. The main findings of this study were that the CES-D had good internal consistency reliability and convergent validity among patients with SSc and that both first- and second-order versions of the standard 4-factor model of the CES-D fit the data well and better than alternative models. Cronbach's alpha for the overall CES-D was 0.88. A widely used standard suggests that self-report measures should have an internal consistency reliability of ≥0.70 and ≥0.80 for use as a screening tool (15). Given that the coefficient alpha is influenced by the number of items in a scale, the internal consistency reliabilities of the CES-D factor subscales were also very strong (from 0.67 for the 2-item interpersonal factor to 0.88 for the 7-item depressed affect factor). The CFA analysis showed that, consistent with other studies, the CES-D items clustered into 4 interpretable factors: depressed affect, somatic/vegetative, interpersonal, and (lack of) positive affect. The good fit of the second-order model supports the use of a total score of the CES-D as a global indicator of levels of depressive symptoms in patients with SSc. Thus, although the positive affect scale, for instance, appears to be only weakly related to the overall depressive construct, the total score is a valid measure. Nonetheless, it is possible that a shorter version of the CES-D could provide equally good measurement in patients with SSc, and this should be tested in future work. A strength of this study was that the factor structure was tested with rigorous methods that explicitly modeled the CES-D items as ordinal data.

One limitation of this study is that it did not address criterion-related validity by comparing cutoff scores on the CES-D with a gold standard, such as a structured interview for major depression. Thus, this report establishes that the CES-D is a valid continuous measure of depressive symptoms, but standard cutoff scores for detecting depression need to be verified for patients with SSc. One study reported that a cutoff of ≥19 is best in patients with RA (7), rather than a standard cutoff of 16, although this finding has not been replicated. In addition, more research is needed to assess the degree to which somatic symptom overlap, if any, may bias symptom severity estimates made with the CES-D in patients with SSc. Two studies of patients with RA have reported that several somatic items of the CES-D reflect both depressive symptoms and RA disease factors, although both of these studies concluded that the effect on the CES-D total score was minimal (16, 17). In addition, although recruitment rates were high, the present sample was technically a convenience sample, and characteristics of patients not yet enrolled in the CSRG registry are not available.

In a recent Delphi exercise (18), the CES-D was proposed by members of the Scleroderma Clinical Trials Consortium as a possible outcome measure in SSc. However, it was found to lack proper validation in this patient population. The present study demonstrates that the CES-D is a reliable and valid instrument for measuring depressive symptoms in patients with SSc, although criterion-related validity and specific cutoff scores need to be established.


Dr. Thombs had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study design. Thombs, Baron.

Acquisition of data. Hudson, Taillefer, Baron.

Analysis and interpretation of data. Thombs, Hudson.

Manuscript preparation. Thombs, Schieir, Taillefer, Baron.

Statistical analysis. Thombs.


The authors of this article had access to all study data; are responsible for all contents of the article, and had authority over manuscript preparation and the decision to submit the manuscript for publication. The sponsors of the Canadian Scleroderma Research Group did not have any role in the development of this manuscript or the decision to submit it for publication.