A comparison of self-reports of distress and affective disorder diagnoses in rheumatoid arthritis: A receiver operator characteristic analysis

Authors


Abstract

Objective

To compare 3 commonly used psychiatric symptom checklists (the Center for Epidemiological Studies Depression Scale [CES-D], the Positive and Negative Affect Schedule, and the Endler Multidimensional Anxiety Scales [EMAS]) to determine their sensitivity, specificity, and ability to discriminate between a disorder (Major Depression [MD], Generalized Anxiety Disorder [GAD]), and no disorder. To compare the checklists for their ability to discriminate between type of disorder (MD and GAD). To evaluate the discriminant ability of the subscales, particularly positive affect; whether the somatic items in the CES-D artificially inflate affective scores; and the optimal cut off score for the CES-D.

Methods

We compared the 3 scales to diagnostic criterion of MD, GAD, and comorbid disorder using receiver operator characteristic (ROC) and logistic regression analyses. The sample consisted of a national panel of 415 individuals with rheumatoid arthritis (RA).

Results

Each of the scales had high sensitivity and specificity (areas under the curve: CES-D = 0.92, negative affect = 0.88, positive affect and EMAS = 0.82). The CES-D, however, demonstrated better sensitivity and specificity than the positive affect and the EMAS, but not the negative affect scale.

Conclusion

All 3 self-reports have high combined sensitivity and specificity as measures of affective disorders among RA patients.

INTRODUCTION

Elevated levels of emotional distress indicated by depressive symptoms (1–4), diagnoses of depression (5–8), and anxiety (3) are reported consistently across studies of rheumatoid arthritis (RA) patients. Two reasons individuals with RA may be at greater risk for depression than the general population are high levels of pain and/or functional impairment or a common neurobiologic mechanism underlying both depression and RA (9–11). An important rival hypothesis is that elevated levels are a measurement artifact.

An accurate assessment of the scope of emotional distress is important and researchers exploring a connection between RA and affective disorders need reliable and valid brief scales for screening individuals. Unfortunately, scores on the most frequently used measures of psychological distress (screening scales) may be inflated for a variety of reasons, calling into question the meaning currently ascribed to the high distress levels seen in RA samples (1, 2, 8, 12). The problems stem from 3 sources: 1) findings have not been based on large, representative samples of RA patients; 2) many depression and anxiety scales have adequate convergent validity but low discriminant validity; and 3) there may be overlap between emotional distress items and typical RA symptoms, i.e., criterion contamination (13). There is particular concern that distress detected by the widely used Center for Epidemiological Studies Depression Scale (CES-D) might not be due to anxiety, depression, or concurrence of the 2, but rather that it is an artifact of overlap between CES-D somatic items and RA disease severity (1, 2, 11).

To address these limitations and to test the measurement artifact hypothesis, we employed a large, nationally representative sample; used a structured diagnostic interview; used a multidimensional approach to enhance discriminant validity of screening scales suggested by Clark and Watson (14); and compared the combined sensitivity and specificity of the CES-D with and without the somatic items using receiver operator characteristic (ROC) curves. This study also provides information to determine appropriate cut off scores for major depression (MD) and generalized anxiety disorder (GAD) for the continuous scales commonly used on the RA population (15–18).

The intensity of depressive symptoms among individuals with RA has been widely studied using various indicators, including the CES-D (1, 2, 11), the Beck Depression Inventory (BDI) (19, 20), and the Arthritis Impact Measurement Scales (3, 19, 21–23). In samples of individuals with RA, and using the conventional cut off for general populations, investigators have found rates of depression ranging from 23% using the BDI to 46% using the CES-D (1). Although anxiety symptoms among RA patients have been studied less frequently than symptoms of depression, studies that have assessed anxiety report rates comparable to that reported for depression (3, 19).

When individuals with RA report high levels of psychological distress, it is not clear that researchers should be reporting their findings as evidence of depression. Brief self-report scales are economical and often have good convergent validity (indicated by high correlations with other measures of the same construct), but they do not have good discriminant validity (indicated by high correlations with other measures of related but not identical constructs). It is possible, of course, that while the underlying constructs may be conceptually distinct, they may be empirically colinear. As a result, anxiety measures correlate or load highly with depression measures and it is unclear whether the distress that is being measured is due to anxiety, depression, or both (3, 12, 19, 24–26).

The somatic items included in most measures of depression and anxiety leave open the possibility of criterion contamination (13). This contamination occurs when items that were designed to assess dimensions of depression or anxiety actually reflect aspects of RA. For example, questions about fatigue or difficulty “getting going” can reflect depression, the effects of RA, or both. Callahan et al (27) and Blalock et al (1) found evidence consistent with the idea that the somatic items of the CES-D are elevated among RA patients due to RA symptoms, not greater depression. Rhee et al (11) found some inflation of total CES-D scores in an RA sample attributable to somatic items. Blalock and colleagues (1) recommended that investigators remove 4 CES-D items to reduce the inflation in scores deriving from RA disease severity.

These problems with common self-report screening questionnaires have compromised our understanding of connections between RA and affective disorders. Fortunately, methods are now available to improve the discriminant validity of questionnaire measures to differentiate between depression and anxiety. Two such methods are used in this study. The first is a measure of state anxiety that was designed specifically to distinguish anxiety from depression (25). The second approach is based on the assessment of positive affect (PA) and negative affect (NA), as suggested by Clark and Watson (14). NA refers to feeling “upset or unpleasantly engaged rather than peaceful,” and PA refers to feeling a “zest for life and pleasurable engagement” (14). These 2 affects can be measured as distinct and orthogonal factors (14). They discriminate between anxiety and depression because NA is associated with both depression and anxiety, but low PA is unique to depression. These findings have been reported in nonclinical (28, 29) and clinical samples (30–32). In this study, we assess whether the discriminant validity of the CES-D can be increased by using 2 subscales of the CES-D, identified in our previous work (33), that closely resemble PA and NA.

In summary, we pose the following questions: 1) Which of the commonly used distress scales demonstrates the highest sensitivity and specificity? 2) Do subscale scores improve our ability to distinguish between anxiety disorders and depressive disorders, and does the measurement of PA enhance this distinction? 3) Do somatic items in depression scales, such as the CES-D, artificially inflate reports of the prevalence of affective disorder among individuals with RA, and does removing those items improve scale sensitivity and specificity? 4) What is the optimal cut off score on the widely used CES-D to detect affective disorder in a nationally representative RA sample?

PATIENTS AND METHODS

Patients.

A subset of patients from the National Rheumatoid Arthritis Study (NRAS) were recruited for this project. NRAS was a prospective panel study that completed its tenth and final year in 1997–1998. The panel of 988 patients with classic or definite RA (34) was recruited from a national, random sample of board-certified rheumatologists (details of the recruitment are published elsewhere [35]). At the close of their eighth year interview, the 508 patients remaining in the panel were asked if they would be interested in participating in an additional interview about their emotional and physical well being. A total of 462 (91%) agreed to the followup interview and 415 (90%) completed it. Consistent with the 3:1 prevalence of RA among middle-aged women to men, 83% of the sample was female. They were largely upper middle-aged (mean 58 ± SD 9.7 years), married (68%), and out of the labor force (65%). The relatively low employment rate is similar to other samples of RA patients (36).

Methods.

Diagnostic and scale responses were obtained by telephone interviews that lasted approximately 30 minutes. Current and lifetime psychiatric diagnoses of MD, GAD, and comorbid disorder (CD) were obtained using the Semi-Structured Assessment for the Genetics of Alcoholism (SSAGA) (37–39) based on the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV) (40). The SSAGA is suitable for either telephone or face-to-face administration by lay interviewers (38, 39).

The SSAGA was selected because it provides a complete and detailed lifetime psychiatric history for adults. The SSAGA interview schedule covers the major Diagnostic and Statistical Manual of Mental Disorders Revised Third Edition, DSM-IV, International Classification of Diseases, 10thRevision defined Axis I psychiatric disorders. In a combined sample of subjects drawn from the general population and from outpatient psychiatric patients, the SSAGA has been shown to have good within and between site reliability (37). Using a combined sample of outpatient psychiatric patients and subjects drawn from the community with unknown psychiatric histories, the SSAGA has been shown to be valid compared with other standardized psychiatric diagnostic instruments, i.e., the Schedule for Clinical Assessment in Neuropsychiatry (SCAN) (38). The SSAGA is currently being used in more than 55 studies in the US and 10 studies in foreign countries.

The SSAGA is useful in the arthritis population because it links episodes of MD or GAD to comorbidities and flares. Interviewers, supervised by a clinical psychologist, completed approximately 20 hours of SSAGA training. All of the interviews were edited for accuracy by a research staff member with a master's degree in psychology and several years of experience with development of the interview.

Measures.

The Center for Epidemiological Studies Depression Scale.

The CES-D scale consists of 20 questions chosen to reflect various aspects of depression, including depressed mood, feelings of guilt and worthlessness, feelings of helplessness and hopelessness, psychomotor retardation, loss of appetite, and sleep disturbance (41). Respondents are asked to think of the last week and report the frequency of occurrence for each item on the following 4-point scale 0) rarely, that is less than 1 day; 1) some of the time, 1 to 2 days; 2) a moderate amount of the time, 3 to 4 days; or 3) most or all of the time, 5 to 7 days. Scores can range from 0–60. Aneshensel et al (42) found that phone and in person administrations produce comparable scores. Past studies show that the CES-D demonstrates adequate test-retest stability, as well as concurrent validity and construct validity (41).

Although the CES-D appears to suffer from the same discriminant validity problems as other popular depression questionnaires used in arthritis research (1), it has been chosen for this study for 3 reasons. First, along with the BDI, it has been rated among the best self-report measures of depression and anxiety based on content validity (43). Second, it has been widely used for some time among arthritis researchers, and knowledge about its validity or how to improve it would be very useful in the analysis of data appearing in the literature. Finally, previous analyses have identified the factor structure of the CES-D, which may enhance its discriminant validity (33).

Blalock et al (1) identified 4 somatic CES-D items that could be confounded with RA symptoms (I felt that everything I did was an effort, My sleep was restless, I felt hopeful about the future, I could not get going). We created a version of the CES-D that removed these somatic items and prorated the remaining items to retain the range of the original scale.

The state scale of the Endler Multidimensional Anxiety Scales (EMAS).

This scale was used to measure state anxiety (44). The EMAS-State is a 20-item measure with each item rated on a 5-point intensity scale. It includes a 10-item cognitive-worry subscale and a 10-item autonomic-emotional scale. Previous studies report good reliability for both scales (0.84 or better). Fifield (45) has reported that administration by telephone and questionnaire produce comparable results

The Positive and Negative Affect Schedule (PANAS).

The PANAS (32) was used as the measure of positive and negative affect. The measure includes two 10-item mood scales for NA and PA, which form independent factors. Each item is rated on a 5-point scale from 1 indicating “felt very slightly or not at all” to 5 indicating “felt very much.” Watson et al (32) have reported extensively on the reliability and validity of the scales, finding strong convergent and discriminant validity with other lengthier mood measures (32). We used the time frame that refers to “over the past week” to make the PANAS comparable to the CES-D and the EMAS. Fifield (45) has reported that administration by telephone and questionnaire produce comparable results

Diagnosis of affective disorder (MD, GAD, CD).

DSM-IV (40) criteria to qualify for a diagnosis of MD include 1) depressed mood or loss of interest and pleasure in things that the individual usually cared about or enjoyed every day or nearly every day for 2 weeks or more at some time in the past while experiencing impaired role functioning; and 2) 4 of 8 additional symptoms: problems with appetite, sleep, fatigue, energy, interest, self worth, cognition, or suicidal ideas. The episode could not be due to injury, illness, medication or alcohol, childbirth, or the loss of a loved one within certain time parameters. A diagnosis of current MD required that symptoms occurred within 3 weeks of the interview, whereas a diagnosis of lifetime MD required that the symptoms occurred anytime in the past but not in the past 3 weeks.

DSM-IV criteria for GAD include 1) excessive anxiety and worry for 6 months; 2) difficulty controlling worry; and 3) at least 3 of the following 6 symptoms experienced nearly every day during the episode: restlessness, being easily fatigued, difficulty concentrating or mind going blank, irritability, muscle tension, sleep disturbance. Symptoms cannot be due to the direct effects of a substance or medical condition, they must manifest themselves in role impairment, and they cannot be exclusively coincident with mood disorder. Current GAD required all of these criteria within the 6 months preceding the interview, whereas lifetime GAD required the same criteria for 6 months in the past prior to that time.

DSM-IV does not include criteria for a CD. Previous studies show that those meeting criteria for both disorders have more severe and persistent emotional symptoms with more handicapped social lives (46, 47). Therefore, we classified people who met full criteria for both MD and GAD as CD for the preliminary analysis. For the ROC analysis, people with CD are combined with the people diagnosed with MD or GAD in a general affective disorder category (n = 37) or, in the analyses attempting to discriminate between MD and GAD, they are eliminated (n = 27) because they can not be placed in either group.

Statistical analysis.

We use a variety of methods to assess the effectiveness of the 3 scales to discriminate between those with anxiety, depression, both, or neither. Preliminary analyses of convergent and discriminant validity include bivariate correlations among the full and subscales and analysis of variance (ANOVA) tests of the differences in mean scale scores by affective disorder (MD, GAD, CD). We used Scheffe and Bonferroni tests for multiple post hoc comparisons between the groups. ROC curves simultaneously estimate the sensitivity and specificity of the screening scales, and provide significance tests for the differences between scales. ROC analysis has been used in other studies to assess the sensitivity and specificity of screening tests for depression and/or anxiety (15–17, 48). However, ROC analysis cannot evaluate the combined effect of subscales to discriminate between groups. Interaction terms created by multiplying subscale scores provide this information, but cannot be used in ROC analysis. Therefore, we tested the possibility that the interaction between the PA and depressed affect (DA) subscales of the CES-D and the PA and NA subscales of the PANAS scale provide more information than the full scale or than the subscales in an additive logistic regression model.

The specificity and the sensitivity rates for each scale provide information about the ability of the instruments to discriminate between those with a disorder and those without, and between disorders (MD and GAD). Determining specificity and sensitivity usually requires choosing cut off scores for the continuous scales, therefore making the results dependent on the selected cut off. Somoza and Mossman (49) and Metz (50) suggest ROC analysis as a way to overcome the problem of having several cut off scores or selecting only one. ROC analysis provides an overall description of a scale's combined sensitivity and specificity (ability to discriminate) throughout its entire range of possible cut off scores, summarized in an area score (Az).

ROC analysis creates an estimated, smoothed curve with a confidence interval for generalizing to the population. Generalizing requires assuming that the scale's scores are normally distributed within each of the groups (disorder/no disorder or MD/GAD), even if the means in each group are different (binormal assumption). Swets (51) indicates that the binormal assumption is valid for a wide variety of diagnostic tests, and Hanley (52) describes ROC analysis as robust to deviations from normality. We used the CLABROC program, part of the ROCKIT software (53), to conduct the ROC analysis. CLABROC is a maximum likelihood program that creates estimated curves for correlated, paired, continuous distributions. The output from the CLABROC program provides information to create ROC curves, a confidence interval around the area under the curve (Az), and comparisons between 2 scales on the same group of people. The Az parameter indicates how much the scale improves discrimination over chance. The Az is an overall index of the accuracy of the scale. Higher Az scores indicate greater combined sensitivity and specificity. Differences in the accuracy of the scales were tested using the area under the curve test (95% confidence level, 2-tailed test). According to Mossman and Somoza (54), the Az represents the probability that 1 of the patients who was diagnosed with the scale as having an affective disorder ranked higher on a particular scale than another patient, who was randomly identified and did not have an affective disorder, ranked on the same particular scale.

To use the ROC analysis to select an optimum cut-off score for a scale, prevalence rates need to be specified because discrimination by score varies by prevalence. Because the optimal cut-off scores depend on the prevalence in the population and the research/clinical situation, it is impossible to provide a single best score. However, we calculated the true positive rate and the false positive rate for various cut off scores for all of the scales (available from the first author) (16).

Logistic regression analysis provides a way to assess the effectiveness of combinations of the subscales (interaction terms) to discriminate between those with and those without an affective disorder, or between disorders. Interaction terms that combine information from the PA and NA subscales of the PANAS and the PA and DA subscales of CES-D should have larger coefficients than either scale alone if Clark and Watson's (14) tripartitie model of affective disorder is correct. However, a problem arises when trying to interpret odds ratios when the independent variables are continuous. Because the coefficients describe changes per unit change in the independent variable, for the coefficients to be comparable, the independent variables need to be standardized by subtracting the scale mean from each value of the independent variable and dividing by the standard deviation of the independent variable.

RESULTS

Sample description and prevalence of affective disorders.

Similar to the RA population, most of the study participants are women (83%) and married (68%), but only one-third are currently employed (35%). At entry into the study, participants had been living with a diagnosis of RA for an average of 10 years; the mean age at diagnosis was 39.5 years. Median family income was $35,000 and median education was 1 year of post-high school education. Average pain and fatigue levels in the sample during the eighth wave of the study were at the middle of the range (45 for pain, 51 for fatigue). Of the 415 participants in this study, 9% met the criteria for current major depressive episode (MD = 4%), current generalized anxiety disorder (GAD = 3%), or both simultaneously (CD = 2%).

Means and alpha reliabilities of the scales.

All of the screening scales had adequate alpha reliabilities (Table 1). The lowest was 0.71 for the interpersonal subscale of the CES-D, the highest is 0.93 for the EMAS summary scale. We also constructed a CES-D scale with the somatic items removed (CES-Dnoso). The items for this scale were prorated to make a scale with the same possible range of scores as the original CES-D scale. The mean for the scale without the somatic items (mean = 10.17) is lower than for the scale with the somatic items (mean = 12.23, paired t-test P < 0.05); however, the 2 scales were almost perfectly correlated (0.99). Squaring this coefficient (R2 = 0.97) indicates that less than 3% of the original CES-D score is explained by the somatic items. Although the scores were significantly lower when the somatic items were removed, suggesting some RA contamination, the magnitude of the difference is small. We further explore how much the somatic items inflate the CES-D scores among people with RA using the ROC analyses.

Table 1. Means and reliabilities for the CES-D, EMAS, and PANAS with subscales (n = 415)*
ScaleMeanSDAlpha reliabilityMin–max
  • *

    CES-D = Center for Epidemiological Studies Depression Scale; EMAS = Endler Multidimensional Anxiety Scales; PANAS = Positive and Negative Affect Schedule; SD = standard deviation; Min = minimum; max = maximum; CES-Dnoso = CES-D with somatic items removed.

CES-D12.2311.000.750–50
 CES-Dnoso10.1710.730.720–47
CES-D subscales
 Somatic5.794.440.830–21
 Interpersonal0.961.680.710–10
 Depressive2.663.370.880–15
 Positive9.193.020.830–12
EMAS29.8312.050.9320–84
 Autonomic worry14.505.760.8310–40
 Cognitive worry15.216.950.9110–40
PANAS positive affect  scale30.497.980.8910–50
PANAS negative affect  scale17.697.480.9110–50

Correlations among the scales and subscales.

Table 2 provides the correlations among the scales and subscales for all of the cases (n = 415, bottom half of the diagonal) and for just those cases with an affective disorder (n = 37, top half of the diagonal).

Table 2. Correlations among the scales and subscales*
 12345678910
  • *

    Bottom half contains all study participants (n = 415), top half (italics) contains just those with an affective disorder (n = 37). All of the correlations in the bottom left half are significant at the 0.01 level (2-tailed) (n = 415). All of the correlations in the top right half are significant at the 0.05 level (2-tailed) except for those in bold (P values are greater than 0.05) (n = 37). See Table 1 for definitions.

1 CES-D interpersonal 0.460.560.510.760.480.350.450.250.46
2 CES-D somatic vegitative0.63 0.700.320.850.530.340.470.220.41
3 CES-D depressed affect0.730.79 0.340.860.470.270.400.330.51
4 CES-D positive affect (reversed)0.600.610.68 0.640.380.080.250.410.45
5 CES-D summary score0.800.910.930.82 0.590.340.500.370.57
6 EMAS cognitive worry0.650.730.700.590.77 0.720.930.090.53
7 EMAS autonomic worry0.520.650.570.430.630.75 0.92−0.200.30
8 EMAS summary score0.630.740.690.560.760.950.92 −0.050.45
9 Positive affect (reversed)0.410.490.470.630.580.410.300.38 0.10
10 Negative affect0.670.720.750.600.790.760.640.750.36 

We focus first on the correlations in the entire sample. The first 5 columns contain the subscales of the CES-D, as identified by Sheehan et al (33). All of the correlations are above 0.60, indicating adequate convergent validity. All of the subscales had strong positive correlations with the full scale (0.80–0.93). These patterns of correlations provide evidence of good convergent validity for the CES-D subscales.

Next, we assessed the correlations between the subscales of the CES-D and the subscales of the EMAS and the PANAS. Some of the correlations between the depression subscales and the anxiety subscales are quite high (5 correlations are higher than 0.70, the lowest was 0.43). The strong positive correlation between both the depression subscales (CES-D) and the anxiety subscales (EMAS) and NA indicates that both scales were tapping negative affect. NA and PA had, as they should, a modest correlation (r = 0.36).The correlations for study participants with a diagnosis of either MD or GAD (n = 27) are similar to the full sample. The smallest correlation is −0.05, the largest is 0.86. The overall pattern of correlations among those with an affective disorder is similar to the pattern among all study participants; there is better convergent validity than discriminant validity.

Distinguishing between MD, GAD, and CD.

We compared mean CES-D, NA, PA, and EMAS scores for those with no affective disorder (n = 378), those with MD (n = 16), GAD (n = 11), and CD (n = 10) using ANOVA (Table 3).

Table 3. Means, SDs, and 95% CIs for the scales by affective disorder category*
ScaleAffective disorder categoryNMeanSD95% CI
  • *

    SDs = standard deviations; 95% CIs = 95% confidence intervals; MD = major depression; GAD = generalized anxiety disorder; CD = comorbid disorder. See Table 1 for other definitions.

Positive affectNo disorder37831.307.6230.53–32.07
 MD1622.817.5318.80–26.82
 GAD1122.186.6617.71–26.66
 CD1021.406.6916.62–26.18
Negative affectNo disorder37816.666.6215.99–17.33
 MD1627.257.6123.19–31.31
 GAD1128.739.2622.50–34.95
 CD1029.106.7424.28–33.92
EMASNo disorder37828.4010.6427.33–29.48
 MD1647.9416.8438.96–56.91
 GAD1141.0916.0830.29–51.89
 CD1042.3013.7432.47–52.13
CES-DNo disorder37810.489.389.53–11.42
 MD1630.9410.0025.61–36.26
 GAD1127.2710.0720.51–34.04
 CD1032.2011.3724.07–40.33
CES-DnosoNo disorder3788.469.097.54–9.38
 MD1627.9710.0622.61–33.33
 GAD1024.5010.6616.88–32.12
 CD1029.6312.1520.93–38.32

The group without an affective disorder had lower average scores on all of the scales than the groups with a disorder. The average scores for those in the affective disorder categories were at least 1 standard deviation above the mean for all scales. These results indicate that each of the screening questionnaires were able to distinguish between individuals with an affective disorder and those without one. There were large overlaps in the confidence intervals between types of disorders (GAD, MD, or CD), suggesting that none of the scales were able to differentiate between the types of disorders. Furthermore, the subscales showed no apparent advantage over the full CES-D or EMAS. PA scores were comparable for those with GAD, MD, or both.

ROC analysis.

We first used ROC analysis to test the ability of the scales and subscales to discriminate between those with GAD and those with MD (n = 27). Because the CD participants fit in neither group, they are omitted from this analysis and only are incorporated in the analysis of those with and those without a disorder. All of the scales showed poor discriminant validity (Az scores range from 0.45 to 0.66; none were significantly better than chance). Only 2 of the subscales had Az scores that were different from their full scales: the interpersonal and DA subscales of the CES-D were each significantly worse than the full CES-D at discriminating between depression and anxiety.

Figure 1 shows the ROC analyses of the full scales using affective disorder (AD), that is MD or GAD or both, as the criterion. The ROC curves for the CES-D, NA, EMAS, and PA show areas under the curve ranging from a high of 0.92 for the CES-D to 0.82 for the PA. The CES-D had a significantly higher area under the curve than the EMAS (Az = 0.92 versus 0.82; P = 0.003) and the PA (Az = 0.92 versus 0.82; P = 0.003), but not the NA. These results indicate that the CES-D is better than the EMAS, PA, or NA at differentiating between those with and those without an affective disorder. Although not shown in the figure, the curve for the CES-D without the somatic items was not different from the CES-D with the somatic items.

Figure 1.

Receiver operator characteristic curves for the full scales (Center for Epidemiological Studies Depression Scale [CES-D], negative affect [NA], Endler Multidimensional Anxiety Scales [EMAS], and positive affect [PA]).

All of the Azs were high, suggesting that all of the scales had good combined sensitivity and specificity (all are higher than 0.80) to predict membership in the affective disorder group compared with the nonaffective disorder group. We also tested the subscales of the CES-D and the EMAS compared with the full scales to ascertain if they added information hidden in the combined scales (Figure 2). None of the subscales had significantly higher Az scores than the full scales. The interpersonal, DA, and PA subscales had significantly lower Az scores than the full CES-D scale.

Figure 2.

Receiver operator characteristic curves for the full and subscales of the Center for Epidemiological Studies Depression Scale (CESD) and Endler Multidimensional Anxiety Scales (EMAS) (n = 415). CESD-DA = CESD depressed affect; CESD-SV = CESD somatic vegitative; CESD-PA = CESD positive affect; CESD-IN = CESD interpersonal; EMAS-AW = EMAS autonomic worry; EMAS-CW = EMAS cognitive worry.

Logistic regression analysis.

We tested the possibility that the interaction between the PA and DA subscales of the CES-D and the PA and NA subscales of the PANAS scale provide more information than either full scale or than the subscales in an additive logistic regression model. The EMAS subscales have not previously been examined for interactive effects; however, we decided to test the possibility that the subscales might be more effective than the full scale in a fashion parallel to the CES-D and PANAS. We focused only on the whole sample (n = 415) because the disordered subsample did not have sufficient cases for the analysis (the Hoesmer-Lemeshow test showed that several categories had fewer than the minimum required 5 expected cases).

The logistic regression analysis revealed that both the full CES-D and the PA and DA subscales of the CES-D showed significant positive associations with the likelihood of having an affective disorder diagnosis. Combining information from the DA and PA subscales does not contribute to the simple additive model. The pattern is the same for the PANAS, but slightly different for the EMAS. For the EMAS, when both of the subscales (autonomic worry and cognitive worry) were in the model, the autonomic worry subscale ceased to be a significant predictor, suggesting that there was considerable overlap between these subscales.

Cut point for the CES-D.

ROC analysis also allowed us to evaluate the best cut point for determining membership in the affective disorder category. The generally accepted cut point for the CES-D is 16. Some researchers have questioned the applicability of this value for the RA population because of the possible criterion contamination of the somatic items. We saw little evidence of contamination. Both the false positives and the true positives increased substantially when the cut point scores were increased from 17.5 to 20.5. There was very little difference between scores of 16.5 and 17.5.

The ROC program automatically selects cut point scores from the raw data. Because cut points of 16 and 19 have been suggested in past research, we examined these 2 cut point scores for the full CES-D and the CES-Dnoso (somatic items removed) using simple cross-tabulated data (Table 4). In this sample, there was a difference of only 1 “true” case missed when 19 was used as a cut score, compared with 16, but there were 22 more false positives with 16. Like other studies using the CES-D, we found that the specificity (0.89) of the CES-D is much better than the sensitivity (0.24) at the standard cut off score of 16.

Table 4. Comparing the CES-D and the CES-Dnoso at 2 conventional cut points (n = 415)*
ScaleCut pointn (%)True positives missedFalse positivesSensitivitySpecificity
  • *

    CES-D = Center for Epidemiological Studies Depression Scale; CES-Dnoso = CES-D with somatic items removed.

CES-D16122 (29)4890.890.24
CES-Dnoso16114 (27)4810.890.21
CES-D1999 (24)5670.860.18
CES-Dnoso1986 (21)8570.780.15

DISCUSSION

This study provided a unique opportunity to review the usefulness of 3 popular scales of depression and anxiety for use as screeners in RA research. Its uniqueness stems from the national representative sample and the availability of both screener and diagnostic interview scores. Using these data, we found that it is possible to detect affective disorders among people with RA using short symptom scales. Furthermore, we found much lower rates of depression and anxiety in this sample than reported in other RA studies; the percentages are closer to the national average than previously found in RA samples (55). We do not attribute this lower rate to our decision to exclude dysthymia from our analyses, because only an additional 2% of the sample (8 cases) met criteria for dysthymia only. Rather, we conclude that higher rates of depression and depressive symptoms found in other studies are more likely due to the characteristics of the samples than weaknesses in the scales, because the scales, particularly the CES-D, demonstrated high combined sensitivity and specificity.

Similar to Breslau (12), we found that the CES-D does not differentiate between MD and GAD, but does detect high levels of depression and anxiety equally well. None of the self-report measures of distress discriminated between DSM-IV MD and GAD. Despite the promise of Clark and Watson's (14) tripartite model for differentiating between anxiety and depression using positive and negative affect subscales, PA did not differentiate between MD and GAD in this sample. We do not have a satisfactory explanation for this result. Possible explanations include sample-specific findings, a small number of individuals with MD or GAD, something particular about the RA population, or that MD and GAD share too much of a common core to be discernible with general scales. The evidence of comorbidity in this and other studies suggests that differentiating between MD and GAD may be less important than identifying the existence of an affective disorder.

Because the consequences of comorbidity are quite high, physicians should attempt to detect and treat those cases that emerge. In addition to RA–affective disorder comorbidity, there is evidence of comorbidity among anxiety disorders. Fifty to sixty percent of individuals who meet criteria for 1 anxiety disorder diagnosis also meet criteria for additional comorbid anxiety diagnoses (56). Individuals with anxiety disorders are also likely to have major depression and substance use disorders (57). When they occur together, anxiety disorders tend to precede the development of mood disorders, such as depression or substance use (58). This temporal ordering suggests that anxiety may contribute to the development of other psychiatric problems, making early detection a high priority.

What do these results suggest for clinical practice and future research on RA/affective disorder connections? In clinical practices where the costs of false positives are low, using a cut point of 16 for the CES-D with followup screening seems prudent. In prevalence research where questions about the relationship between RA and affective disorders are the focus, a cut point of 19 for the CES-D is more conservative and therefore preferable. If interview time is limited, using the DA subscale of the CES-D is almost as reliable as the full scale; however, cut points for the subscale need to be established.

Researchers should be cautious in assuming a higher prevalence of MD or GAD in RA populations. This study, using a representative sample and diagnostic interview schedule derived diagnoses, shows similar rates to a general population national survey. Despite similar numbers, because the consequences of comorbidity are greater than each disorder alone, it is essential to identify those meeting criteria for both affective disorders, and to continue to explore connections between chronic illnesses such as RA and affective disorders.

Ancillary