This publication's contents are solely the responsibility of the authors and do not necessarily represent the official views of the NIH.
To determine the degree of discordance between patient and physician assessment of disease severity in a multiethnic cohort of adults with rheumatoid arthritis (RA), to explore predictors of discordance, and to examine the impact of discordance on the Disease Activity Score in 28 joints (DAS28).
Adults with RA (n = 223) and their rheumatologists completed a visual analog scale (VAS) for global disease severity independently. Patient demographics, the 9-item Patient Health Questionnaire (PHQ-9) depression scale score, the Health Assessment Questionnaire score, and the DAS28 were also collected. Logistic regression analyses were used to identify predictors of positive discordance, defined as a patient rating minus physician rating of >25 mm on a 100-mm VAS (considered clinically relevant). DAS28 scores stratified by level of discordance were compared using a paired t-test.
Positive discordance was found in 30% of cases, with a mean ± SD difference of 46 ± 15. The strongest independent predictor of discordance was a 5-point increase in PHQ-9 score (adjusted odds ratio 1.61, 95% confidence interval 1.02–2.55). Higher swollen joint count and Cantonese/Mandarin language were associated with lower odds of discordance. DAS28 scores were most divergent among subjects with discordance.
Nearly one-third of RA patients differed from their physicians to a meaningful degree in assessment of global disease severity. Higher depressive symptoms were associated with discordance. Further investigation of the relationships between mood, disease activity, and discordance may guide interventions to improve care for adults with RA.
Accurate assessments of disease activity in rheumatoid arthritis (RA) are central to establishing disease severity and monitoring response to treatment. With the advent of increasingly effective yet potentially toxic therapies, the need for patient-provider agreement, or concordance, around assessments of disease activity is critical to the safe and effective management of RA. These assessments, which rely on both subjective measures (patient self-report) and objective measures (physician-assessed joint counts, acute-phase reactants), pose a significant challenge to the field of rheumatology.
Whereas diseases such as diabetes mellitus or hypertension have objective, numerical measures to assess severity and treatment response (glycosylated hemoglobin or blood pressure), RA disease activity lacks a single gold standard. Composite scores such as the Disease Activity Score in 28 joints (DAS28) (1) are routinely used in clinical trials but less commonly used in practice. One key component of the DAS28 is the patient global assessment of disease severity as measured on a visual analog scale (VAS). Given that new American College of Rheumatology (ACR) recommendations (2) include a disease activity score to determine eligibility for nonbiologic and biologic therapies in RA, the patient global assessment will need to be collected more systematically in practice.
Support for the clinical value of concordance can be found in nonrheumatic chronic conditions, for which studies have shown that when doctors and patients agree, adherence and outcomes improve (3). Despite the importance of concordance in assessments of disease activity, little is known about the prevalence and correlates of discordance, or disagreement, around commonly used measures in RA. The ACR core set of disease activity measures includes both patient and physician global assessment of disease severity (4). Discordance between physicians and patients on these measures as well as other measures of health status have been reported previously in RA (5, 6). Although these results, among others, show that discordance exists (6–10), there is a paucity of research to help us understand why such a gap exists. The studies that examined discordance in RA identified patient age, sex, and education level as being associated with discrepancies in assessments but did not evaluate the possible association of patients' language or mood with discordance, both of which pose barriers to communication and have been associated with variation in symptom reporting (11, 12). Language barriers have also been associated with lower patient satisfaction, poorer health outcomes, and increased mortality in a number of chronic conditions (13–16), but have not been studied in RA. Comorbid depression in chronic disease states has been linked to underestimation of symptoms by physicians and suboptimal communication, but a similar association has not been examined in RA (17, 18).
Although discordance in RA has been documented, no study to our knowledge has sought to better understand this phenomenon in an ethnically diverse population that includes non-English–speaking patients, nor has any study evaluated the impact of depressed mood on agreement. In addition, no study has yet examined the effect of discordance on the DAS28. Therefore, our study had 3 objectives. The first was to determine the degree and directionality of discordance between patients' and physicians' assessments of global disease severity in an ethnically diverse cohort of adults with RA. The second was to explore prespecified, patient-level predictors of any measured discordance of disease severity. The final objective was to examine the impact of discordance on the scoring and categorization of disease activity by the DAS28.
PATIENTS AND METHODS
Subjects were participants in the University of California, San Francisco (UCSF) Rheumatoid Arthritis Cohort, a multisite observational cohort. Enrollment began in October 2006. Subjects were consecutively enrolled from 2 outpatient clinics staffed by UCSF faculty and fellows, the Rheumatoid Arthritis Clinic at San Francisco General Hospital, and the university-based UCSF Arthritis Center. Subjects included in this study must have been seen by a rheumatologist at one of these sites at least twice over a previous 12-month period, been ≥18 years of age, and met the 1987 ACR (formerly the American Rheumatism Association) criteria for RA (19). Physician participants were board-certified or board-eligible rheumatologists based in the 2 clinics, including fellows in training. The research protocol was approved by the UCSF Committee on Human Research. All participants gave their informed consent to be part of the study.
Bilingual research associates gathered data on patient demographics, disease characteristics, functional status, and depressive symptom scores in the clinics. Patient demographics included age, sex, ethnicity, language, and country of origin, and were obtained at the time of enrollment in the cohort. Disease characteristics included rheumatoid factor (RF) status and disease activity as captured by a 28 tender and swollen joint count recorded by the physician, a sedimentation rate, and full DAS28 calculated after each visit. Functional status was measured using the Health Assessment Questionnaire (HAQ) (20). Being that this measure has been shown to be stable over a 1-year period, it was included in this study if it was obtained within 1 year of the patient and physician global scores (20). The HAQ is scored from 0–3, where 0 = no disability and 3 = severe disability.
Depressive symptoms were measured using the 9-item Patient Health Questionnaire (PHQ-9) (21). The PHQ-9, the recommended measure to screen for depression in primary care settings, is a validated and reliable screening measure available in English, Spanish, and Chinese (22, 23). The PHQ-9 has a range of scores from 0 to 27. Scores of 0–4, 5–9, 10–14, 15–19, and ≥20 corresponded to none, mild, moderate, moderately severe, and severe depressive symptoms, respectively. We treated the PHQ-9 as a continuous variable with 5-point increments that correspond to none, mild, moderate, moderately severe, and severe depressive symptoms.
Measure of patient-physician discordance.
The primary outcome of this study was the mean of the difference between patient and physician scores on the VAS for global disease severity. The patient and physician global VAS scores were recorded at each clinical visit. Prior to the visit, each patient was asked the following question in English, Spanish, or Chinese: “Considering all the ways that your arthritis affects you, rate how you are doing on the following scale by placing a vertical mark (|) on the line.” The line is a 0–100-mm horizontal line, where 0 = very well and 100 = very poor. After the visit, the physician (blinded to patient results) marked a separate line using the same 100-mm scale. For the purposes of this study, we used the patient and physician VAS scores from the first recorded cohort visit from which data were complete for all measures listed above. Patient and physician ratings were compared by measuring the difference between the two VAS scores (3, 5, 6, 17, 24, 25). In addition to measuring the degree of discordance, we also assessed the direction. For example, if the patient rated herself with worse disease severity than the physician, we termed this positive discordance, being that subtracting a physician score that was lower than the patient's resulted in a positive integer. In the instance where a physician marked disease severity as worse than the patient, we termed this negative discordance.
A 1-sample t-test was used to assess the mean of the difference between patient and physician scores on the VAS for global disease severity. Although no standardized cutoff for a level of clinically significant discordance exists in the literature, prior research suggests that a difference of 25 mm on a 0–100-mm scale is considered clinically meaningful (3). In addition, there is a report that supports that an approximation of the minimum clinically important difference is, on average, equal to one-half of an SD (26). For the purposes of this study, we used >25 mm of difference as the cutoff for discordance. Given the lack of uniformity regarding a clinically significant degree of discordance, we also performed sensitivity analyses using cutoffs of 10 mm and 40 mm.
We used descriptive statistics to characterize differences in patient demographics, disease characteristics, and depressive symptoms between the concordant and discordant groups. Specifically, bivariable relationships between discordance and the patient's age (continuous), race/ethnicity, language (Spanish, English, Cantonese/Mandarin, or other), country of origin (US versus non-US born), and sex were assessed. Bivariable relationships between discordance and other disease characteristics (including RF status, physician-recorded tender and swollen joint counts, DAS28, and HAQ score) were also assessed. The relationship between discordance and depressive symptoms as measured by the PHQ-9 (continuous) was also assessed. We used chi-square or Fisher's exact tests for categorical variables and analysis of variance or Kruskal-Wallis tests for continuous variables.
We used a multivariate logistic regression analysis to measure the independent effects of patient demographics, disease characteristics, and depressive symptoms on discordance. As has been done in a prior study (17), subjects with negative discordance defined as lower than a −25-mm difference (n = 12) were included in the nondiscordant group because of the small number, and models were run with and without this group. In the multivariate model, we included those covariates that were significant at P < 0.20 in the bivariate analyses. Clinic site, patient age, sex, and language were also included in the multivariable analysis because they may affect doctor-patient communication, and therefore discordance, despite or because of the presence of in-person or video monitor interpreter services. Finally, we used generalized estimating equations to account for clustering by physician.
Additional analyses: discordance and the DAS28.
In order to explore how patient-physician discordance in assessment of disease activity may affect the DAS28 and categorization of severity (low, moderate, high), we compared mean DAS28 scores between concordant and positive discordant pairs both with and without the patient global (the DAS28 4-variable and DAS28 3-variable, respectively) on a subgroup (n = 202) with complete data to calculate a DAS28. The formulas are DAS28 = 0.56 × √(tender28) + 0.28 × √(swollen28) + 0.70 × ln(ESR) + 0.014 × GH, and DAS28(3) = (0.56 × √(tender28) + 0.28 × √(swollen28) + 0.70 × ln(ESR)) × 1.08 + 0.16, where ESR = erythrocyte sedimentation rate, GH = general health (Disease Activity Score in Rheumatoid Arthritis, Nimegen, The Netherlands; online at www.reuma-nijmegen.nl/www.das-score.nl/index.html). To determine whether discordance was associated with differences between an individual's DAS28 4-variable score and a modified DAS28 3-variable (calculated without the patient global assessment), subjects were first separated into 2 groups: no discordance and positive discordance. Paired t-tests were then used to compare DAS28 4-variable and DAS28 3-variable scores. The DAS28 4-variable and the DAS28 3-variable were then categorized according to standard cutoffs for disease severity (≤3.2 = low, >3.2 and ≤5.1 = moderate, and >5.1 = high) and stratified by concordant versus positive discordant groupings. All analyses were performed using Stata, version 9.2 (Stata Corporation).
Demographics and clinical characteristics.
Data from 223 consecutively enrolled subjects with complete data were included in this analysis. The mean ± SD age was 53 ± 14 years. Of the subjects, 88% were women and 45% were Latino, 27% were Asian/Pacific Islander, 16% were white, 10% were African American, and 2% were American Indian or other. Nearly three-quarters of the subjects were born outside of the US (Table 1). With regard to clinical characteristics, 83% were RF positive, with a median swollen joint count of 3 (interquartile range [IQR] 1–8) and a median tender joint count of 1 (IQR 0–6). The mean ± SD HAQ score was 1.27 ± 0.82. The mean ± SD PHQ-9 score was 7.08 ± 5.80. Of the subjects, 66 (30%) met the definition of moderate to severe depression on the PHQ-9 (score ≥10). The mean ± SD patient VAS score for global disease severity was 46 ± 26 mm and the mean ± SD physician VAS score was 31 ± 21 mm. The mean ± SD of the difference in VAS scores was 16 ± 26 mm.
Table 1. Characteristics of the UCSF RA cohort (n = 223), stratified by level of discordance*
Total (n = 223)
No discordance (n = 143)
Positive discordance (n = 68)
Negative discordance (n = 12)
No discordance = ≤25 mm difference, positive discordance = >25 mm difference, negative discordance = less than −25 mm difference. Values are the number (percentage) unless otherwise indicated. USCF = University of California, San Francisco; RA = rheumatoid arthritis; RF = rheumatoid factor; IQR = interquartile range; HAQ = Health Assessment Questionnaire; PHQ-9 = 9-item Patient Health Questionnaire.
Age, mean ± SD years
53 ± 14
53 ± 14
53 ± 13
47 ± 14
American Indian or other
Country of origin
Swollen joint count, median (IQR)
Tender joint count, median (IQR)
Patient global, mean ± SD
46.19 ± 26.29
37.34 ± 23.26
69.21 ± 17.47
21.25 ± 11.62
Physician global, mean ± SD
30.50 ± 21.06
31.03 ± 21.35
23.43 ± 14.55
64.29 ± 14.79
Patient global − physician global, mean ± SD
15.69 ± 26.00
6.31 ± 11.23
45.78 ± 14.53
−43.04 ± 5.17
HAQ score, mean ± SD
1.27 ± 0.82
1.14 ± 0.82
1.55 ± 0.76
1.30 ± 0.81
PHQ-9 score, mean ± SD
7.08 ± 5.80
5.83 ± 4.77
9.50 ± 6.73
8.25 ± 7.13
Depression (PHQ-9 ≥10)
Patient-physician discordance of global disease severity.
A patient-physician difference of >25 mm on the VAS for global disease severity (patient's score more severe than the physician's, or positive discordance) was found in 68 (30%) of the patient-physician dyads with a mean ± SD difference of 46 ± 15 mm. Of the dyads, 12 (5%) had less than −25 mm of difference (patient's score less severe than the physician's, or negative discordance), with a mean ± SD difference of −43 ± 15 mm. In 143 dyads (64%), there was <25 mm of difference on the VAS scores (corresponding to no discordance, or concordance) with a mean ± SD difference of 6 ± 11 mm.
Predictors of discordance.
The results of the bivariable analysis (Table 1) revealed significant differences among the groups by discordance status with regard to swollen joint count (P < 0.001), depressive symptoms (PHQ-9 score; P = 0.001), and functional status (HAQ score; P = 0.003). Poorer function, greater depressive symptoms, and fewer swollen joints were more common among subjects with positive discordance. Language category was not statistically significantly different between the groups (P = 0.663).
On multivariable analyses (Table 2), depressive symptoms as recorded by a 5-point increase in the PHQ-9 score were an independent predictor of positive discordance (adjusted odds ratio [OR] 1.61, 95% confidence interval [95% CI] 1.02–2.55). The swollen joint count was associated with decreased odds of discordance (adjusted OR 0.87, 95% CI 0.83–0.91). Cantonese/Mandarin language was also associated with lower odds of discordance (adjusted OR 0.44, 95% CI 0.28–0.69) as compared with English, the referent group. In multivariable models, the association between poorer functional status (HAQ score) and discordance persisted as measured by the point estimate, but was no longer statistically significant (adjusted OR 1.71, 95% CI 0.82–3.55).
Table 2. Bivariate and multivariate logistic regression of predictors of discordance between patient and physician assessments of global disease severity*
In sensitivity analyses using cutoffs of 10 mm and 40 mm, 55% and 18% of the patient-physician dyads, respectively, resulted in the patient scoring higher than the physician (positive discordance). Only one-third (33%) of the pairs were concordant using the 10-mm cutoff as opposed to the majority of pairs (79%) using the 40-mm cutoff.
Further sensitivity analyses yielded similar results to the original analyses in that greater depressive symptoms (PHQ-9 score), worse functional status (HAQ score), and a lower swollen joint count were all significant predictors of discordance in unadjusted analyses for the 10-mm and 40-mm cutoffs. However, in the multivariate logistic regression using the 10-mm cutoff, worse functional status was associated with greater odds of discordance and English language was associated with lower odds of discordance. The 40-mm cutoff did not yield statistically significant predictors in multivariate analysis but showed similar patterns to both the 10-mm and 25-mm cutoffs.
Exploratory analysis of multivariable results.
To help interpret our findings, we explored which of the two components of the outcome (patient or physician global VAS score) drove the observed associations of two significant predictors with discordance. Side by side box plots of patient and physician global VAS scores by tertile of swollen joint counts are illustrated in Figure 1A. The median physician global VAS score increases steadily with each tertile of swollen joint counts. In addition, the largest discrepancy in median global VAS scores is seen at the lowest tertile of swollen joint counts and appears to narrow as the counts increase, suggesting there may be a threshold of swollen joints at which patients and physicians begin to agree. In Figure 1B, the median patient global VAS score increases steadily with each increase in category of depressive symptoms while the median physician global scores appear to remain relatively stable.
Discordance and the DAS28.
Complete data to calculate the DAS28 were available for 202 subjects. The mean ± SD DAS28 for the concordant pairings (n = 132) was 4.01 ± 1.53; for the positive discordant pairings (n = 59) it was 4.31 ± 1.53, and for the negative discordant pairings (n = 11) it was 4.66 ± 1.23. There was a statistically significant difference between the mean DAS28 4-variable and the DAS28 3-variable (which does not include the patient global) scores for all groups (Table 3). The largest difference between the two scores was seen among patients with positive discordance (the DAS28 3-variable was 0.54 lower on average for the positive discordant subjects versus 0.08 for the concordant subjects). The differences between the DAS28 4-variable and DAS28 3-variable also revealed variation in how subjects were categorized into low, moderate, or high levels of disease activity. For instance, whereas 15 subjects in the positive discordance group (n = 59) were classified as having low disease activity using the DAS28 4-variable, using the DAS28 3-variable led to 25 being classified in the low disease activity group. Subjects in general move from a higher disease activity level to a lower one when using the DAS28 3-variable (data not shown). These shifts were most pronounced in the positive concordance group.
Table 3. DAS28 4-v compared with DAS28 3-v by degree of discordance*
No discordance (n = 132)
Positive discordance (n = 59)
Negative discordance (n = 11)
No discordance = ≤25 mm difference, positive discordance = >25 mm difference, negative discordance = less than −25 mm difference. See Patients and Methods for Disease Activity Score in 28 joints (DAS28) 4 variable (4-v) and 3-variable (3-v) forumulas.
DAS28 4-v, mean ± SD
4.01 ± 1.53
4.31 ± 1.53
4.66 ± 1.23
DAS28 3-v, mean ± SD
3.93 ± 1.40
3.77 ± 1.50
4.86 ± 1.32
Difference between DAS28 4-v and DAS28 3-v
P, paired t-test
In this study, we found evidence for clinically meaningful differences between patient and physician assessments of RA disease severity in 36% of cases. The physicians' assessments underscored the patients' assessments in the overwhelming majority (85%) of discordant pairs. The presence of greater depressive symptoms was an independent predictor, whereas a higher swollen joint count was associated with lower odds of discordance. These findings were robust across different cut points for discordance. As the threshold of discordance was lowered, however, we found that worse functional status and non-English language were independently associated with discordance. An exploration of our findings revealed that median patient global assessments increased with higher depressive symptoms while median physician global scores remained similar. In contrast, the median patient and physician global assessments were the least discordant at the highest tertile of swollen joint counts. Among subjects with positive discordance (patient's score worse than the physician's), mean DAS28 scores calculated with and without the patient global were the most divergent. This important finding suggests that among patients who are discordant with their physicians, the DAS28 score may not accurately reflect disease activity.
Discrepancies between patient and professional assessments of pain, function, and overall health have been reported in RA (5–7, 9, 10, 27). Nicolau et al evaluated differences in ratings of disease activity using a 3-cm cutoff on a 10-cm VAS and found a difference in 37% of them, an effect nearly identical to that in our study (5). Suarez-Almazor et al explored discordance in ratings of health status and reported, as in our study, that physicians on average rated their patients' health as better than the patients did. The impact of language or psychological well-being on discordance was not reported (6). Our finding of Chinese language being associated with decreased odds of discordance may be a statistical artifact, or perhaps related to the quality of the Chinese language interpreter in our clinics.
The impact of depression on symptom reporting in RA has been well documented (28). Zautra et al found an association between recurrent bouts of major depression and increased risk for pain (29). Although depression has been shown to be associated with more pain and worse function (30), there is no literature in RA that explores the role of depression and its association with discordance. Depression has been associated with symptom underestimation by physicians in nonrheumatic diseases (17). In one study of adults with diabetes mellitus screened for depression, Swenson et al found that patients with severe depressive symptoms were more likely to report suboptimal clinician-patient communication (18). The authors hypothesized that this could be due to competing demands, unmet expectations, or poor concentration related to depression. It is possible that the association between depressive symptoms and discordance observed in our study is a result of poor communication for any of the aforementioned reasons. Given the prevalence of comorbid depression and RA (31), the mechanisms for how depressive symptoms are associated with discordance warrants further investigation.
Finally, and perhaps most notably, no study has evaluated the effect of discordance on the DAS28. In our study, the largest mean difference between the DAS28 4-variable and the DAS28 3-variable (calculation without the patient global) was seen in patients with positive discordance. One explanation may be related to an association of depressive symptoms and discordance. Higher disease activity as measured by the DAS28 may reflect both a patient's mood as well as disease activity. Ward found that self-report of pain and global disease severity may be confounded by depression (32). Our analysis indicates that depressive symptoms are associated with positive discordance, which, in turn, may impact the DAS28. If this is true, rheumatologists (as guided by the ACR recommendations to aim for low disease activity) may escalate therapy for patients whose apparent moderate or high disease activity as reflected by the DAS28 is influenced more by depressed mood than by systemic inflammation, joint pain, or swelling. In such cases, appropriate recognition and treatment of depressive symptoms may be warranted. Alternatively, depressive symptoms may be an emotional manifestation of the systemic, inflammatory process common in RA and depressed mood may, in part, be driven by heightened levels of proinflammatory cytokines as postulated in the literature (33).
A third explanation may be that depression somehow interferes with the efficacy of therapy in RA and blunts the response as measured through the components of the DAS28. A recent study by Hider and colleagues to investigate the prevalence of depression among patients with RA initiating anti–tumor necrosis factor (anti-TNF) therapy reported a high prevalence of depression (47.5%) as well as a higher mean DAS28 among depressed patients at baseline, prior to treatment, and at 3 and 12 months while receiving therapy (34). Depressed patients had a poorer response to anti-TNF therapy with smaller reductions in all components of the DAS28 when compared with nondepressed patients at 3 months.
The current study had several limitations. First, our study population was largely nonwhite (84%), non-US born (74%), and from an urban area. Although this may limit the generalizability of our findings, it could also be viewed as a strength insofar as vulnerable populations have been shown to be at greater risk of miscommunication with physicians, experience lower quality of care, and less commonly participate in research studies (35–40). Second, we measured depressive symptoms using the PHQ-9 rather than the gold standard of a clinical diagnostic interview. The PHQ-9, however, has been shown to be a reliable and valid screening measure of depression severity in the outpatient setting (21). Third, this was a cross-sectional study and, as such, a causal relationship between depressive symptoms and discordance cannot firmly be established, nor can we assess whether discordance lessens over time. Fourth, there are no established cut points for what constitutes “significant” discordance in the literature, but it should be noted that 25 mm exceeds one-half of an SD of discordance (26 mm), which approximates a minimum clinically important difference (26). In addition, we performed sensitivity analyses, which supported our findings. As we accrue longitudinal data and perform additional analyses, we will examine in greater depth the relationship between depressive symptoms and discordance. Finally, there were no direct observations of patient-physician communication during clinic visits or a measure of quality of the doctor-patient relationship, which could have provided additional insights as to contributors of discordance. Potential next steps in better understanding why discordance exists could include a qualitative study to explore how beliefs and/or culture may influence the reporting of disease severity by both patients and physicians, and an evaluation of the role of health literacy as a potential predictor of discordance.
In conclusion, we found that 36% of patients with RA differed from their physicians to a clinically meaningful degree, with physicians systematically underscoring disease severity relative to patients' self-assessments. Depressive symptoms were common, with 30% of subjects exceeding a cut point of major depression. Independent predictors of discordance included greater depressive symptoms and a lower swollen joint count. In sensitivity analyses, we also found that non-English language and functional status were associated with discordance.
Future studies should prospectively evaluate the impact of discordance in disease activity assessment in RA and on the DAS28 in particular, and assess the contribution of depressive symptoms to the quality of clinician-patient communication. In addition, reducing discordance may be an important goal in and of itself, as it has been shown that when doctors and patients agree, adherence and outcomes improve (3). Further investigation of the relationships between mood, disease activity, and discordance may help guide interventions to improve care for adults with RA.
All authors were involved in drafting the article or revising it critically for important intellectual content, and all authors approved the final version to be submitted for publication. Dr. Barton had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study conception and design. Barton, Imboden, Yelin, Schillinger.
Acquisition of data. Barton, Imboden, Graf, Yelin, Schillinger.
Analysis and interpretation of data. Barton, Imboden, Glidden, Yelin, Schillinger.