2012 Provisional classification criteria for polymyalgia rheumatica: A European League Against Rheumatism/American College of Rheumatology collaborative initiative†
This article is published simultaneously in the April 2012 issue of Annals of the Rheumatic Diseases.
The objective of this study was to develop European League Against Rheumatism/American College of Rheumatology classification criteria for polymyalgia rheumatica (PMR). Candidate criteria were evaluated in a 6-month prospective cohort study of 125 patients with new-onset PMR and 169 non-PMR comparison subjects with conditions mimicking PMR. A scoring algorithm was developed based on morning stiffness >45 minutes (2 points), hip pain/limited range of motion (1 point), absence of rheumatoid factor and/or anti–citrullinated protein antibody (2 points), and absence of peripheral joint pain (1 point). A score ≥4 had 68% sensitivity and 78% specificity for discriminating all comparison subjects from PMR. The specificity was higher (88%) for discriminating shoulder conditions from PMR and lower (65%) for discriminating RA from PMR. Adding ultrasound, a score ≥5 had increased sensitivity to 66% and specificity to 81%. According to these provisional classification criteria, patients ≥50 years old presenting with bilateral shoulder pain, not better explained by an alternative pathology, can be classified as having PMR in the presence of morning stiffness >45 minutes, elevated C-reactive protein and/or erythrocyte sedimentation rate, and new hip pain. These criteria are not meant for diagnostic purposes.
This criteria set has been approved by the European League Against Rheumatism (EULAR) Executive Committee and the American College of Rheumatology (ACR) Board of Directors as Provisional. This signifies that the criteria set has been quantitatively validated using patient data, but it has not undergone validation based on an external data set. All EULAR/ACR-approved criteria sets are expected to undergo intermittent updates.
The American College of Rheumatology is an independent, professional, medical and scientific society which does not guarantee, warrant, or endorse any commercial product or service.
Polymyalgia rheumatica (PMR) is a common inflammatory rheumatic disease of older individuals and a common indication for long-term corticosteroid therapy (1–3). PMR is also subject to wide variations of clinical practice, due to the considerable uncertainty related to diagnosis, course, and management in primary and secondary care (4–7). There is no diagnostic laboratory test, inflammatory markers are not specific, and clinicians often turn to the corticosteroid response as a “test of treatment” to establish the diagnosis (1, 2, 8).
Difficulties in diagnosing and classifying patients with PMR are inherent in its definitions (9, 10). The proximal pain and stiffness syndrome can occur at presentation in many other rheumatologic inflammatory illnesses in older people (1, 2, 7, 9). Approximately half of patients diagnosed with PMR may have distal manifestations such as peripheral arthritis, hand swelling with pitting edema, and carpal tunnel syndrome (1, 2, 7, 11). Polymyalgic presentation is common in late-onset rheumatoid arthritis (RA) and spondylarthritis and is also associated with giant cell arteritis in 10–30% of cases (1, 2). Heterogeneity in the disease course, uncertainty regarding disease assessment parameters, and evolution of alternative diagnoses on followup complicate the management of PMR (9, 12–15). For the above reasons a safe and specific approach preferring a relative underdiagnosis to an overdiagnosis is needed in PMR (9, 11).
Uniform responsiveness to low doses of corticosteroids has been assumed to be a cardinal feature of PMR. However, there is little hard evidence to substantiate this assertion. A previous report showed that 3 weeks after starting prednisolone 15 mg a day only 55% showed a complete response to therapy (10). This also emphasizes that clinical trials of novel effective agents are needed in PMR.
The lack of standardized classification criteria has been a major factor hampering the development of rational therapeutic approaches (12, 16, 17) and causing difficulties in evaluating patients in clinical studies. In response to a European League Against Rheumatism (EULAR)/American College of Rheumatology (ACR) initiative, a criteria development work group convened in 2005.
A systematic literature review, a 3-phase hybrid consensus process, and a wider survey were undertaken to identify candidate criteria items (18). The first stage consisted of a meeting of 27 international experts who anonymously rated 68 potential criteria identified through literature review. In the second round the experts were provided with the results of the first round and re-rated the criteria items. In the third round the wider acceptance of the chosen criteria (>50% support) was evaluated using a survey of 111 rheumatologists and 53 nonrheumatologists in North America and western Europe.
In round 3, over 70% of respondents agreed on the importance of 7 core criteria (all achieving 100% support in round 2). These were aged 50 years or older, symptom duration 2 weeks or longer, bilateral shoulder and/or pelvic girdle aching, duration of morning stiffness more than 45 minutes, elevated erythrocyte sedimentation rate (ESR), elevated C-reactive protein (CRP), and rapid corticosteroid response. More than 70% agreed on assessing pain and limitation of shoulder (84%) and/or hip (76%) motion, but agreement was low for peripheral signs (e.g., carpal tunnel syndrome, tenosynovitis, peripheral arthritis).
The group reached consensus on the need for a prospective cohort study to evaluate the disease course from presentation in patients included on the basis of proximal pain and stiffness, with evaluations over a 6-month period while receiving a standardized corticosteroid treatment regimen (7, 18–25). The group also agreed to assess musculoskeletal ultrasound as part of the PMR classification criteria. In this paper, we present the results from this prospective study and propose new classification criteria for PMR.
STUDY DESIGN AND METHODS
Consensus decisions about study design.
A priori, the work group decided that a specific approach should be adopted for classifying newly presenting patients with bilateral shoulder pain as PMR (18). The group agreed on the following:
1. Patients presenting with polymyalgic syndrome should have stepwise evaluation on the basis of inclusion and exclusion criteria, response to a standardized corticosteroid challenge, and followup confirmation. The criteria items would be agreed-upon clinical features and laboratory investigations.
2. The need to standardize response to corticosteroid therapy in PMR. Because the goal of classification criteria is to identify patients for enrollment into clinical trials before any treatment, the response to corticosteroid therapy should be used in verifying the classification of a patient as having PMR, although it should not be used as a classification criterion. Because scientific evidence is poor as to what constitutes such a response it was agreed (>75% agreement) this response would be defined as >75% global response in clinical and laboratory parameters within 7 days of corticosteroid challenge with 15 mg oral prednisone or prednisolone and subsequent resolution of inflammatory indices (18).
3. That a prospective study would be needed to evaluate the disease course from presentation in patients included on the basis of the mandatory “core” criteria of proximal pain and stiffness. New-onset bilateral shoulder pain was selected as the main eligibility criterion as the percentage of PMR presenting with hip pain without shoulder pain was very small (<5%), and that as hip girdle pain is due to a wide range of conditions, it would require the enrollment of an impracticably large number of comparator patient groups (18). The study would evaluate at prespecified intervals symptoms, examination, investigations, and their evolution with standardized corticosteroid treatment in a prospective cohort of patients with new-onset bilateral shoulder pain (comparing the PMR case cohort with the comparator cohort of mimicking conditions) over a 6-month period.
4. Musculoskeletal ultrasound should be evaluated in a substudy as a feasible mode of investigation of possible PMR. A secondary objective of this substudy would be the evaluation of clinical and patient-based outcomes in PMR over a 6-month period (26, 27).
The study was a prospective cohort study that included a cohort of patients with new-onset PMR and a comparison cohort of non-PMR patients with various conditions mimicking PMR. Study subjects were recruited from 21 community-based and academic rheumatology clinics in 10 European countries and the USA. Inclusion criteria for PMR patients were age 50 years or older, new-onset bilateral shoulder pain, and no corticosteroid treatment (for any condition) within 12 weeks before study entry, fulfilling all the inclusion and exclusion criteria defined in our previous report and in accordance with the judgment of the participating investigator that the patient had PMR (18). Every effort was made to choose patients across the spectrum of disease severity. Corticosteroid treatment for PMR patients was initiated according to a predefined treatment protocol starting with 15 mg a day oral prednisone for weeks 1 and 2, 12.5 mg a day for weeks 3 and 4, 10 mg a day for weeks 6–11, 10 mg/7.5 mg every other day for weeks 12–15, 7.5 mg a day for weeks 16–25, and tapering according to treatment response from week 26 onward. The gold standard for the pre-steroid diagnosis of PMR was established as above at presentation and when the diagnosis was maintained without an alternative diagnosis at week 26 of followup.
The non-PMR comparison cohort included conditions representative of the types that need to be distinguished from PMR, in both primary and secondary care. Inclusion criteria for the non-PMR comparison cohort were age 50 years or older, new-onset bilateral shoulder pain, and a diagnosis of either inflammatory or noninflammatory conditions, including new-onset RA, connective tissue diseases, various shoulder conditions (e.g., bilateral rotator cuff syndrome and/or adhesive capsulitis, rotator cuff tear, glenohumeral osteoarthritis), fibromyalgia, generalized osteoarthritis, and others. Patients known to have the condition for >12 weeks before the baseline evaluation (except fibromyalgia and chronic pain) were not eligible for inclusion. PMR patients with clinical suspicion of giant cell arteritis were included as part of the comparison cohort because these patients required different corticosteroid doses. Patients in the comparison cohort were included on the basis of clinician diagnosis and not on formal criteria. No guidelines were provided for treatment of the conditions in the comparison cohort.
Ethics board approval was obtained at all participating institutions before initiation of the study, and all participants gave written informed consent before enrollment.
Followup and data collection.
PMR patients were evaluated at baseline, and at 1, 4, 12, and 26 weeks. At each followup visit, clinical evaluation included response to corticosteroid therapy and opinion on the emergence of alternative diagnoses. Patients not considered as having PMR at any time were evaluated and treated according to accepted clinical practice. They were excluded from the PMR cohort and included in the non-PMR comparison cohort. Patients in the comparison cohort were evaluated at baseline and at 26 weeks.
Data were collected using standardized data collection forms and questionnaires translated into national languages. Data collection included the candidate inclusion/exclusion criteria items for classification of PMR, physical examination, and assessment of corticosteroid response. Criteria items were age 50 years or older, symptom duration 2 weeks or more, bilateral shoulder and/or pelvic girdle aching, recent weight loss >2 kg, duration of morning stiffness >45 minutes, elevated ESR, elevated CRP, and rapid response of symptoms to corticosteroids (>75% global response within 1 week to prednisolone/prednisone 15–20 mg a day). Pain was assessed using a horizontal 100-mm visual analog scale (VAS) in 4 separate locations (shoulder, pelvic, neck, and overall) with 0 indicating no pain and 100 indicating worst pain. Morning stiffness (in the past 24 hours) was assessed in minutes; functional status and quality of life were assessed by the modified Health Assessment Questionnaire (M-HAQ) and Short Form 36. A 100-mm VAS was also used for recording global well-being measures (patient and physician global) and fatigue. Physical examination included the presence or absence of tenderness, pain on movement, and limitation of the shoulders and hips. Aspects of corticosteroid therapy including dose, therapeutic response, and change in dose and therapy discontinuation were documented.
Data regarding laboratory measures (including ESR, CRP, rheumatoid factor [RF], and anti–citrullinated protein antibody [ACPA]) were obtained from clinically ordered tests performed at each study center. As the laboratory assays used at each center varied, test results were classified as normal/abnormal using the reference ranges from each center (see Supplementary Table 1, available on the Arthritis & Rheumatism web site at http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1529-0131). Both PMR and non-PMR subjects underwent ultrasound evaluation of shoulders and hips at baseline and at 26 weeks. Evaluations were made according to EULAR guidelines (28) to assess for features previously reported to be associated with PMR, including bicipital tenosynovitis, subacromial and subdeltoid bursitis, trochanteric bursitis, and glenohumeral and hip effusion. A rheumatologist or radiologist experienced in musculoskeletal ultrasound performed the ultrasound examination using linear probes with the frequency range 6–10 MHz for shoulders and linear or curved array probes with the frequency range 5–8 MHz for hips.
Descriptive statistics (means, percentages, etc.) were used to summarize the candidate criteria data. Demographics (age and gender) and candidate criteria were compared between PMR and comparison subjects using chi-square and rank sum tests. Several statistical approaches were considered in order to develop a scoring algorithm for PMR and to assess the proposed classification criteria in patients judged by expert clinician investigators to have PMR 26 weeks after enrollment (29).
First, logistic regression models to distinguish PMR patients from all comparison subjects and each subset of comparison subjects were examined. The C statistic, a measure of concordance analogous to the area under the receiver operating characteristic curve, was used to assess the ability of each individual criterion to distinguish between PMR and comparison subjects. The C statistic ranges from 0.5 to 1, with 0.5 indicating a criterion that provides no information. The sensitivity, specificity, and the positive and negative likelihood ratios were also examined.
Second, exploratory factor analysis was used to examine the interdependencies between the candidate criteria (30). Maximum-likelihood factor analysis with varimax rotation was used. Maximum-likelihood tests were used to examine goodness of fit (e.g., to determine the number of factors). This method is thought to be superior to the eigenvalue >1 or Cattell's scree plot method for selecting the number of factors (30). For each factor, variables with factor loadings >0.5 were examined and found to represent a similar domain. This technique allowed a reduction of the number of variables, as one variable from each factor was examined in multivariable logistic regression models. In addition, to avoid discarding a relevant domain as identified by expert consensus, a few of the variables with factor loading between 0.4 and 0.5 were also considered in the multivariable models.
Classification trees including the variables determined by the factor analysis were also considered, but were not found to be optimal for distinguishing PMR from comparison subjects (31, 32). An integer scoring algorithm was defined based on the odds ratios in the final multivariable logistic regression model. Performance characteristics (sensitivity, specificity, etc.) of this scoring algorithm were assessed. In addition, the utility of ultrasound assessments for classifying PMR patients was examined using factor analysis and adding potential ultrasound criteria to the scoring algorithm. Odds ratios for clinical criteria varied somewhat in the scoring algorithm that included ultrasound criteria. Scoring weights based on both models were considered and found to perform similarly, so a common set of scoring weights was used for the clinical items in both algorithms in order to ease comparison and application of the criteria.
Finally, gradient boosting regression tree models, which are a machine learning technique, were examined to determine whether a better prediction could be achieved using a more complex algorithm (33).
At baseline, 128 patients were recruited into the PMR cohort and 184 patients were recruited into the non-PMR comparison cohort. During followup, 10 PMR patients were reclassified as not having PMR and moved into the non-PMR cohort. Similarly, 8 non-PMR comparison cohort patients were reclassified as having PMR and moved into the PMR cohort. In addition, 7 patients (1 PMR and 6 non-PMR) were excluded due to missing information, and 2 non-PMR subjects with age <50 years and 9 non-PMR subjects with no shoulder pain were also excluded. Therefore, the final analysis was based on 125 PMR and 169 non-PMR subjects.
The diagnoses of the 169 non-PMR subjects (see Supplementary Table 2, available on the Arthritis & Rheumatism web site at http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1529-0131) were new-onset RA (49 [29%]), new-onset other seronegative arthritis (20 [12%]), new-onset connective tissue diseases or vasculitis (9 [5%]), shoulder conditions (52 [31%]), chronic pain (26 [15%]), infection (5 [3%]), previously undiagnosed malignancy (4 [2%]), and 2 each of endocrinopathy and neurologic disorders.
The distribution of the candidate classification criteria for the 125 PMR patients and 169 non-PMR comparison subjects (all, RA only, and shoulder condition only) is displayed in Table 1. Criteria items present in >80% of the PMR subjects were ≥2 weeks duration of symptoms, bilateral shoulder pain, and elevated CRP and/or ESR. Relevant clinical features that best discriminated RA from PMR were peripheral synovitis, the presence of RF and/or ACPA, and hip pain/limited range of motion. Features best discriminating shoulder conditions from PMR were hip pain/limited range of motion, morning stiffness, and elevated CRP and/or ESR.
Table 1. Distribution of candidate criteria for 125 PMR patients and 169 comparison subjects*
|Age, mean ± SD years||71.5 ± 8.4||67.0 ± 9.6||<0.001||68.6 ± 10.0||0.06||67.0 ± 10.6||0.017|
|Female sex||72 (58)||112 (66)||0.13||28 (57)||0.96||35 (67)||0.23|
|Duration of symptoms ≥2 weeks||121 (97)||163 (96)||0.87||47 (96)||0.77||51 (98)||0.64|
|Duration of symptoms, mean ± SD weeks||10.0 ± 7.2||14.4 ± 19.1||0.17||13.8 ± 16.1||0.62||12.9 ± 11.8||0.21|
|Bilateral shoulder aching||124 (99)||162 (96)||0.08||46 (94)||0.035||50 (96)||0.15|
|Bilateral pelvic girdle (hip) aching||91 (73)||90 (53)||0.001||28 (57)||0.046||21 (40)||<0.001|
|Neck aching||71 (57)||92 (54)||0.69||29 (59)||0.78||27 (52)||0.55|
|Morning stiffness duration >45 minutes‡||92 (77)||68 (43)||<0.001||33 (69)||0.25||10 (20)||<0.001|
|Morning stiffness duration, median (IQR) minutes||120 (60, 240)||30 (5, 120)||<0.001||60 (30, 180)||0.11||10 (0, 30)||<0.001|
|Weight loss of >2 kg||45 (36)||40 (24)||0.021||16 (33)||0.68||4 (8)||<0.001|
|Shoulder pain or limited range of motion||121 (97)||158 (93)||0.20||47 (96)||0.77||49 (94)||0.42|
|Hip pain or limited range of motion||71 (57)||59 (35)||<0.001||15 (31)||0.002||12 (23)||<0.001|
|Shoulder tenderness||96 (77)||126 (75)||0.66||40 (82)||0.49||34 (65)||0.12|
|Hip tenderness||59 (47)||47 (28)||0.001||12 (24)||0.006||9 (17)||<0.001|
|Carpal tunnel syndrome||19 (15)||27 (16)||0.90||11 (22)||0.28||8 (15)||0.99|
|Peripheral synovitis (distal swelling, tenosynovitis, or arthritis)‡||48 (39)||78 (46)||0.20||41 (84)||<0.001||11 (21)||0.024|
|Other joint pain‡||63 (51)||109 (66)||0.011||40 (85)||<0.001||29 (57)||0.50|
|Abnormal CRP and/or ESR‡||116 (96)||99 (63)||<0.001||41 (85)||0.017||18 (41)||<0.001|
|Presence of RF and/or ACPA‡||11 (10)||37 (25)||0.004||19 (41)||<0.001||5 (12)||0.79|
|Abnormal serum protein electrophoresis‡||43 (52)||32 (35)||0.027||9 (36)||0.17||9 (35)||0.13|
|M-HAQ, mean ± SD||1.2 ± 0.6||0.8 ± 0.6||<0.001||1.1 ± 0.7||0.32||0.5 ± 0.6||<0.001|
Development of a scoring algorithm.
Table 2 shows the results of univariate logistic regression models to distinguish PMR from all comparison subjects, RA only, and shoulder conditions only. Criteria items related to hip involvement (pain, tenderness, limited movement) had significant ability to discriminate PMR from all comparison subjects, RA, and shoulder conditions based on the C statistic. Early morning stiffness, M-HAQ, weight loss, and raised laboratory markers of inflammation distinguished PMR from comparison subjects, particularly those with shoulder conditions. The presence of ACPA and/or RF, peripheral synovitis, and joint pains had significant ability (with high C statistic) to distinguish PMR from RA. In addition, the odds ratios for abnormal CRP and/or ESR were particularly high. This resulted because only 5 PMR patients did not have abnormal CRP and/or ESR, perhaps reflecting that the diagnosis of PMR is less certain in the presence of normal CRP and ESR. Therefore, abnormal CRP or ESR would be included as a required criterion in the scoring algorithm for PMR. Similarly, as all subjects in the study were required to have shoulder pain, this was also included as a required criterion for PMR.
Table 2. Univariate logistic regression models to distinguish subjects with PMR from comparison subjects*
|Duration of symptoms ≥2 weeks||1.1 (0.3–4.0)||0.50||1.3 (0.2–7.3)||0.50||0.6 (0.1–5.4)||0.51|
|Shoulder pain or limited range of motion||2.1 (0.7–6.8)||0.52||1.3 (0.2–7.3)||0.50||1.9 (0.4–8.6)||0.51|
|Shoulder tenderness||1.1 (0.7–1.9)||0.51||0.7 (0.3–1.7)||0.52||1.8 (0.9–3.6)||0.56|
|Hip pain or limited range of motion||2.5 (1.5–3.9)||0.61||3.0 (1.5–6.0)||0.63||4.4 (2.1–9.1)||0.67|
|Hip tenderness||2.3 (1.4–3.8)||0.60||2.8 (1.3–5.8)||0.61||4.3 (1.9–9.5)||0.65|
|Neck aching||1.1 (0.7–1.8)||0.51||0.9 (0.5–1.8)||0.51||1.2 (0.6–2.3)||0.52|
|Morning stiffness duration >45 minutes||4.5 (2.6–7.7)||0.67||1.5 (0.7–3.3)||0.54||13.6 (6.0–31)||0.79|
|Weight loss >2 kg||1.8 (1.1–3.0)||0.56||1.2 (0.6–2.4)||0.52||6.8 (2.3–19.9)||0.64|
|Carpal tunnel syndrome||1.0 (0.5–1.8)||0.50||0.6 (0.3–1.5)||0.54||–||–|
|Peripheral synovitis (distal swelling, tenosynovitis, or arthritis)||0.7 (0.5–1.2)||0.54||0.1 (0.08–0.3)||0.72||2.4 (1.1–5.0)||0.59|
|Other joint pain||0.5 (0.3–0.9)||0.57||0.2 (0.1–0.4)||0.67||0.8 (0.4–1.5)||0.53|
|Abnormal CRP and/or ESR||13.8 (5.3–36)||0.67||4.0 (1.2–13)||0.55||33.5 (11–98)||0.78|
|Presence of RF and/or ACPA||0.4 (0.2–0.7)||0.57||0.2 (0.07–0.4)||0.66||0.9 (0.3–2.6)||0.51|
|Abnormal serum protein electrophoresis||2.0 (1.1–3.6)||0.58||1.9 (0.8–4.8)||0.58||2.0 (0.8–5.1)||0.59|
|M-HAQ (per 1-unit increase)||2.3 (1.6–3.4)||0.66||1.3 (0.7–2.2)||0.55||6.7 (3.2–14)||0.78|
Factor analysis revealed that 4 factors were sufficient to represent all the criteria (see Supplementary Table 3, available on the Arthritis & Rheumatism web site at http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1529-0131). These 4 factors were hip pain/tenderness, peripheral synovitis or other joint pain, morning stiffness, and shoulder tenderness. Note that duration of symptoms, neck aching, carpal tunnel syndrome, weight loss, presence of RF/ACPA, and M-HAQ did not play a prominent role in any of the factors.
Next, multivariable logistic regression models were used to determine the importance of each criterion when assessed simultaneously (Table 3). Three models were considered: 1) including the 4 factors identified in factor analysis; 2) removing shoulder tenderness and adding presence of RF/ACPA; and 3) subsequently adding M-HAQ. While the subsequent additions appeared to be significant, they also negatively impacted the hip pain criteria. The utility of the inclusion of M-HAQ was questionable, as hip pain was easier to assess than M-HAQ in a clinical setting. Therefore, the second model, which included hip pain, other joint pain, morning stiffness, and abnormal RF/ACPA, was deemed the best multivariable model.
Table 3. Multivariable logistic regression models*
|Hip pain or limited range of motion||2.7 (1.5–4.8)||0.001||2.1 (1.1–4.0)||0.019||1.6 (0.8–3.2)||0.16|
|Other joint pain||0.4 (0.2–0.6)||<0.001||0.4 (0.2–0.7)||0.002||0.3 (0.1–0.6)||<0.001|
|Morning stiffness duration >45 minutes||5.2 (2.9–9.4)||<0.001||6.2 (3.2–11.8)||<0.001||4.8 (2.4–9.6)||<0.001|
|Shoulder tenderness||0.9 (0.5–1.8)||0.80|| || || || |
|Presence of RF or ACPA|| || ||0.3 (0.1–0.8)||0.009||0.3 (0.1–0.8)||0.013|
|M-HAQ, per 1-unit increase|| || || || ||2.4 (1.4–4.2)||0.002|
|Likelihood ratio test for additional terms|| || ||P < 0.001|| ||P < 0.001|| |
|C statistic||79%|| ||81%|| ||81%|| |
Additional analyses were performed using classification trees and assessing combinations of criteria. Classification trees were examined using 3 sets of potential variables: 1) the 4 identified factors; 2) adding presence of RF/ACPA; and 3) subsequently adding M-HAQ. The resulting trees were deemed inadequate. For example, the second tree, which was fit using the same 5 variables that were included in our scoring algorithm, had a sensitivity of 66% and specificity of 66%. This specificity was lower than the scoring algorithm developed from the logistic models. In addition, this tree only included morning stiffness and absence of RF and/or ACPA, so it was inadequate because it excluded other domains deemed necessary for content validity. Content validity requires that the set of criteria identified is comprehensive. This was the case in all 3 trees. Therefore, classification trees were deemed to have poorer performance and content validity than the logistic regression models.
A scoring algorithm was developed (Table 4) based on the multivariable logistic regression model presented in Table 3 and included morning stiffness for >45 minutes (2 points), hip pain/limited range of motion (1 point), absence of RF and/or ACPA (2 points), and the absence of peripheral joint pain (1 point). The score was evaluated using all PMR subjects (including the 5 with normal CRP/ESR) and all comparison subjects. This was done to account properly for the influence of CRP/ESR on the performance of the scoring algorithm. A score of 4 or greater had 68% sensitivity and 78% specificity for discriminating all comparison subjects from PMR. The specificity was higher (88%) for discriminating shoulder conditions from PMR and lower (65%) for discriminating RA from PMR. The C statistic for the scoring algorithm was 81%. A total of 40 PMR patients (32%) and 38 comparison subjects (22%) were incorrectly classified. The positive predictive value was 69% and the negative predictive value was 77%. Supplementary Table 4 (http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1529-0131) shows the sensitivity and specificity for all possible cut points of the scoring algorithm.
Table 4. Scoring algorithm with and without optional ultrasound criteria—required criteria: age ≥50 years, bilateral shoulder aching, and abnormal CRP and/or ESR*
|Morning stiffness duration >45 minutes||6.2 (3.2–11.8)||2||5.0 (2.8–9.1)||2|
|Hip pain or limited range of motion||2.1 (1.1–4.0)||1||1.4 (0.8–2.6)||1|
|Absence of RF or ACPA||3.0 (1.3–6.8)||2||5.2 (2.1–12.6)||2|
|Absence of other joint pain||2.7 (1.4–5.0)||1||2.2 (1.3–4.0)||1|
|Ultrasound criteria|| || || || |
| At least 1 shoulder with subdeltoid bursitis and/or biceps tenosynovitis and/or glenohumeral synovitis (either posterior or axillary) and at least 1 hip with synovitis and/or trochanteric bursitis|| || ||2.6 (1.3–5.3)||1§|
| Both shoulders with subdeltoid bursitis, biceps tenosynovitis, or glenohumeral synovitis|| || ||2.1 (1.2–3.7)||1¶|
Finally, gradient boosting regression tree models, which are a model averaging technique, were examined to determine whether a better prediction could be achieved using a more complex algorithm. The resulting C statistic from the gradient boosting model was 80%, which was quite comparable to the results for our scoring algorithm. This indicated that we will not be able to make better predictions from these data with another modeling approach.
Ultrasound was performed in 120 PMR subjects, 154 of the comparison subjects (including 46 with RA and 47 with shoulder conditions), and 21 additional controls (not included in our study cohorts) who did not have shoulder conditions. Patients with PMR were more likely to have abnormal ultrasound findings in the shoulder (particularly subdeltoid bursitis and biceps tenosynovitis), and somewhat more likely to have abnormal findings in the hips than comparison subjects as a group (Table 5). PMR could not be distinguished from RA on the basis of ultrasound, but could be distinguished from non-RA shoulder conditions and subjects without shoulder conditions.
Table 5. Ultrasound findings in 120 patients with PMR, 154 comparison subjects (including 46 with RA and 47 with shoulder conditions), and 21 subjects without shoulder conditions*
|At least 1 shoulder with subdeltoid bursitis, biceps tenosynovitis, or glenohumeral synovitis||83||70†||78||62†||19†|
|Both shoulders with subdeltoid bursitis, biceps tenosynovitis, or glenohumeral synovitis||59||43†||65||26†||0†|
|At least 1 shoulder with subdeltoid bursitis or biceps tenosynovitis||82||63†||72||53†||19†|
|Both shoulders with subdeltoid bursitis or biceps tenosynovitis||57||35†||52||21†||0†|
|At least 1 hip with synovitis or trochanteric bursitis||38||23‡||30||18‡||0†|
|Both hips with synovitis or trochanteric bursitis||19||8†||9||4‡||0‡|
|At least 1 shoulder and 1 hip with findings as above||33||16†||17‡||11†||0†|
|Both shoulders and both hips with findings as above||12||7||6||2‡||0|
Assessing the utility of ultrasound in classifying PMR.
Factor analysis (see Supplementary Table 5, http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1529-0131) revealed several strong factors potentially useful for classifying patients with PMR from the ultrasound data. The inclusion of ultrasound findings in the scoring algorithm resulted in improving the C statistic to 82%. A score of 5 or greater had 66% sensitivity and 81% specificity for discriminating all comparison subjects from PMR. The specificity was higher (89%) for discriminating shoulder conditions from PMR and lower (70%) for discriminating RA from PMR. A total of 41 PMR patients (34%) and 30 comparison subjects (19%) were incorrectly classified. The positive predictive value was 72% and the negative predictive value was 75%. Therefore, ultrasound findings were useful in discriminating PMR from shoulder conditions, but less so in discriminating PMR from RA. Table 4 also shows the scoring algorithm including the ultrasound criteria. Supplementary Table 4 (http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1529-0131) shows the sensitivity and specificity for all possible cut points of the scoring algorithm.
Response to corticosteroids in PMR.
Complete corticosteroid response at 4 weeks was seen in 71% of patients and the response was sustained in 78% of responders at 26 weeks. As expected by the plan for tapering of the corticosteroids, the median prednisone dose decreased from 15 mg at baseline to 5 mg at 26 weeks. Response to treatment (percentage improvement in global pain VAS at weeks 4 and 26) was highly correlated with percentage improvement in other VAS measures (correlation >0.5 and P < 0.001 at weeks 4 and 26), but was not correlated with percentage change in corticosteroid dose (P = 0.20 at week 4 and P = 0.47 at week 26). There was no association between the points obtained on either scoring algorithm and the response to corticosteroids at 4 weeks (Spearman correlation coefficient 0.09, P = 0.38 for the scoring algorithm including ultrasound) and 26 weeks (data not shown), indicating that corticosteroid response cannot be used as part of PMR classification.
We also reevaluated our risk score model for the scoring algorithm using only the PMR subjects who responded to corticosteroids. When the final risk score model was recomputed using only the subset of subjects with PMR who responded to corticosteroids (and all of the comparison subjects), the odds ratios for each of the criteria remained essentially unchanged. In addition, the specificity was identical and the sensitivity increased by an insignificant amount (0.5%).
Blinded reevaluation of selected PMR patients and non-PMR controls.
The reevaluation exercise showed that most candidate criteria items performed well in discriminating PMR patients from controls. However, a third of the sample of PMR patients/comparison subjects was difficult to classify. The high C statistic levels associated with the corticosteroid response and posttreatment CRP and ESR suggested that the uncertainty originated from the pivotal role of corticosteroids in the investigator assessment, in deciding whether a patient does or does not have PMR. It raises questions such as whether PMR always responds adequately to corticosteroids and whether polymyalgic RF-positive disease without peripheral synovitis can occur (see Supplementary Appendix [http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1529-0131)] for details).
This is the first international multicenter prospective study examining consensus-based candidate classification criteria for PMR proposed by an international work group. Findings indicate that patients 50 years of age or older presenting with new bilateral shoulder pain (not better explained by an alternative diagnosis) and elevated CRP and/or ESR can be classified as having PMR in the presence of morning stiffness of >45 minutes and new hip involvement (pain, tenderness, limited movement). The absence of peripheral synovitis or of positive RA serology increases the likelihood of PMR. While recognizing that RF particularly may be present in patients with PMR, the absence of RF serology is useful in distinguishing PMR from RA in older patients for classification purposes (18, 34). Ultrasound findings of bilateral shoulder abnormalities (subacromial bursitis/bicipital tenosynovitis/glenohumeral effusion) or abnormalities in 1 shoulder and hip (hip effusion, trochanteric bursitis) may significantly improve the specificity of the clinical criteria. These criteria are not meant for diagnostic purposes and have not been tested as diagnostic criteria.
Newer concepts of PMR are revealed by this and other recent studies—heterogeneity at presentation and course, lack of uniform responsiveness to low-dose steroids, and overlap with inflammatory arthritis. However, we feel that at present these classification criteria provide a basic framework for developing clinical trials of novel therapies in PMR.
How should the PMR classification criteria be applied?
The target population will be patients aged 50 years or older presenting with new-onset (<12 weeks) bilateral shoulder pain and abnormal acute-phase response. The criteria may only be applied to those patients in whom the symptoms are not better explained by an alternative diagnosis. Mimicking conditions include the inflammatory and noninflammatory conditions studied as comparators in this report.
Four clinical and laboratory criteria along with optional ultrasound criteria (Tables 4 and 6) can be applied to eligible patients to identify patients with PMR suitable for low-dose corticosteroid therapy. The scoring scale is 0–6 (without ultrasound) and 0–8 (with ultrasound). In the absence of competing diagnoses, a score of 4 or greater (without ultrasound), or 5 or greater (with ultrasound) is indicative of PMR. Patients with a score of less than 4 (based on clinical plus laboratory criteria) cannot be considered to have PMR. Ultrasound improves the specificity of PMR diagnosis, and shows particularly good performance in differentiating PMR from noninflammatory conditions and thus is a recommended investigation for PMR.
Table 6. PMR classification criteria scoring algorithm—required criteria: age ≥50 years, bilateral shoulder aching, and abnormal CRP and/or ESR*
|Morning stiffness duration >45 minutes||2||2|
|Hip pain or limited range of motion||1||1|
|Absence of RF or ACPA||2||2|
|Absence of other joint involvement||1||1|
|At least 1 shoulder with subdeltoid bursitis and/or biceps tenosynovitis and/or glenohumeral synovitis (either posterior or axillary) and at least 1 hip with synovitis and/or trochanteric bursitis||NA||1|
|Both shoulders with subdeltoid bursitis, biceps tenosynovitis, or glenohumeral synovitis||NA||1|
Classification criteria for PMR should be useful for identifying patients appropriate for enrollment into clinical trials of novel medications for the treatment of PMR, and studying long-term outcomes in more homogeneous patient cohorts. Our analyses indicate that even typical PMR patients at presentation may vary in their response to low-dose corticosteroid therapy, indicating that corticosteroid response is not reliable as a classification feature for PMR. This is similar to other rheumatic diseases such as RA, in which phenotypically similar patients may exhibit different responses to disease-modifying antirheumatic drugs.
Strengths and limitations.
The strengths of the study relate to harnessing international effort to address a disease area subject to wide variation of practice, to develop agreement on the definition of what may or may not be treated as PMR and what needs further evaluation.
Our study methodology satisfies the ACR guidelines for the development of classification criteria for rheumatic diseases (35). The consensus-based candidate criteria were generated by a multispecialty international group whose views were supported by the results of a wide trans-Atlantic survey. The work group suggested a prospective study designed to separate PMR patients from comparison subjects in patients included on a single eligibility criterion (new bilateral shoulder pain in subjects ≥50 years).
This ensured an inception cohort longitudinal observational design wherein the PMR cohort could be compared with the comparison cohort at similar chronological time points of disease. All PMR patients were evaluated before treatment with corticosteroids, were treated with a standard corticosteroid schedule, and assessed at predetermined time points. Our study is in keeping with the EULAR/ACR goal of developing rheumatic disease classification criteria as opposed to diagnostic criteria. We focused on subjects with new-onset/incident disease, and a 6-month longitudinal followup allowed an accurate evaluation of the disease course and diagnoses. Previous criteria for PMR were developed using cross-sectional comparisons. Only 2 of the previous criteria were developed through an evaluative process (20, 36). Neither had a definition of the “gold standard” diagnosis other than the physician considered the patient to have “unequivocal” PMR.
Another strength of our proposed classification criteria is the imaging component. Musculoskeletal ultrasound has promise due to widespread availability, feasibility, and good research evidence (18). The PMR work group standardized the examination of shoulders and hips by ultrasound for the purposes of the current study (37). Our findings indicate that ultrasound evaluation of hips and shoulders adds significantly to the evaluation of the polymyalgic syndrome. The lack of a “gold standard” and the challenge of circularity were addressed through a blinded multirater reevaluation exercise in selected cases and comparison subjects (see Supplementary Appendix [http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1529-0131)]). Most of the candidate criteria items performed well. The misclassification of several subjects reflects the difficulty in discriminating PMR from other inflammatory conditions such as RA. This uncertainty originates from the pivotal role of (and circular reasoning related to using) the corticosteroid response in deciding whether a patient does or does not have PMR. Although the proposed scoring algorithm has high specificity for identifying PMR patients, it is nevertheless unable to predict the subsequent corticosteroid response, suggesting heterogeneity of disease course and treatment response. This has also been reported previously (10, 38).
While we were able to scrutinize the basis for PMR diagnosis, no formal criteria were required for diagnosis of the comparison conditions. This is a limitation of the study. However, the study was led in all centers by experienced rheumatologists with major clinical and research interest in PMR and related conditions. We did not include hip pain without shoulder pain as an eligibility criterion for reasons discussed in the Methods section. Funding constraints limited the followup duration to only 6 months. We were also limited by lack of funding for the central measurement of laboratory data. Values of ESR, CRP, RF, and ACPA were based on local laboratory assays. Our study approach reflects a pragmatic view, which perhaps lends wider applicability to the results of the study.
Our classification algorithm had a C statistic of 81%, which exceeds the threshold of 80% that is conventionally considered to be useful in clinical decision-making. However, we suggest that the criteria are regarded as provisional at this point, awaiting validation in a separate cohort.
In conclusion, patients aged 50 years or older presenting with bilateral shoulder pain and elevated CRP and/or ESR can be classified as having PMR in the presence of morning stiffness for more than 45 minutes, and new hip pain in the absence of peripheral synovitis or positive RA serology. Using ultrasound, a score of 5 or greater had 66% sensitivity and 81% specificity for discriminating all comparison subjects from PMR. In our view, this approach can now be used to test eligibility for trials with newer therapies in PMR. A number of future research questions are highlighted, including: 1) Should PMR be considered as a part of the spectrum of late-onset inflammatory arthritis? 2) Can polymyalgic disease without peripheral synovitis occur in RF-positive disease? 3) Can we diagnose PMR in patients with normal acute-phase response? 4) What is the role of the early introduction of disease-modifying antirheumatic drugs in PMR?
We have collected and stored blood samples from both the case and the comparator groups in the study. We hope to develop research proposals using these biospecimens to test several candidate biomarkers in PMR. These proposals will also examine the acute-phase response: whether stratification of the response and differential levels of acute-phase reactants and cytokines may function as additional classification criteria items.
All authors were involved in drafting the article or revising it critically for important intellectual content, and all authors approved the final version to be published. Drs. Dasgupta and Matteson had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Study conception and design. Dasgupta, Cimmino, Maradit Kremers, W. Schmidt, Schirmer, Salvarani, Bachta, Duftner, Balint, Nannini, Cid, Martínez-Taboada, Nordborg, Direskeneli, Ahmed, Hazleman, Pease, Luqmani, Michet, Marcus, Gonter, Carter, Crowson, Matteson.
Acquisition of data. Dasgupta, Cimmino, Maradit Kremers, W. Schmidt, Schirmer, Salvarani, Bachta, Duftner, Jensen, Duhaut, Poór, Kaposi, Mandl, Balint, Z. Schmidt, Iagnocco, Nannini, Cantini, Macchioni, Pipitone, Del Amo, Espígol-Frigolé, Cid, Martínez-Taboada, Nordborg, Direskeneli, Aydin, Ahmed, Silverman, Pease, Wakefield, Abril, Marcus, Gonter, Maz, Matteson.
Analysis and interpretation of data. Dasgupta, Cimmino, Maradit Kremers, W. Schmidt, Bachta, Dejaco, Duftner, Balint, Z. Schmidt, Del Amo, Cid, Nordborg, Luqmani, Abril, Marcus, Gonter, Carter, Crowson, Matteson.
The investigators would like to acknowledge EULAR, ACR, the Mayo Foundation, and the Biobanque de Picardie in Amiens, France, for material and intellectual support of this project. The investigators also wish to recognize the individual and uncompensated efforts of the staff at each participating institution, without which this study could not have been completed.