Developing a provisional definition of flare in patients with established gout

Authors


Abstract

Objective

Various nonvalidated criteria for disease flare have been used in studies of gout. Our objective was to develop empirical definitions for a gout flare from patient-reported features.

Methods

Possible elements for flare criteria were previously reported. Data were collected from 210 gout patients at 8 international sites to evaluate potential gout flare criteria against the gold standard of an expert rheumatologist definition. Flare definitions based on the presence of the number of criteria independently associated with the flare and classification and regression tree approaches were developed.

Results

The mean ± SD age of the study participants was 56.2 ± 15 years, 207 of them (98%) were men, and 54 of them (26%) had flares of gout. The presence of any patient-reported warm joint, any patient-reported swollen joint, patient-reported pain at rest score of >3 (0–10 scale), and patient-reported flare were independently associated with the study gold standard. The greatest discriminating power was noted for the presence of 3 or more of the above 4 criteria (sensitivity 91% and specificity 82%). Requiring all 4 criteria provided the highest specificity (96%) and positive predictive value (85%). A classification tree identified pain at rest with a score of >3, followed by patient self-reported flare, as the rule associated with the gold standard (sensitivity 83% and specificity 90%).

Conclusion

We propose definitions for a disease flare based on self-reported items in patients previously diagnosed as having gout. Patient-reported flare, joint pain at rest, warm joints, and swollen joints were most strongly associated with presence of a gout flare. These provisional definitions will next be validated in clinical trials.

Efforts to ameliorate gout often focus on trying to reduce the frequency of disease flares, since these are the key manifestations applicable to most patients. Evaluation of gout flares in the past has been limited by the lack of a validated definition. Clinical trials and observational studies have used various definitions for a flare of gout, including acute events requiring physician consultation, medical interventions such as use of nonsteroidal antiinflammatory drugs or colchicine prescribed by a physician, or a visit to the emergency room (1–5). None of these definitions have been validated. Preliminary criteria developed by the American College of Rheumatology (ACR) for the classification of acute gouty arthritis were intended to identify patients with acute gout and not to define a flare in those already known to have gout (6). There are no validated response criteria for monitoring responses to therapy or for comparing therapies in clinical trials in patients with known gout. Such criteria require a standardized definition of gout flare.

Since most gout clinical research uses flares as an important outcome, we developed a definition for a gout flare. This is an initial step toward the development and validation of a composite gout response criterion. Our approach used patient-reported outcomes, since these have been successfully incorporated into response criteria for other rheumatic diseases, including rheumatoid arthritis (7), juvenile inflammatory arthritis (8), and seronegative spondylarthropathies (9–11). In addition, use of patient-reported outcomes conforms to the Patient-Reported Outcomes Measurement Information System (PROMIS) initiative by the National Institutes of Health (NIH) (12). Features reported by patients for gout flare determination without formal physician examinations are useful in clinical trial settings. In clinical trials, study subjects most commonly experience gout flares at home, and these flares are only reported and evaluated at subsequent study visits. Under these circumstances, physician examination and/or laboratory assessment is not feasible at the time of the flare.

A previous step described elsewhere (13, 14) consisted of qualitative Delphi exercises performed among different groups of rheumatologists to identify key elements for a definition of a gout flare, followed by a cognitive mapping technique among 9 gout experts, with hierarchical cluster analysis providing the final set of items. We asked patients about symptoms of gout flares, and we solicited their comments on the relevance of proposed features for the development of a gout flare definition (13). Based on the opinions of both gout experts and patients, we selected items for potential inclusion in the gout flare definition (13, 14). These items consisted of a combination of patient-reported, physician-assessed, and laboratory-determined parameters. Specifically, items identified included patient's self-report of pain, global assessment of disease activity, swollen joints, tender joints, warm joints, time to maximum pain, time to pain resolution, functional status, assessed through the Modified Health Assessment Questionnaire (M-HAQ) (7), and laboratory measurement of markers of inflammation (13, 14). We used physician-assessed and laboratory parameters in the sensitivity analyses to assess their additional utility in improving the accuracy of gout flare definition.

This report describes the development and initial evaluation of a definition for flare in patients with established gout, using data collected prospectively in an international group of patients and based on clinical variables previously identified. The objective was to construct a definition that was entirely patient-reported and that concurred most closely with the judgment of expert rheumatologists who had examined each patient.

PATIENTS AND METHODS

Gout patient population.

Two hundred twelve patients with an established diagnosis of crystal-proven gout were enrolled in consecutive sequence during a routine or urgent clinic visit at 8 clinic sites located in Asia (n = 45), Europe (n = 42), Latin America (n = 73), New Zealand (n = 30), and Philadelphia, PA (n = 22). Patients were enrolled regardless of gout flare status (intercritical period or currently experiencing a flare). Because of missing data on the main study outcome, 1 patient from Philadelphia and 1 from Taipei were excluded from the analysis, leaving a total of 210 patients (Table 1).

Table 1. Demographic and clinical characteristics of the international group of patients evaluated for a gout flare decision rule
VariableTotal study population (n = 210)Investigator-defined flare (n = 54)Self-defined flare (n = 86)Discrepant flare (n = 41)*
  • *

    Discrepant flares were cases for which there was a discrepancy of opinion between the patient and the investigator about the presence of a gout flare. In 36 cases (87.8%), the discrepancy originated with the patients, who considered that they had a flare, whereas the expert investigator disagreed. In only 5 cases (12.2%) did the discrepancy originate in the opposite scenario.

Age, mean ± SD years56.2 ± 15.055.0 ± 14.353.4 ± 13.851.7 ± 13.7
No. (%) men207 (98.6)54 (100)84 (97.7)41 (100)
Clinic site, no. (%)    
 Auckland, New Zealand30 (14.3)8 (14.8)17 (19.8)9 (22.0)
 Alicante, Spain19 (9.1)7 (13.0)18 (20.9)13 (31.7)
 Mexico City, Mexico40 (19.1)8 (14.8)16 (18.6)7 (17.1)
 Taipei, Taiwan35 (16.7)9 (16.7)10 (11.6)5 (12.2)
 Beijing, China9 (4.3)0 (0)1 (1.2)1 (2.4)
 Vizcaya, Spain23 (11.0)3 (5.6)4 (4.7)1 (2.4)
 Sao Paulo, Brazil33 (15.7)9 (16.7)9 (10.5)2 (4.9)
 Philadelphia, PA21 (10.0)10 (18.5)11 (12.8)3 (7.3)
Ethnicity, no. (%)    
 Black24 (11.4)10 (18.5)11 (12.8)3 (7.3)
 Asian47 (22.8)13 (24.1)15 (17.4)5 (12.2)
 Caucasian76 (36.2)15 (27.8)16 (18.6)3 (7.3)
 Hispanic42 (20.0)11 (20.4)38 (44.2)29 (70.7)
 Pacific Islander18 (8.6)4 (7.4)5 (5.8)1 (2.4)
 Other3 (1.4)1 (1.9)1 (1.2)0
Disease duration, mean ± SD years13.2 ± 10.710.4 ± 8.710.8 ± 8.612.1 ± 8.8
% with tophi47.650.961.275.6
% taking urate-lowering drugs64.355.664.063.4

Patient evaluation and data collection.

At each clinic site, a research associate, a blinded observer (described below), and an expert rheumatologist (ND, FS, JV-M, C-TC, XZ, FP-R, SCK, CG-S, and Lan Chen) independently evaluated all patients. Two investigators (HRS and JD) trained the clinic site study staff during conference calls lasting at least 90 minutes. These investigators discussed all aspects of patient assessment with the study site teams, placing emphasis on ways to achieve standardized patient assessment across the sites. Each clinic site consecutively included all patients with crystal-proven gout who were seen routinely and/or urgently. The research associates (e.g., trained fellows, allied health professionals, and study coordinators at each local site) collected clinical and demographic data from the patients completely independently from the assessments rendered by the blinded observers and the expert rheumatologists. The blinded observers were health care professionals with experience in gout patient care who, for the purpose of internal validation within sites, performed limited assessments independently of the gout expert rheumatologist. To provide an independent internal validation, the blinded observers were unaware of the patient's symptoms or the expert rheumatologist's clinical examination findings and final assessment concerning the presence or absence of a gout flare. Gout expert rheumatologist investigators were chosen based on their considerable experience in gout management. Their global assessment of a gout flare served as the gold standard. To avoid any circular reasoning, the expert rheumatologists were specifically not instructed on how to recognize a gout flare.

The following data were collected as appropriate from patients, blinded observers, and expert rheumatologists using questionnaires translated in the language spoken in each country: demographic information (patient); counts of red, tender, warm, and swollen joints (patient, blinded observer, and investigator). Patients, blinded observers, and investigators assessed global scores with variations of the question, “In all ways considering gout, how are you feeling now?” Pain scores were determined by asking patients the question, “What was the maximum pain from your gout in the past week while you were resting?” The latter 2 items were determined with a 0–10-point numerical rating scale (NRS). In addition, data from the HAQ (disability index, original Stanford version, and M-HAQ) were obtained from the patients (7, 15, 16); and the patient's and investigator's assessments of the presence of a gout flare were recorded. Laboratory parameters included serum urate, erythrocyte sedimentation rate (ESR), and C-reactive protein (CRP) levels. Among the candidate variables preselected in the initial Delphi and cognitive mapping exercises, assessments of acute-phase reactants were initially not included in the criteria, given the focus on patient's self-reported items.

Definition of a gout flare.

Similar to other widely accepted diagnostic and classification criteria in the rheumatic diseases (17–23), the physician's determination was used as the gold standard for the outcome. In this analysis, the expert rheumatologist's determination of the presence or absence of a gout flare was the gold standard and the dependent variable.

To allow for maximum simplification of the variables in the decision rules, counts of swollen, tender, and warm joints were transformed into dichotomous variables (“any swollen joint,” “any tender joint,” or “any warm joint”). The continuous variables (patient's self report of pain, patient's global assessment of disease activity, and patient's assessment of functional status) were analyzed by univariate logistic regression against the dichotomous gout flare gold standard. From this initial analysis, the best-discriminating point (inflexion point in a receiver operating characteristic [ROC] curve) was obtained in order to convert these continuous variables into dichotomous predictor variables. Multivariable logistic regression was then used to narrow the initial 7 items into a smaller number of criteria for subsequent analysis and to exclude collinear variables. These dichotomous variables were tested for collinearity using the phi test, Cramér's V test, and eigenvalues.

Our first approach was based on the “number of criteria,” and our second approach was based on a classification and regression tree (CART) method. These approaches have been used extensively to define decision rules in the rheumatic diseases (24). In the number of criteria approach, all of the elements are weighted equally and categories are generated according to how many criteria are fulfilled (one or more, two or more, etc.). These categories were then compared with the investigator's definition of a flare (our gold standard) for their diagnostic characteristics of sensitivity (frequency of positive definition in those with a gout flare), specificity (frequency of negative definition in those without a gout flare), positive predictive value (frequency of gout flare in those with positive definitions), negative predictive value (frequency of the absence of gout flare in those with negative definitions), and accuracy (proportion of true definitions within all episodes). Finally, categories based on the consecutive fulfillment of these criteria were plotted in a ROC curve to find the inflexion point of maximum discriminating ability (i.e., the maximum value of the sensitivity − (1 − specificity)] between those with and those without a flare of gout, as defined using the investigator's definition.

CARTs are algorithms aimed at achieving the best possible accuracy. For this CART approach, we used Quest software. This method yields binary splits based on estimating the lowest proportion of misclassified cases. We used the 4 variables that were found to be significant in the multivariable logistic regression analysis with a 50:50 cost-function for misclassification between having and not having a gout flare. Other cost-function relative weights were also tested, but this did not change the results of the final tree (data not shown).

The number of criteria and CART approaches were further analyzed in sensitivity analyses by adding joint examination variables (any joint swollen and any joint warm), as reported by the blinded observers and the investigators. We compared the areas under the curve (AUCs) from these sensitivity analyses to those obtained using joint examination findings reported by the patients in the primary analysis. In addition, a weighted approach was explored in order to complement the number of criteria and the classification tree analysis. The weighted approach was based on the odds ratios obtained from the multivariable model and is presented in Appendices A and B. Bootstrapping was used to obtain 95% confidence intervals (95% CI) for the AUCs in the ROC analyses (25).

RESULTS

Patient characteristics.

The demographic characteristics of the 210 patients analyzed to develop these criteria are presented in Table 1. Consistent with the epidemiology of gout, the patients were mostly men. Similar proportions were recruited from each clinic site. Several ethnic groups were represented, including North American and European Caucasians, Blacks, Asians, Hispanics, and Polynesians.

Fifty-four patients (25.7%) were having a gout flare at the time of the assessment, based on the opinion of the expert rheumatologist investigator (gold-standard criterion for the study). A somewhat higher proportion of patients believed they were having a flare at the time of the assessment (41%), as compared to the expert rheumatologist's determination. The rate of disagreements between investigators and patients was uniform among sites, except for one. Removing the patients from that site (data not shown) did not significantly change the diagnostic properties of the criteria presented below. Those with discrepancies between the patient's determination and the expert rheumatologist's determination of gout flare were more likely to be Hispanic, to be receiving treatment with urate-lowering drugs, and to have tophi as compared to patients whose flares were concordant with the expert rheumatologist's assessment. A large majority of discrepancies (87.8%) originated because patients believed they were having a flare, yet the expert rheumatologists disagreed (Table 1).

Details about the clinical features of the episodes of gout flare are presented in Appendix C. The median time to maximum pain was 9 hours. Seventy-one percent of patients reported that the current episode was similar to a previous gout flare experience, but only 55% of patients reported that the flare was similar to previous episodes when there was a discrepancy between the investigator and the patient about the presence of a flare. The knees were the most commonly involved areas in patients with investigator-defined flares, followed by the feet and the hands. In cases of patient-defined flares or when there were discrepancies, the knees were again the most commonly affected area, but there was a higher frequency of involvement in the hands and elbows.

The sensitivity, specificity, positive predictive value, and negative predictive value of candidate variables for defining gout flare criteria are shown in Table 2. Overall, high diagnostic accuracy was noted for a patient-reported flare, a patient report of at least 1 warm joint, and of having a score of >3 (0–10-point numerical rating scale) for pain when at rest.

Table 2. Diagnostic performance of variables considered for the definition of gout flare in bivariable and multivariable models*
Patient-reported features of goutBivariable analysisMultivariable analysis
Sensitivity, % (95% CI)Specificity, % (95% CI)PPV, % (95% CI)NPV, % (95% CI)OR (95% CI)P
  • *

    PPV = positive predictive value; NPV = negative predictive value.

  • Odds ratios (ORs), 95% confidence intervals (95% CIs), and P values were determined by multivariable logistic regression analysis. The area under the curve for the model was 0.944 (95% CI 0.910–0.976).

  • Cut points for pain at rest, patient's global assessment of disease activity score (by numerical rating scale), and Health Assessment Questionnaire (HAQ) scores were determined from a bivariable logistic regression analysis against the dichotomous dependent variables. The best-discriminating point (inflexion point in a receiver operating characteristic curve) was obtained from the maximum value of the following formula for the different data points: sensitivity − (1 − specificity).

  • §

    For pain, the question asked was: “What was the maximum pain from your gout in the past week while you were resting?” For global assessment, the question asked was: “In all ways considering your gout, how are you feeling now?”

Gout flare91 (80–97)76 (69–83)58 (46–68)96 (91–99)25.6 (5.8–113.0)<0.001
Any swollen joint99 (94–99)36 (28–44)34 (27–42)98 (91–99)22.2 (1.1–434.0)0.04
Any painful joint91 (80–97)45 (37–53)36 (28–45)93 (85–98)0.2 (0.02–1.2)0.07
Any warm joint83 (71–92)76 (69–83)55 (43–66)93 (87–97)3.3 (1.1–10.4)0.04
Pain at rest score >3 (0–10 scale)§91 (80–97)80 (73–86)61 (50–72)96 (91–99)11.7 (2.50–55.6)0.002
Patient's global assessment score >2 (0–10 scale)§91 (80–97)65 (57–73)48 (38–58)95 (89–98)0.8 (0.1–4.8)0.81
HAQ score >0.3 (range 0–3)85 (73–93)59 (51–67)42 (32–52)92 (85–96)2.8 (0.8–10.1)0.10

Logistic regression and collinearity analyses.

The multivariable adjusted logistic regression model describing the associations of the selected variables with the investigator's definition of a gout flare is shown in Table 2. A patient-reported flare, report of at least 1 swollen joint, at least 1 warm joint, and pain at rest with a score of >3 were significantly associated with a flare and were used in the subsequent approaches for the criteria definition. The AUC for this model was 0.944 (95% CI 0.910–0.976). We found a small or moderate correlation based on collinearity analyses performed between patient-reported flare, at least 1 swollen joint, at least 1 warm joint, and pain at rest with a score of >3, which allowed us to proceed with the model development based on these variables.

Number of criteria approach.

Figure 1 and Table 3 show the results of the number of criteria approach to defining a gout flare. The AUC for this approach was 0.931 (95% CI 0.890–0.962). Having ≥3 positive criteria of the 4 proposed criteria yielded a high sensitivity (91%) and specificity (82%) and a positive predictive value of 64%. Requiring the presence of all 4 criteria increased the specificity of the gout flare definition to 96% and the positive predictive value to 85% but decreased the sensitivity to 72%. When we replaced the patient's report of swollen and warm joints in the 3-criteria definition of flare with the blinded observer's finding of swollen and warm joints, the AUC was not substantively different (0.941 [95% CI 0.910–0.969]).

Figure 1.

Receiver operating characteristic (ROC) curve plotted by summing the number of criteria that were positive in predicting a gout flare. The 4 criteria used were as follows: patient's report of flare, any swollen joint, any warm joint, and pain at rest score >3. The area under the curve is 0.931 (95% confidence interval 0.890–0.962). Yellow boxes indicate the positions of the different numbers of criteria along the ROC curve. The maximum [sensitivity − (1 − specificity)] is located at the point where 3 or more criteria are found (∗).

Table 3. Number of criteria approach to defining a flare of gout*
Rule: Number of criteria requiredNo. of patients with gout flareNo. of patients without gout flareSensitivity, % (95% CI)Specificity, % (95% CI)PPV, % (95% CI)NPV, % (95% CI)Accuracy, % (95% CI)
  • *

    The criteria consisted of the following patient-reported features: flare, any swollen joint, any warm joint, and pain at rest score >3 (0–10 scale). 95% CI = 95% confidence interval; PPV = positive predictive value; NPV = negative predictive value.

Any54156100 (100)026 (20–32)NA26 (20–32)
1 or more54115100 (100)26 (20–34)32 (25–40)100 (100)45 (38–52)
2 or more536298 (90–100)60 (52–68)46 (37–56)99 (94–100)70 (63–76)
3 or more492891 (80–97)82 (75–88)64 (52–74)96 (91–99)84 (79–89)
All 439772 (58–84)96 (91–98)85 (71–94)91 (85–95)90 (85–93)

Data on the ESR was available in 190 patients and on the CRP in 195 patients. Adding these variables (inflexion points were at ESR >35 mm/hour or CRP >1.75 mg/dl) to the number of criteria approach only minimally improved the predictive value of the flare definition, from an AUC of 0.931 (95% CI 0.890–0.962) to 0.946 (95% CI 0.901–0.973) for the model using the dichotomous ESR variable and to 0.951 (95% CI 0.914–0.975) for the model using the dichotomous CRP variable.

Classification and regression tree approach.

The 2 criteria selected with the CART approach were a pain at rest score >3 and the presence of gout flare according to the patient's report (Figure 2). Diagnostic properties of the CART definition were unaltered despite variation in the cost function for misclassification. The CART approach had a sensitivity of 83.3%, specificity of 90.4%, and AUC of 0.869 (95% CI 0.809–0.919).

Figure 2.

Classification and regression tree (CART) definition for an acute flare in patients with established gout. For this model, the sensitivity is 83.3%, specificity 90.4%, positive predictive value 84.8%, negative predictive value 90.9%. The area under the curve for the model is 0.869 (95% confidence interval 0.809–0.919). CT = classification tree.

Weighted-item approach.

We also tested a weighted approach using logistic regression, but compared to the CART method and to the number of criteria method, it improved the diagnostic properties of the flare definition only minimally (Appendices A and B). Given its added complexity and lack of significant considerable added value, we did not include a weighted approach in our final prediction rules.

DISCUSSION

We describe patient-reported criteria for defining a flare of gout using two common approaches for criteria development validated against an expert rheumatologist's judgment. These criteria were developed for use as part of response criteria in clinical trials for patients with gout. A gout flare definition using the number of criteria approach that required the presence of all 4 features had the highest positive predictive value and lowest number of false-positive results, whereas the definition requiring presence of at least 3 features had the best discriminating ability according to our prespecified methods and a high sensitivity.

These definitions for a gout flare were created as a dimension of a composite gout response criterion, and its intended application is for capturing gout flare events as outcomes of interest in studies of chronic gout and urate-lowering therapy. However, for trialists and investigators desiring the lowest false-positive results, the use of a definition requiring all 4 elements in the number of criteria approach is an option that increases the specificity to 96% and the positive predictive value to 91% and reduces the false-positive rate, as compared to the 3 of 4 criteria approach. The use of this definition, which has the highest specificity and positive predictive value of all our definitions, may be more desirable when considering flare as the entry criterion for a gout clinical trial of a therapy with more severe adverse events.

Our second approach to a gout flare decision rule used a CART method. We found that having pain at rest with a score of >3 followed by a patient-reported gout flare had the best diagnostic characteristics. These 2 approaches are complementary, with the number of criteria at 3 items being more sensitive and the CART and the 4-item criteria approaches being more specific. Depending on the consequences of misclassification, along with the feasibility of measurement, an investigator or clinician should choose the gout flare definition with the diagnostic properties that best fits their needs. Following programs endorsed by the NIH PROMIS initiative (12), we developed gout flare definitions using patient-reported outcomes. Adding the laboratory markers of inflammation variables to the number of criteria models did not lead to a meaningful increase in the diagnostic performance of the rule.

The risk of misclassification is a challenge in developing response criteria. The false-positive rate was 36% for the 3 or more criteria definition and 25% for the CART approach. We provided alternate definitions of gout flare to reduce the false-positive rate, such as the definition using all 4 criteria in the number of criteria approach. The use of this definition, with the highest specificity and positive predictive value of all definitions, may be most desirable when a therapy with more severe adverse events is being tested in a gout clinical trial. The proposed gout flare definitions were primarily crafted for use as an outcome measure for clinical trials in patients with gout. These criteria are not meant to substitute for decision-making in ordinary clinical practice.

The comparison of diagnostic models based on joint examination findings by the blinded observers and expert clinicians showed only modest improvement over the accuracy based on patient-reported joint examination findings (warm joint, swollen joint) and marked overlap in the confidence intervals of the AUC estimates, confirming the value of patient-reported findings. Similarly, adding markers of inflammation (ESR, CRP) to the models led to only minor improvements in the accuracy of the gout flare definition that do not seem to justify the additional logistic and economic challenges of performing these tests during an acute gout flare in clinical trial settings.

The 2 approaches used to generate these criteria are similar to those used to generate the ACR 1990 classification criteria for different types of vasculitis (26–32). More recently, Taylor and colleagues have successfully used classification trees to develop classification criteria for psoriatic arthritis (9). The use of expert investigator–defined outcomes as standard has also been used in several established and developing diagnostic and classification criteria previously endorsed by the ACR and the European League Against Rheumatism (EULAR) (17–23).

Our methodologic approach has several specific strengths. As noted, the methodology we used has been utilized by ACR and EULAR groups for developing classification criteria for several musculoskeletal conditions. Both the number of criteria approach and the CART approach can be influenced by small sample cell sizes. However, we did not encounter this problem in our final set of flare definitions, since we started with a sample size of 210 patients. In our CART definition, the initial variable split was provided by an item (pain at rest score >3) that was significantly associated with the outcome and has high face validity. Therefore, we did not encounter the limitation of poor performance of the CART approach when the initial variable split is not strongly associated with the outcome (33). Our methodology assured independent data collection and case ascertainment, which was useful in avoiding circular reasoning that may have limited the final conclusions. The use of blinded observers helped validate the joint data reported by the patients. Inclusion of an international group of patients improved the generalizability.

The methodology we used had some limitations as well as specific challenges associated with the peculiar aspect of accurately defining gout flares in clinical trials. First, due to the cross-sectional nature of our data collection, we were unable to incorporate time-dependent elements, such as time to maximum pain, in our definitions of flare. Additionally, collection of such data may be challenging in a gout clinical trial, where patients commonly experience flares outside their scheduled study visits, and their recollection and recording of time-related components is fraught with measurement error. It is possible that some patients with severe chronic gouty arthropathy who are not having a current gout flare may fulfill our proposed criteria for a gout flare. Until further validation is completed, our gout flare definition should be used with caution in patients with chronic gouty arthropathy. This issue of misclassification of severe chronic disease is not unique to a gout flare definition, but it is a potential source of misclassification for exacerbations of many chronic rheumatic conditions.

Second, our protocol design, in which experts from international sites assessed patients and determined gout flares that, by nature, are episodic, made cross-validation between sites impractical and unfeasible. Third, we did not map the cognitive processes that led to the diagnosis of a gout flare by the expert investigators whose opinion defined the gold standard in this study. Our group previously examined these cognitive processes among a different group of investigators (14). Fourth, we were not able to collect extensive demographic data on our population, which may preclude detailed comparisons of our cohort with other gout cohorts. As expected, we had a high proportion of men in our sample (as is typical for gout), which limits the generalizability of our findings to women. Finally, patients who self-treat early attacks of gout pose a challenge to any diagnostic definition in diseases characterized by exacerbation of symptoms, and this also applies to a definition of gout flare. Episodes involving self treatment of exacerbations should always be considered separately from untreated, suspected flares in clinical trials, since it may be difficult to determine if they constitute bona fide flares. It may be possible in some future protocols to require patients to contact the study staff before initiating treatment. In any case, the flare characteristics present at the time of self treatment will help investigators with the adjudication of gout flares according to the definition rules.

These criteria should be prospectively and longitudinally validated using data collected in clinical trials. Further validation in a prospective study will allow these criteria to pass the Outcome Measures in Rheumatology (OMERACT) filter of truth, discriminative ability, feasibility, and responsiveness to change (34). Selected clinical trials published to date have not included sufficient data elements to permit validation of these criteria. Incorporation of data elements of this definition will allow us to validate the proposed definition in clinical trial settings. After this validation, gout-response criteria incorporating the flare definition with other elements, such as improvement/change in flare frequency, quality of life/function, serum urate levels, and tophi, will also be developed.

In summary, we present a definition for a disease flare in patients with gout, as part of an effort to develop a gout response criterion. This definition was developed for application in the setting of clinical gout studies, where flares are important outcomes. Our gout flare definition should prove useful to researchers and practitioners who monitor flare frequency to improve the care of patients with gout.

AUTHOR CONTRIBUTIONS

All authors were involved in drafting the article or revising it critically for important intellectual content, and all authors approved the final version to be published. Dr. Singh had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study conception and design. Schumacher, Saag, Taylor, Lang Chen, Chou, Perez-Ruiz, Singh.

Acquisition of data. Dinnella, Outman, Dalbeth, Sivera, Vázquez-Mellado, Chou, Zeng, Perez-Ruiz, Kowalski, Goldenstein-Schainberg, Lan Chen.

Analysis and interpretation of data. Gaffo, Schumacher, Saag, Taylor, Outman, Lang Chen, Chou, Bardin, Singh.

Acknowledgements

The authors acknowledge the support of the ACR and EULAR for providing the funding necessary for this project. In addition, the authors would like to recognize the help of the local site coordinators as well as the study patients.

APPENDIX A

WEIGHTED ITEM APPROACH

After the 4 preselected criteria (Table 2) were entered into a logistic regression model, we defined the relative weights based on their odd ratios (ORs) as follows: 1 unit for patient's report of any joint warmth (OR 3.03), 2 units for patient's report of any joint swelling (OR 5.29), 4 units for patient's report of pain at rest score >3 (OR 11.80), and 4 units for patient's report of disease flare (OR 11.76). Thus, warm and swollen joints are the “minor” criteria, and pain at rest score >3 and self-reported disease flare are the “major” criteria. Results from this analysis are presented in Appendix B.

The best diagnostic properties, accuracies, and the point of maximum discrimination in the ROC curve were found at 8 units (equivalent to having 2 major criteria) and 7 units (equivalent to having 1 major and 2 minor criteria). Having 8 units by this weighted approach was operationally similar to, and had equal diagnostic properties as, the classification tree definition. Having 7 units by this weighted approach was operationally similar to, and had equal diagnostic properties as, the “number of criteria” definition at 3 of 4 criteria. The AUC for this weighted approach was 0.939 (95% CI 0.908–0.970), as compared to 0.931 for the number of criteria approach.

We decided not to add a weighted approach to the preferred options for a definition of gout flare given that it did not outperform the other two options we presented (number of criteria or classification tree) and that it added complexity.

APPENDIX B

Table  . WEIGHTED ITEM APPROACH TO DEFINING A FLARE OF GOUT*
Criteria requiredWeightSensitivity (95% CI)Specificity (95% CI)PPV (95% CI)NPV (95% CI)Accuracy (95% CI)
  • *

    Major criteria are patient's report of disease flare (4 units) and patient's report of pain at rest score >3 (4 units). Minor criteria are patient's report of any swollen joint (2 units) and patient's report of any warm joint (1 unit). The area under the curve for this model is 0.939 (95% CI 0.908–0.970).

2 major883 (71–92)90 (85–95)75 (62–85)94 (89–97)89 (83–93)
1 major and 2 minor793 (82–98)81 (74–87)63 (51–73)97 (92–99)84 (78–89)
1 major and 1 minor698 (90–100)71 (63–78)54 (44–64)99 (95–100)78 (72–83)
1 major and 1 minor598 (90–100)71 (63–78)54 (43–64)99 (95–100)78 (71–83)
1 major498 (90–100)67 (59–74)50 (41–60)99 (95–100)75 (68–80)
2 minor398 (90–100)56 (48–64)44 (35–53)99 (94–100)67 (60–73)
1 minor2100 (100)30 (23–38)33 (26–41)100 (100)48 (41–55)
1 minor1100 (100)26 (20–34)32 (25–40)100 (100)45 (38–52)

APPENDIX C

 

Table  . CLINICAL FEATURES OF GOUT FLARE EPISODES ACCORDING TO DIFFERENT DEFINITIONS
Patient-reported features of gout flareInvestigator-defined flare (n = 54)Patient-defined flare (n = 86)Discrepant flare (n = 41)
  • *

    Data available from 40, 46, and 8 responses in investigator-defined, patient-defined, and discrepant flare groups, respectively.

  • Data available from 52, 48, and 4 responses in investigator-defined, patient-defined, and discrepant flare groups, respectively.

  • Data available from 46, 54, and 9 responses in investigator-defined, patient-defined, and discrepant flare groups, respectively.

  • §

    Data available from 53, 81, and 38 responses in investigator-defined, patient-defined, and discrepant flare groups, respectively.

Time to maximum pain, median (range) hours*9 (1–500)8 (1–500)8 (4–48)
No. of days with current flare at time of clinic visit, median (range)7 (1–40)7 (1–40)5 (2–30)
Similar to previous gout flares, %71.170.455.6
Involved joint is swollen, %98.290.782.9
Involved joint is painful, %90.786.173.2
Involved joint is warm, %81.566.343.9
Involved joint is red, %64.855.843.9
Main area involved, %§   
 Knees21 (39.6)28 (34.6)11 (28.9)
 Toes and feet11 (20.8)14 (17.3)5 (13.2)
 Ankles6 (11.3)10 (12.3)5 (13.2)
 Hands and wrists9 (17.0)17 (21.0)9 (23.7)
 Elbows4 (7.5)10 (12.3)6 (15.8)
 Other2 (3.8)2 (2.5)2 (5.3)

Ancillary