Gout measures: Gout Assessment Questionnaire (GAQ, GAQ2.0), and physical measurement of tophi

Authors

  • William J. Taylor

    Corresponding author
    1. University of Otago, Wellington, and Hutt Valley District Health Board, Lower Hutt, New Zealand
    • Rehabilitation Teaching and Research Unit, University of Otago Wellington, PO Box 7343, Wellington, New Zealand
    Search for more papers by this author

INTRODUCTION

Outcomes assessment of gout has been a relatively neglected area of rheumatology measurement science until recent years, in which the paucity of properly tested instruments have been highlighted by clinical trials of therapy for acute gout attacks and more recently preventative treatment of chronic gout. This paucity may have been due to the traditional reliance on simple pain responses for acute gout and upon serum urate changes for chronic gout as the typical outcomes of interest. This has changed significantly since gout became a topic for the Outcome Measures in Rheumatology Clinical Trials (OMERACT) in 2004 (1) and the general recognition of patient reported outcomes as vital for proper understanding of the effect of treatment.

This review is based upon work conducted through the OMERACT process, review of the Ovid Medline database (to August 2010) concerning the keywords “gout” AND [“outcome measure.mp” OR “Questionnaire”], personal archives of the author, and other work conducted by the author in collaboration with colleagues from the OMERACT Gout Working Group.

GOUT ASSESSMENT QUESTIONNAIRE (GAQ, GAQ2.0)

Description

Purpose.

The original GAQ, reported in 2006 (2), was developed to fill a large gap, there being no other gout-specific patient reported outcome instrument. It was conceived as measuring the impact of gout and its treatment from the patient's perspective, but was developed largely within the context of a single clinical trial. The GAQ2.0, reported in 2008 was developed with more patient involvement and was tested in a community-based sample of gout patients (3).

Content.

The GAQ is a 21-item questionnaire that collects information about gout impact, assessing pain, well-being, productivity, and treatment satisfaction. The GAQ2.0 contains a Gout Impact Scale (GIS) and 4 other sections that collect clinical, background, and economic data that are not scored. The GIS scores the domains of overall concern, medication side effects, perception of unmet needs, and impact of acute episodes.

Number of items.

There are 21 items in the GAQ. There are a total of 31 questions in the GAQ2.0, but most of these are categorical or designed to be reported as individual items, and are not summated. There are 24 items (in 3 questions) in the GIS portion of the GAQ2.0, which are summated to form 5 scales. The other questions describe the respondents' gout, recent attacks, treatment, medical history, and demographics. These additional questions are not formally scored.

Response options/scale.

Most items of the GAQ are scored on a Likert scale and some are scored by number of days of hours of activity restriction. There are 5 sections of the GAQ2.0 with 31 items. Each item has different response options. Each item of the GIS portion of the GAQ2.0 is rated “strongly agree” to “strongly disagree,” “all of the time” to “none of the time,” or “not a bit” to “extremely” on a 5-point Likert scale.

Recall period for items.

There is no stated recall period.

Endorsements.

The instrument has not been endorsed by any group.

Examples of use.

The instrument has not been reported in any published study, except for the original 2 development studies. The instrument developers have published other articles, but these are presentations of data from the same group of patients studied in the instrument development process. One article focused on the Short Form 36, version 2 (SF-36) scores and categories of gout characteristics from the “Gout Background Questionnaire” (presumably a component of the GAQ2.0, although this was not explicitly stated) (4). Another article focused on health care utilization (5) and another focused on discrepancies between patient and physician rating of gout severity (6).

Practical Application

How to obtain.

The instruments were developed through a pharmaceutical development program by TAP Pharmaceutical Products (now Takeda Pharmaceuticals), which retains copyright. However, GAQ2.0 is freely available from Dr. Omar Dabbous, Senior Director of Global Health Economics and Outcomes, Takeda Pharmaceuticals International Deerfield, IL (E-mail: omar.dabbous@tpna.com).

Method of administration.

The instrument is self-reported.

Scoring.

The GAQ is scored in 7 subscales: gout concern, well-being, productivity, gout pain and severity, treatment convenience, treatment satisfaction, and treatment bother. Each subscale contains 1–6 items. Subscales are reported in a 0–100 range but the detailed scoring procedure is not reported.

The GIS portion of the GAQ2.0 is scored in 5 subscales (total of 24 items): gout concern overall (4 items), gout medication side effects (2 items), unmet gout treatment need (3 items), well-being during attack (11 items), and gout concern during attack (4 items). Each item is scored on a 5-point Likert scale. Subscales are reported in a 0–100 range but the detailed scoring procedure is not reported.

Score interpretation.

Higher scores indicate more problems.

Respondent burden.

The time needed to complete the GAQ or GAQ2.0 has not been reported. The GAQ2.0 consists of 7 pages and the GIS portion is 1.5 pages.

Administrative burden.

Not reported.

Translations/adaptations.

There are no language or cultural translations. The instrument was developed in the US in 3 centers, mainly with male subjects.

Psychometric Information

Method of development.

Items for the original GAQ were identified mainly through literature review. Items were potentially modified through telephone interview with 5 gout patients after the draft questionnaire was completed by postal survey. The instrument was tested in a phase 2 clinical trial of febuxostat compared to placebo (126 patients). The subscales were formed through factor analysis but the details of this analysis have not been published.

During development of the GAQ2.0, 2 focus groups were conducted but the method of qualitative analysis was unclear. Some new items were added as a result of the focus group interviews. The GAQ2.0 was tested in a community cohort of patients with gout (297 people) and analysed using Rasch modelling and confirmatory factor analysis with structural equation modelling.

Acceptability.

Not reported.

Reliability.

The GAQ instrument was not evaluated for test–retest reliability. Approximately one-fifth of the validation sample completed the GAQ2.0 on 2 occasions over 2 weeks. The intraclass correlation coefficients (ICC) ranged from 0.77–0.89 for the 5 subscales of the GIS, but it was not clear which subscale belonged to which ICC.

Validity.

The internal consistency of the GAQ subscales was assessed using Cronbach's alpha, and ranged from 0.83–0.97. This statistic was not suitable for the single item scales (treatment convenience, treatment satisfaction). Construct validity was assessed by correlation with the SF-36 subscales. Correlation was generally low (<0.45) for each GAQ subscale (7). The highest correlations for each subscale were well-being (0.30 SF-36 bodily pain), productivity (0.30 SF-36 role-physical), gout concern (0.41 SF-36 bodily pain), treatment satisfaction (0.35 SF-36 bodily pain), gout pain and severity (0.45 SF-36 bodily pain), treatment bother (0.20 SF-36 vitality), and treatment convenience (0.14 SF-36 vitality).

Internal consistency assessed using Cronbach's alpha of the GAQ2.0 GIS scales ranged from 0.60–0.94. The 2-item Gout Medication Side Effects scale and the 3-item Unmet Gout Treatment Need scale had poor internal consistency (0.60 and 0.65, respectively.) Although item-fit statistics were presented for the Rasch analysis, overall model fit, formal tests for unidimensionality, local dependence, and item bias were not reported.

The construct validity of the GAQ2.0 GIS scales is difficult to discern since the scales are rather idiosyncratic and are poorly represented by any other reported scale or concept. However, all reported correlations are low in magnitude and some of these are not supportive of construct validity. In particular, the correlation between gout concern overall and patient-rated severity was only 0.45; unmet gout treatment need and attack frequency in the past year was 0.43; gout concern during an attack and typical attack pain during the past 3 months was 0.21; the physical functioning scale and general health scale of the SF-36 version 2 failed to correlate beyond 0.3 with any of the GIS scales.

Ability to detect change.

There are no data to show that GIS scales change over time.

Critical Appraisal of Overall Value to the Rheumatology Community

Strengths.

The GAQ2.0 is the only published gout-specific instrument that attempts to measure the impact of gout from the perspective of the patient, and to comprehensively describe the experience of having gout.

Caveats and cautions.

Two subscales of the GIS portion of the GAQ2.0 were considered by OMERACT 10 and were not endorsed as having sufficiently met the OMERACT filter for use in clinical trials of chronic gout (8). These subscales were gout concern overall scale and unmet need scale. The construct validity of all 5 scales of the GIS portion of the GAQ2.0 is unclear. The overall concept of “impact of disease” is ambiguous and not well-defined.

Clinical usability.

The instrument is not recommended for routine clinical use at this time.

Research usability.

The instrument is not recommended for use in research settings at this time, except where the purpose of the research is further refinement of the instrument.

PHYSICAL MEASUREMENT OF TOPHI (TAPE MEASUREMENT, VERNIER CALIPERS, ENUMERATION, DIGITAL PHOTOGRAPHY)

Description

Purpose.

Tophi are pathognomonic of chronic gout and may be responsible for joint damage, as well as being unsightly and intrinsically undesirable. Tophi are a legitimate target for treatment (9) and therefore require a satisfactory method of measurement. A number of physical methods have been used to achieve this purpose and will be discussed here. The purpose of these techniques is primarily to determine response to therapy that might reduce tophus burden.

Content.

Enumeration of tophi by simply counting the total number of palpable tophi is a rapid and inexpensive method. The tape measurement technique has been described to determine the area of a sentinel tophus and uses a standard tape measure to identify the distance between 2 pen marks drawn on a predefined length and width axis that are orthogonal to each other. The area is calculated as the product of these 2 distances. Vernier calipers (150 mm digital) have also been used to determine the longest diameter of a sentinel tophus. Digital photography using a standardized image acquisition protocol has also been used to determine change in tophus burden. The reported approach (Computer-Assisted Photographic Evaluation in Rheumatology, CAPER) has specified up to 5 measurable tophi (10). Using electronic calipers, the longest axis is measured together with the orthogonal axis to produce a measurement of area. Measurable tophi are defined as ≥5 mm in their longest dimension and to have distinguishable borders. In addition, up to 2 nonmeasurable tophi could be assessed qualitatively if they were ≥10 mm in their largest dimension (Table 1). The reported scoring system was the categories at the patient level (complete response, partial response, stable disease, and progressive disease) based on the definitions in Table 1 (10).

Table 1. Definitions for change in tophus burden using digital photography (10)
MeasurementTophus responsePatient response*
  • *

    Defined as the best tophus response in the absence of a new tophus or progressive disease in any tophus (in which case the response is progressive disease).

For ≤5 measurable tophi  
 100% decrease in tophus areaComplete responseComplete response
 ≥75% decrease in tophus areaMarked responsePartial response
 ≥50% decrease in tophus areaPartial responsePartial response
 Neither a 50% decrease nor 25% decrease in tophus areaStable diseaseStable disease
 ≥25% increase in the tophus areaProgressive diseaseProgressive disease
For ≤2 nonmeasurable tophi  
 Disappearance of the tophiComplete responseComplete response
 Approximately ≥50% reduction in sizeImprovedPartial response
 Neither improvement nor progression can be determinedStable diseaseStable disease
 Approximately ≥50% increase in the area of the tophusProgressive diseaseProgressive disease

Number of items.

Not applicable.

Response options/scale.

Not applicable.

Recall period for items.

Not applicable.

Endorsements.

These techniques have not been unequivocally endorsed by any group. However, during OMERACT 10, 56 of 68 (82%) of the nonundecided participants agreed that the Vernier calipers method met the OMERACT filter for truth, discrimination, and feasibility. There were 37 additional participants who voted “Don't know” (11).

Examples of use.

Enumeration of tophi has been used in randomized clinical trials of febuxostat and allopurinol (12, 13), which showed that the number of tophi decreased after 40 months of effective urate-lowering therapy. In these trials, the tape measure method of a sentinel tophus has also shown change after prolonged normalization of serum urate levels. The Vernier calipers method has been used in a study that compared tophus size obtained from computed tomography. There was strong correlation between the 2 measurement techniques (14). In a longitudinal observational study, the Vernier calipers method was used to show that the velocity of tophus regression correlated strongly with the degree of urate lowering (15).

The digital photography method has been used in 2 replicate trials of pegloticase where it was shown that tophi regressed significantly after 12 weeks of therapy (10).

Practical Application

How to obtain.

The methods of Vernier calipers, enumeration, and tape measurement are available for public use and are described clearly in a recent review (16). The digital photography method was developed by Savient Pharmaceuticals and RadPharm. Details of how to use this approach are available from Steve Hamburger (E-mail: shamburger@savientpharma.com).

Method of administration.

These techniques are observer administered by direct examination.

Scoring.

This is explained in the Content section and Table 2.

Score interpretation.

Higher scores indicate greater tophus burden.

Respondent burden.

Not applicable.

Administrative burden.

The enumeration method, Vernier calipers, and tape measurement method are rapid (5 minutes) and require minimal or no equipment. The digital photography method requires a high up-front payment for the initial equipment (approximately $500) then low repeat costs. A training manual and video are available. Image acquisition takes 5 to 7 minutes and image analysis takes up to 35 minutes depending on the number of tophi.

Translations/adaptations.

Not applicable.

Psychometric Information

Method of development.

These methods were developed as part of an effort to demonstrate changes in tophi in response to treatment.

Acceptability.

Not reported.

Reliability.

The reliability of the enumeration method and the digital photography method has not been reported. The intraobserver reliability for the tape measure method was ICC 0.92 (95% confidence interval [95% CI] 0.88–0.94), mean ± SD −0.2 ± 835 mm2. The interobserver reliability (site 1) was ICC 0.92 (95% CI 0.86–0.96), mean ± SD −150 ± 982 mm2; (site 2) ICC 0.85 (95% CI 0.75–0.91), mean ± SD 7 ± 925 mm2 (17). The intraobserver reliability for the Vernier calipers method was ICC 1.0 (95% CI 0.99–1.0), mean ± SD −0.72 ± 2.42 mm. The interobserver reliability was ICC 0.99 (95% CI 0.97–0.99), mean ± SD 0.45 ± −2.3 mm (14).

Validity.

All methods have good face validity. Only the Vernier calipers method has been compared with other methods of tophus measurement to establish construct validity (14). The study showed that there was a very high correlation (r = 0.91, P < 0.0001) between measures obtained by computed tomography and the Vernier calipers method. There was no difference in the coefficient of variation in measures obtained by either method. Only subcutaneous tophi were assessed by these methods and microscopic confirmation that the measured nodules were in fact tophi has not been obtained.

Ability to detect change.

Each method has been shown to change in response to effective urate-lowering therapy. For the enumeration method, the mean percentage reduction in the total number of tophi was 58.5% after up to 3 years of treatment with an effect size of 0.47. Furthermore, a small but significant difference in the mean percent decrease in the number of tophi was observed with febuxostat 120 mg (−1.2) compared to placebo (−0.3) at 28 weeks (P < 0.05) (18). Using the tape measure method, tophus size was reduced by 59% and the effect size was 0.48 (13). A between-group difference in the change in tophus size has not been demonstrated with this method.

In a longitudinal observational study over 5 years, the Vernier calipers method showed that the velocity of tophus regression ranged from 0.57 to 1.53 mm/month and an effect size of 1.83 was observed (15). In this same study the velocity of tophus regression was greater in patients treated with benzbromarone. In the clinical trials that employed the digital photography method, 40% of patients experienced complete resolution of tophi and higher rates of complete resolution were observed in patients treated with pegloticase compared to placebo (7%; P = 0.002) (11).

Critical Appraisal of Overall Value to the Rheumatology Community

Strengths.

Physical measurements of tophi are generally quick and easy to perform and are able to demonstrate change over time in people treated with effective urate-lowering therapy.

Caveats and cautions.

Only palpable, subcutaneous tophi are observable by physical measurement. However, intrasynovial or periarticular tophi are more likely to be responsible for joint damage in gout and it is not currently clear that change in subcutaneous tophi will mirror change in unobservable tophi, although this seems likely. Another important unresolved issue is whether there are particular sites at which sentinel tophus assessment is more or less reliable. In addition, the minimally important change in tophus size or number has not been determined.

Clinical usability.

The Vernier calipers and tape measurement methods are easily accommodated in the clinical setting. The observer reliability of these measures is sufficiently high to justify their use in the clinical setting. The enumeration method and digital photography cannot be recommended for routine clinical use at present, mainly because their observer and retest reliability are not published.

Research usability.

The method with the most complete information regarding psychometric properties is the Vernier calipers method. This method received sufficient endorsement at OMERACT 10 to recommend this method for clinical research, although it has not yet been employed in the context of a randomized clinical trial.

AUTHOR CONTRIBUTIONS

Dr. Taylor drafted the article, revised it critically for important intellectual content, and approved the final version to be published.

Ancillary