In a recent article in Arthritis Care & Research, Taylor and McPherson (1) compared the Health Assessment Questionnaire disability index (HAQ DI) and the Short Form 36 (SF-36) physical functioning subscale (PF) using Rasch analysis in a small cross-sectional study, suggesting that the analysis favors the SF-36 PF over the HAQ DI in psoriatic arthritis (PsA). Studies such as this bring item response theory approaches to analyses of patient-reported outcomes. Although this effort is by itself meritorious, it carries the hazard that relatively unfamiliar terminology may obscure rather than illuminate. Under some circumstances, Rasch analysis has posed unacceptable threats to content through trimming of items to a more unidimensional construct, which then lacks face and content validity (2). Some of us would argue that sensitivity to change, face and content validity, and reliability, not studied by Taylor and McPherson, are among the most essential attributes of an outcome assessment instrument, and that item separation, ceiling and floor effects, and differential item functioning, although not unimportant, are less essential.
Furthermore, the authors' analyses and interpretations misunderstand the construction of the HAQ DI, which was designed to balance content across categories into a single score, not to be disaggregated into subdimensions (profiles). HAQ DI categories were not designed to be ranked or separately reported, but to ensure attention to all major content areas of disability. The PsA patients compared with the rheumatoid arthritis patients had on average much better physical functioning (HAQ DI score 0.5 versus 1.23), raising issues of different performance in different populations. This cross-sectional study cannot get at the most critical outcome assessment issues nor lead to definitive conclusions.
That being said, there are useful insights here. First, an unresolved clinical issue with PsA is whether we should assess only the arthritis or some sum of the skin and the joint disease. If it is the latter, a health-related quality of life instrument might perform strongly. Second, where disability is near the population norm, an instrument designed for more normal populations (SF-36 PF) might perform well. Third, we agree that floor and ceiling effects have received less attention than warranted.
Figure 1 shows measurement precision, where a standard error of 2.3 corresponds to reliability of Cronbach's alpha of 0.95 graphed against theta values, normalized so that 50 represents the average functioning of a normal population and each set of 10 units represents 1 standard deviation (3). The best instrument would have the lowest and broadest curve; the lowest point shows the degree of physical functioning where information content is maximal and greater breadth reduces floor and ceiling effects. These data confirm the authors' belief that the HAQ DI has its greatest item information content in sicker populations and the SF-36 PF in more normalized ones, and that floor effects are more common with the HAQ DI than the SF-36 PF. Most importantly, use of computer-adaptive testing, where items are dynamically selected for the individual based upon prior responses, can readily outperform static instruments using a similar number of items.
The National Institutes of Health Patient-Reported Outcomes Measurement Information System (PROMIS) is approaching these issues with qualitative as well as quantitative item review and calibration, with the best of the HAQ DI and the SF-36 PF together with other items used in dynamic (computer-adaptive testing) rather than static instruments. These instruments will clearly supersede our present standards. PROMIS item banks are in the public domain and may be accessed at www.nihPROMIS.org and at ARAMIS.Stanford.edu for the PROMIS HAQ. We have entered an era of higher performance standards for patient-reported outcomes and better outcome measures for studies.