We are grateful for the interest shown by Fries et al in our work comparing the SF-36 PF with the HAQ DI in PsA using the Rasch model. We agree with many of their comments, especially the usefulness of the information function plot that is displayed. This provides a useful insight actually quantified by the Rasch measurement model: increased precision of the estimate of the attribute is achieved by including items that are as difficult as the ability of the sample. In other words, items that are targeted to the sample of interest lead to more precise measurement. We agree that computer-adaptive testing is an excellent approach to address this. We also agree that sensitivity to change and other psychometric properties are crucially important in measurement. However, each of these has already been comprehensively evaluated in the measures we addressed.
It is still fundamental that items within a scale fit the Rasch model for rational computation of scores and appropriate statistical evaluation of the effects of interventions and the impact of conditions. Summation absolutely demands unidimensionality among other characteristics and it is not true, as Fries et al seem to assert, that unidimensionality is somehow less important than content validity. Content validity is essentially a question of targeting as described above, but unless the items fit the Rasch model, any apparent targeting and validity in the content of items as part of a single scale is challenged. We find the emphasis that the HAQ DI is supposed to be an aggregated score reflecting functional difficulties across major disability areas to be irrelevant. Aggregated scores are the product of all multiple-item instruments, but if any item happens to behave very differently than the others, then such aggregation leads to scores that have little useful meaning. Improving such items, moving them to a separate scale if they seem fundamentally important and meaningful, or at times removing them altogether, improves the usefulness of the scale. However, this is not suggesting unthinking or arbitrary discarding of items. It also seems to us that there is some inconsistency with an argument in favor of computer-adaptive testing and a lamenting of trimming items to improve the measurement properties of the instrument. The ultimate aim of computer-adaptive testing is to achieve precise measurement with as few items as possible, clearly an approach that utilizes trimming of items. In the absence of such technology in the clinic it seems reasonable to try and achieve appropriately targeted and psychometrically sound static instruments by reexamination of item behavior and revision of the instrument if necessary.
We agree that the language of Rasch analysis remains for many somewhat mysterious and obscure, but the science is very definitely robust. Rasch analysis may be a relatively new approach to considering measurement in clinical practice. However, it can be very illuminating, by highlighting scales that are not actually scales and measures that are rather short on fundamental measurement properties. Given that it is not that long since responsiveness, minimally important clinical difference, and the merits of agreement as opposed to correlation became part of the modern measurement lexicon, we argue that we should persist with Rasch analysis, use it wisely, and question assumptions accordingly.