Each measure of patient-reported change provides useful information and is susceptible to bias: The need to combine methods to assess their relative validity


Patient-reported outcomes, such as quality of life, are nowadays widely used to evaluate the effectiveness of treatments for medical conditions in randomized controlled trials. Measuring change in patient-reported outcomes, however, presents a challenge because different methods for measuring change have yielded discordant results. In the study by van Koulil et al, published in this issue of Arthritis Care & Research, changes in patient-reported outcomes were assessed using 2 different methods in a randomized controlled trial of tailored multidisciplinary treatment for fibromyalgia (1). Van Koulil and colleagues' results add to the existing evidence, which shows that changes in patient-reported outcomes inferred from different methods may not concur (2–8). These findings beg the questions of how such discordant results can be explained and what is the relative validity of each method. These questions have occupied the field of patient-reported outcomes for a long time and are the subject of ongoing research.

The first method used by van Koulil and colleagues is the assessment of patients' retrospective perception of the direction and magnitude of change. This method accords most with actual clinical practice, in which physicians usually rely on asking patients if they are better, the same, or worse since their last visit or some relevant event. While persuasively common, retrospective judgments of change require cognitive effort and are far from simple. Patients must not only be able to recall their previous health status, but they must also be able to quantify both their previous and current health status and subsequently perform a mental subtraction. There is ample evidence that patients' memory of their past health status is flawed (9, 10). Moreover, there is increasing evidence that patients' perception of change is highly correlated with their current status and not with their previous status, which indicates that patients' perception of change is primarily based on their current health status (5, 9, 10). For example, a patient may currently feel quite well and may thereby infer that his or her health status must have improved since baseline. An interesting finding is worthy of note: patients' perception of change has been found to correlate more strongly with treatment satisfaction than other measures of change (2, 8). Perhaps treatment satisfaction and perception of change are both correlated with current health. Patients who evaluate their current health to be good would also be expected to be more satisfied with and to perceive their health outcomes as having improved. Similarly, a poor health status is likely to concur with dissatisfaction and perception of physical deterioration. Retrospective measures of change are also subject to other biases, such as social desirability responding, effort justification, and cognitive dissonance responding. These biases are dependent on patients' interests and expectations and may lead to exaggeration or underestimation of benefit.

The second and most common method for measuring change, also used by van Koulil and colleagues, is to assess patient-reported outcomes prospectively and to calculate the mean difference between 2 time points, i.e., pre-post changes. Since this method is not susceptible to recall bias nor requires a cognitive intricacy such as mental subtraction, it has been assumed to be more accurate than patients' retrospective perception of change. However, measuring pre-post changes introduces other biases. The method assumes that patients evaluate themselves against the same reference value at each time point. To be able to compare scores over time, response options should be interpreted similarly, i.e., the meaning of a 4 on a 7-point scale should not change to a 3 or 5. It has been indicated in the literature that this is not the case (11). As a result of adaptation to changing circumstances, such as the onset of illness or start of treatment, patients may adopt different reference values over time. This phenomenon is referred to as a “recalibration response shift.” For example, breast cancer patients may perceive themselves as being exhausted just after breast surgery, but during chemotherapy, when their fatigue has seriously worsened, they realize that exhaustion is what they feel now and that in the past they were just tired (6). Such recalibration response shifts could explain paradoxical findings where somatically ill patients report a quality of life comparable to or better than that of healthy individuals.

A third method for measuring change has been suggested to account for the recalibration response shift. After completion of the conventional posttest, patients are administered the same questionnaire again, but now as a retrospective pretest, or so-called “then-test,” in which they are asked to provide a renewed judgment about their health status at the time of the pretest. For example, if the pretreatment assessment contains the question “How has your quality of your life been during the last 4 weeks?” the corresponding then-test question would be “How has your quality of your life been during the 4 weeks before you started treatment?” Since the then-test is completed immediately after the conventional posttest, it is assumed that the same reference value is used for both measurements.

Consequently, the comparison of a then-test with a posttest is a measure of change assumed to be unaffected by a recalibration response shift. Studies conducted among patients with a variety of medical conditions have shown that then-test/posttest change scores generally yield larger estimates of change than conventional pretest/posttest change scores (6, 7, 11). However, most studies are inconclusive about the question of which measure of change is more valid. It should further be noted that the then-test is susceptible to the same biases as mentioned for the first perception of change method, including recall bias, social desirability responding, effort justification, and cognitive dissonance responding (10).

All measures have their strengths and weaknesses and they are all susceptible to partly overlapping and partly different biases. Van Koulil and colleagues have provided an extensive list of such biases that were partly reiterated here. We would like to add that each method of change includes measurement error in varying degrees, e.g., the internal consistency reliability or stability will not likely be perfect. The notorious unreliability of difference scores also merits attention. Therefore, it may not come as a surprise that these different methods lead to different results regarding magnitude or even direction of change. Of interest, van Koulil and colleagues found moderate to relatively high correlations between patients' perception of change and pre-post changes on the physical outcomes, in contrast to small or nonsignificant correlations for psychological outcomes. Questions about physical outcomes may refer to more concrete and specific behavior, whereas questions about psychological outcomes allow more room for subjective interpretation. Psychological outcomes may therefore be particularly susceptible to the above mentioned biases.

The question arises of which method constitutes the most valid assessment of change. Unfortunately, few studies conducted a head-to-head comparison among the 3 measures of change. Moreover, what would be the criterion by which one would assess the relative validity? For most patient-reported outcomes (quality of life, fatigue, or pain), no external gold standard exists against which self-reported changes could be evaluated. Van Koulil and colleagues suggest that investigating the correspondence of quantitative self-report measures of change with qualitative methods, such as in-depth interviews, might be useful. Recently, the relative validity of perception of change questions and pre-post changes were investigated using the patients' qualitative narratives as the gold standard of change among participants in a chronic disease self-management program (12). In this study, pre-post changes were found on more occasions to be discordant with the patients' qualitative narratives than the perception of change questions. The inconsistency was largely due to a change in patients' perspectives following the pretest. The authors concluded that perception of change questions are more valid than pre-post questions for the assessment of health education program outcomes. However, a serious limitation of this study is that the gold standard consists of a retrospective self-report method, which can be expected to be susceptible to the same biases as the perception of change questions.

Perhaps a more stringent approach to validate self-report measures of change is the use of non–self-report clinical measures of change that are known to be related to quality of life, an approach also suggested by van Koulil et al. To our knowledge, only one study conducted a head-to-head comparison among the 3 measures of change using clinical criteria in the context of health-related quality of life (7). We previously investigated the association between patients' perception of change in quality of life, pre-post change, and then-post change with 4 different non–self-report clinical measures of change among 268 patients with human immunodeficiency virus type 1 (HIV-1) who were starting highly active antiretroviral therapy (HAART). Patients completed questions about their overall quality of life, energy/fatigue, mental health, and social functioning as a pretest at the start of HAART, and after 36 weeks as a posttest and a then-test. These patients also completed questions about perceived change in these domains since baseline. We calculated correlation coefficients between these 3 types of self-reported change, and changes in CD4 cell count, plasma HIV-RNA concentration, body mass index, and hemoglobin concentration. To our surprise, we found that then-post changes were most strongly related to the clinical measures of change (P = 0.02 to P < 0.01), suggesting this to be a more valid measurement of change than the other 2 methods. These results remained when we restricted our analysis to the majority of patients who were unaware of their laboratory results, i.e., CD4 cell count, plasma HIV-RNA, and hemoglobin at the time of assessment of patient-reported outcomes. Interestingly, distinguishing between changes inferred from the pre-post method and the then-post method was meaningful, because the pre-post method did not lead to changes that would be considered clinically significant, whereas the then-post method did. We were puzzled by the fact that the perception of change questions yielded associations with clinical measures of change similar to those obtained using the pre-post method. Because perception of change questions and the then-test are both retrospective measures, they are both subject to recall bias and we expected them to behave more similarly. We can only speculate why the then-post method yielded stronger associations with clinical measures of change than the perception of change questions. Possibly, it is cognitively easier to provide a renewed judgment of a previous health state than to consider change in a health state.

The above mentioned studies only provide indirect evidence about each method's susceptibility to biases and their relative validity. We also need direct evidence by obtaining insight into how patients actually make change evaluations. To date, we have only limited understanding about the cognitive processes underlying responses to questions about perception of change, pre-post, and then-post changes. Studies using cognitive interview techniques would provide such critical information. During cognitive interviewing patients think aloud while completing questions about quality of life and are further questioned to elicit more information about the cognitive processes. The resulting information can be used to examine the assumptions inherent to the different methods. For example, for perception of change questions, one could examine whether patients correctly recall their previous health status and accurately compare their previous and current functioning (13).

To date, there is no conclusive evidence about the relative validity of the different measures of change. Discrepancies among perception of change, pre-post changes, and then-post changes suggest that different information is being obtained with these methods. Each of these methods has shown to provide valuable information about the impact of disease and treatment from the patient perspective. However, some guidelines can be provided. Given the need for retrospection, we recommend that perception of change questions and a then-test only be used when the baseline refers to a salient event, such as the start of treatment or onset of illness, and the period since baseline is not too long to enable accurate recall. Moreover, researchers should be aware that these 2 retrospective measures of change are particularly susceptible to potential biases if patients undergo treatment or participate in interventions that require time and effort, and induce expectations about the direction of change. Extra measures need to be taken to take such biases into account. When applying pre-post questions, researchers should be alert to the occurrence of recalibration response shift in the interim. Such response shift is likely to occur if patients undergo major and enduring changes in health status that require adaptation. In addition to administering the then-test, other design and statistical approaches can be used to measure and account for such response shift effects (11). We additionally suggest that researchers keep combining different methods for measuring change in patient-reported outcomes, as van Koulil and colleagues did. Such studies will ultimately increase our insight into the relative validity and usefulness of different methods for measuring change.