Investigators can take the following steps to design a PRO measurement strategy: 1) identify the relevant domains to measure; 2) develop a conceptual framework; 3) identify alternative approaches for measuring the domains; and 4) synthesize the information to design the measurement strategy. These steps represent the process that a company could go through to develop its PRO measurement strategy. The FDA spoke strongly in favor of sponsors providing complete and detailed documentation describing the development of any PRO instrument used to support a claim . Nevertheless, an argument can be made for the FDA's receiving only complete information about the claim and the instrument(s) used to support that claim related to the domains to be studied. The necessity of providing a comprehensive listing of every domain affected by the disease and its treatment, or a comprehensive description of alternatives for measuring every domain, adds significantly to cost and documentation required in the claims submission.
Step 1: Identifying the relevant domains to measure. In determining the relevant domains to measure, investigators should develop a comprehensive list of domains that are affected by the disease itself and also its treatments based on their therapeutic effects and side effects. Next, using all available information, they should create a list of the domains that are expected to be affected by the experimental therapy. In each case, consideration must be given to both positive and negative effects. Then they should narrow the list based on relevance criteria, in particular whether the domain is relevant to the proposed labeling claim.
Regarding the negative impact of therapy, both experimental and control, side effects are often assessed through the provider-completed adverse event reporting system (e.g., using a Common Toxicity Criteria scoring system). This is not PRO measurement as considered in this article. In some cases, investigators may wish to assess these side effects as part of the PRO measurement strategy. They may want to obtain information on the functional impact of the side effect, such as conducting an assessment of the bother of (rather than the frequency or severity of) certain side effects.
Because adverse events and PROs are not synonymous, both have a place in assessing the impact of therapy. Because standard adverse event reporting may fail to detect some differences, PROs may provide a more sensitive measure in some instances. When a sponsor wishes to demonstrate a superior safety profile (i.e., fewer or less severe side effects), getting the information directly from the patient is well-advised, although not mandatory. Thus, although a sponsor may choose at times to assess the negative effects through its PRO measurement strategy, PRO assessment of negative aspects of treatment should not be confused with standard assessment of adverse drug events .
Step 2: Development of a conceptual framework. The next step is to develop a conceptual framework based on the identified domains. The conceptual framework should outline the relationship between the domains and the hypothesized impacts, both positive and negative, of the experimental and control therapies. This conceptual framework may be useful in refining the goals for PRO measurement. Specifically, investigators can use the conceptual framework to identify their final domains of interest and set priorities for the truly important ones––i.e., those for which the company aims to obtain a labeling claim and thus those that will be most important in designing the PRO measurement strategy. For companies that also seek to conduct a more comprehensive PRO measurement as part of the study, the conceptual framework can help identify all of the domains of interest, not just those that will be used to support the labeling claim. For more information on developing a conceptual framework, see the article by Rothman et al. .
Step 3: Identifying candidate approaches for measuring the domains. After investigators identify the relevant domains and develop a conceptual framework, their next step is to identify the most suitable approach for measuring the domains of interest from alternative approaches. Not uncommon is a situation in which no single instrument covers all domains targeted for the labeling claim. In such instances, investigators may need to use multiple instruments, to modify or adapt an existing instrument, or to develop a completely new instrument. Thus, determining how to measure a PRO in a regulatory context involves not simply selecting an instrument, but rather designing a measurement strategy that will address the targeted domains.
Researchers should consider the relative strengths and weaknesses of the alternative instruments in terms of their comprehensiveness and their psychometric performance . The Medical Outcomes Trust (MOT) developed a list of review criteria that can be used to evaluate the performance of candidate measures . The eight review criteria the MOT proposes include the conceptual and measurement model, reliability, validity, responsiveness, interpretability, burden of administration, alternative forms/modes of administration, and cultural and language adaptations. For more information on evaluating the psychometric performance of instruments, see the article by Frost et al. .
When considering alternative measurement approaches, one should first determine whether an existing single instrument is an option. The literature should be searched to identify which instruments have been used previously in similar studies to determine how well they performed. The literature may contain potentially useful instruments that have not been used previously in similar studies. An example of the use of a single instrument occurred in the evaluation of Advair Diskus (GlaxoSmithKline, Research Triangle Park, NC, USA) for patients with asthma. The Asthma Quality of Life Questionnaire (AQLQ) was used to assess the patient's perception of asthma and its treatment. Based on the AQLQ results, the label for Advair notes that patients in the Advair Diskus group experienced improvements in their overall asthma-specific quality of life that were clinically meaningful in comparison with the group on placebo .
An existing single instrument for the targeted domains may not be available or sufficient, in which case alternative approaches to PRO measurement should be considered. Although the FDA draft guidance states that if an adequate PRO instrument does not exist a new PRO instrument can be developed , investigators have several alternative options when an existing instrument is not adequate. They may be able to modify or adapt existing instruments. Also, if an instrument covers most of the domains of interest, it can be used and supplemented with scales or items from other existing instruments or even with scales that are developed for that particular study. Such adaptations and modifications require varying degrees of revalidation work as discussed in the second half of this article.
The evaluation of etanercept for rheumatoid arthritis (RA) illustrates a measurement strategy using multiple instruments to cover the relevant domains. Several PRO measures were used, including the Health Assessment Questionnaire (HAQ), the SF-36, items assessing energy and mental health from the Medical Outcomes Study, and a single-item rating scale assessing current health . The resulting package insert notes that all subdomains of the HAQ improved in patients receiving etanercept in two studies. It also notes that, in the study that included the SF-36, the patients receiving 25 mg etanercept showed significantly more improvement in the SF-36 physical component summary than the patients receiving 10 mg etanercept .
Another option when no single existing instrument addresses the relevant domains is to modify or adapt an instrument previously used in other studies and tailor it to the objectives of the proposed study. For example, eflornithine cream was developed to treat unwanted facial hair (hirsutism), but no existing PRO instrument assessed the impact of hirsutism. In this case, the researchers developed the ESTEEM scale (Exchanges of affection, Social interactions, Time spent removing facial hair, Encountering new people, Engaging in work or school, Minimizing overall bother with facial hair) by adapting the Bother Assessment in Skin Conditions scale (BASC), that had been developed and validated in the assessment of hyperpigmentation. The BASC was modified to create the ESTEEM scale by adapting characteristics of bother and discomfort to the setting of hirsutism . The resulting label claim for eflornithine notes that it significantly reduced how bothered patients felt by their facial hair and by the time spent removing, treating, or concealing facial hair .
A final alternative is to develop a new instrument specifically tailored to the study. This approach, too, requires documentation that the new instrument is valid and reliable in this setting. For example, the International Index of Erectile Function (IIEF) was developed to detect changes resulting from treatment for erectile dysfunction . The IIEF was used as the primary measure for the clinical efficacy of sildenafil, specifically focusing on two questions related to ability to achieve erections sufficient for sexual intercourse and maintenance of erections after penetration. Patients also used daily diaries on their sexual function and responded to a global question. Results from the IIEF were used to support the labeling claim for sildenafil, which notes that maintenance of erections after penetration was better in the sildenafil-treated patients than in placebo patients . It also notes that sildenafil improved the frequency, firmness, and maintenance of erections; frequency of orgasm; frequency and level of desire; frequency, satisfaction, and enjoyment of intercourse; and overall relationship satisfaction. For more information on instrument development, see the article by Turner et al. .
One final consideration when evaluating alternative measurement approaches is whether it is feasible and advisable to include a general health status measure. Adding a general health status measure can identify unanticipated consequences, both positive and negative, of the experimental therapy or comparators. Using such instruments can also promote comparisons across diseases and populations. Including an additional instrument does increase administrative and respondent burden and costs. Although investigators should always consider this alternative, it will not always be appropriate.
Step 4: Synthesizing the information to design the measurement strategy. After identifying the relevant domains and the alternative approaches for measuring them, the research team needs to consider the trade-offs of the various strengths and weaknesses and determine the best measurement strategy based on the study's priorities. First, they need to include domains targeted for a labeling claim. Second, the instruments used to measure the targeted domains should be psychometrically sound, and the best available, for the given application.
Patient-reported outcome researchers face trade-offs when designing a measurement strategy. On the one hand, using a previously developed and well-validated instrument lends credibility to the measurement strategy and allows for greater comparability across studies. If the instrument does not adequately target the relevant domains, it may not be as sensitive and responsive in its measurement properties as desired. On the other hand, using newly developed questions that are specifically tailored to measure the relevant outcomes for a given study may be more sensitive to differences and responsive to changes, but this approach requires significantly more work to demonstrate that the measure is valid and reliable.
Using such study-specific instruments does not promote cross-study comparisons. When investigators must deal with such trade-offs, they may find it helpful to identify the domains of greatest importance and ensure that those are measured with instruments of the greatest validity and reliability and allow secondary domains to be measured with instruments that may not have been as well tested. For example, if a company is targeting depression and sexual function for a labeling claim but also wants to measure social function and emotional well-being, depression and sexual function should take priority in designing the measurement strategy, with social function and emotional well-being of secondary concern.
Because trade-offs and compromises will likely be required, the next step is to consider how to strengthen the weaker areas of the selected approach. If an instrument is being supplemented with newly developed items, pilot testing and validating these items before their use may be helpful. Similarly, if a previously developed and validated instrument is being used in new ways or new populations, testing the instrument in the target application or population before the main study is the ideal practice. Validation concurrent with the pivotal Phase III trial is a reasonable strategy, especially considering that the risk in this endeavor lies with the sponsor.
In some circumstances, using pivotal Phase III data to confirm the psychometric properties of an instrument is acceptable. For example, when a sponsor includes an instrument in Phase II studies, analyzes its psychometric properties, and on the basis of those analyses, revises the instrument, it is reasonable and standard practice to administer the revised version of the instrument in phase III and use those data to confirm the reliability and validity of the final version. The FDA is seemingly still evaluating whether validation should be done before Phase III trials. This approach may be incongruous with the agency's stance that, in certain circumstances, molecular biomarkers can be validated during the course of pivotal Phase III trials. Ideally, in a concurrent validation strategy the data should be collected parallel in time with a separate study.
Studies by Damiano et al. illustrate the four-step process for designing a measurement strategy in preparation for clinical trials in Parkinson's disease . First, the researchers identified the areas affected by Parkinson's disease and its treatment. They conducted a literature review and consulted with clinicians and patients to identify the relevant domains. They also identified the two Parkinson's disease-specific questionnaires available at the time. They reviewed how well the two Parkinson's disease measures covered the relevant domains and the evidence available regarding their psychometric performance. Based on this review, they developed and tested a measurement strategy in the target population . The measurement strategy included one of the instruments evaluated in the review (the Parkinson's Disease Questionnaire-39), but because this instrument did not address sexual function, they also included the MOS Sexual Function Scale. Finally, the SF-36 was used to identify the impact on general health status and to identify any unanticipated consequences. They also used this validation study to evaluate two modes of administration (at the study site and over the telephone). Thus, the validation study could be used to support the application of this measurement strategy in a clinical trial for a regulatory submission.