Patient-Reported Outcomes: Conceptual Issues

Authors


Jeff A. Sloan, Department of Health Sciences Research, Mayo Clinic and Mayo Foundation, 200 First Street SW, Rochester, MN 55905, USA. E-mail: jsloan@mayo.edu

ABSTRACT

There is a broad agreement that patient-reported outcome (PRO) assessment in health care should proceed from a strong conceptual basis, with rationales clearly articulated in advance concerning what is to be measured and how this is to be accomplished. The representation of the patient's perspective has been part of clinical trials for some time; but the formalization of, and broader emphasis on PROs has become increasingly important with the release of the draft guidance for industry on patient-reported outcomes. In response, we address the challenges in constructing the conceptual foundations for PRO assessment to support drug product labeling claims submitted to regulatory agencies worldwide.

After discussing what constitutes a PRO concept and an adequate basis for framing a PRO assessment, we examine the consequences of choosing PRO instruments without reference to a well-established conceptual framework for measurement. Then we illustrate through a hypothetical examplethe important interplay between the sponsor's proposed product claim, the corresponding conceptual model that depicts hypotheses involving the PRO concept(s) in the claim, and the resulting conceptual framework(s) for measurement to guide instrument selection and psychometric analyses. We discuss how these conceptual issues may vary or evolve over time depending on the phase of product development.

As the science of PRO measurement continues to develop and experience accumulates, a consensus may emerge on how best to articulate the conceptual basis of PRO measurement for purposes of product labeling and regulation. In the meantime, one point is imminently clear: in regulatory decisions expected to affect not only the quantity but the quality of life, it is imperative to incorporate the patient's own perspective on the illness experience and the effects of therapy.

Introduction

Acknowledgments of the need to capture the patient's perspective of the impact of illness and health-care interventions has grown rapidly over the last decade. Attention has been paid to selection of measures, analysis (e.g., the impact of missing data), and interpretation (e.g., definition of a clinically important difference), but less so to the conceptual models and frameworks for hypothesis generation, analysis, and interpretation of patient-reported outcome (PRO) data.

This article underlines some of the issues that need to be considered to maximize our understanding of the meaning of the PRO data. The next section provides an overview of the use of PRO concepts in health-care research in drug development. The subsequent two sections address explicitly defining the conceptual model for the analysis of a PRO claim and the conceptual framework to guide PRO measurement in the validation of that claim. The last section addresses the need to update our conceptual models to reflect our understanding of the patient's perspective in interpreting the impact of disease and treatment.

Some recommendations to apply issues and techniques raised in this article may be preliminary. More explicit use of conceptual models and frameworks as tools to guide research will improve interpretation and communication of research findings.

What Is a PRO Concept?

“Patient-reported outcome” is a broad term that includes direct subjective assessment by the patient of elements of their health including: symptoms, function, well-being, health-related quality of life (HRQOL), perceptions about treatment, satisfaction with care received, and satisfaction with professional communication. The patient is asked to summarize his or her evaluation of the disease, treatment, or health-care system interactions through various modes, providing perceptions related to the condition, its impact, and its functional implications.

The patient's perception of the illness experience is influenced by internal standards, intrinsic values, and expectations. The importance to the individual is reflected in the evaluative comments and ratings. The PRO provides information unavailable from other sources. These data reflect how the patient interprets the experience and the conditions not observable by others and are distinct from proxy measures.

Patient reports can provide insights into health status with or without a comparator; current functional capacity compared with past performance; the intensity of symptoms or side effects of treatment; impressions of how symptoms affect function; ability to comply with treatment recommendations or rationale for nonadherence; and vivid descriptions of difficulties imposed on personal and family life (e.g., inability to work) [1–3]. To capture these insights from patients in a way that allows for meaningful communication, rules have been devised for the measurement of subjective phenomena [4,5]. These rules, as they apply to PRO assessment in the regulatory environment and for making claims about new products, are described in the Food and Drug Administration (FDA) draft PRO guidance [6]. The PRO concept that is defined in the Guidance is “the specific goal of measurement (i.e., the thing that is to be measured by a PRO instrument)”[6]. PRO concepts may range from the simple, e.g., pain intensity, to the complex, e.g., HRQOL, which itself encompasses several multiple multidimensional concepts. The complexity of the PRO instrument will be driven by the complexity of the concept being measured. For example, a simple concept such as pain intensity may be measured by a single item specific to that concept, while a more complex concept such as physical function that might incorporate aspects of activities of daily living, mobility, etc., would require multiple items or multiple domains. Whether the concept being measured is simple or complex, it is important there be sufficient evidence that the PRO concept is adequately measured to ensure appropriate interpretation of scores and clarity of communication of findings. Issues related to adequacy of evidence are discussed in more detail in later articles in this series.

What Is an Adequate Basis for Framing a PRO Assessment?

Given these considerations, what then is an adequate basis for framing a PRO assessment? A conceptual model that clearly defines the decision-relevant outcomes of interest and their posited interrelationships and possible determinants should guide decision-making. Specifically, a conceptual model should provide the rationale for, and specification of, the PRO outcomes of interest (e.g., mobility, physical function, HRQOL) in some population of interest (e.g., patients undergoing initial treatment for breast cancer) for a particular decision to be made (e.g., choice of appropriate chemotherapy); see recommendations of the Scientific Advisory Committee of the Medical Outcomes Trust [7]. In the context of FDA regulatory decision-making, a conceptual model identifies and describes the PRO concepts and hypotheses that underlie a PRO-based product labeling claim.

Guided by an appropriate conceptual model, one then specifies a corresponding framework for measurement in which all the variables and relationships in the conceptual model are given operational meaning in a way that then guides the selection or development of specific PRO measurement instruments and psychometric approaches to analysis. In line with FDA recommendations [6], a conceptual framework focuses attention on the interrelationships among the PRO domains being measured, the content validity of each PRO instrument, and the construct validity, reliability, and responsiveness of each PRO instrument when applied within a patient population pertinent to the product claim.

To convey a sense of how it is challenging to execute the paradigm suggested above. Most PRO measurement activities reported in the literature have not sprung from a priori formulated conceptual models,described, for example by Ferrans [8]. This disconnect has not gone unnoticed. For example, both Gill and Feinstein [9] and Leplege and Hunt [10] reported that the gap is wide between the rhetoric emphasizing that quality-of-life evaluation must embody the patient's perspective and the reality of HRQOL instrumentation and scoring algorithms. Based on literature reviews, both articles concluded that measurement models have frequently reflected a professional-judgment orientation about how to encapsulate what is important to the patient.

This state of affairs may reflect, in part, the absence of consensus about the appropriate conceptual model for PRO assessment, as Ferrans [8], Erickson [11], Darby [12], and Gustafson [13] have emphasized. If so, a compelling question is how to progress toward such a consensus––one that serves to improve the motivation, conduct, and interpretation of PRO measurement.

The simple idea of decision relevance may supply the key. That is, the selection of a PRO conceptual model should be guided by the specific nature of the decisions that motivate the PRO measurement activity in the first place? For example, the conceptual model for framing hypotheses relevant to a product claim for a promising anticancer therapy that also generates substantial toxicity might emphasize a set of PRO domains and domain relationships that will differ from a conceptual model relevant to interventions for pain management near the end of life. We note that this definition of a conceptual model appears closely akin to the “end point model” that the FDA has presented subsequent to the publication of the draft guidance document [14]. Specifically, the latter is “a model of the relationships among all measures that may be defined as end points––primary or supportive––in a clinical trial or validation study . . . specific to a specific treatment setting . . . [so as to] inform the instrument development and validation process”[14].

The interconnections among the product claim, conceptual model for claim analysis, and conceptual framework(s) for the resulting PRO(s) in the measurement model are discussed in more detail two sections hence. First we examine the importance of having a sound conceptual framework for measurement.

What Are the Consequences of Proceeding with Instrument Selection without a Well-Established Conceptual Framework

An inadequate conceptual framework for PRO measurement can create challenges for: 1) the grouping and scoring of items into domains; (2) the analysis; and (3) the interpretation of PRO scores, all of which are interrelated and may affect the evaluation of a PRO for a label claim. First, because items should be grouped together to represent a clearly defined concept, an unsuitable conceptual framework can obscure the grouping and scoring of items into domains. A misguided conceptual framework for measurement, for example, can lead one to combine items on psychological distress “effect indicators” with items that measure pain, nausea and vomiting, or treatment-related symptoms (causal indicators) [15,16]. A patient with high psychological distress is likely to manifest a high level of anxiety. This patient, however, need not necessarily have high levels of all treatment-related symptoms and side effects; psychological distress does not necessarily imply that a patient is experiencing, say, the symptom of nausea. On the other hand, if a patient does have severe nausea, then nausea is likely to result in, or cause, increased psychological distress.

By showing what items belong to specific domains, a conceptual framework can enhance clarity in the evaluation of the intended claim. Doing so would help to avoid the possibility of putting forward an unwarranted claim based on the performance of an individual item that, in fact, is merely part of an overall PRO scale. We therefore concur with the draft guidance that, “Individual items that contribute to overall score (e.g., dyspnea) generally would not support a dyspnea claim unless the items were developed to measure the claimed concept (e.g., the items validly and reliability capture the impact of treatment on dyspnea)”[6].

The targeted claim should be specific on which PRO concepts are being supported or substantiated for a claim. Particular items should be aggregated intospecific domains when appropriate. Consider the claim that “Drug X reduces anxiety and depression more than Drug Y does in adult men with both conditions.” A plausible conceptual framework for measurement may encompass items such as feeling tense, feeling panic, worrying thoughts, and feeling restlessness to measure the concept of anxiety; and items such as enjoying things, feeling cheerful, and laughing at things to measure the concept of depression, with the two concepts posited as being interrelated.

Second, an inadequate conceptual framework can impair the quality of an analysis. For example, a framework that mixes treatment-related symptoms and functional status could mask the impact of symptoms and functional status in accounting for observed treatment differences. Because treatment-related symptoms and functional status should be grouped and scored separately, a conceptual framework that captures this would lead to an analysis of treatment differences that are transparent and can be clearly communicated.

Without a well-defined conceptual framework, the risk of using the wrong psychometric measurement model is heightened. For instance, analysts might mistakenly use exploratory factor analysis instead of confirmatory factor analysis for a conceptual framework with items that are consequences of disease or treatment (such as pain, nausea and vomiting, or treatment-related symptoms) for empiric validation [15,16]. This distinction is important because such consequences of disease or treatments, which will be misrepresented as factors in an exploratory factor analysis, contain little information about the relationship between the items and the underlying PRO concepts of interest.

The hypothesized and expected relationships among concepts can form the basis for a conceptual framework for measurement before creating the instrument, assessing its measurement properties, and modifying the instrument as needed. Thus, we believe that empiric evidence from psychometric analyses (e.g., exploratory factor analysis) should be used to modify a conceptual framework as part of a fluid process of refinement. Modifying a conceptual framework at least once after a round of creating, assessing, and modifying an instrument should become part of an evolving measurement process conducted before a sponsor files a new drug application. Such modification is intended to strengthen and hone the hypothesized relationships with empiric evidence. These enhancements and refinements can lead to an efficient pathway toward a suitable instrument for evaluating the targeted claim, possibly leading eventually to an instrument with fewer total items than the one originally considered. Our position on this point therefore appears consistent with that in the FDA draft guidance.

Third, an inadequate conceptual framework can reduce the interpretability of the measurement model, because it is not clear what scores obtained from the instrument represent. Consider the FDA draft guidance: “For example, if improvements in a score for a general concept (e.g., physical function) are driven by a single responsive domain (e.g., symptom improvement) while other important domains (e.g., physical abilities and activities of daily living) did not show a response, a general claim about improvements in physical function would not be supported”[6]. For situations in which the conceptual framework shows physical functioning to be validly and reliably measured, and where all items on the selected physical functioning scale (or all domains on physical functioning) move in unison, we believe that a claim about improvement in physical function would be supported.

In summary, an inappropriate conceptual framework for measurement can hinder the scoring, the analysis, and the interpretation of a PRO label claim. Without a well-conceived conceptual framework, the validation process becomes less structured and therefore less likely to empirically support or confirm a target claim with clarity and precision. A well-defined conceptual framework, on the other hand, can lead to a well-defined measurement process in which the rationale for each PRO instrument (specifically, for the instrument's item content) is clearly articulated and well-defended in relation to the specific product claim being advanced [17].

Having underscored the importance of a sound conceptual framework for measurement, we turn to its role within an overall model of the application of PROs to regulatory decision-making.

Application to Regulatory Decision-Making: Hypothetical Examples?

The FDA's perspective on the interconnectedness between the regulatory decision at issue, the concepts to be measured, and the selection of instrumentation was illustrated in a recent conference presentation by agency staff closely involved in development of the emerging guidance document [18]. In the simple example employed, the decision is whether and how the FDA would rule in favor of a particular drug's desired claim (“Velpaz relieves pain without upsetting your stomach”). The posited concepts are “pain relief” and “stomach upset.” The corresponding measurement model calls for PRO data derived, respectively, from a “pain diary” and a “GI-symptom diary.” Some difficult questions lie just beneath the surface of this compact depiction of PRO assessment––for example, how to translate diary-derived data into either single- or multidimensional scales to measure pain relief and stomach upset (because the diary itself is just a means to actual end points) [19]. These are precisely the type of questions that fully articulated conceptual frameworks for measurement via a pain diary and a GI-symptom diary would need to address. The implied linkages among decision, concept, and measurement are underscored.

This example also suggests that the appropriate conceptual model for PRO assessment to address a specific decision, such as product labeling, will likely not be identical to one of the comprehensive “conceptual models” for HRQOL that depict a host of hypothesized interconnections among outcome dimensions, clinical variables, and other covariates [20]. For purposes of FDA decision-making, we infer that a conceptual model need only be sufficiently detailed to clarify, illuminate, and lend support to the analysis of the PRO outcomes put forward in the sponsor's product claim.

In what follows, we illustrate in somewhat more, albeit fictitious, detail the potential connections among the product claim, conceptual model, and conceptual frameworks for guiding the measurement of PROs germane to the claim.

To demonstrate how the nature and complexity of the tasks may vary with the scope or breadth of the product claim, we examine three alternative cases focusing on the same hypothetical drug, Moodlift, which is a candidate for FDA approval. The product is expected to have a positive impact on symptoms of depression as well as the multiple dimensions of HRQOL. Nevertheless, early research suggests the drug is associated with a relatively higher incidence of mild nausea than the active comparator. The sponsor has elected to prospectively assess this aspect of tolerability to allow a more informed and comprehensive assessment of the net benefit of the product. The following claims range from greater to less complexity:

  • 1The product claim is for decreased symptoms of depression and improvement in HRQOL in adult men with major depressive disorder (MDD).
  • 2The product claim is for decreased symptoms of depression and improvement in psychological and social functioning (a subset of the claim no. 1) in adult men with MDD.
  • 3The product claim is for improvement in symptoms of depression in adult men with MDD.

In each case, we describe the Product Claim, Conceptual Model, Conceptual Framework(s) for Measurement, and an illustrative (though fictitious) selection of PRO measures. After developing Case 1 in detail, we provide a more concise discussion of Cases 2 and 3, largely emphasizing the distinctions among the three cases. Each case essentially reflects a different strategy for seeking labeling approval for a given drug.

Case 1: HRQOL and Symptom Status Claim

Product Claim

Moodlift 20 mg, taken once daily, will lead to improvement of symptoms of depression and improvement in HRQOL among adult men with MDD.

Conceptual Model for Product Claim Analysis (see Fig. 1a)

Figure 1.

(a) Conceptual model for product claim analysis: Case 1. (b) Conceptual framework for health-related quality of life (HRQL) measurement in Case 1 product claim analysis.

Concepts and domains.  Symptoms of depression will include malaise, feelings of despair and hopelessness, and impaired decision-making ability. HRQOL will be defined as a multidimensional construct with the following domains of functional status: physical, social, and psychological (which includes emotional functioning) [21]. A side effect of Moodlift is increased incidence of mild nausea in relation to the comparator.

Hypothesized relationships:

  • 1There will be a greater improvement in symptoms of depression in the treatment group compared to the active comparator.
  • 2There will be a greater improvement in social functioning in the treatment group compared to the active comparator.
  • 3There will be a greater improvement in physical functioning in the treatment group compared to the active comparator.
  • 4There will be a greater improvement in a composite HRQOL index, psychometrically derived from social, emotional, and physical functioning scores, compared to the active comparator.
  • 5There will be greater incidence of mild nausea in the Moodlift group compared to the active comparator.
  • 6Based on the underlying biological mechanisms of drug efficacy and side effects, improvements in depression-related symptoms will be positively associated with measurable inhibition in serotonin reuptake, while the occurrence of nausea is positively associated with drug-induced modulation of the serotonin receptors lining the patient's digestive tract.

Conceptual frameworks for measurement of HRQOL, symptoms of depression and nausea (Figs. 1b and 4).  The product claim is for greater improvement in symptoms of depression and HRQOL than the active comparator in adult men suffering from MDD. According to the FDA draft guidance, for an HRQOL claim to be sustained, improvement must be demonstrated in each posited HRQOL domain: physical functioning, social functioning, and psychological functioning. Thus, the conceptual framework for measurement will focus on the selection, or development, of measurement scales for physical, social, and psychological functioning which demonstrate adequate validity, reliability, and responsiveness when applied to adult men suffering from MDD. As in Figure 1b, attention will be paid to the content validity of the items measuring each HRQOL domain and the interrelationships among domains.

Figure 4.

Conceptual framework for symptom and side-effect measurement in product claim analysis for Cases 1–3.

Although HRQOL is posited here to be multidimensional, common approach is to estimate each scale separately, using Classical Test Theory or, more recently, Item Response Theory approaches to select the optimal item content for each scale. Alternatively, multidimensional estimation approaches are available that allow a given item to contribute information to the estimation of multiple scales [22].

Assuming that improvement on all the three scales must be demonstrated to support a HRQOL claim, the conceptual model for measurement does not have to address a challenging issue: how to judge whether HRQOL has improved in aggregate for an individual when some scale scores improve and others worsen in response to the intervention. This approach assumes there is strong prior evidence that improvement in each of the three domains identified does represent improvement in HRQOL.

The sponsor has taken the unusual approach of prospectively measuring an aspect of tolerability that is not expected to be favorable to the product. While an increased incidence of nausea is expected to be associated with the product, it is also expected to be mild and tolerable (only mildly bothersome to subjects). Prospective assessment of this side effect will allow a more informed discussion of the benefits and risks (in this case nausea) of the product.

Finally, an appropriately encompassing conceptual framework for measurement will impose the same standards of rigor (validity, reliability, and responsiveness in the treatment population relevant to the claim) on symptom and tolerability outcome measures as on HRQOL or functional status measures. Such a conceptual framework is represented in Figure 4 (which, as will be seen, is applicable to all three cases here).

PRO instruments.  Changes in depressive symptoms (malaise, despair, and decision-making ability) and nausea will be measured by domains included in the MOOD instrument. Changes in physical, social, and psychological functional status and in overall HRQOL will be measured by the HEAL measurement system, which features a multidimensional construct with distinct domains representing physical, social, and psychological functioning. In recently published studies focusing on the treatment of MDD in adult men, both the MOOD and the HEAL were shown to be valid, reliable, and responsive measures of symptoms of depression, nausea related to selective serotonin reuptake inhibitors (SSRI) use and HRQOL. In particular, selected domains from the MOOD were good measures of the depression-related symptoms noted in the Moodlift claim. Moreover, in these published studies there was a strong positive correlation between symptom changes as measured by the MOOD, and changes in physical, social, and psychological functioning as measured by the scales of the HEAL.

Changes in patient serotonin levels in response to Moodlift will be measured by the SEROT metric. SEROT is a well-validated clinical test of serotonin levels in persons with MDD. It will measure the resulting impact on serotonin reuptake and on modulation of serotonin receptors in the digestive tract.

For patients undergoing treatment with Moodlift, the hypothesized changes in serotonin-related effects, symptom status, functional status, and HRQOL will be estimated using correlation analysis, such as found in structural equation modeling [22].

Case 2: Functional Status and Symptom Status Claim

Product Claim

Moodlift 20 mg, taken once daily, will lead to improvement in symptoms of depression, and psychological and social functioning among adult men with MDD.

Conceptual Model for Product Claim Analysis (see Fig. 2a)

Figure 2.

(a) Conceptual model for product claim analysis: Case 2. (b) Conceptual framework for functional status measurement in Case 2 product claim analysis.

Concepts and domains.  Symptoms of depression and nausea are the same as described in Case 1. Psychological functioning and social functioning are distinct domains (constructs) hypothesized to reflect the impact of MDD.

Hypothesized relationships:

  • 1There will be a greater improvement in symptoms of depression in the treatment group compared to the active comparator.
  • 2There will be a greater improvement in psychological functioning in the treatment group compared to the active comparator.
  • 3There will be a greater improvement in social functioning in the treatment group compared to the active comparator.
  • 4There will be greater incidence of mild nausea in the Moodlift group compared to the active comparator.
  • 5Based on the underlying biological mechanisms of drug efficacy and side effects, improvements in depression-related symptoms will be positively associated with measurable inhibition in serotonin reuptake, while the occurrence of nausea is positively associated with drug-induced modulation of the serotonin receptors lining the patient's digestive tract.

Conceptual frameworks for measurement (Figs. 2b and 4).  The product claim is for improvement of symptoms of depression and in social and psychological functioning in adult men with MDD. Thus, the conceptual framework relevant to Case 2 will focus on the selection, or development, of measurement scales for social and psychological functioning that demonstrate adequate validity (content, construct), reliability, and responsiveness when applied to adult men suffering from MDD. As suggested in Figure 2b, there will be particular attention to the content validity of the items measuring each domain and the potential interrelationships among domains.

Measures and instruments.  These will be the same as in Case 1, with psychological functioning and social functioning measured by the corresponding scales from the multidomain HEAL instrument employed in Case 1. This is justified by recent published studies focusing on the treatment of chronic depression in adult men, in which the social and psychological functioning scales of the HEAL were shown separately to be valid, reliable, and responsive measures. Moreover, in these published studies there was a strong positive correlation between symptom changes, as measured by the MOOD, and changes in social and psychological functioning as measured by these scales of the HEAL.

Case 3: Symptom Status Claim

Product Claim

Moodlift 20 mg, taken daily, will lead to improvement in symptoms of depression in adult men with MDD.

Conceptual Model for Product Claim Analysis (see Fig. 3)

Figure 3.

Conceptual model for product claim analysis: Case 3.

Concepts and domains.  Symptoms of depression will include malaise, feelings of despair and hopelessness, and impaired decision-making ability.

Hypothesized relationships:

  • 1There will be a greater improvement in symptoms of depression in the treatment group compared to the active comparator.
  • 2There will be greater incidence of mild nausea in the Moodlift group compared to the active comparator.
  • 3Based on the underlying biological mechanisms of drug efficacy and side effects, improvements in depression-related symptoms will be positively associated with measurable inhibition in serotonin reuptake, while the occurrence of nausea is positively associated with drug-induced modulation of the serotonin receptors lining the patient's digestive tract.

Conceptual framework for measurement (Fig. 4).  The same as for the symptom and tolerability measurement components in Cases 1 and 2.

Measures and instruments.  The same as for Cases 1 and 2.

In sum, these examples are intended to illustrate schematically the execution of the following steps:

  • 1Identify the product claim, including applicable population.
  • 2Define the conceptual model for product claim analysis, with all relevant study hypotheses stated.
  • 3Articulate a conceptual framework for measurement for each PRO concept used in the analysis.
  • 4Specify the elements of the corresponding measurement model, which operationalizes the conceptual model and is guided in its construction and testing by the conceptual framework(s).

The principal distinctions between Cases 1, 2, and 3 relate to the breadth and scope of the product claim––the general approach to laying out the analysis does not change.

One important working assumption distinguishes Cases 1 and 2: the proposed principle stated at multiple points in the FDA draft guidance that to claim an improvement in HRQOL, there must be concurrent improvements in all important domains comprising HRQOL [5]. Thus, under Case 2, if the product sponsor hypothesized and subsequently found significant improvements in psychological and social functioning––but, contrary to expectations, no significant change in physical functioning—and if, there was a significant overall improvement in HRQOL (as assessed through the posited HEAL measurement system), the HRQOL part of the claim would nonetheless not be allowed. This conclusion apparently holds even if there are substantial improvements in psychological and social functioning so that the HRQOL score improves significantly, by either distribution-based or anchor-based criteria for clinical significance. As long as a product's claim to improve HRQOL requires a “dominant solution” (improvement along every HRQOL dimension), there will be a need for the Case 1–Case 2 distinction as drawn here.

Do Specific Conceptual Issues Vary or Evolve by Phase of Product Development (i.e., Phase I, II, III, or IV Trials)?

In studying new therapeutic areas, investigators have limited information about how the condition and treatment affect individuals. Thus, efforts to model this phenomenon will be limited. As more evidence becomes available, conceptual models may change by becoming more elaborate and precise or less complex. During the process of drug development, considerable information is obtained that may help inform understanding and lead to refinement of models of disease and treatment. For example, as toxicity data become available, modifications to the dosing schedule often occur. Similarly, we would expect that conceptual models may evolve over time as new information becomes available.

The focus of a conceptual model at different stages of product development may also be different. Developing an elaborate conceptual model describing the impact of disease and treatment on an individual may be a desirable goal in understanding the total illness experience; doing so, however, may not be useful or desirable in addressing specific questions that decision-makers may pose at different stages of the development process. For example, in early product development, researchers need to identify those aspects of patient functioning and well-being most important to patients and amenable to therapeutic intervention and to determine the best measures of those concepts. This may be straightforward in the case of therapeutic areas that have been studied for many years, but it may be a more exploratory process for therapeutic areas that are less well understood.

A conceptual model may help guide selection of options for investigation. In later stages of development, health authorities need to focus on whether the appropriate end points have been identified and whether findings based on these end points are correctly interpreted. Following product approval, determining whether a product should be included in a formulary may require other information. For example, the degree of satisfaction with aspects of a product in the context of customary care may become important and should be incorporated into the model.

Greater understanding of a therapeutic area may also influence the conceptual framework for measuring concepts that are identified over time as relevant to patients. Addition or deletion of domains or items to assess domains may be appropriate with more information from a larger pool of persons with the condition under study or application to new conditions. More data may allow development of abbreviated and alternative forms or scoring algorithms.

Conclusions

This article addressed four conceptual issues in the development and use of PRO measures: 1) the definition of a PRO concept; 2) the description of an adequate conceptual framework; 3) the consequences of proceeding with instrument development; and 4) the variability of conceptual issues over the lifespan of product development. The potential implication of these issues for regulatory decision-making was illustrated through hypothetical examples that emphasize the interconnections between the product labeling claim, the conceptual model guiding the specification of hypotheses, and the conceptual framework guiding instrument selection or development and data analysis.

We provided no definitive answers to these questions. The interpretation and resolution of issues surrounding the measurement of a PRO are evolving as experience accumulates, but the importance of obtaining the patient's perspective on the illness experience is imperative. A conceptual model is important in guiding our understanding of what to measure and how to measure it and for providing context for interpreting findings. Such models should be assessed using psychometric techniques when feasible, although a fairly advanced level of knowledge in a therapeutic area may be required to evaluate a more complex model. Finally, conceptual and measurement models are not static. As more information becomes available, our thinking about what to measure and how to measure it must evolve.

Source of financial support: Funding for the meeting was provided by the Mayo Foundation in the form of unrestricted educational grants; North Central Cancer Treatment Group (NCCTG) (CA25224-27) and Mayo Comprehensive Cancer Center grants (CA15083-32).

Glossary

  1. For further definitions, see the FDA guidance document which includes an extensive glossary (http://www.fda.gov/cder/guidance/5460dft.htm).

Average-value carried forwardAn imputation method for missing data which inserts the average response for any missing value.
Cognitive debriefingAsking questions after a survey is completed to determine if there were difficulties with the item content or questions.
Cognitive interviewAsking questions to gain understanding as the respondent completes a survey to determine what and how he/she is thinking.
Conceptual frameworkHypothesis of relationships among the domains being researched.
Conceptual modelProvides the rationale for and specification of the patient-reported outcomes of interest in the population of interest that will result in a specific treatment decision.
Differential item functioningTesting to determine whether one group responds differently to an item than another group despite controlling for differences.
End point modelDescribes how the end points in a study are expected to interact and justifies the need for their assessment.
Item bankA collection of assessment tools that measure a single domain, have undergone extensive review, and have been calibrated to a set of properties matching the study population.
Item response theoryAn approach to assessment construction that involves analysis of individual item responses.
Last-value carried forwardAn imputation method for missing data which inserts the last value observed for any missing value.
Measurement strategyUsing items or instruments designed to assess the domains of interest.
Minimal important differenceThe smallest change in a patient-reported outcome measure that is perceived by the patient as beneficial or resulting in a change in treatment.
Zero-value carried forwardAn imputation method that replaces any missing data with the value zero.

Ancillary