SEARCH

SEARCH BY CITATION

Keywords:

  • guidance;
  • outcomes research;
  • quality-of-life research;
  • regulatory

Abstract

  1. Top of page
  2. Abstract
  3. Introduction
  4. Conclusions
  5. References

Introduction: Health-related quality-of-life (HRQL) can be defined as the impact of disease and treatment across the physical, psychological, social and somatic domains of functioning and well-being. Health-related quality-of-life measures are included in clinical trials of drug treatment to assess the impact of therapy on the patient's functioning. HRQL guidance could allow for use of this data in drug labeling and promotion.

Objectives: The aim of our study was to provide recommendations with respect to regulatory issues important to the development of guidelines for HRQL research.

Methods: The HRQL workshop was planned jointly by members of the Pharmaceutical Research and Manufacturers of America Health Outcomes Committee and the Division of Drug Marketing, Advertising, and Communications of the Food and Drug Administration. The workshop was limited to six regulatory issues related to HRQL research in clinical trials of pharmaceutical therapies. These six issues were: instrument selection and validation, study design, data analysis, HRQL and safety, clinical meaning, and promotional use. Before the meeting, a consensus was reached that HRQL does not measure, nor should it be used to measure, safety. Therefore, five work groups discussed HRQL issues and made recommendations.

Results: Overall, the workshop recommended that HRQL measures be treated as any other clinical end point. The workshop recognized that research in HRQL methods is ongoing and that any guidance should be flexible to allow for changes in this developing research area.

Conclusions: HRQL provides a patient perspective on the impact of disease and therapy on patients' daily life and functioning. Including HRQL information in promotion could be beneficial to decision making on the use of therapies. HRQL is a measure of effectiveness, not safety, and should be treated as any other clinical end point.


Introduction

  1. Top of page
  2. Abstract
  3. Introduction
  4. Conclusions
  5. References

This paper is a summary of a workshop held on March 24 and 25, 1999, at which regulatory issues for health-related quality of life (HQRL) were discussed. This summary reflects the viewpoints of workshop participants and as such is not intended to reflect FDA policy, nor is it a consensus of the field of health-related quality of life.

Health-related quality-of-life measures are more and more frequently included in clinical trial assessments of drug treatment. These patient-reported outcome measures are often assessed in addition to more traditional clinical measures such as blood pressure, cholesterol, morbidity events, and physician-based disease assessments. Implicit in the use of patient-reported outcome measures such as HRQL is the concept that pharmaceutical interventions can affect parameters of HRQL such as physical function, social function, or mental health functioning [1].

Patient-reported outcome measures such as HRQL may provide better information, both positive and negative, about the actual impact of therapy on the patient's life than more objective or traditional clinical measures. HRQL measures provide a patient perspective on the impact of the disease and its treatment on the patient's daily life and functioning. Additionally, HRQL may provide more relevant information to third-party payers and patients making decisions regarding the use of a therapy. The pharmaceutical industry would like to be able to use patient-reported outcome data, including HRQL data, on drug labels and in promotional campaigns. As a result of increased interest in health-related quality of life, there has been an effort to provide guidance to researchers, particularly the pharmaceutical industry, on the conduct of HRQL research.

The Federal Food, Drug, and Cosmetic Act (FFDCA) Section 505(d) defines evidence of effectiveness of new drugs as follows: “… substantial evidence [of drug effect] means evidence consisting of adequate and well-controlled investigations.” Health-related quality-of-life research fits within this definition of substantial evidence [2]. In the context of an approved label for a drug, a single, well-controlled, clinical trial of HRQL could constitute a labeled claim for improvement in HRQL. Standards for the assessment and inclusion of HRQL data as a patient-reported outcome measure on labeling and in promotional campaigns are not currently available. To assist in the development of guidance on HRQL measures and research, a workshop on regulatory issues for HRQL was held on March 24 and 25, 1999. The workshop was planned jointly by the members of the Pharmaceutical Research and Manufacturers of America Health Outcomes Work Group (PhRMA HOWG, now called the PhRMA Health Outcomes Committee [PhRMA HOC]) and the Division of Drug Marketing, Advertising, and Communications of the Food and Drug Administration (DDMAC FDA). To facilitate discussion, the workshop was limited to six primary regulatory issues dealing with health-related quality-of-life measures collected in clinical trials of pharmaceutical therapies.

The workshop included introductory presentations by experts to provide an overview of the current state of HRQL for each of the six issues outlined in Table 1. Among the topics discussed was the relationship of HRQL and safety assessment. A consensus position that HRQL is a measure of effectiveness was agreed to before the meeting. While it can be argued that HRQL can be used to measure both bother and tolerability associated with a disease or a treatment, the group felt that HRQL measures were not developed for, nor should they be used for, detection of safety problems, and that HRQL measures are not a substitute for collection of spontaneously reported adverse events. Individual questions or overall scores may be sensitive to and inform about safety issues. However, they reflect events and effects without attribution and as such should not require review or individual reporting with the aim of “signaling” safety problems. HRQL when used as a research measure should be accompanied by all customary protocol-driven protections and monitoring of safety as is done for any other effectiveness measure. In some cases, researchers have excluded specific questions from validated HRQL questionnaires in order to avoid including questions that may deal with adverse events (e.g., the Sickness Impact Profile asks about suicidal ideation). However, this invalidates the instrument or, at a minimum, the domain. Therefore, in most cases the complete validated measure should be used, although exceptions to this rule may be appropriate.

Table 1.  Introductory presentations
Title of presentationSpeakerAffiliation
What constitutes health-related quality of life for labeling and promotion?Nancy Kline-Leidy, PhD, RNHealth Outcomes Research MEDTAP, International
Instrument selection: How much is enough?Dennis Revicki, PhDHealth Outcomes Research MEDTAP, International
Collection of HRQL data: Are there special considerations for study design?Albert Wu, MD, MPHSchool of Hygiene and Public Health Johns Hopkins University
Analysis of HRQL data: Are there special considerations for analysis?Lisa Kammerman, PhDDivision of Biometrics, Center for Drug Evaluation and Research, Food and Drug Administration
Interpretation of HRQL: What does it mean?Geoffrey R. Norman, PhDClinical Epidemiology and Biostatistics McMaster University
HRQL and safety: How are they related?Hugh Tilson, MD, DrPHHealth Affairs Glaxo Wellcome, Inc.
HRQL and the FDA: What can we say?Laurie Burke, RPh, MPHManaged Care Outcomes and Labeling Staff Food and Drug Administration

Separate work groups were convened for each of the remaining five issues. Table 2 lists the five work groups and the primary purpose of each. Work-group participants were representatives of the pharmaceutical industry and FDA who were interested in health-related quality-of-life research. Each participant either chose (based on interest) or was assigned to (based on need to balance groups) one of the five work groups. Two facilitators were assigned to lead and record each group's discussions. Guidelines for facilitating participation and discussion were identified and used, and each work-group facilitator followed the format outlined in Table 3.

Table 2.  Health-related quality-of-life work groups
Work groupsPurpose of work groups
Instrument selection and validation Study design Data analysisTo discuss selection, development, and validation of health-related quality-of-life measures for use in clinical trials of pharmaceutical therapies. To discuss clinical trial study design issues related to planning trials of pharmaceutical therapies conducted to evaluate measures of health-related quality of life.
Interpretation of clinical meaning in Health-related quality-of-life measuresPurpose: to discuss analytic issues related to health-related quality-of-life measures collected in clinical trials of pharmaceutical therapies.
Promotional use of health-related quality- of-life resultsPurpose: to discuss issues related to definition and calculation of clinically meaningful changes and differences in health-related quality of life measures within clinical trials of pharmaceutical therapies. To discuss promotional use of health-related quality-of-life measures from clinical trials of pharmaceutical therapies.
Table 3.  Organization of workshop groups
Introductions and objectives
Identify and prioritize key issues
Develop problem statements
Current state
Impact
Desired state
Develop technical recommendations
Present results to workshop

Facilitators reviewed the objectives for the work-group participants. Work-group members were asked to brainstorm to develop a list of items on each issue. After assembling a list of issues, participants voted to determine priority items and narrow the selection to those items that the majority of participants identified as being of primary importance. Once the primary items were identified, facilitators proactively solicited input from each participant to ensure that all viewpoints were being considered. The current state of HRQL research, the impact of the current state, and the desired state were discussed for each issue. This paper summarizes the issues and discussions regarding the current state for each issue and regulatory recommendations from each of the five work groups of this workshop pertaining to health-related quality of life. It is important to recognize that the recommendations reached by the work groups do not represent FDA policy but rather are a consensus of the various work groups.

Work Group 1: Instrument Selection and Validation Workshop

Purpose: to discuss selection, development, and validation of health-related quality-of-life measures for use in clinical trials of pharmaceutical therapies.

The work group identified the following five key issues pertaining to instrument selection and validation:

  • 1
    What are minimally acceptable standards of evidence for validity and reliability in health-related quality-of-life measures?
  • 2
    Can the same clinical trial data be used for instrument validation and measurement of treatment effects?
  • 3
    Should the same standards of evidence be applied to previously developed, well-accepted instruments as well as newly developed and validated instruments?
  • 4
    What is an acceptable or unacceptable relationship between clinical and health-related quality-of-life measures?
  • 5
    What standards should be used to decide whether to use a disease-specific and/or general health-related quality-of-life instrument in a clinical trial?
What are minimally acceptable standards of evidence for validity and reliability in health-related quality-of-life measures?

The workshop participants recommended that guidelines request that any promotional use of HRQL measures include a description of methods used to evaluate the measurement characteristics of specific HRQL measures. This description could include such validation measures as reliability, construct and discriminant validity, and responsiveness as appropriate and known within the context of the disease, study population, and use of the measure. The description could be in the form of a footnote or as a reference in the promotional piece to provide information dealing with validation of the measure.

It was clear from the discussion that there is no clear consensus among researchers or the FDA regarding acceptable standards for validation or and use of instruments that will allow HRQL claims to be included in promotion and advertising. In general, HRQL measures have been developed and validated more rigorously than many physician-assessed clinical measures or patient-completed clinical measures. The literature provides some guidance in terms of evaluating HRQL measures for reliability and validity [3–11]. The generally accepted level for intraclass correlation coefficients for test-retest reliability is 0.7 or greater. However, this may not apply to certain diseases that are not stable or that vary over time, and some measures may not be suitable for evaluation of test-retest reliability. Validation needs to be understood within the framework of a disease, a population, and the use of the measure because the validity of an HRQL measure may vary among diseases and populations. The measurement characteristics of HRQL measures need to be understood in a manner similar to any other measure of disease and treatment. A clinical test (such as blood pressure or cholesterol level) that does not show good test-retest reliability, validity, or responsiveness could not be used to assess disease status or response to therapy.

The work group recommended that descriptive (rather than prescriptive) guidelines for what constitutes validity of HRQL measures should be established. That is, guidelines should not mandate a minimally acceptable level of reliability or validity, but rather should request that the measurement characteristics of the HRQL measure be described. This would allow reviewers to determine whether the measurement characteristics were well understood within the disease and population in which they were used. Furthermore, the work group recommended that guidelines not provide a list of acceptable HRQL measures.

Can the same clinical trial data be used for instrument validation and measurement of treatment effects?

Use of clinical trial data for instrument validation and measurement of treatment effects should be limited to only those situations in which it can be clearly shown that treatment does not affect item selection or validation.

When measuring HRQL within the context of a clinical trial, there is often a desire to provide additional information about the measurement characteristics of an instrument by maximally using the clinical trial data. Participants agreed that under certain conditions, using the same trial data for both validation and measurement would be acceptable, such as in the case where baseline measurements are used to determine reliability. However, under other conditions this would not be acceptable, such as in the case where a large pool of possible items for an HRQL measure are included in a clinical trial of a particular therapy and items are subsequently selected to comprise a final HRQL measure based on a potential benefit associated with that treatment. In the former situation, only baseline data is being utilized for evaluation of an HRQL measure, whereas in the latter, item selection for HRQL measurement could be biased in favor of the therapy being evaluated, thereby creating a measure limited to only those items for which the therapy has a positive effect. It would be preferable to select items that are important to patients regardless of the treatment received. Data from early-phase clinical trials may be useful in determining the validity, and particularly the responsiveness, of HRQL instruments to be used in later trials. Guidelines could provide a comprehensive set of situations that would more broadly demonstrate the limits of acceptability.

Should the same standards of evidence be applied to previously developed, well accepted instruments as well as newly developed and validated instrumentation?

A description of the development methods and measurement characteristics (e.g., reliability, validity, and responsiveness) should be provided for all instruments, both new and old.

It was recognized that familiarity with old, commonly used instruments imparts a degree of credibility that may not be justified. A description of the development methods and validation procedures should be provided for all measures of HRQL, including well-established instruments. If a set of minimally acceptable standards of evidence has been developed, all instruments should be required to meet these, and use of those instruments that do not meet such standards, even if they have been in routine use, should be discouraged.

What is an acceptable or unacceptable relationship between clinical and health-related quality-of-life measures?

HRQL measures provide additional information beyond that obtained with more traditional clinical measures such as physician assessments and physiologic, anatomic, and laboratory measures. In cases where the relationship between clinical measures and HRQL is discordant, regulatory review has generally accorded greater credibility to clinical measurements than to measurements of HRQL.

Patient-reported outcome measures such as HRQL are not surrogate measures for more traditional clinical measures such as physician assessments and anatomic, physiologic, and laboratory measures. Rather, HRQL measures provide additional information regarding efficacy, and sometimes tolerability, and ways in which a therapeutic intervention affects the lives of patients from the patient's perspective. HRQL assessments may deal with the impact of disease and therapy and their combined effect on certain aspects of patients' quality of life such as social, emotional, physical, and cognitive functioning that cannot be collected with more traditional clinical or tolerability measures. Wilson and Cleary propose a conceptual model that integrates biological and physiological (or clinical) factors with other measures of health such as symptoms, function, health perceptions and quality of life [1].

Results of an HRQL measure may or may not be consistent with other clinical measures depending on the disease state and the therapy. There was agreement that in the current hierarchy of interpretation, when there is discordance between a clinical measure and an HRQL measure, the HRQL measure has generally been regarded as secondary. It was not expected that a perfect relationship between a clinical outcome and a quality-of-life measure would exist, but it was generally expected that the HRQL and clinical measures should not be in conflict if a health-related quality-of-life claim were to be used. The issue of conflicting outcomes could exist even when a validated HRQL instrument is included in certain clinical outcomes (e.g., positive quality-of-life results with negative mortality). It was recognized that clinical measures often prevail over HRQL in terms of regulatory review, and that the field would need to mature before an HRQL measure would take precedence in these circumstances.

What standards should be used to decide whether to use a disease-specific and/or general health-related quality-of-life instrument in a clinical trial?

Regulatory guidance should not mandate that both general and disease-specific measures be used in the same clinical trial. Rather, the most appropriate HRQL measure(s) should be selected based on knowledge of available disease-specific measures, the degree of validation of both general and disease-specific HRQL measures in the population to be studied, and the feasibility of implementing the HRQL measure.

Health-related quality of life can be measured with validated general HRQL instruments in most diseases and with disease-specific instruments in other cases. However, in many clinical trials, a general HRQL measure is not specific enough to detect the impact of the intervention. The workshop participants advised against a mandate for the use of a general or a disease-specific instrument for all disease categories. The availability of validated instruments for the population under study should drive each case. In certain situations, it may be desirable to use both general and disease-specific instruments in the same study. Cases in which a general measure may be useful could include trials of chronic diseases where long-term treatment may have delayed benefits or side effects, where the evidence for the validity of a disease-specific instrument is less than optimal, or where it is known that a general measure is responsive and captures most of the domains affected by a disease or therapy. Which measure(s) to use should be determined on an individual trial level based on availability of validated instruments, knowledge of performance characteristics, the disease, population, duration of the trial, and feasibility of implementing the HRQL instrument.

Work Group 2: Study Design

Purpose: to discuss issues related to the design of clinical trials of pharmaceutical therapies aimed at evaluating health-related quality of life.

What elements should be considered when designing clinical trials to assess health-related quality of life?

HRQL trials are frequently designed and conducted without adequate preplanning and thought to achieving an HRQL claim. This often means that the HRQL hypotheses, an HRQL data analysis plan, the meaning of HRQL differences, and HRQL data-handling issues may not be specified in the protocol. In addition, the rationale for selection of the HRQL instrument(s) in the study is often not provided.

Work-group participants recommended that clinical trials of HRQL be incorporated into the clinical development plan. The work group believed that clinical research standards that are well documented in the scientific literature should be applied to HRQL trials. The rationale for instrument selection should be provided in the protocol along with a discussion of meaningful change to be detected by the instrument.

A fundamental issue regarding use of HRQL data in labeling and promotion of drug products is whether the trials that produce such data have special design considerations. Many traditional clinical trials attempt to generate data to answer the research question, Is treatment A better than placebo (or treatment B) as measured by clinical end points? The end points in such trials are typically mortality, disease markers, physician- or patient-assessed signs and symptoms, and/or laboratory or physiologic abnormalities. There is a tradition of conducting clinical trials to compare treatment interventions using these end points. The literature regarding HRQL study methodology for use in clinical trials is less comprehensive.

Frequently, HRQL measures are included in a clinical trial that has been designed to detect changes in a traditional clinical (anatomic or physiologic) end point. It is asserted that because clinical trials are expensive and time-consuming, this “piggybacking” of HRQL measures may be an efficient use of resources. Including HRQL measures in this way, however, may produce scientific difficulties with the use of the HRQL data. For example, HRQL hypotheses, an HRQL data-analysis plan, the meaning of HRQL differences, and HRQL data-handling issues may not be specified in the protocol. In addition, the inclusion criteria of the trial may be based on clinical end points only. Patients will thus be selected who will respond to the treatment as measured by the clinical end point but who may not have impairments in HRQL scores. The duration of the trial may be designed to increase the chances of detecting clinical changes. The time required to detect changes in HRQL may be different from the time required for clinical changes, or lack thereof, to be recognized or manifested. Moreover, the selection of instrument(s) to measure HRQL can often be based on availability only and may be unrelated to the intervention or the intended clinical outcomes of the study.

The work group agreed that HRQL should be included in the clinical development plan. This whould ensure that proper consideration is given to issues dealing with HRQL claims to be addressed in the trial. Furthermore, it was agreed that design standards for clinical trials should be applied to trials designed to support HRQL claims. For example, pre-specification of HRQL hypotheses, data analysis, data handling and meaningful changes of the HRQL measures should be included in the protocol. This could also be accomplished by designing a separate protocol to address HRQL or other secondary outcome measures.

HRQL trials that are intended for promotional or labeling use should be designed with the intended HRQL claim(s) in mind. In these trials, patients who have impairments in HRQL, as measured by the HRQL instrument used in the trial, should be selected. The duration of the study should be designed to allow the detection of HRQL changes by the instrument used. The sample size should be large enough to detect statistically significant differences between treatments as measured by the HRQL instrument. Finally, sound rationale for the selection and timing of administration of the HRQL instrument should be provided in the protocol. This rationale should include considerations such as the sensitivity of the instrument, disease state, population to be studied, goals of the study, treatment intervention, and time period for measuring the instrument.

Work Group 3: Study Analysis

Purpose: to discuss analytic issues related to health-related quality-of-life measures collected in clinical trials of pharmaceutical therapies.

In terms of labeling and advertising of products, the analysis of HRQL outcomes data should meet the same standards as other clinical outcomes considered for claims. The analysis of HRQL should be specified as part of the overall statistical analysis plan and be in accordance with sound statistical principles. The analysis plan should specify the objectives and hypotheses, sample size calculations, method of analysis including the handling of missing data, and adjustments for multiple testing. Simplicity of analyses should be the rule when appropriate.

The work group identified the following three key questions regarding analysis of HRQL data:

  • 1
    How should multiple comparisons of pre-specified health-related quality-of-life scale(s) or domain(s) be handled?
  • 2
    What are acceptable methods to deal with potential biases from missing data (unit nonresponse)?
  • 3
    What approaches should be used for the analysis of longitudinal HRQL data?
How should multiple comparisons of pre-specified health-related quality-of-life scales or domains be handled?

It was recommended that protocols should pre-specify use of one of the accepted methods for controlling Type 1 errors (e.g., O'Brien, step-down) by limiting the number of key end points for statistical testing and analysis or by using summary measures when appropriate.

Univariate tests of each health-related quality-of-life scale and each time point can seriously inflate the Type I (false-positive) error rate. One solution proposed for handling the issue of multiple comparisons is to specify a priori three or fewer key end points for statistical testing and analysis. The use of summary measures (global statistics) such as area under the curve and O'Brien's method [12] should be considered when appropriate. The analyses of the remaining scales or time points, which do not require a adjustment for multiplicity, can be designated as secondary and exploratory or can be presented descriptively and graphically. An alternative method to address the problem of multiple comparisons is to use one of the accepted multiple-comparison methods for controlling the experimental error rate. These include step-down procedures, global or composite assessment measures (e.g., O'Brien's methods), adjustment methods for Type 1 error (e.g., Holm's sequential rejective Bonferroni procedure), p-value adjustment methods (e.g., Westfall and Young's adjustment), and critical value adjustments (e.g., Tukey, Ciminera, and Heyse's adjustment). The relevance and use of multiple comparison methods in different scenarios are documented elsewhere [13–18].

What are acceptable methods to deal with potential biases from missing data (unit nonresponse)?

Missing data may involve data that are missing for certain patients or items that are missing within a domain or scale. For missing HRQL data from patients, several recommendations were given: Prospectively document reasons for missing data, collect covariates to help analyze the missing data pattern, conduct sensitivity analyses with different models or strategies, and pre-specify the use of existing methods such as complete case analysis, available case analysis, summary measures, maximum likelihood-based approaches, or imputation approaches. For missing data within a domain or scale, the data analysis plan should address how measures will be scored in the event of missing data.

The main cause for concern regarding missing data is the potential for bias. Bias affects the interpretation of data, and the results of the trial become questionable. In the case of unit nonresponse, where all items on a scale are missing for a subject, HRQL data should be analyzed in different ways (sensitivity analyses) and confident conclusions made only when consistency is achieved. Several methods exist for unit nonresponse when assessments are collected over time. One method [15] involves a description of individuals with nonmissing data and missing data at each time assessment, along with the reasons for missing data. It is advisable to collect additional covariates such as subjects' survival status, disease status, and number of completed assessments. This information can be combined with a graphical representation of subjects with different numbers of completed assessments to observe whether patterns of change over time are comparable.

Two methods have been routinely applied to analyze missing data: in a complete case analysis, only subjects with data at all relevant time assessments are analyzed; in an available case analysis, subjects with data on a certain variable at any assessment are the cohort analyzed. These two methods are best used when cases left out of the analysis can be assumed to be similar to those included; i.e., cases are missing completely at random. Another method to be considered uses a summary measure (e.g., the mean) that reduces the data over time of each subject into a single summary measure that reflects some important aspect of response. Maximum likelihood-based statistical approaches to the analysis of missing data, which use all the available data, should be considered as well. These approaches include analysis of repeated measures using mixed-model analysis of variance that incorporates noninformative missing data (i.e., missing at random) and allows for comparisons of treatment by time interactions. For missing data that is informative (i.e., not missing at random), selection models and pattern-mixture models are two recommended techniques when there is evidence that subjects have stopped filling out quality-of-life forms because of side effects, disease progression, or therapeutic effectiveness. Methods of imputation, which may be used for cross-sectional and time-series data, may also be valuable. These methods include mean imputation, last observation carried forward, regression imputation, hot deck imputation, and multiple imputation, of which the last two are likely to show the most promise. Missing data that are imputed would complete a data set(s) that in many cases could then be analyzed using standard statistical methods (e.g., analysis of covariance). Determination of what methods are appropriate or acceptable depends on distinguishing the pattern of missing data and identifying a mechanism to generate the missing data. The continued development of easy-to-use software was encouraged to implement these techniques and examine their suitability.

In situations where data on specific items within a domain or scale is missing because of patient nonresponse, the data analysis plan should a priori specify how the domains or scales will be scored if items are missing. As an example, if there are five items in a scale, the data analysis plan should indicate that a minimum of three item responses are required to calculate a scale score.

More detailed information about the suitability of different methods for handling missing data in different contexts is documented elsewhere [16].

What approaches should be used for the analysis of longitudinal HRQL data?

Appropriate methods for analysis of longitudinal data exist and should be described a priori.

The choice of an appropriate method for analysis of longitudinal HRQL data depends, in part, on the nature of the missing data and number of multiple end points and comparisons. Two basic approaches were considered for analysis of longitudinal data. The first was analysis of variance of repeated measures, which includes multiple univariate analyses and multivariate analyses. Multiple univariate analysis involve an analysis at each time point or specified time points and includes procedures such as a t-test, an analysis of variance, or a Wilcoxon rank-sum test. These analyses should be considered when the number of tests is small and interest centers on a particular time assessment. Multivariate analyses captures the richness of longitudinal data over repeated time assessments. It includes multivariate analysis of variance when the proportion of missing data is small. It also includes growth-curve modeling and mixed-model analysis of variance to be considered when data are missing at random and pattern-mixture models and selection models when data are not missing at random.

The second approach to analyzing longitudinal data is to reduce missing data and multiple testing [17] to calculate a single summary measure, such as the rate of change or area under the curve, for each subject. Often, these summary measures can then be analyzed by simple statistical techniques such as analysis of variance as though they were raw data. The statistical analysis plan, completed before the data analysis phase of a study, should include the analytic procedures to be used in handling the longitudinal structure of the HRQL data. Appropriate methods for longitudinal analysis exist and depend on the research questions of interest as well as on the influence of missing data and multiple testing.

In summary, the same standards of statistical analysis used for clinical measures also apply to HRQL measures with respect to the three key analytic issues: multiple comparisons, missing data, and longitudinal analysis. For each issue, the choice of an acceptable method depends on the situation. This section prioritized the analytic issues and highlighted potential solutions. Although it is beyond the scope of this summary report to detail the methods themselves, an experienced statistician can explain the advantages and disadvantages of each method in a given situation. The references cited in this section [12–18], along with their citations and other reports [19], can provide further guidance.

Work Group 4: Interpretation of Clinical Meaning in Health Related Quality of Life

Purpose: to discuss issues related to definition and calculation of clinically meaningful changes and differences in health-related quality-of-life measures within clinical trials of pharmaceutical therapies.

The group identified approximately 20 issues relevant to the topic of how to interpret HRQL scores. Two issues of primary importance were chosen and were discussed in depth: 1) whether the interpretation of HRQL should be different from any other subjective clinical measure; and 2) that the current research methods used to estimate minimal important difference have been shown to have methodological flaws [20].

Should the interpretation of HRQL be different from any other subjective clinical measure?

HRQL measures should be regarded as equivalent to any other clinical end point when determining what constitutes a clinically important change from baseline or difference between treatments. Statistical significance should be used until accepted methodological approaches for minimal important difference are developed and accepted.

The interpretation of HRQL has received attention as the use of HRQL measures in clinical trials has grown. Currently, minimal important difference is relevant for HRQL interpretations because the FDA would like to know how to judge changes due to drug treatment within groups and/or differences observed between groups. The work group acknowledged that it is important to be able to provide some assistance with the interpretation of HRQL measures. Consumers and health-care decision makers need to understand the meaning of change in clinical measures and in HRQL measures to make informed decisions about treatment. Furthermore, it was generally agreed that the interpretative methods for HRQL are not well defined although they are probably no worse off than many other subjective, or in some cases objective, clinical measures. Currently, what constitutes a clinically important change from either baseline or difference from placebo of many traditional clinical measures is not well understood except when related to a life-threatening condition. One definition might be that the clinically important change is that magnitude of change or difference from placebo required for physicians to prescribe a drug to patients. However, this magnitude may vary depending on the type of physician and on his or her interpretation of the change or difference as being important. HRQL measures are more challenging to interpret than some clinical measures because there is no standard metric for all HRQL instruments, unlike, for example, those that measure blood pressure and report results in mmHg or measure cholesterol and report results in mg/dl. Additionally, more than 40 years of epidemiological data demonstrate the impact of treating blood pressure and/or cholesterol on morbidity and mortality end points and assist in understanding the implications of a change of 3 mmHg in the diastolic blood pressure or of 5 mg/dl in free cholesterol for patients as a group. In contrast, measurement standards and metrics for HRQL measures vary by instrument, and the meaning of improving the score on a group mean scale by 0.5 or by 10% is not well understood.

In an attempt to address the issue of how to interpret a certain level of change in an HRQL instrument, some HRQL instrument developers have published the estimated minimal (clinical) important difference for their instrument. An example of this is the Juniper et al. [21,22] Asthma Quality of Life Questionnaire (anchoring to a global question of change for an uncontrolled asthma group that experienced change) or the Short Form 36 Health Survey (providing normative values for different populations and diseases) [23]. Others have adapted these methods and estimated minimal important difference using various methodological approaches for existing instruments. This is appealing from a pragmatic perspective because it allows easy determination of whether the change is meaningful. However, in some cases the FDA has rigidly adhered to these published cut points; e.g., if the estimated minimal important difference is 0.5, a change of 0.49 is not considered relevant. This rigidity may lead to inappropriate interpretation of the data and implies the cut points have no variance. In other cases, a cut point calculated as a change from baseline has been interpreted as meaning a difference between groups or a difference from placebo that is not supported by the methodology used to determine the cut point. The appropriateness of using specific cut points was questioned. One option would be to recognize that the cut points are associated with a certain degree of variance and that a certain degree of flexibility with respect to those cut points is justified. The minimal important difference may also differ depending on the starting point of a scale. An instrument for benign prostatic hypertrophy was used as an example wherein the minimal important difference depended on the patient's global classification of mild, moderate, or severe disease.

There is also a certain degree of confusion with respect to the minimal important difference for group differences versus individual differences. A difference of less than one point for a group may be considered unimportant, whereas the smallest amount of clinically meaningful change on a Likert-type scale for an individual would need to be at least one point. For diseases where the patient's perception of change in HRQL can be measured, each individual patient decides if a therapy is providing a clinically meaningful change within the context of efficacy, safety, and cost for him/herself by either continuing on therapy or discontinuing therapy.

The impact of inconsistent handling of HRQL data relative to other measures of clinical effectiveness creates confusion regarding where HRQL as a measure stands relative to other measures and how to interpret the data.

A change in HRQL can be a component of a risk-to-benefit decision that would also include safety, efficacy, and cost. If one measures efficacy alone, then any improvement is good. The work group recommended that the risk-to-benefit ratio for a drug be analyzed. HRQL results also need to be understood within the context of other clinical trial results. Other measures of efficacy and safety should be consistent with what is found with an HRQL instrument.

The work group's recommendation is to treat HRQL as any other clinical end point to determine what constitutes a clinically important change or difference between groups. That is, statistical significance should be used until good methodological approaches for minimal important difference are established. Another option would be to use confidence intervals as a means of expressing results.

The current research methods to estimate minimally important clinical difference have been demonstrated to have methodological flaws.

A multidisciplinary research group should be mandated to develop a research process to establish scientifically sound and widely accepted methods for determining what constitutes a minimal important difference for clinical measures.

The second issue, which is closely related to the first issue regarding HRQL interpretation, is that current research methods to estimate minimal important difference have been shown to suffer from certain methodological flaws. It is apparent that the current terminology requires consistency and understanding. The group felt that while the terms are used at times interchangeably, minimal important difference (MID), minimal perceptible difference (MPD), and minimal important change (MIC) all had different meanings. There is a lack of obvious consensus with respect to the methodological issues involved in determining clinically relevant change or difference. Recent approaches that use retrospective global assessments as an anchor are known to have significant limitations.

The impact of this current state of affairs is that decision makers are left with inaccurate estimates of MID, which makes interpreting HRQL results for drugs that fail to meet the MID but for which statistically significant clinical differences were found in comparison with placebo or an active comparator confusing. The desired state would be to achieve a sound, scientifically credible research process to estimate and interpret the MID.

The work group's recommendation is to establish a multidisciplinary research group to develop a research process and be responsible for defining and clarifying the terminology and establishing a research agenda. A list of issues dealing with the interpretation of clinical meaning was generated for this multidisciplinary research group to resolve, and includes the following:

  • 1
    Define the perspective for minimal important difference—patient, clinician, or third party.
  • 2
    Determine scientifically sound and widely accepted methods for determining MID.
  • 3
    Determine whether a gold standard should be used to estimate MID. It was concluded that use of a single-item global measure as a gold standard would be fraught with psychometric problems. Topics of discussion should include what alternative measures could be used as benchmarks in lieu of a global measure and defining an MID for other clinical parameters, including physiological measures and symptoms.
  • 4
    Determine whether MID should be measured from baseline to study end within a group or between two groups at study end.
  • 5
    Determine how the relevance of change in HRQL fits into the context of change in the other study measures. How consistent does HRQL need to be with the primary end point if it is a secondary end point, and vice versa?
  • 6
    Determine the clinical relevance of maintaining HRQL, or of the attenuation of decline in HRQL. How should lack of change be put in perspective? Is there a “negative” minimal important difference such that no change is actually useful?
  • 7
    Determine whether global recommendations be made regarding MID given disease differences.
  • 8
    Terminology suggestions: minimal perceptible difference = patient perspective versus clinically relevant change = physician perspective.

Given the lack of consensus regarding a research approach and the methodological flaws in what is currently accepted, the work group recommended that no absolute standard for measuring MID be imposed until additional research has been completed. One option discussed, but not adopted, was to use effect size until consistency can be reached. Another option discussed was to use case-by-case evaluation. Because this seems to be the current modus operandi, it did not seem appropriate to list it as a recommendation.

Work Group 5: Promotional Issues

Purpose: to discuss promotional use of health-related quality-of-life measures from clinical trials of pharmaceutical therapies.

This workshop was divided into two groups based on the interest manifested by workshop participants to discuss promotional issues related to HRQL. The two groups focused on four major issues:

  • 1
    Should HRQL results appear in approved labeling before being used in promotion, or is HRQL promotion possible with two well-controlled trials without mention in the label?
  • 2
    Should positive, neutral, and negative findings all be presented in advertising and promotion, or only positive ones?
  • 3
    What is the minimum disclosure regarding health-related quality-of-life claims in promotion?
  • 4
    Should declared versus undeclared hypotheses be disclosed in promotion?
Should HRQL results appear in approved labeling before being used in promotion, or is HRQL promotion possible with two well-controlled trials?

HRQL promotion should be based on substantial evidence, consistent with existing promotional guidelines.

Currently, inclusion in the label or “substantial evidence” enables promotion. Both work groups agreed that promotion should be allowed if HRQL is in the label or supported by substantial evidence, consistent with promotion of other efficacy parameters. One work group suggested that the FDA use an outside advisory panel to conduct the reviews.

Should positive, neutral, and negative findings be presented in advertising and promotion?

Full results should be presented with fair balance to positive, neutral, and negative findings and should be consistent with pre-specified data-analysis plans.

The second issue concerns the presentation of findings in promotion. The current state is not defined, thus interpretation can be problematic. The recommendation is to present the full range of results with fair balance and transparency. For example, if a trial found that three of five domains of a particular HRQL measure were positive and two were neutral, promotion should clearly state both the positive and neutral findings. However, a priori hypotheses and operational definitions should be provided to demonstrate that the findings on specific domains were expected.

What is the minimum disclosure concerning health-related quality-of-life claims in promotion?

The HRQL instrument used needs to be disclosed and supported scientifically. Currently, there is confusion in this area regarding the type of instrument, representation of domain, effect size, and manuscript citation. As a result there is a lack of understanding of what health-related quality of life really means. The work group supported the need for clarity and minimum standards and recommended that promotions report specifics concerning the measurement characteristics of the HRQL instrument.

Should declared versus undeclared hypotheses be disclosed in promotion?

Hypotheses should be stated in advance and linked to domains where feasible.

The last major issue concerns the use of hypotheses. The work group noted that in the current state the practice is variable. What is desired are clear hypotheses, consistent policy between divisions, and clarity of direction with regard to what the objectives of HRQL research. The group recommended that HRQL researchers follow a scientific approach by defining hypotheses that are linked to domains and instruments.

Conclusions

  1. Top of page
  2. Abstract
  3. Introduction
  4. Conclusions
  5. References

A workshop format was utilized to generate regulatory issues for health-related quality-of-life (HRQL) research. Each work group explored and listed numerous regulatory issues for HRQL research. It was beyond the scope of the work groups to attempt to address every issue generated in detail because of time limitations. Therefore, facilitators and participants were asked to prioritize and limit the work-group dialogue to a few key issues for more in-depth discussions and recommendations. The work-group discussion included the current state for each prioritized issue, the impact of the current state, the desired state, and recommendations to achieve that desired state.

The work groups recommended that HRQL measures and research be handled in a manner similar to more traditional clinical measures and research. In fact, it was noted that HRQL measures are often developed and validated with more rigor than many traditional clinical measures. However, it was also recognized that recognition of and experience with HRQL measures by physicians and researchers is more limited than for clinical measures. The work groups anticipate that as experience is gained with these measures and methods, greater acceptance by physicians, regulators, managed-care organizations, and other users of health-care information will be achieved. Guidance on the regulatory use of health-related quality-of-life results could be helpful to the industry in overcoming some of the current uncertainty concerning HRQL claims.

Consensus among members of the planning committee was reached before the workshop that HRQL measures were not developed for, nor should they be used for, detection of safety problems. Individual questions or overall scores may be sensitive to and informative about safety issues; however, they reflect events and effects without attribution and as such should not be required to be reviewed or individually reported for “signals” of safety problems. It was agreed that use of HRQL measures in clinical research should require the inclusion of all customary protocol-driven protections and monitoring of safety as is done with any other effectiveness measure. Furthermore, excluding questions about specific adverse events such as the Sickness Impact Profile, which asks about suicidal ideation, would invalidate the questionnaire, or at a minimum, the domain. Therefore, in most cases the complete validated measure should be used.

Why is it important to provide guidance on how health-related quality-of-life research is conducted? Industry is interested in providing useful information about the actual impact of pharmaceutical treatments on patients and their lives, both in drug labeling and in promotion. Health-related quality-of-life measures may provide more relevant patient-level information to third-party payers, physicians, and patients than traditional clinically oriented measures that primarily address anatomic or physiologic aspects of disease and treatment. Currently, inclusion of HRQL information in drug labels and in promotion is limited because of concerns about how the research has been conducted. The work groups clearly advised that any recommendations regarding HRQL measures and research should provide a guide rather than a prescription on how to conduct research or which HRQL instruments are acceptable, and they recommended that HRQL measures be treated in the same way as any other clinical end point. Additionally, the work groups recognized that research in HRQL methods is ongoing and that any guidance should be flexible enough to allow for changes in the field. Although members of the FDA participated in the organization of the meeting and in the work groups, it is important that any recommendations that resulted from this workshop not be interpreted as FDA policy, but rather, as a consensus on each issue reached by members of the work groups.

References

  1. Top of page
  2. Abstract
  3. Introduction
  4. Conclusions
  5. References
  • 1
    Wilson IB & Cleary PD. Linking clinical variables with health-related quality of life: a conceptual model of patient outcomes. JAMA 1995;273: 5965.
  • 2
    Burke L. Development of health-related quality of life and other guidance for labeling and promotion of FDA-regulated medical products. Value Health 2001;4: 511.
  • 3
    Testa MA & Simonson DC. Assessment of quality of life outcomes. New Engl J Med 1996;334: 83540.
  • 4
    Ware JE, Brook RH, Davies AR, et al. Choosing measures of health status for individuals in general population. Am J Public Health 1981;71: 6205.
  • 5
    Deyo RA, Diehr P, Patrick DL. Reproducibility and responsiveness of health status measures. Control Clin Trials1991;12: 142S58S.
  • 6
    Gill TM & Feinstein AR. A critical appraisal of the quality of quality-of-life measurements. JAMA 1994;272: 61926.
  • 7
    Guyatt GH, Veldhuyzen Van Zanten SJO, Feeny DH, et al. Measuring quality of life in clinical trials: a taxonomy and review. CMAJ 1989;140: 14418.
  • 8
    Guyatt GH, Feeny DH, Patrick DL. Measuring health-related quality of life. Ann Intern Med 1993;118: 6229.
  • 9
    Guyatt GH, Kirshner B, Jaeschke R. Measuring health status: What are the necessary measurement properties? J Clin Epidemiol 1992;45: 13415.
  • 10
    Williams JI & Naylor CD. Dissent. How should health status measures be assessed? Cautionary notes on procrustean frameworks. J Clin Epidemiol 1992;45: 134751.
  • 11
    Guillemin F, Baombardier C, Beaton D. Cross cultural adaptation of health-related quality of life measures: literature review and proposed guidelines. J Clin Epidemiol 1993;46: 141732.
  • 12
    O'Brien PC. Procedures for comparing samples with multiple endpoints. Biometrics 1984;40: 107987.
  • 13
    Zhang J, Quan H, Ng J, et al. Some statistical methods for multiple endpoints in clinical trials. Control Clin Trials 1997;18: 20421.
  • 14
    Hochberg Y & Tamhane AC. Multiple Comparison Procedures. New York: John Wiley and Sons, 1987.
  • 15
    Hsu JC. Multiple Comparisons. Theory and Methods. New York: Chapman & Hall, 1996.
  • 16
    Fairclough DL & Gelber RD. Quality of life: statistical issues and analysis. In: SpilkerB, ed. Quality of Life and Pharmacoeconomics in Clinical Trials (2nd ed.). Philadelphia: Lippincott-Raven Publishers, 1996.
  • 17
    Curran D, Fayers PM, Molenberghs G, et al. Analysis of incomplete quality of life data in clinical trials. In: StaquetMJ, HaysRD, FayersPM, eds. Quality of Life Assessment in Clinical Trials: Methods and Practice. New York: Oxford University Press, 1998.
  • 18
    Fairclough DL. Methods of analysis for longitudinal studies of health-related quality of life. In: StaquetMJ, HaysRD, FayerPM, eds. Quality of Life Assessment in Clinical Trials: Methods and Practice. New York: Oxford University Press, 1998.
  • 19
    Leidy NK, Revicki DA, Geneste B. Recommendations for Evaluating the Validity of Quality of Life Claims for Labeling and Promotion. Value Health 1999;2: 11327.
  • 20
    Norman GR, Stratford P, Regehr G. Methodological problems in the retrospective computation of responsiveness to change: the lesson of Cronbach. J Clin Epidemiol 1997;50: 86979.
  • 21
    Juniper EF, Guyatt GH, Willan A, et al. Determining a minimal important change in a disease-specific quality of life questionnaire. J Clin Epidemiol 1994;47: 817.
  • 22
    Guyatt GH, Kirshner B, Jaeschke R. Response. A methodologic framework for health status measures: Clarity or oversimplification? J Clin Epidemiol 1992;45: 13535.
  • 23
    McHorney CA, Kosinski M, Ware JE Jr. Comparisons of the costs and quality of norms for the SF-36. Med Care 1994;32: 55167.