Establishing the Cost-Effectiveness of New Pharmaceuticals under Conditions of Uncertainty—When Is There Sufficient Evidence?


Mark Sculpher, Centre for Health Economics, University of York, Heslington, York, YO10 5AY, UK. E-mail:


Decisions about which health-care interventions represent adequate value to collectively funded health-care systems are as widespread as they are unavoidable. In the case of new pharmaceuticals, many countries now require formal cost-effectiveness analysis to inform this decision-making process. This requires evidence on parameters associated with health-related utilities, treatment effects, resource use, and costs, for which data from available regulatory trials are invariably absent or highly uncertain. This uncertainty results from a number of factors including the predominance of intermediate end points in the clinical evidence-base and the limited period of follow-up of patients in clinical studies. Despite these imperfections in the evidence base, decisions about whether new pharmaceuticals are sufficiently cost-effective for reimbursement cannot be side-stepped. Data limitations do, however, require the use of rigorous analytical methods to support decision making. Probabilistic decision models and value of information analysis offer a means of structuring decision problems, synthesizing all available data, characterizing the uncertainty in the decision, quantifying the cost of uncertainty, and establishing the expected value of perfect information. This analytical framework is important because it addresses two fundamental questions about new pharmaceuticals. First, is the product expected to be cost-effective on the basis of existing evidence? Second, is additional research concerning the product itself cost-effective? In addressing these questions, the analytical framework can establish when sufficient evidence exists to sustain a claim for a new pharmaceutical to be cost-effective.


It is now widely recognized among decision-makers that value for money represents a key criterion in deciding which health-care interventions should be made available in collectively funded health-care systems. This applies as much to national systems, such as those in western Europe, as to the more fragmented arrangements in the United States. A clear manifestation of this trend relates to the policy which now exists in many jurisdictions to establish which new pharmaceutical products represent sufficient value for money to justify reimbursement from those systems’ limited budgets. A growing number of jurisdictions formally use cost-effectiveness evidence as part of their reimbursement decision making [1]. Many more systems consider economic evidence in making decisions and have described their preferred forms of analysis, including managed care organizations in the United States. [2].

The use of economic analysis to support decision making poses some very fundamental questions about the most appropriate methods with which to assess the claims of cost-effectiveness for a new pharmaceutical. One of these issues relates to how decisions should be taken under conditions of uncertainty regarding the evidence-base for a particular product. A cost-effectiveness analysis needs to incorporate evidence on numerous parameters relating to treatment effects, health-related preferences (utilities), resource use, and costs. Nevertheless, ideal data will never exist for all of these parameters. As a result of the way evidence accumulates for pharmaceuticals, major gaps will inevitably exist in a product's evidence-base. This is because the primary objective of the research and development undertaken by pharmaceutical manufacturers is to run trials to generate appropriate efficacy and safety data to obtain a product license. Nevertheless, most reimbursement authorities will require evidence relating to the cost-effectiveness of the product before its widespread use in routine practice.

The evidence gaps which result from this process include limited, or highly imprecise, estimates of nonclinical parameters such as resource use and health utilities; Phase 3 trials being undertaken in a range of countries, raising questions regarding how transferable the data are to particular jurisdictions; and treatment effect estimates not relating to comparisons with all relevant existing treatments. Arguably the most widespread data limitations relate to the focus of Phase 3 trials on intermediate estimates of effect rather than ultimate measures of health gain, and the short duration of observation of treatment effect and outcomes that such trials typically afford.

In the face of such uncertainties in the evidence-base, one option is for decisions about reimbursement to be delayed until the data gaps are filled. Of course, this amounts to a de facto decision—namely, that the new product is not as cost-effective as one or more existing treatments, and should not be reimbursed. There are several arguments against this refusal to consider the cost-effectiveness of new products until “ideal” evidence is available. The first is that such evidence will be difficult to generate unless some form of reimbursement is agreed, even if limited to the context of research. This could be addressed by permitting some form of conditional reimbursement for a limited period until the data are generated. Nevertheless, is order for this to be successful, a second problem would have to be addressed. Namely, the nature and extent of the additional evidence that is needed will depend on how limitations in the evidence base translate into uncertainty in the cost-effectiveness of a treatment—for example, the imprecision of a resource use estimate may or may not lead to uncertainty over the appropriate reimbursement decision. A third, and related, problem is that the process of generating additional evidence draws on a pool of limited resources, and it may be inefficient to require that they are used for research that would not be expected to lead to a change in decision compared with one based on existing data. Finally, the concept of an “ideal” evidence base is itself a chimera: whatever the value of resource devoted to research, it will never remove all uncertainty in the reimbursement decision.

If decisions about the cost-effectiveness of new products are inevitably taken under conditions of uncertainty, there needs to be a clear analytical framework within which to address two related questions. First, does existing imperfect evidence suggest that an intervention is, in terms of mean costs and effects, cost-effective? Second, given the uncertainty associated with answering the first question, is additional research, to allow the decision to be revisited in the future, itself efficient? This article argues that this analytical framework should be based around probabilistic decision modeling and value of information analysis. The second section of the article considers what features an analytical framework should have to address the first question above—the adoption decision based on current evidence. Using a case study from the technology appraisal program of the National Institute for Health and Clinical Excellence (NICE) in the UK, the third section describes the elements of the analytical framework necessary to address the second question—the cost-effectiveness of additional research. The  article  then  considers  in  more  detail  the  types  of uncertainty that need to be adequately handled using these methods. The next section identifies some of the policy implications of using this framework, and offers some conclusions.

Analytical Framework for the Adoption Decision

Given that the evidence base associated with new pharmaceuticals will always have weakness and limitations, the adoption decision requires an analytical framework  which  is  explicit  in  its  handling  of uncertainty [3]. This framework will inevitably have  to  pull  data  together  from  a  range  of  sources as  no  single  study  is  likely  to  provide  estimates  of  all relevant parameters. Decision models provide a structure for evidence synthesis within which numerous sources of evidence can be brought together to estimate the differential costs (the value of resources consumed) and effects (defined by the system's objectives, e.g., health gain) of the options under comparison. Probabilistic modeling allows the uncertainty in individual parameters, estimated from available evidence, to be fully characterized as random variables which are propagated through the model using second order Monte Carlo simulation [4]. Expected cost-effectiveness can be taken from this distribution. That is, a measure of cost-effectiveness, such as an incremental cost-effectiveness ratio (ICER), based on mean differential costs and benefits between the options under comparison. It should be emphasized that expected cost-effectiveness is not derived by calculating a distribution of ICERs based on the Monte Carlo simulation [5]. Although estimated with uncertainty, mean cost-effectiveness represents the best estimate of cost-effectiveness.

Given a decision-maker's objective of maximizing health gain from available resources, the decision about reimbursement based on existing evidence should then be based on how the expected ICER compares with the decision maker's threshold willingness to pay for additional health benefits which, in resource constrained systems, should reflect the costs and benefits of interventions which are displaced by the new treatment (i.e., the shadow price of the budget constraint).

This analytical framework also needs to incorporate some  other  key  features.  The  first  is  the  need to  compare  the  new  intervention  to  the  full  range of alternative treatments which currently constitute routine clinical practice. This may include nonpharmaceutical interventions, watchful waiting and palliative/supportive care. This contrasts with the practice of simply adopting the comparator which was used in the regulatory trials which is unlikely to represent the full picture of current practice in the jurisdiction of interest. The second feature is the need to take a consistent perspective on costs. There are strong arguments in favor of a societal cost perspective for cost-effectiveness analysis [6], although many reimbursement authorities focus on a health system perspective given that they have no responsibility for altering the total “pot” of resources available for health care [1].

A third important characteristic of the analytical framework used to inform the adoption decision is that there needs to be a clear view of the system's maximand—that is, the measure which the system is trying to maximize from available resources. Although there is on-going debate in the theoretical literature about the most appropriate objective for health-care systems [7], the maximization of health gain has been explicitly adopted by some systems to guide their decisions, including NICE in the UK [8]. The way applied economic evaluation has developed, with its focus on cost-effectiveness analysis, is consistent with health gain being the maximand. A fourth, and related issue, is how health gain should be measured and valued within the analytical framework. A key criterion is the need for reimbursement decisions to use measures which are consistent over time. Given that decisions are taken across a range of disease areas, the measure should be generic. For use in cost-effectiveness analysis, all relevant dimensions of health-related quality of life (HRQL) and mortality should be expressed on a single scale which will require the use of multiattribute valuation methods. The quality-adjusted life-year (QALY) is the most frequently used measure of health gain used in cost-effectiveness analysis. Although it satisfies the generic and single index criteria, the need for consistency across decisions would also require the use of a single approach to describing health states which is sufficiently sensitive to the characteristics of a disease and intervention, and a choice of whose preferences are to be used to value health states. Although most methods guidelines from reimbursement agencies support the use of QALYs, at least as one measure of outcome, with the exception of NICE [9] few have been prescriptive about the descriptive system, the framed perspective, and the source of preferences [1].

Analytical Framework for Decisions about Further Research

Bayesian Decision Theory

The expected cost-effectiveness of an intervention indicates whether reimbursement is warranted given existing evidence. It does not, however, show whether sufficient evidence exists. In other words, whether or not a reimbursement agency agrees to reimburse an intervention, it may ask for additional research in order for the adoption decision to be re-examined in the future. Although the uncertainty in cost-effectiveness should play no direct role in the adoption decision, it is an essential part of establishing whether additional research is required. Therefore, basing the adoption decision on expected cost-effectiveness does not mean these decisions can simply be based on little or poor quality evidence, as long as the decision to conduct further research to support adoption, or rejection, is made simultaneously [10].

For a regulatory agency to establish whether there is sufficient evidence to support the reimbursement of a particular pharmaceutical, a measure of the societal value of particular research is required. An appropriate methodological framework for this would consider the uncertainty surrounding reimbursement in terms of the likelihood of making a “wrong” decision if the technology is adopted based on current evidence. It would also view the value of research as the extent to which further information will reduce this decision uncertainty. The value of the additional information generated by research would be expressed in a way which is consistent with the objectives and the resource constraints of health-care provision. Bayesian decision theory and value of information analysis provides such an analytic framework. These methods have firm foundation in statistical decision theory [11,12], and have been successfully used in other areas of research such as engineering and environmental risk analysis [13–15]. More recently, these methods have been extended to setting priorities in the evaluation of health-care technologies [16,17]. The methods have been applied to different health technologies [18,19] including a series of case studies taken from guidance issued by NICE [20]. They have also been assessed as an input to publicly funded research priority-setting [21].

An Illustrative Example

Background.  The framework outlined above is illustrated using a stylized example which is based on  the  NICE  appraisal  of  orlistat  in  the  treatment  of obesity undertaken in 2001 [22]. The guidance imposed restrictions and conditions on continued use in terms of minimum body mass index and weight loss using dietary control. In addition, it required that the patients should only continue with therapy beyond 3 months if they lose at least 5% of body weight, and only continue beyond 6 months if they lose at least 10% of body weight. The guidance suggests that patients are not expected to be on therapy beyond 12 months.

The appraisal of orlistat and its guidance was based on the independent assessment report [23] which identified 14 randomized controlled trials (RCTs) and two economic models. None of the RCTs measured changes in HRQL or utilities, resource use, regain in weight after cessation of treatment or any longer-term consequences for mortality of morbidity. Clearly, RCT evidence alone was not sufficient to establish the cost-effectiveness of this technology, and there were a number of extrapolations and generalizations which needed to be made. These included establishing how many patients continue with treatment, which effects both outcomes to patients and resource use, how changes in body weight translate into changes in HRQL and some assessment of the long-term impact on weight, HRQL and mortality and morbidity after 12 months of treatment.

Decision model.  The structure of this decision problem can be represented as a simple decision tree (Fig. 1). The structure compares orlistat with dietary management and reflects the nature of the NICE guidance outlined above. It involves two chance nodes: the probability of greater than 5% weight loss at 3 months and, if this is achieved, the probability of greater than 10% weight loss at 6 months. The key end points are weight loss compared with dietary control, changes in HRQL given weight loss and the additional cost of 3, 6, and 12 months of treatment. In this simple case study, long-term mortality and morbidity, and the possibility of sustained weight loss, have not been modeled. This is in order, first, to simplify the example and, second, because no evidence seems to exist to support these effects which were not considered credible by NICE. The RCT evidence is a crucial source for parameter estimates for the model, particularly the magnitude of treatment effects (weight loss and probability of weight loss at 3 and 6 months). Other sources of data, however, have been taken from nontrial sources detailed in the assessment report [23]. This applies, for example, to changes in HRQL and the resource implications of weight loss.

Figure 1.

Structure of the orlistat model used in the case study.

All decisions about the cost-effectiveness of interventions are based on uncertain information. The extent of the evidence available, for each of the inputs, can be reflected in probability distributions assigned to these estimates, where less information and more uncertainty about an input will be reflected in greater variance from the distribution assigned. The quality and exchangeability or relevance of the evidence available may also be represented by linking the uncertain estimate to model inputs through additional uncertain parameters which can represent potential bias or exchangeability. These may be based on evidence of the potential bias of alternative designs or on expert judgment. Without access to patient-level data, these distributions are assigned based on secondary sources (e.g., published literature reviewed in the assessment report [23]) and some judgment about which type of distribution would be appropriate [24]. The parameter estimates, their distributions and data sources are described in Table 1. In general, the quality of this evidence is very low, and the shape of the distributions assigned to these parameters attempts to represent the substantial uncertainty surrounding these estimates. For an overview of the process of selecting and fitting probability distributions in decision models, see Briggs et al. (2002) [4].

Table 1.  Data inputs for the orlistat case study
Input parameterDistribution used to characterize parameter uncertaintySource
  1. CI, confidence interval.

Probability of 5% weight loss at 3 monthsBeta (286, 214)Meta-analysis of trials with 3-month follow-up (n = 500) [23]
Probability of 10% weight loss at 6 monthsBeta (170, 230)Meta-analysis of trials with 6-month follow-up (n = 500) [23]
Weight loss at 12 monthsNormal (95% CI 2.19–3.69)Meta-analysis of trials with 12-month follow-up (n = 548) [23]
Health value (utility) gain per 10 kg weight lossLog-normal (95% CI 0.0767–0.26)[23]
Total costs per annumLog-normal (95% CI £554–£887)[23]

Results of the case study—the adoption decision.  The model indicates that orlistat is more effective but more costly than dietary control alone, with an incremental cost per additional QALY of £21,400. Hence if the decision-maker's threshold willing to pay is more than £21,400 per QALY, orlistat should be adopted given existing evidence. Nevertheless, there is uncertainty in cost-effectiveness, and this is shown in the cost-effectiveness acceptability curve (CEAC) in Figure 2. Detailed descriptions of the derivation and interpretation of CEACs are available elsewhere [25–28] In brief, it shows the proportion of the simulations in which (i.e., the probability that) orlistat is considered cost-effective for a given maximum willingness to pay on the part of the decision maker. That is, the proportion of simulations in which the orlistat has an ICER which is less then the maximum willingness to pay. One minus this probability reflects the decision uncertainty around adoption. That is, the probability that, in adopting orlistat on current evidence, a “wrong” decision would have been made. The figure shows that, unless the cost-effectiveness threshold is very high, there will be substantial decision uncertainty surrounding this decision to adopt. For example, at a threshold willingness to pay of £30,000, the probability that orlistat is cost-effective is 0.758, giving an error probability of 0.242. Although this probability is strictly Bayesian, it is possible to interpret this in terms of a conventional (“frequentist”) P value on a one-tailed test on a null hypothesis of no difference in expected cost-effectiveness [26]. As such, this probability is much greater than the traditional rules of inference and statistical significance of 0.05 or 0.1.

Figure 2.

Cost-effectiveness acceptability curve for orlistat in the case study.

Results of the case study—the decision about further research.  How can this error probability be interpreted? If the wrong decision about adoption is made, there will be costs in terms of health benefits and resources forgone. Therefore, the expected cost of uncertainty is determined jointly by the probability that a decision based on existing information will be wrong and the consequences of a wrong decision. The expected costs of uncertainty can be interpreted as the expected value of perfect information (EVPI) because perfect information can eliminate the possibility of making the wrong decision. This is also the maximum that the health-care system should be willing to pay for additional evidence to inform this decision in the future, and it places an upper bound on the value of conducting further research [10,16].

More formally, EVPI is simply the difference between the payoff with perfect and current information. The payoff can be seen in terms of expected net benefit—for example, expected net monetary benefit which, for a given option, is: (expected QALYs × λ) − expected costs, where λ is the decision maker's threshold willingness to pay [5]. More specifically, if there are two alternative interventions (j = 1, 2), with unknown parameters θ then, given the existing evidence, the optimal decision is the intervention that generates the maximum expected net benefit [maxj Eθ NB(j, θ)]. This is the maximum net benefits over all the iterations from the Monte Carlo simulation, because each iteration represents a possible future realization of the existing uncertainty (a possible value of θ). With perfect information, we would know how the uncertainties would resolve, which value θ will take, before making a decision and could select the intervention that maximizes the net benefit given a particular value of θ[maxj NB(j, θ)]. Nevertheless, the true values of θ are unknown; we don’t know which value θ will take. Therefore, the expected value of a decision taken with perfect information is found by averaging these maximum net benefits over the distribution of θ[Eθ maxj NB(j, θ)]. The EVPI for an individual patient is simply the difference between the expected value of the decision made with perfect information about the uncertain parameters θ, and the decision made on the basis of existing evidence:

EVPI = Eθ maxj NB(j, θ) − maxj Eθ B(j, θ)(1)

This provides the EVPI surrounding the decision problem for each time this decision is made and for an individual patient or individual episode. Nevertheless, once information is generated to inform the decision for an individual patient or a patient episode, then it is also available to inform the management of all other current and future patients. If this “population” EVPI exceeds the expected costs of additional research, then it is potentially cost-effective to conduct further research, current evidence is not sufficient and additional research should be undertaken.

Figure 3 illustrates the population EVPI for the orlistat guidance. At a cost-effectiveness threshold of £30,000, the population EVPI is just more than £1.5 m. This may well exceed the costs of further investigation and suggests that further research is needed to support the adoption of orlistat. When the threshold for cost-effectiveness—maximum value of health outcome—is low (much less than £21,400), the technology is not expected to be cost-effective and additional information is unlikely to change that decision (the EVPI is low). Similarly, when the threshold willingness to pay is higher (i.e., much higher than £21,400), the ICER is much lower than the threshold, oralist would be considered cost-effective in terms of expected costs and QALYs and this decision is unlikely to be changed by further research. In this case the population EVPI reaches a maximum when the threshold is equal to the expected ICER; that is, where there is most uncertainty about whether to adopt or to reject orlistat based on existing evidence. Nevertheless, EVPI does not always reach a maximum at this point. This is because, although the probability of error falls as the threshold increases, the value of changing the decision (the cost of error) also increases, so the maximum point is determined by the balance of these two factors.

Figure 3.

The population EVPI for orlistat in the case study.

The value of reducing the uncertainty surrounding individual input parameters in the decision model can also be established. This type of analysis can be used to focus further research by identifying those inputs for which more precise estimates would be most valuable. In some circumstances, this will indicate which end points should be included in further experimental research; in others, it may focus research on inputs which may not necessarily require experimental design and can be provided relatively quickly.

This analysis of the value of information associated with each of the model inputs (parameter EVPI) is, in principle, conducted in a very similar way to the EVPI for the decision as a whole [29,30]. The EVPI for a parameter or group of parameters (ϕ) is again simply the difference between the expected net benefit with perfect information about the parameter(s) ϕ and the expected value with current information. The expected value with current information is the same as before [maxj Eθ NB(j, θ)]. With perfect information, the decision maker would know how the uncertainties about ϕ would resolve (which value ϕ will take) before making a decision and could select the intervention that maximizes expected net benefit, which must now be calculated over all the other remaining uncertain parameters (ψ) the model [maxj Eψ|ϕ NB(j, ϕ, ψ)]. As before, the true value of ϕ is unknown so these maximum expected net benefits must be averaged over the distribution of ϕ[Eϕ maxj Eψ|ϕ NB(j, ϕ, ψ)]. The EVPI for ϕ is the difference between the expected net benefit with perfect information about ϕ and the expected value with current information:

EVPI for ϕ= Eϕ maxj Eψ|ϕ NB(j, ϕ, ψ) − maxj Eθ NB(j, θ)(2)

This does require substantial additional computation for models where the relationship between the model's inputs and expected cost and outcomes is not linear, for example, in Markov models [19,30]. It should also be noted that, in general, the EVPIs for individual model inputs will not sum to the EVPI for the decision problem. This is because both decision and parameter EVPI depend entirely on whether additional research would be predicted to change the decision about the preferred option. In the simulation process undertaken to estimate this, if a value of a specific parameter is drawn some distance from its mean, it may be insufficient in itself to change the decision. Nevertheless, when that value is drawn together with similar extreme values for other parameters, this combination may well be enough to change the decision. So there is no simple relationship between individual parameter and decision EVPI.

Figure 4 illustrates the EVPIs for individual parameters associated with the overall population EVPI at a cost-effectiveness threshold of £21,400. In this example, it is the EVPI associated with the changes in HRQL, due to modification in body weight, which is highest. This should not be surprising as there was limited evidence to link changes in weight to HRQL, but it is this relationship which is crucial to establishing the cost-effectiveness of orlistat. The EVPIs associated with resource use parameters are also relatively high for the same reasons. Although the EVPI analysis in Figure 3 suggests that further research may be required to support the adoption of orlistat, the analysis of the EVPIs for individual parameters indicates that this may not need to have an experimental design. This is because more precise estimates of HRQL changes and elements of resource use can be established without an additional clinical trial and could be based on an observational survey.

Figure 4.

EVPI for individual parameters in the orlistat model in the case study.

There remains, however, substantial value of information associated with the expected loss in body weight at 12 months, and more precise estimates of this input would require experimental design. Its also interesting to note that the probability of remaining on treatment is not associated with the highest values of information. This is partly because substantial evidence from the previous trials exists already. The other reason is that, when patients come off treatment, the potential gains in HRQL are lost, but these are offset by reduction in the intervention costs. It should be noted that the relative value of information associated with model inputs will also change with the cost-effectiveness threshold. Specifically, those inputs which are more closely related to differences in expected costs will be relatively more important at low threshold values, and those more closely related to differences in outcomes will be more important at high values.

The case study highlights the fact that economic considerations are central, not only to establishing how much evidence is required to support the adoption of a technology, but also what type of evidence will be required and the appropriate research design. Given an objective to maximize health gain from limited resources, this framework demonstrates that, for a particular technology, the amount and type of evidence required depends on decision uncertainty and economic decision rules, rather than on rules of statistical significance applied to the trial end point. It is also clear that different amounts and types of evidence will be required for different types of technology relevant to different patient populations.

Types of Uncertainty in Cost-Effectiveness Decision Models

Reflecting Precision and Quality of Evidence in Uncertainty

The orlistat case study highlights the importance of identifying, quantifying, and incorporating parameter uncertainty in decision models. In addition to the precision of the data, the quality of the evidence available on a particular part of the model may be limited. In the case-study, this was seen in the link between weight loss and HRQL. In some situations, there will be a complete absence of formal evidence, and informed judgments may be required. In such situations, the weakness of the evidence can be reflected in a model through additional uncertainty; that is, by adding variance to the distribution around the parameter of interest.

Parallel concerns about precision and quality in evidence will be part of all reimbursement decisions. In the context of the NICE appraisal program, for example, there is a need to compare a number of new interventions with each other and with several existing treatments, but there is inevitably an absence of head-to-head RCTs to compare all options. The recent appraisal of new interventions for epilepsy in adults was an example with this characteristic [31]. In such situations, the use of indirect methods to estimate treatment effects is necessary. These can involve relative treatment effects being estimated for each intervention through the use of a common comparator [32]. More general statistical models to combine mixed comparison evidence to provide a consistent set of treatment effect estimates have been developed [33–35]. Again, any additional uncertainty associated with the use of indirect evidence would be factored into a model through additional variance in the parameter distributions.

Modeling Beyond Trial Evidence

Arguably the biggest challenge that reimbursement agencies have to face in terms of the uncertainty surrounding existing evidence relates to costs and outcomes which have not been observed directly in trials. There are two frequent manifestations of this: linking intermediate outcomes to ultimate measures of health gain, and extrapolating costs and benefits over a longer-term time horizon.

Linking changes in intermediate measures to health gain.  The measure of treatment effect which is likely to be the main driver of the cost-effectiveness of an intervention will ideally be taken from one or more RCTs. Nevertheless, such studies are often designed to show differences in end points other than measures of health gain such as survival or HRQL. Indeed, the use of intermediate end points is common with most chronic diseases, where these end points are either surrogate markers for outcomes or intermediate measures of severity. Examples of the former include blood cholesterol for cholesterol-lowering drugs and CD4 count or viral load in HIV treatments. An example of the latter is the Kurtzke Expanded Disability Status Scale in multiple sclerosis. The reasons for the trial focus on these intermediate end points are clear: trials designed to show differences in ultimate health gain would have to be very large and/or continued for many years, and licensing authorities have been satisfied with intermediate end points for many products. Nevertheless, as described above, there is a need for reimbursement authorities to understand the impact of interventions in terms of generic measures of health such as QALYs. The need to develop a link between intermediate end points and health gain represents an important role for decision models.

The use of intermediate end points in clinical trials is usually acceptable to licensing authorities because evidence exists that some degree of correlation exists between the intermediate measure and ultimate health. In the context of cholesterol-lowering drugs, for example, a number of epidemiological studies have shown that serum and low-density lipoprotein (LDL) are risk factors for coronary heart disease (CHD) [36,37]. To establish the cost-effectiveness of cholesterol-lowering therapies in terms of changes in (quality-adjusted) survival duration, most published studies have had to rely on risk equations to make the link between changes in blood cholesterol and health events [38].

A small number of RCTs of cholesterol-lowering drugs have been undertaken which provide direct evidence on the implications for therapies for health outcomes [39–41]. These studies have provided a platform for cost-effectiveness analysis using patient-level trial data [42,43], but still require modeling to extrapolate over time. They have also generated more evidence on the general relationship between changes in the intermediate end point of blood cholesterol and health effects. For example, analyzing the data from the 4S trial using a Cox proportional hazards model, Pedersen et al. estimated that each additional 1% reduction in LDL would generate a 1.7% reduction in the risk of major coronary events [44].

Extrapolating future costs and benefits.  Another feature of many trials is their short-term follow-up. This is particularly true of Phase III regulatory trials where there is a strong need to satisfy the licensing authorities and hence to get the product to market as swiftly as possible. For those interventions between which costs and benefits are likely to differ over an extensive time period, there will inevitably be a mismatch between trial follow-up and the appropriate time horizon of the cost-effectiveness analysis. This will require the decision model to estimate the costs and health outcomes beyond the trial, together with the uncertainty associated with the extrapolation. An example of the need for extrapolation is for interventions which aim to reduce mortality. To estimate differences in expected survival duration (i.e., life-years gained), the area between the full survival curves needs to be calculated. Unless the trial has followed-up patients until all have died—which is only likely in diseases with poor prognoses like advanced cancer—extrapolating survival curves beyond the trial will be necessary. This was the case, for example, with the NICE appraisal of implantable cardioverter defibrillators for arrhythmias, where trial evidence existed on mortality over a 3- to 4-year follow-up period, and assumptions about future mortality were needed as a basis for extrapolation [45].

A key issue with extrapolation relates to the duration of the treatment effect—that is, the extent to which the additional effectiveness of an intervention, relative to its comparator(s), is maintained after the period of observation in the trial. The need to estimate this future effect is important both when treatment has been discontinued before the end of trial follow-up, and when patients are still undergoing treatment at that point. In an early cost-effectiveness analysis of zidovudine in HIV, Schulman et al. estimated the cost-effectiveness of therapy based on extrapolation beyond the trial period under a number of assumptions about the post-trial survival curves [46]. The most “optimistic” assumption, with respect to the benefits of the therapy, was that the survival curves continued to diverge after the trial indicating a continuing additional treatment effect over time. The “pessimistic” assumption was that the curves gradually came back together after the trial suggesting a rebound effect in the death rate after the trial (i.e., the zidovudine patients died at a faster rate during the extrapolation period). The third scenario effectively lay between the other two and assumed that the, after the trial follow-up, patients died at the same rate after trial follow-up. The authors showed that, depending on which assumption was considered the most realistic, the cost per life-year gained for zidovudine varied markedly. This particular study was deterministic but, in a probabilistic framework, a possible extension would have been to model a family of possible survival curves reflecting different extrapolation assumptions. Appropriately parameterized, it would have been possible to assess the EVPI regarding additional research on long-term survival.

Uncertainty about future treatment effects has also been a feature of the cost-effectiveness literature on cholesterol-lowering therapies. In the economic evaluation of the WOSCOPS study, the trial provided  estimates  of  reductions  in  mortality  over a period of 5 years. Nevertheless, to establish the increase in mean survival duration associated with the use of pravastatin, an estimate of the life expectancy of patients who had survived that period was necessary. The authors assumed that, after the 5-year period, patients would cease taking the therapy and those in both arms of the trial would die at the same rate as the general population [43].

One element of the uncertainty associated with the intermediate-to-final outcome relationship, is whether the nature of the relationship is stable across different therapies or varies depending on what intervention is used to affect the intermediate measure. These features of the model can be dealt with by adding further uncertainty to the parameter distributions. This allows the measure of decision uncertainty to include all relevant uncertainty associated with cost-effectiveness. All things being equal, the more uncertain the intermediate-to-final outcome relationship or the nature of the treatment effect in the future, the greater the decision uncertainty associated with reimbursing a given therapy. The  overall  EVPI  associated  with  a  decision  about a therapy will also reflect uncertainty about these parameters. Furthermore, an EVPI associated with the specific parameter(s) which define the intermediate-outcome relationship or the nature of treatment effects beyond the trial can be calculated. This will indicate whether additional research is potentially efficient to estimate these parameters.

Policy Implications of the Framework

In making decisions about whether or not to reimburse new pharmaceuticals, health-care systems will frequently face the question of when they have enough information to justify reimbursement. This question can only be addressed when there is a clear and explicit objective from delivering health care—that is, the objective function is defined—and an explicit budget constraint is indicated. The assumed objective function in the case study in this article is health gain measured in terms of QALYs; the assumed budget constraint relates to the perspective of the health-care system. The value of the framework is not, however, tied to these perspectives. The insistence on being 95% sure that the new product is more cost-effective than its appropriate comparator(s)—which would be implied if the standard rules of statistical inference were adhered to—will be inconsistent with any objective function.

If the appropriate focus is on expected cost-effectiveness, why is it necessary to place so much emphasis of quantifying the uncertainty? The answer to this is that the decision uncertainty faced by the decision maker is a key element in determining whether additional research should be undertaken and, if so, the nature of that additional research. The appropriate design of additional research, including the optimal sample size, is determined by the balance between the marginal cost and marginal benefit, in terms of reducing the cost of uncertainty, of collecting additional data [30]. A decision about whether or not to reimburse a new intervention based on existing evidence will therefore only be an interim one if it is still efficient to undertake additional research—that is, if the value of perfect information is greater than the cost of collecting additional data. Therefore, the answer to the question “when is there sufficient evidence to reimburse a new product?” is “when it is inefficient to collect additional data.”

This framework for reimbursement decision making has some profound implications. The first is that the decision about whether to reimburse, based on existing data, needs to be taken simultaneously with the decision regarding whether additional research is to be undertaken. Currently, most reimbursement agencies internationally focus largely on the first of these decisions. NICE is unusual in that it has the responsibility to recommend topics for additional research. Nevertheless, it has limited authority to ensure that this research is undertaken to particular timelines. “Joining up” decisions about reimbursement and additional research will therefore be an important feature of any rational system for reimbursement decisions.

A second implication is that there will be no “standard” type or amount of evidence which will be required for every new product for each patient group. Rather, the amount and type of evidence will depend on the value of additional research generally and, in particular, that relating to individual parameters. The type and extent of evidence will therefore be an empirical question, itself informed by analysis. This contrasts with the existing formulaic approach to what constitutes adequate evidence. For a given intervention, the decision theoretic framework may, then, lead to greater or lesser research demands than the existing framework.

A third implication is that the framework will rest on the adequacy of the model and, in particular, on how fully all forms of uncertainty have been characterized. This is particularly important given the fact that, for most reimbursement authorities internationally, the models are submitted by the manufacturers who will have an incentive to underestimate the uncertainty associated with the cost-effectiveness of their products. It is essential that the decision makers are informed by adequate critical review and interrogation of the manufacturers’ analyses and, if necessary, that these are augmented or replaced by independent analyses. This mirrors closely the existing arrangements for modeling to inform the NICE technology appraisal process [47]. Ultimately, it is important that all features of a model are explicit and open to challenge, with the opportunity to assess the implications for cost-effectiveness and value of information of alternative formulations.

A fourth implication relates to whether a new product should be reimbursed while additional evidence is gathered. The framework suggests that if the intervention's expected costs and effects, relative to appropriate comparators, suggest it is cost-effective based on existing evidence (e.g., its ICER is less than the opportunity cost of implementing it), then it should be reimbursed, even if it is efficient to require additional evidence. Nevertheless, this raises some important issues such as whether the reimbursement agencies have the powers to reverse a decision if the additional research subsequently suggests that the product is not cost-effective. Although there are ways of reflecting the “costs” of the possible need to reverse a decision in the analysis informing the adoption and future research decisions [48], it may be considered too difficult to reimburse a new product, require additional evidence and then risk having to withdraw funding. The alternative would be to delay reimbursement until an efficient level of research has been undertaken. Nevertheless, it is important to recognize that such a delay will have opportunity costs in terms of health benefits which are not conferred on patients and/or additional resource costs. This “cost of delay” can be formally quantified using the decision theoretical framework, and this can provide a further source of information to decision-makers [10].

If adopted, this framework will face pharmaceutical manufacturers with a range of new incentives. Some of these are desirable. For example, companies are more likely to gather evidence, during the drug development process, which directly informs cost-effectiveness, and to ensure that this is sufficient at the point at which they apply for reimbursement. Indeed, there is an important role for decision theory and value of information analysis to inform intracompany decisions about drug development [49]. Nevertheless, there are some potentially negative incentives which would need to be addressed. Perhaps most importantly, companies who are second- (or subsequent) to-market within a new class of drug are likely to free-ride on the research undertaken by the company which was first-to-market [50]. For example, this may relate to research undertaken by those first-to-market to establish the link between an intermediate measure of effect and ultimate health gain which could be used by companies which are later to market because it would be largely common to all products in that class. If not addressed, free-riding of this type could lead to market failure and inefficient levels of drug development. To address this issue, which is a classic public good problem, policy makers may have to create an environment whereby the company which is first-to-market is guaranteed some property rights over the research they undertake.


Reimbursement decision-making needs to be supported by an analytical framework which is explicit about all forms of uncertainty relating to a product's cost-effectiveness, which is able to establish whether the intervention is expected to be cost-effective based on existing evidence, and whether additional research to reduce uncertainty when this adoption decision is revisited is itself efficient. An implication of such a framework is that, only when the costs of undertaking additional research are greater than its benefits in terms of reducing uncertainty, is there sufficient evidence regarding the cost-effectiveness of an intervention.

Some may consider the decision theoretical framework too speculative. It is certainly true that, when attempting to characterize the uncertainties in a decision problem when there is a lack, or absence, of evidence, speculation and judgment is inevitable. In such a situation, the available options would be, first, to ignore those elements of a model for which evidence of an “acceptable quality” is unavailable, in which case the analysis will be partial and biased. The second option would be only to appraise technologies where complete and good quality evidence has already been produced, in which case research will focus on relatively simple questions where solutions already exist. The third, and only rational, option is to address complex and uncertain problems in an explicit way, based on evidence when available, but to accept speculation and judgment when it is not, and to require additional research when that would represent an efficient use of resources.

Source of financial support: In preparing this article, the authors received funding from AstraZeneca Pharmaceuticals, and the authors retained full control over its content and publication. Mark Sculpher also receives funding from the NHS Research and Development program for a career award in public health. The Centre for Health Economics receives funding from the UK Medical Research Council Health Services Research Collaboration.