Classical randomized trial approach to evidence-based decisions
Across many disciplinary traditions, the randomized trial has been considered to yield the strongest (most internally valid) level of evidence. The Evidence-Based Medicine movement and its precursors have for many years emphasized randomized studies as providing the highest possible level of evidence. To be recommended for adoption under this approach, a treatment should ideally be shown in a randomized trial to yield positive health benefits at the 5% significance level. Recent reviews have discussed application of these concepts to the related area of Evidence-Based Public Health.1,2 A key difficulty highlighted is that while some public health interventions are amenable to randomized study, many population-level social determinant interventions have to date not been feasible to study using randomization designs. If randomized evidence was required for policy decisions, then this would remove many population-based interventions from consideration, in favor of more individualized interventions that are more amenable to randomization, such as medical care.3 However, other researchers have argued that the lack of randomized trial results should not imply lack of action; decisions still must be made, and should use the best available evidence.4 In fact, many health policies and interventions are based on evidence that is relatively weak or sometimes missing altogether. While the evidence base in both medicine and public health is growing, it is still relatively limited. A majority of medical practices have weak to no formal evaluation.5 Within public health, a recent study found that more than 40% of programs lack an evidence base.6 Thus while it is of course desirable to have the strongest evidence possible, it would be a counterproductive double standard to insist on randomized evidence before backing any policies or interventions relating to the social determinants of health.
Potts et al. went further than this, arguing that where nonexperimental evidence is strongly suggestive, it would be unethical not to act.7 They approvingly cite Smith and Pell's satirical essay that provocatively argues that because randomized trials have not proven the efficacy of parachutes, they should not be used until a randomized, placebo-controlled parachute trial has been conducted.8 Extensive commentary in letters submitted to the journal in response criticized Potts et al., however, on two key counts. First, there was concern about possible cherry picking of the nonexperimental evidence in support of some favored approach. Indeed, Ioannidis warns that nonrandomized studies may systematically produce larger results than randomized ones, presumably due to data mining (possibly inadvertent) and publication bias problems, and that “claimed research findings may often be simply accurate measures of the prevailing bias” in a field.9,10 Randomized trials do not wholly avoid this problem, however, as their design and framing may reflect underlying assumptions and they often produce conflicting results. Concern about publication bias and data mining highlights the importance of critical, comprehensive, and balanced literature reviews; however, it should not preclude the use of nonexperimental evidence when it is the best available for policy. A second critique of the Potts et al. approach is potentially more serious: that the health field is replete with examples of poorly tested and now discontinued practices and interventions that have turned out to be useless or even harmful on net after further study. However, such cases are not the norm, and this critique ignores the many medical practices that have turned out to indeed be beneficial despite lack of early rigorous testing. What these concerns highlight is the importance of having a framework for prospectively using the best available information to choose which interventions are expected to have net benefits that are greater than net costs.
Decision science and Bayesian policy analysis
Here, we turn to the field of Decision Science, an area of study devoted to making optimal use of evidence for decision making, based on the tools of cost-effectiveness, cost-utility, and cost–benefit analysis.11 A considerable portion of this field has focused on decisions about clinical medicine, but similar conceptual principles apply to nonmedical health policies. Among the most important conceptual differences between decision analysis and the traditional focus on randomized trials is that unlike classical statistics, decision analysis does not focus solely on 5% significance levels to determine whether an intervention is recommended. Cost–benefit analysis (the conceptually most appropriate but most challenging decision science method in practice), is guided instead by an analysis of whether, after appropriately considering all of the benefits and costs of a policy or intervention, and their estimated uncertainty, the expected benefits outweigh the expected costs.
Consider an intervention that would increase health by ΔH units over the status quo, at an incremental cost of ΔC. To reflect uncertainty about our estimates, the future time path of benefits, and the value of health, we will represent the health improvement as having a present discounted (i.e., as valued today) expected value of EV[ΔH]. Similarly, represent the present discounted expected value of intervention costs as EV[ΔC]. The benefit–cost decision rule would be to recommend this intervention for populations where: EV[ΔH] > EV[ΔC]. Note that proponents of an intervention often argue that the “cost of inaction” (maintaining the status quo) must be considered as well. This cost of inaction is already incorporated into the decision rule, as it typically refers to the value of the foregone health improvement, EV[ΔH], which would not be obtained if the intervention were not to be adopted. But it is useful to state in both manners, as a reminder to focus not just on the costs of the intervention EV[ΔC], but to instead compare which of these amounts is larger. More precisely, the net “cost of inaction” can be defined as EV[ΔH]− EV[ΔC]; that is, the net surplus to society that would be foregone if the intervention were not adopted
Actual estimation of the full EV[ΔH] and EV[ΔC] is of course complex, but focusing here on this simple relation is instructive for thinking about common features of policies and interventions discussed in the latter half of the chapter. Consider what is included in the cost calculation EV[ΔC]. As the cost-effectiveness literature has emphasized, this should capture any incremental change in direct and indirect costs, both now and in the future, as compared to what costs would have been in the absence of the intervention. Whose costs are included will depend on whether the analysis is from the perspective of an individual, family, or some larger population group. A typical population health analysis might take the perspective of the nation, thus include both personal and public budgetary costs of everyone in the nation. From this perspective, any unintended efficiency loss caused by the intervention should also be taken into account. For interventions with public budgetary costs (including many key social determinants discussed below), an important efficiency loss to take into account would be the “marginal cost of public funds”;12 that is, the welfare loss from reduced work effort caused by raising taxes. Raising $1 of budget revenue may cost the economy $1.15 for a relatively nondistortionary tax, or it could cost substantially more for other types of taxes that have bigger negative impacts on behavior. It would also be appropriate to include among the costs any other losses induced by the program, such as negative work incentives induced by means-testing programs; in practice, however, such costs are rarely estimated and thus are left to be considered more qualitatively alongside the numeric benefit–cost result.
Calculation of benefits EV[ΔH] also requires careful consideration. Many health investments may not pay off for many years, while costs are incurred upfront, thus benefits may be less striking after appropriately discounting. For example, if an early childhood program improves chronic disease outcomes by one unit when the child reaches age 50, then discounting at 3% annually yields a present value of that improvement equal to less than one-fourth of a unit. At the same time, when considering health benefits of nonhealth policies, it is important to take into account the value of nonhealth benefits as well. For example, Dow and Schoeni13 calculated the EV[ΔH] of health improvements from investing in college education so as to raise the health of less educated Americans up to the level of college-educated Americans; the health benefit was roughly one trillion dollars, but the wage benefit of education would likely be valued at more than double this amount. This finding, that the health value of nonhealth policy may be somewhat smaller than the nonhealth value, may be common when examining policies targeted at the social determinants of health. To accomplish a full analysis across different types of outcomes, it is necessary to use a common economic value metric. It is tempting to cite the health benefits only in health units, such as reduced asthma cases or increased quality-adjusted life years (QALYs), but comparing the full costs and benefits of such policies will require going the next step and placing a dollar value on health (e.g., the commonly used value of $100,000 per QALY). Valuing both health and nonhealth benefits is likely to result in more favorable assessments of the benefits of such policies.
What about the many sources of uncertainty in decision science analyses? Best practice in decision science modeling involves extensive sensitivity analysis. One type of sensitivity is with respect to modeling decisions and assumptions, such as the value of a QALY. A second type of sensitivity is with respect to statistical uncertainty surrounding the value of parameter inputs, such as the estimated effectiveness of an intervention on health. Stochastic decision analysis techniques have been developed from either a classical frequentist perspective, using traditional confidence intervals from a randomized trial, or alternatively from a Bayesian perspective using a prior distribution influenced by both confidence intervals and potentially subjective factors, such as strength of study design as well as theoretical priors. While one could insist on only using classical statistical inference from randomized trials when conducting decision analyses, such approaches are in the minority; a Bayesian-like choice of sensitivity thresholds has long been employed, and formal Bayesian practices are now emerging as well.14–16
In fact, Claxton goes one step further, arguing for a full Bayesian-theoretic decision approach in which decisions are based only on whether mean net benefit is positive (EV[ΔH] > EV[ΔC]), regardless of confidence intervals (surrounding, for example, the estimate of the health effect ΔH).17 When a policy decision must be made, he argues that the best available estimates should be used and acted upon, without favoring one outcome over another based solely on arbitrary significance level choices. Claxton, Neumann, Araki, and Weinstein (2001) provide an example in which relying on a classical approach insisting on 5% significance levels would result in an unduly conservative decision to withhold an Alzheimer's treatment, and thus would result in net harm to the population.18 It is important to note though that this does not imply that confidence intervals and quality of evidence are irrelevant—on the contrary, in a Bayesian analysis the quality of evidence is likely to factor even more importantly than in a classical statistical paradigm.
In a Bayesian policy analysis, an analyst begins with theoretical arguments and evidence from related areas to form a prior distribution (i.e., prior to bringing to bear direct evidence on the policy) of the likely net empirical effect of the policy (on ΔH and ΔC), Although we discuss this prior distribution in formal statistical terms, it is of course common that these priors are held quite informally (by both researchers and policy makers). In a situation in which the analyst has a relatively diffuse prior, acknowledging, for example, the theoretical potential that the intervention has some chance of either helping or harming health, then before considering the weight of evidence, the analyst's prior is likely to be that a costly intervention would not yield net benefit. This situation is illustrated in the left-most distribution of Figure 1, which depicts a probability density function of the likelihood that the net health effects of an intervention would take on various values. The prior distribution of Figure 1 shows a mean estimated net health benefit that is positive (size one unit), but with considerable uncertainty. As new evidence develops, the analyst uses this new knowledge to update this prior distribution, resulting in a new “posterior distribution” (reflecting the original theoretical information plus the new empirical evidence) of the net health benefits. If a strong randomized trial were conducted that precisely estimated large benefits, of say a magnitude four units net health improvement in the Figure 1 example, then the posterior distribution of ΔH might become quite tight and pulled close to that estimate. This is illustrated by the right-most (strong) posterior distribution in Figure 1, which results in an updated mean effect of three units. If the costs are less than three units, then the Bayesian analysis might accord well with the classical analysis. But if instead only a small observational study is available, even if it estimated the same effect size of four units, the prior distribution would not be moved as far to the right. Instead, the prior distribution may only weakly move to the right, as in the posterior distribution in the middle of Figure 1, which depicts an updated mean of two units. In this case, depending on the net cost, the Bayesian analysis may or may not yield an estimate that EV[ΔH] > EV[ΔC]. If the net cost were greater than two units then the Bayesian analysis would accord with a classical perspective, but if costs were below two units then even in the absence of a strong randomized trial the Bayesian analysis would support the policy proposal. Thus in Bayesian policy analysis, the validity and precision of estimated ΔH only indirectly affects the policy recommendation, rather than dominating it. Instead of solely focusing on the statistical significance of randomized trial evidence, the Bayesian policy analysis is better able to incorporate both theoretical priors and weaker types of evidence, as well as the crucial information on intervention costs.
Finally, Claxton points out that this same Bayesian method of indirectly incorporating study quality and confidence intervals may be used to derive the value of producing additional research on ΔH, by estimating how likely different types of new evidence would be to substantially move the posterior distribution. Thus the Bayesian approach has the further benefit of helping guide efficient investment into future research agendas.
We next turn to considering types of policies and interventions that could reduce socioeconomic disparities in health. Many of these have characteristics discussed above. They generally lack evidence from randomized trials, while having varying degrees of observational evidence, as well as varying levels of theoretical agreement regarding likely orders of magnitude and direction of effects. The feasibility of conducting various types of decision analysis will also vary across these interventions. As of yet there are no formal Bayesian policy analyses to report, and given the nature of the political processes in which the evidence must be used, formal Bayesian analyses are unlikely to gain widespread use in considering these policies beyond the research community. But the conceptual framework of Bayesian policy analysis is crucial to apply even if informally: combining all available information from theory, randomized trials, and observational studies will provide a stronger basis for choosing those policies and interventions with the highest expected positive social benefit after accounting for full social costs.