Transportability of Comparative Effectiveness and Cost-Effectiveness between Countries


  • Andrew Briggs has no conflicts to declare.

Andrew Briggs, Public Health & Health Policy, University of Glasgow, 1 Lilybank Gardens, Glasgow G12 8RZ, UK. E-mail:


Most clinical trials performed today are multicenter and many are multinational. The inclusion of multiple centers and countries usually allows the enrollment of a larger sample size in a shorter period of time, which affords greater statistical power. Other advantages include the perception of greater generalizability and the opportunity for the sponsor, in the case of a drug trial, to use the results for registration in more than one country. At the same time, this design poses a number of challenges for interpreting the resulting cost-effectiveness ratio(s). One of these challenges is whether the data required for cost-effectiveness analysis—and thus the cost-effectiveness recommendations—can be assumed to be equivalent across the different countries.

In this article, a brief overview of the literature on economic analysis of multinational studies is presented. The first section relates to the background on the transferability of economic data and the second section discusses the results of a consensus conference on economic analysis of multinational trials. The third section considers the two main statistical approaches that have been advocated for handling the analysis of multinational data—fixed effect and random effect modeling methods—plus a third approach that has only recently been employed in the literature based on modeling the components of the cost-effectiveness calculus. This approach is then illustrated using the results of the recently published cost-effectiveness analysis of the TOwards a Revolution in COPD Health (TORCH) trial [1]. A final section offers some conclusions for the use of such methods in comparative effectiveness and cost-effectiveness studies.

Background to the Transferability of Economic Data

In an early contribution to the literature, O'Brien [2] identified six threats to the transferability of data for economic analysis for questions of whether treatments that are cost-effective in one country might not necessarily be cost-effective in another. These six threats are also useful to consider in the context of multinational clinical trials.

Demography and Epidemiology of Disease

The underlying premise of a multinational clinical trial is that the treatment effect on the underlying biological process is constant across countries. Nevertheless, differences in demography and epidemiology of disease between countries may threaten this assumption, particularly with respect to the absolute benefit of treatment in different countries. The treatment effect in most clinical trials is often a relative measure, such as hazard ratios, relative risks, or odd ratios. In such cases, it is likely that the most it will be appropriate to assume is a constant relative treatment effect across countries, especially when baseline epidemiology/demography differs [3]. Even this assumption may not be supportable. Although large numbers of studies claim to rule out country by treatment interactions for the clinical findings, the power of these tests is usually very limited. The conclusion of no interaction may be a misrepresentation of the finding that there is an absence of evidence of a difference as an indication that there is evidence of an absence of a difference.

Clinical Practice and Conventions

In a clinical trial, the differences in clinical practice and conventions between countries may be limited by the trial's protocol. Nevertheless, there is the potential for differences in provision of “usual care” (as opposed to the treatments under evaluation) to impact the comparison between countries. For example, the rate of surgical intervention, compared to medical management, is known to be higher in the United States than in many other health systems [4,5]. Country-specific differences in lengths of stay in the hospital have also been observed in some trials [6].

Incentives and Regulations for Health-Care Providers

Different countries will have different incentives and regulations which will result in practice variations between countries. These in turn may result in different levels of resource use across different categories of care.

Relative Price Levels

Absolute price levels clearly differ between countries, but these can be accounted for directly in the valuation of the trial's resource use. Country-specific relative price differences between different categories of resource use, on the other hand, are potentially more problematic for multinational studies. Economic theory suggests that differences in relative prices will result in substitution from relatively more expensive resources to relatively cheaper ones. Therefore, differences in relative prices between countries should lead to different practice patterns.

Consumer (Patient) Preferences

Quality of life measures that are used in the calculation of quality-adjusted life-years (QALYs) are based upon individual preferences. There is no reason to suppose that these preferences are not culturally dependent such that we might expect differences to be observed between countries. At the same time, use of pre-scored instruments, such as the EuroQol 5-D or the Health Utilities Index 2 or 3 to assess preferences will tend to mask such differences.

Opportunity Costs of Resources

Different countries will have different levels of ability to pay for improved health outcomes. What is considered cost-effective in a health system in North America or Western Europe may not be considered affordable in South America or Eastern Europe. This fundamentally limits the usefulness of an overall conclusion of any cost-effectiveness analysis of a trial, but does not invalidate the cost-effectiveness results.

The first five of these six threats to transferability of economic data provide reasons why we might be concerned with making a single estimate of cost-effectiveness across all countries. Yet, as was argued in the Introduction, the rationale for multinational clinical trials is usually related to obtaining a large study with power to detect treatment effects. The fundamental problem, therefore, relates to whether data are pooled to maximize power or split to maximize the credibility of the economic analysis in each individual country.

Results from a Consensus Workshop

A taxonomy of different approaches to economic appraisal in multinational clinical trials was recently developed as part of a workshop to explore whether it was possible to gain consensus on how to analyze such trials [7]. The categorization relates to the intersection between three factors: whether the measure of clinical effectiveness data was pooled across all countries or split by country; whether the measure of resource use data was obtained by pooling across all countries or by splitting by country; and whether service use was valued by use of unit costs from multiple countries or by use of a single set of unit costs from one country. The first two factors are combined to define a fully pooled analysis (clinical outcomes and resource use averaged across all countries); a fully split analysis (clinical outcomes and resource use from an individual country or from individual countries); or a partially split analysis (clinical effect averaged across all countries and resource use from an individual country or from individual countries) (the fourth option, clinical effect from an individual country and resource use averaged across all countries, was not considered in the article, presumably because it was felt to be an unlikely approach in practice). For each of these three broad categories, two subcategories were created to distinguish the approach used for costing resource use: a study used “one country” costing if it used costs (prices) from a single country; it used “multi-country” costing if it used costs from multiple countries.

The authors then reviewed 18 economic analyses conducted alongside clinical trials published in the cardiology field and found that the fully pooled approach has been the most prevalent approach to date, with half of all the studies presenting their analysis in this way, although only two used multicountry costing. Fully split analyses are much less common, with only two studies presenting this approach. The second most common approach was to present partially split, one country costing results. Although some may consider that this method provides an insight into the results for a single country, and while its adoption may satisfy decision-makers who are located in these single countries, there is no evidence that analyses of this type provide information about the cost-effectiveness of the therapy in any one of the individual countries that participated in the trial.

Analytical Approaches to Analyzing Multinational Trials

Fixed Effect Approaches

One of the earliest attempts to address the statistical analysis of multinational clinical trials for cost-effectiveness analysis was presented by Willke and colleagues [8]. They examined how clinical and economic outcomes interact using data from a multinational clinical trial of treatment of subarachnoid hemorrhage. Using a series of regression analyses, they developed a novel approach that explored the treatment by country interactions in both outcome (death) and cost, and which allowed the treatment effect on cost to be estimated independently of the outcome effect on cost. Use of a fully pooled analysis with multicountry costing produced just a single cost-effectiveness ratio for the whole trial. Use of a fully pooled analysis with one country costing produced ratios for each country that had very little variability between them. Partial splitting with multicountry costing provided a much greater spread, but the widest variation came from the fully split, multicountry costing analysis.

The increasing spread of results as the data are more widely split is entirely consistent with expectations. The smaller sample sizes involved in the split analyses will increase variability. The key question is to what extent this variability is related to random error or to what extent it reflects systematic differences in the cost-effectiveness between countries because of the sorts of factors discussed previously.

In a more recent contribution, Cook and colleagues [9] proposed the use of standard tests of heterogeneity in the comparison of cost and effects [10] to inform decisions about whether it is appropriate to pool economic data across countries. They outline methods based both on incremental net benefit (INB) and the angular transformation of the incremental cost-effectiveness ratio and illustrate them by use of the 4S study of cholesterol reduction with simvastatin [11,12]. The results of their INB analysis for the countries of Denmark, Finland, Iceland, Norway, and Sweden were presented for a willingness to pay threshold of $75,000 per additional survivor.

The results show that there is some variability when country-specific subsets are analyzed. Positive net-benefit is observed for Denmark, Finland and Sweden, whereas negative net-benefits are observed in Norway and Iceland. Nevertheless, all of the confidence limits overlap zero and tests for both quantitative and qualitative interactions are insignificant. The authors suggest that in the absence of strong evidence of heterogeneity, it is appropriate to consider pooling these data and the overall pooled estimate (ignoring country) is clearly much more precise with a much tighter confidence interval (which nevertheless still overlaps zero net-benefit).

The authors are careful to point out that these tests often suffer from low power. This is perhaps unsurprising given that part of the rationale for multinational trials is to achieve sufficient power overall on the main clinical end point. The authors suggest that evidence of a country-by-treatment interaction is likely to provide an argument against pooling the data, but that absence of evidence should not necessarily be interpreted as a rationale to pool.

Given the relative similarity of the Scandinavian countries and their health systems, the lack of heterogeneity in this case is not unexpected. For multinational trials covering a broader range of countries, evidence of heterogeneity may be more likely.

Random Effects Approaches

The potential problem with the fixed effect approaches identified previously is that they require a choice to be made between pooling or splitting. Although when splitting the data, random error is important, systematic differences between countries are also likely to be important. Random effects models offer the potential to estimate systematic differences between countries, while simultaneously adjusting for their expected random error associated with splitting the data. In this regard, they offer something of a statistical middle ground between a fully split analysis and a fully pooled analysis, while reflecting the natural hierarchy in the data structure of subjects belonging to countries or regions. Random effects modeling was identified as a promising method at the consensus conference discussed previously and there have recently been a number of published examples of its application [13–15].

Statistical Modeling of Cost-Effectiveness Components

An alternative approach than the two approaches outlined previously is to consider separate statistical modeling of the components of cost and effect. This sort of modeling is common in decision analysis. Indeed, when first introducing decision analysis-based cost-effectiveness to a clinical audience, Weinstein and Stason emphasized that cost and effect differences are made up of components. These components relate to the cost of treatment, the effect of treatment on length of life and associated costs, the effect of treatment on the morbidity of the disease and the consequent effects on quality of life and cost, and the effect of treatment side effects on quality of life and cost [16]. This sort of approach offers a number of advantages over the traditional approach to cost-effectiveness analysis based on analyzing the trial directly via the mean cost and effect in each arm. For example, it is possible to choose the appropriate statistical model for component, with explanatory variables that vary by component, and with different scales for different components. It can be easier to incorporate external evidence where required (for example, quality of life weights attached to events) and an analysis based on components may form a more logical basis for extrapolation.

One of the consequences of allowing different scales of measurement and different explanatory factors (including treatment effects) is that heterogeneity is directly estimated and this can lead to a form of sub-group analysis that is not based on splitting the data and which may therefore avoid some of the problems associated with standard approaches to sub-group analysis. For example, in their cost-effectiveness analysis of the TORCH study, Briggs and colleagues [1] separately modeled study treatment cost, other medical costs, health related quality of life (HRQoL) and survival including an assessment of treatment by country interaction terms to estimate jurisdiction-specific cost-effectiveness from this multinational study. Regional cost-effectiveness results were estimated from a combination of significant treatment by region interactions on treatment cost, with multiplicative models with main effects only for survival and other medical costs. The resulting region-specific estimates had tighter confidence intervals than an analysis based on splitting the trial into separate regions, while still allowing for some regional variation based on different baseline characteristics. The results of this analysis are shown in Figure 1.

Figure 1.

Comparative estimates of effectiveness quality-adjusted life-years (QALYs) for four different regions in the TOwards a Revolution in COPD Health (TORCH) trial [1]. White circles show estimates based on splitting the data into four regions, black diamond shows pooled result over all regions, crosses show the estimated regional results based on separate estimation of quality of life and life expectancy by region, allowing for different baseline characteristics. Horizontal lines show 95% confidence intervals. Region codes: US, United States; EE, Eastern Europe; WE, Western Europe; Other, all other countries in TORCH except for Asia Pacific.

Of course, these potential advantages come at the price of the assumptions that are introduced. The conventional cost-effectiveness approach to trial-based evaluation requires little in the way of assumptions and that is its principal strength. The alternative approach based on statistical modeling of clinical events, or components of cost-effectiveness, introduces additional assumptions to the analysis. The validity of the results is therefore conditional on those assumptions holding. Nevertheless, the assumptions employed are often the “natural assumptions” that in any case underpin the clinical analysis of events from a trial. Furthermore, any concerns over potential manipulation for the results can be guarded against by fully specifying the analysis plan for the health economic analysis. Although it may be too early to be sure whether such event-based analysis will become the new standard, it offers sufficient advantages that it should be considered seriously by all those embarking on trial-based economic evaluation studies.

Closing Remarks

In studying comparative effectiveness (and cost-effectiveness) in multinational studies, it is important to acknowledge the potential for the results of the study to vary across different countries/geographic regions. Simply pooling results across a multinational trial without regard for this potential risks inappropriate decision-making in at least some of the jurisdictions covered by the trial. It was argued that an analysis based on separate components making up the effectiveness (and cost-effectiveness) calculus offers a robust way forward that can avoid splitting the data. The utility of QALYs has long been recognized in the health economics community as a measure that can bring together different aspects of treatment effect on quality and length of life, and this unit of measurement is also well suited to quantifying comparative effectiveness. In terms of multinational studies, it seems clear that we need to have absolute measures of effectiveness by country. Nevertheless, this does not mean that the estimates have to be based on additive models. Absolute estimates of treatment benefit can be determined by combining relative measures with baseline risk estimates from individual countries or regions, which can adjust for the differences that we know exist between countries or regions.

Source of financial support: Oxford Outcomes, the National Pharmaceutical Council, and Shire Pharmaceuticals. Andrew Briggs holds the William R. Lindsay Chair in Health Policy & Economic Evaluation at University of Glasgow.