Analyses of Cost Data in Economic Evaluations Conducted Alongside Randomized Controlled Trials

Authors


Jalpa A. Doshi, 1222 Blockley Hall, Philadelphia, PA 19104, USA. E-mail: jdoshi@mail.med.upenn.edu

ABSTRACT

Objective:  The adoption and diffusion of new medical treatments depend increasingly on evidence of costs and cost-effectiveness. This evidence is increasingly being generated from economic data collected in randomized clinical trials. The objective of this article is to evaluate the statistical methods used for analysis of cost data in economic evaluations conducted alongside randomized controlled trials.

Methods:   Systematic review of economic evaluations based on patient-level cost or resource-use data collected in randomized trials was published in 2003. One hundred fifteen articles were identified from the MEDLINE database. The use of statistical methods for 1) joint comparison of costs and effects and assessment of stochastic uncertainty, 2) incremental cost estimation, and 3) handling of incomplete or censored cost data was evaluated.

Results:   Only 42 (37%) of the 115 economic evaluations presented a cost-effectiveness ratio or estimated net benefits and 24 (57%) of these reported the uncertainty of this statistic. A comparison of costs alone was more common with 92 (80%) of the 115 studies statistically comparing costs between treatment groups. Of these, about two-thirds (62; 68%) used at least one statistical test appropriate for drawing inferences for arithmetic means. Incomplete cost data were reported in 67 (58%) studies with only two using a published statistical approach for handling censored cost data.

Conclusion:   The quality of statistical methods used in economic evaluations conducted alongside randomized controlled trials was poor in the majority of studies published in 2003. Adoption of appropriate statistical methods is required before the results from such studies can consistently provide valid information to decision-makers.

Introduction

With the growing demands on limited health-care budgets and the need to control the rapid growth in medical expenditures, the adoption and diffusion of new medical treatments depends increasingly on evidence of cost and cost-effectiveness. A critical source of this evidence comes from analyses of economic information collected prospectively in randomized clinical trials alongside clinical end points. Yet a systematic review of all clinical trial-based economic evaluations published in 1995 found “major deficiencies in the way cost data in randomized controlled trials were summarized and analyzed”[1]. In the past decade, however, the field has matured substantially, including the advancement of, and a growing consensus about, appropriate statistical methods [2].

In light of these changes, we assessed the methods used in clinical trial-based economic evaluations published in 2003. We specifically evaluated the methods used for 1) joint comparison of costs and effects and assessment of sampling uncertainty; 2) estimation of incremental costs; and 3) handling of incomplete or censored cost data.

Methods

Statistical Methods for Cost-Effectiveness Evaluation

Joint comparison of costs and effects and assessment of sampling uncertainty. A joint comparison of costs and effects using the incremental cost-effectiveness ratio (ICER) or the incremental net monetary (health) benefit (INB) is a useful decision tool to help determine whether the new therapy offers good value relative to the alternative. The use of this tool is particularly important when there is a trade-off between costs and effects; that is, one therapy is both significantly more effective and more costly compared with the other therapy. If there is no trade-off between costs and effects, that is, when one therapy is significantly more effective and less costly when compared with the other therapy, this decision tool may not be necessary because that therapy is unambiguously dominant over its alternative. A third possibility occurs when the two treatments have the same effect. In this case, some authors have interpreted textbooks and guidelines on health economic evaluations to suggest that a cost-minimization approach is sufficient (i.e., the lowest-cost treatment is the treatment of choice) and there is no need to perform a joint comparison of costs and effects [3–6]. Nevertheless, as our understanding of sampling uncertainty for the comparison of costs and effects has grown, the cases where this interpretation is appropriate have shrunk.

Because cost-effectiveness ratios and net monetary benefit estimated from trial data are the result of samples drawn from the population, one should report the uncertainty in this outcome that derives from such sampling [7]. Identification of methods such as confidence intervals for cost-effectiveness ratios [8–11], acceptability curves [12], and confidence intervals for net monetary benefit [13] for the measurement of this uncertainty have been important methodologic developments in the economic evaluation of medical therapies [14]. When one uses these methods, a finding of significantly lower cost and an indistinguishable clinical outcome need not guarantee, one can be confident that the significantly less expensive therapy is good value. As a result of uncertainty, the cost-minimization approach has been shown to be rarely appropriate as a method of analysis and the need for a joint comparison still remains under most circumstances [15]. Alternatively, because it is possible to have more confidence in the combined outcome of differences in costs and effects than in either outcome alone, observing no significant difference in costs and effects need not rule out that one can be confident that one of the two therapies is good value. In these cases, one should compare costs and effects, and one should report on their sampling uncertainty.

Estimation of incremental costs.  For economic analysis, costs and cost differences between treatment groups should be expressed by the use of the arithmetic mean, and not medians, because this summary measure permits a budgetary assessment of treatment (N × arithmetic mean = total cost) and is the statistic of interest for health-care policy decisions [1,2]. The most common statistical test for arithmetic mean differences between treatment groups is the parametric t-test. Because of the often highly skewed distribution of cost data, the normality assumption underlying this test is often called into question and standard nonparametric tests (e.g., Mann–Whitney U-test or Wilcoxon rank sum test), or parametric tests on normalizing transformations (e.g., log transformation) are often used as a substitute. Yet these popular alternatives are not appropriate for drawing statistical inferences on differences in arithmetic mean costs [16–18]. For example, when one uses a t-test to evaluate the log of costs, the resulting P-value has direct applicability to the difference in the log of costs and to the difference in the geometric mean of costs. It may or may not be directly applicable to the arithmetic mean costs. Similarly, when one uses a Mann–Whitney U-test, one is testing differences in the median of costs. Thus, statistical inferences about these other statistics may not be representative of inferences about the differences in arithmetic mean, which is the statistic of interest.

If one does not want to adopt a parametric t-test to directly test for differences in arithmetic mean costs, one can compare the arithmetic means by using a nonparametric bootstrap. This procedure has the added advantage of avoiding a parametric assumption about the distribution of costs. As a result, the nonparametric bootstrap has increasingly been recommended either as a check on the robustness of standard parametric t-tests, or as the primary statistical test for making inferences about arithmetic means for moderately sized samples of highly skewed cost data [18–20].

Even if treatment is assigned in a randomized setting, some authors use multivariable techniques to analyze costs. Multivariable analysis of costs may be superior to univariate analysis because it improves the power for tests of differences between groups (by explaining variation due to other causes). It also facilitates subgroup analyses for cost-effectiveness, for example, more and less severe; different countries/centers, etc. Finally, it accounts for potentially large and influential variations in economic conditions and practice patterns by provider, center, or country that may not be balanced by randomization.

Adoption of multivariable analysis does not, however, avoid the issues that arise in the univariate analysis of cost. For example, regressions on the logarithmic transformation of costs were previously considered an ideal remedy to the violation of the assumption of normally distributed error term that underlies ordinary least squares (OLS) regression. Nevertheless, as the shortcomings of multiple regression models of log transformed costs became more widely publicized [17], the use of the generalized linear models have become the accepted alternative [21–23].

Handling of incomplete cost data.  Incomplete or censored cost data occur in most randomized trials that follow participants for clinically meaningful lengths of time. Whether cost data were incomplete, the amount of incomplete data and the statistical method adopted to address the problems posed by censoring incomplete data should routinely be reported in trial-based analyses [2]. Although there exists a mix of approaches to impute the cost data, recent statistical interest in addressing censored cost data has led to the proposal of several methods of estimation that explicitly account for incomplete data loss-to-follow-up [24–30].

It is well-established that these methods are prone to less bias than other naive estimation methods wherein censored observations are either excluded from analysis (i.e., complete-case analysis) or included as though they were complete observations,(i.e., full-sample analysis) [25,26,28,31–33]. In the first naive approach, only the uncensored cases are used in the estimation of mean cost and this method is biased toward the costs of the patients with shorter survival times because patients with larger survival times are more likely to be censored [25,32]. Also completely discarding patients with censored data can lead to the loss of information and statistical power, which can be problematic if the percentage of censored cases is high. The second naive approach which uses all cases without differentiating between censored and uncensored observations is always biased downward because the costs incurred after censoring times are not accounted for [32].

Study Selection

This review included published studies evaluating economic outcomes based on patient-specific cost or resource-use data collected in randomized controlled trials. A search was conducted in the MEDLINE database as of September 2004 for all studies which included terms related to costs (e.g., “cost(s),”“economic evaluation(s),” or “health economic(s)”) and clinical trials (e.g., “trial(s)” or “randomized controlled trials”) in the title, abstract, or MeSH headings. The search was limited to publications in English, involving human subjects, and was published during 2003. This search identified approximately 650 eligible articles. A majority of these studies were excluded upon review of the study abstract wherein it was clear that they were not reporting on clinical trial-based economic results. The full text was reviewed for 162 articles. Studies were excluded because either the study did not collect or analyze patient-level costs, or clinical trial data were applied in a decision-analytic model, or if the study in fact was not a randomized trial. This resulted in 115 articles being finally included in the review.

Data Abstraction

Data were extracted by using a specially designed data abstraction form. The first part of the form collected general study information such as country where trial was conducted, broad clinical area, and type of intervention. The second part of the form collected specific information on the economic outcome studied, the analysis of costs, and the approach to handling incomplete data. We first determined whether a joint comparison of costs and effects was performed in each study; and if not, whether it was justified. For studies that estimated an ICER or INB, we examined whether and how stochastic uncertainty was estimated. We then focused on the analysis of cost data in terms of how costs were summarized, statistical test used to compare the costs across treatment groups, and multivariate technique used to report an adjusted incremental cost estimate. Lastly, we collected information on whether the study reported incomplete cost data and technique, if any, used to address the problem.

Assessments were carried out by one assessor (J.A.D.). The reliability of the data abstraction was monitored using an independent assessment by a second author (H.A.G.) of a 20% random sample of the 115 studies. Agreement was complete for the items reported in this article. Only in the case of one item (i.e., technique for handling incomplete cost data) was discussion needed to determine the classification, because the reporting of methods to account for censored cost data was unclear in several studies.

Reporting of Results

We report the number and proportion of the 115 studies that conform and do not conform with each of the principals for statistical evaluation of cost-effectiveness set forth above. We also investigate whether the statistical methods used were associated with the number of participants in the randomized trial. To do so, we report selected results stratified by the sample size of the study (fewer than 200 subjects, between 200 and 999 subjects, and 1000 or more subjects).

Results

General Study Information

The 115 studies covered a variety of clinical areas such as cardiovascular disease, musculoskeletal conditions, cancer, and psychiatry. A total of 98 (85%) studies were published in general medical, surgical, or subspecialty clinical journals and the remainder were in methods or policy journals. The trials in which these economic analyses were performed were conducted in either the United States (27; 24%), the UK (27; 24%), multinationally (24; 21%), or in other countries (37; 31%). The economic analysis in 50 (43%) of the 115 studies was based on a sample size of less than 200 subjects, 47 studies (41%) had between 200 and 999 subjects, and 18 (16%) had 1000 or more subjects.

Joint Comparison of Costs and Effects and Assessment of Sampling Uncertainty

Only 42 (37%) of the 115 identified studies conducted a joint comparison of costs and effects (INB [n = 3]; ICER [n = 38]; and cost–benefit ratio [n = 1]) (Table 1). Six studies (5%) divided each therapy’s costs by its effects, but did not compare incremental costs with incremental effectiveness. Sixty-seven (58%) studies reported differences in costs only and did not jointly compare costs and effects. The lack of a joint comparison could be justified in nine of the 73 studies because one treatment was statistically dominant (significantly lower costs and significantly better outcomes) compared with the alternative. In 38 other studies, the lack of a joint comparison could possibly be justified based on an absence of significant differences in effects between the treatment groups. Hence, depending on the strictness of the criteria, either 26 (23%) or 64 (56%) of the 115 studies should have estimated costs and effects jointly, but failed to do so.

Table 1.  Joint comparison of costs and effects and sampling uncertainty
 n%
  • *

    Mutually exclusive hierarchical classification in the order listed.

  • “No” refers to studies wherein one treatment arm was significantly more effective but not less costly than the other treatment arm(s), or one treatment arm was more effective but no statistical comparison of costs reported or no statistical comparison on either costs or effects reported; “Possibly” refers to studies wherein there was no significant difference in effects across the treatment arms; and “Yes” refers to studies wherein one treatment arm was significantly more effective and less costly (i.e., dominant) than the other treatment arm(s).

  • One study calculated 95% CI for ICER based on 95% CI values for only the numerator (i.e., costs) and another calculated 95% CI but did not specify how these were estimated.

  • CI, confidence interval; ICER, incremental cost-effectiveness ratio; INB, incremental net benefits.

Total number of studies115100
Type of cost analysis*
 INB33
 ICER3833
 Cost–benefit ratio11
 Cost-effectiveness ratio comparison across treatment arms65
 Mean cost comparison6355
 Median cost comparison only43
Was joint comparison of costs and effects conducted?
 No7363
 Yes4237
If no, was the lack of joint comparison justified?
 No2636
 Possibly3852
 Yes912
If yes, was stochastic uncertainty measured?
 No1843
 Yes2457
If yes, how was stochastic uncertainty measured?*
 Acceptability curves1250
 95% CI using bootstrapping938
 95% CI using Fieler’s theorem14
 Other28

When we stratified by sample size, we found that studies with smaller sample sizes were less likely to conduct a joint comparison of costs and effects. Again, depending on the strictness of the criteria, either 14 (28%) or 35 (70%) of the 50 studies with sample sizes less than 200 failed to conduct a joint comparison. On the other hand, of the 18 studies with sample sizes of 1000 or more, only two (11%) or four (22%) failed to estimate costs and effects jointly.

Among the 42 studies that compared costs and effects, only 24 (57%) reported sampling uncertainty (Table 1). There was no significant difference in this proportion when we stratified by sample size (proportions ranged from a low of 54% among studies with 1000 or more participants to a high of 61% among studies with 200 to 999 participants). When sampling uncertainty was reported, it was reported appropriately 92% of the time. The methods used in these studies included acceptability curves (n = 12), 95% CI obtained from bootstrapping (n = 9), or Fieller’s theorem (n = 1).

Estimation of Incremental Costs

One hundred ten (96%) of the 115 studies reported arithmetic mean costs by treatment group and 92 (80%) statistically compared costs between treatment groups (Table 2). Of the 92 statistical comparisons, 62 (68%) used at least one statistical test which was appropriate for drawing inferences for arithmetic means (i.e., nonparametric bootstrapping [23%] or parametric test on untransformed means [45%]). Less appropriate tests included t-tests on transformed mean costs (6; 7%) and nonparametric tests of medians or distributions (17; 18%). In six (7%) of the studies, the type of test performed was unclear. More than one type of test were performed in approximately 23% of these studies: t-tests on untransformed mean costs were performed 50% of the time, t-tests on transformed mean costs were performed 14% of the time, and other nonparametric tests were performed 27% of the time.

Table 2.  Analysis of costs
 n%
  • *

    Mutually exclusive hierarchical classification in the order listed.

  • Wald type test proposed by Zhou et al. (1999).

  • Mainly includes logarithmic transformations; only one study conducted a square-root transformation.

  • §

    Includes one study which conducted two-stage probit and tobit regressions and another which used a linear mixed model.

  • OLS, ordinary least square.

Total number of studies115100
Type of cost comparison reported across treatment arms
 Mean costs only9684
 Mean and median1412
 Median costs only54
Was statistical comparison of costs across treatment arms made?
 No2320
 Yes9280
If yes, what type of statistical test was conducted?*
 Nonparametric bootstrapping2123
 t-test on untransformed mean costs4145
 Other parametric test on untransformed mean costs11
 t-test on transformed mean costs67
 Mann–Whitney U-test/Wilcoxon rank sum test to infer differences in mean costs910
 Mann–Whitney U-test/Wilcoxon rank sum test on median costs55
 Kruskal–Wallis test on distribution of costs11
 Kolmogorov–Smirnov test on distribution of costs22
 Not clear67
Were adjusted incremental costs reported?
 No10591
 Yes109
If yes, what type of multivariate model was estimated?
 OLS770
 Log OLS with smearing retransformation110
 Generalized linear models00
 Other§220

Only 10 (9%) of the 115 studies conducted multivariate adjustment of the incremental costs. Seven of these estimated multiple regressions using OLS on untransformed costs. In addition to producing inefficient estimates in the face of non-normality, OLS has the disadvantage of not being robust in small- to medium-sized data sets and in large data sets with extreme observations. One study estimated OLS models on log transformed costs and conducted a smearing retransformation of the estimates to the untransformed scale. Nevertheless, this study did not provide sufficient information to judge whether the specific smearing retransformation that was adopted was appropriate. No study estimated generalized linear models. Studies with sample sizes less than 200 (1; 2%) were less likely to conduct multivariate adjustment than studies with sample sizes of 200 or more (9; 14%).

Handling of Incomplete Cost Data

In our sample of economic evaluations, 26 (23%) reported no attrition, 67 (58%) reported some incomplete cost data, and the remaining 22 (19%) did not indicate whether or not complete follow-up was achieved (Table 3). In our examination of how the 67 studies with incomplete cost data dealt with the issue, we found that 19 (28%) studies used various approaches to impute the missing costs. Two of these studies used a published statistical approach to address the issues posed by censored cost data: one used a regression-based method by Carides et al. (2000) [30] and the other used the Kaplan–Meier sample average estimator by Lin et al. (1997) [25]. More than one-third (26; 39%) of the studies conducted a complete-case analysis. This method could pose problems when there are a large number of subjects with incomplete cost data. We found half of the studies using complete-case analysis had more than 15% of observations with censored cost values. Another three studies (4%) used the naive approach of a full-sample analysis by ignoring censoring and averaging costs over all patients. It was not clear how the remaining 19 (28%) studies dealt with incomplete cost data. Only seven (10%) of the 67 studies conducted a sensitivity analysis using an alternative method of handling incomplete cost data.

Table 3.  Handling of incomplete cost data
 n%
  • *

    Use of linear mixed models or generalized estimating equations.

Total number of studies115100
Was there incomplete cost data?
 No2623
 Yes6758
 Not reported2219
If yes, what was the primary method used to handle incomplete cost data?
 Imputation approach1928
  Lin et al. (1997) method12
  Carides et al. (2000) method12
  Imputation of group-specific univariate mean or median69
  Imputation-using regression34
  Repeated measures analysis*34
  Per person per unit time measure23
  Last observation carry forward34
 Full-sample analysis34
 Complete-case analysis2639
 Not clear1928
If yes, was sensitivity analysis conducted using an alternative method of handling incomplete cost?
 No6090
 Yes710

Discussion

The number of clinical trial-based economic evaluations has increased considerably over the last decade. This review identified 115 such studies published in 2003 compared with the 45 studies published in 1995 that were identified in a previous review [1]. In the same time frame, research has also improved the methodologies for analysis and reporting of cost data collected alongside clinical trials. In light of these advances, the comparison of our key findings with those of the 1995 review suggests that published studies have begun to use some of these new statistical techniques. For example, the number of studies performing a statistical test for the cost comparison increased from 53% to 80%; specifically, the proportion of these studies performing nonparametric bootstrapping increased from 0% to 23%. Similarly, the number of studies reporting INB or ICER increased over threefold; in addition, more than half of these studies also reported stochastic uncertainty around these joint estimates relative to none in 1995.

Nevertheless, in terms of absolute numbers, our review reveals that there are still a substantial number of clinical trial-based economic studies using statistical methods of poor quality. For example, about one in five studies did not perform a statistical test for the cost comparison and yet made claims about cost-effectiveness or cost-savings. Although the remainder statistically evaluated costs, about a quarter of these either performed the inappropriate statistical test for arithmetic means or only compared median costs. About one in 10 studies estimated adjusted costs; and a majority of these used multivariate models that potentially faced issues in terms of bias or efficiency of estimation. Only half of the studies calculating an incremental cost-effectiveness estimate reported some appropriate measure of stochastic uncertainty. Finally, although almost two-thirds of the studies reported some incomplete cost data, only two studies used a published statistical approach to handle censored cost data. Both these studies had at least one coauthor who was also an author on the original statistical methods paper [25,30].

The International Society for Pharmacoeconomics and Outcomes Research has recently published a best practices document for the design, conduct, and reporting of economic analyses alongside clinical trials [2]. Whether explicit guidelines alone will foster improvements in the quality of future studies remains a question, given previous research that suggests that such guidelines have had minimal or slow impact in improving the quality of subsequent studies [6,34–36]. Part of this problem may be that most of the advances in the statistical techniques for analyzing cost data have been published in highly technical economic or biostatistics journals. Although some applied researchers may not be reading such literature, many may have difficulty understanding the rationale for and implementation of these technical methods. There is a clear need for publications in more applied journals that focus on explaining these technical advances in an easily understandable format to conduct knowledge transfer to researchers who need to be applying these newer methods. Additional efforts to improve the quality of future studies may involve peer reviewers for both funding agencies and journals being critical of studies that fail to apply best practices in economic evaluations in clinical trials. Regulatory and reimbursement authorities should also explicitly adopt best practices guidelines and uphold all economic data submissions to these high standards while making reimbursement decisions.

The growing number of economic evaluations being incorporated in prospective clinical trials reflects an increasing amount of monetary investments, both private and public dollars, being made for the additional collection and analysis of economic end points alongside clinical trials. Hence, to obtain “value” for our money, it is necessary to immediately ensure the quality and consistency of these studies to result in efficient health-care resource allocation decisions.

Conclusion

Our review finds that the quality of the methods used for analysis of costs collected alongside clinical trials has been improved over the decade; however, there still remains large room for progress. Meanwhile, decision-makers need to be cautious while interpreting the results of current clinical trial-based economic evaluations because some may provide misleading conclusions given the substantial variation in the statistical methodology and reporting of cost analyses revealed in this review. Efforts are needed from different stakeholders to ensure that future clinical trial-based cost-effectiveness analyses address these issues to enhance the validity of their findings and ensure their usefulness in health-care decision-making.

Source of financial support: This study was funded by the National Institute on Alcohol Abuse and Alcoholism (R01-AA12664) and the National Institute of Drug Abuse (RO1-DA017221).