• Open Access

Handling missing values in cost effectiveness analyses that use data from cluster randomized trials



Public policy makers use cost effectiveness analyses (CEAs) to decide which health and social care interventions to provide. Missing data are common in CEAs, but most studies use complete-case analysis. Appropriate methods have not been developed for handling missing data in complex settings, exemplified by CEAs that use data from cluster randomized trials. We present a multilevel multiple-imputation approach that recognizes the hierarchical structure of the data and is compatible with the bivariate multilevel models that are used to report cost effectiveness. We contrast this approach with single-level multiple imputation and complete-case analysis, in a CEA alongside a cluster randomized trial. The paper highlights the importance of adopting a principled approach to handling missing values in settings with complex data structures.

1. Introduction

Public policy makers use cost effectiveness analyses (CEAs) in deciding which health and social care interventions to prioritize (National Institute for Health and Clinical Excellence, 2008; Canadian Agency for Drugs and Technologies in Health, 2006; Pharmaceutical Benefits Advisory Committee, 2008). CEAs exploit evidence from randomized studies and, if they adopt appropriate statistical methods, can provide accurate assessments of which interventions are most worthwhile (Gold et al., ; O'Hagan et al., 2001; Willan and Briggs, 2006; Glick et al., 2007; Gray et al., 2010). CEAs raise major challenges for the analytical approach as the data tend to have complex structures, with correlated cost and effectiveness end points (Willan et al., 2003; Willan, 2006), hierarchical data (Manca et al., 2005; Pinto et al., 2005), and costs with right-skewed distributions (Manning, 2006). Most CEAs that use individual level data have observations with incomplete information (Noble et al., 2012). Statistical methods have not been developed that can simultaneously address all these issues. Hence studies may fail to provide the unbiased, precise cost effectiveness estimates that decision makers require.

This paper is motivated by CEAs that use data from cluster randomized trials (CRTs), but the approach that we propose addresses three issues of general relevance. The first is raised by the bivariate nature of the outcomes, which implies the need for joint modelling. Here, one end point is highly skewed, but inferences about means are still required on the original scales of measurement. The second is that randomization is at the cluster level, which implies that the data are hierarchical. The third issue, and the focus of this paper, is the presence of missing data.

Approaches have been proposed for jointly modelling costs and health outcomes while acknowledging that individual costs tend to have right-skewed distributions (Thompson and Nixon, 2005). There is a large literature on methods for handling clustered data; see for example Hayes and Moulton (2009), Eldridge and Kerry (2012) and Goldstein (2011). Methods for CEAs alongside CRTs include a ‘two-stage’ non-parametric bootstrap procedure (Flynn and Peters, 2005; Bachmann et al., 2007), bivariate generalized estimating equations with robust standard errors and bivariate normal mixed models estimated by maximum likelihood (Gomes et al., 2012b), or multilevel models estimated with Bayesian Markov chain Monte Carlo methods (Grieve et al., 2010; Bachmann et al., 2007).

However, a problem that is common to CEAs is that of non-response or missing data, e.g. because of incomplete patient medical records, non-response to patient questionnaires or because patients or study sites withdraw from the study; if one component of resource use or health-related quality of life is missing then the overall cost or outcome per patient will also be missing. Multiple-imputation (MI) approaches have been proposed for simple CEA settings (Blough et al., 2009; Briggs et al., 2003; Ramsey et al., 2005). However, a recent review of published CEAs based on clinical trials found that complete-case (CC) analysis was the most popular approach, and its use has increased over time (Noble et al., 2012).

In CEAs that use CRT data, an extension to a previous systematic review (Gomes et al., 2012b) found that 38 out of 62 (over 60%) studies reported missing data, of which 27 (over 70%) presented only CC analyses. Of those 11 studies that attempted to address missing data, eight used mean imputation and last observation carried forward techniques, and three studies used MI. The studies that used MI ignored the hierarchical structure of the CRT data. Such single-level MI approaches may be inadequate because, for example, the prevalence of missing end point data may differ according to individual and cluster level characteristics (e.g. cluster size).

Although multilevel MI approaches have been proposed for handling missing data with a clustered structure (Carpenter and Goldstein, 2004; Schafer and Yucel, 2002), no previous study has developed methods for handling missing hierarchical data in complex settings, such as those seen in CEAs that use cluster trials. Further research is required to provide appropriate methods for addressing missing data in CEAs that use data from more complex study designs such as CRTs, and to contrast these approaches with the single-level MI and CC analysis approaches that are traditionally taken in CEAs.

The aim of this paper is to develop and illustrate an overall approach to analysing studies which have bivariate outcomes with one highly skewed end point, a clustered structure and missing data. We do this by using MI within a frequentist paradigm. At the same time, we explore the implications of failing to acknowledge relevant features of the set-up in the handling of the missing data: in particular the potential consequences of ignoring clustering in the imputation step, and departures from normality. We also compare the results that we obtain with those from an analysis restricted to those individuals with complete data.

In Section 'Motivating example: the ‘Psychological interventions for post-natal depression trial and economic evaluation’ study', we introduce our case-study which is a typical CEA that uses CRT data. In Section 'Substantive model', we develop a simple modelling framework for a clustered bivariate pair of continuous outcomes, one of which has a potentially non-normal distribution. Section 'Missing data' considers in detail the handling of missing data in this set-up and explores the use of multilevel MI for this problem. In Section 'Multiple-imputation estimates for the example data set', we compare the results that were obtained from a range of alternative strategies. We close with a discussion in Section 'Discussion'

2. Motivating example: the ‘Psychological interventions for post-natal depression trial and economic evaluation’ study

The ‘Psychological interventions for post-natal depression trial and economic evaluation’ study (which is known as the ‘PoNDER’ study) was a CRT evaluating an intervention for preventing postnatal depression (Morrell et al., 2009). It included 2659 patients who attended 101 primary care providers in the UK (general practices). The intervention comprised health visitor training to identify and manage patients with postnatal depression. Clusters were randomly allocated in a ratio 2:1 to intervention (treatment) or to receive usual care (control). This resulted in 63 clusters randomized to the intervention arm and 38 clusters allocated to standard treatment. As is common, this CRT had an unbalanced design; the number of patients who were recruited in each cluster varied widely (from 1 to 101 in the control group and from 1 to 81 in the treatment group).

Patients were followed up for 18 months with costs (in pounds sterling) and health-related quality of life recorded at 6-monthly intervals. This paper considers costs and health-related quality of life reported at 6 months. These health-related quality-of-life data were used to adjust life-years and to present quality-adjusted life-years (QALYs) gained over 6 months from baseline. Intracluster correlation coefficients ICC were moderate for QALYs (ICCq = 0.04), but high for costs (ICCc = 0.17). Whereas QALYs were approximately normally distributed, costs were right skewed. Fig. 1 shows the empirical distribution of the observed costs, with log-normal and gamma distributions superimposed.

Figure 1.

Histograms of costs at 6 months for all women with available data; image, log-normal distribution; image, gamma distribution

Baseline measurements were collected from mothers at 6 weeks post natally, for variables that were expected to be prognostic for either cost or effectiveness end points. One cluster in the control group withdrew from the study. Table 1 reports the percentage of observations with missing data for the remaining 100 clusters, by treatment group. For each baseline variable, fewer than 2.5% of participants had missing data, but a relatively high proportion of individuals had missing data for the cost end point; 30 clusters were without any observed cost data (14 in the control arm).

Table 1. Description of missing data in the case-study, by treatment group
Variable Type Symbol Results for control group Results for intervention
(total n = 911) group (total n = 1730)
  Missing n % Missing n %
Outcome variables
CostContinuous inline image 40241.146026.6
QALYContinuous inline image 394.3593.4
Baseline variables
Edinburgh postnatalContinuous inline image 0000
depression scale      
EthnicityBinary inline image 0000
Economic statusBinary inline image 0000
AgeContinuous inline image 10.100
English as first languageBinary inline image 0000
Living aloneBinary inline image 91.070.4
Partner's economic statusOrdinal inline image 70.8100.6
BenefitsBinary inline image 192.1382.2
History of depressionBinary inline image 50.560.3
Any major life eventsBinary inline image 80.990.5
Relationship with babyOrdinal inline image 121.3201.2

The original CEA presents incremental QALYs and costs as the differences in means, between the treatment and control groups, based on CCs (70 clusters; 1732 women), and with no adjustment for baseline covariates (Morrell et al., 2009b). Cost effectiveness is then reported as the incremental net monetary benefit INB (see equation (6) in Section 'Substantive model' for a definition).

To simplify the exposition, we restrict our analyses to those individuals with positive costs, by excluding 18 observations with zero costs (15 in the treatment group). See Section 'Discussion' for further discussion.

3. Substantive model

We are principally concerned with estimating the linear additive effect of treatment on population mean costs and health outcomes with no additional covariate adjustment (Jones and Rice, 2011; National Institute for Health and Clinical Excellence, 2008). Statistical methods for performing overall cost effectiveness analyses need to allow for the correlation between costs and health outcomes (O'Hagan and Stevens, 2001), and for the fact that cost data are usually very skewed (Briggs and Gray, 1998).

Because of the simplicity of our set-up, we can model the data from the two treatment groups entirely separately, and then make the comparison. So, in what follows, we show the development for one treatment group; exactly the same arguments apply to the other.

Let inline image and inline image be the cost and QALY outcomes respectively from the jth patient in cluster i of a two-armed CEA alongside a CRT. We assume that the observations from different clusters are independent.

First, we introduce bivariate normal latent variables inline image to represent possible cluster effects for cost and QALYs respectively, with

display math(1)

where inline image and ρ are the variances and correlation of the two latent variables respectively.

We now build the bivariate substantive model on the expectations of the two outcomes, inline image and inline image, defined conditionally on the two cluster effects, first for cost,

display math(2)

with inline image the mean appropriate for the first treatment group, and then for QALYs, conditional on the costs and cluster effects,

display math(3)

with inline image the intercept for inline image for the first treatment group, and α the corresponding regression coefficient for the costs.

We now introduce distributions for inline image and inline image, conditional on the cluster effects. It is assumed that the conditional distribution of inline image given inline image is normal, with variance inline image. We consider three possible distributions for inline image: normal, log-normal and gamma. Other distributions could of course also be considered if thought appropriate. The choice of the normal distribution is straightforward; the mean is given by equation (2), with some variance inline image say. The gamma alternative is introduced with a parameterization that implies that the coefficient of variation, √η say, is constant across clusters; in contrast with the normal distribution which implies constant variance. For inline image, the conditional mean as given in equation (2), the chosen gamma density can then be written

display math(4)

To maintain comparability with the gamma distribution, we introduce the log-normal distribution with a somewhat unusual parameterization, in which the coefficient of variation is again constant across clusters. This gives

display math(5)

We assume that, conditional on the cluster effects, inline image is independent of inline image for inline image, and so the required joint density, still conditional on the cluster effects, can be obtained from the product of the densities for inline image and inline image.

Finally, to obtain the marginal likelihood for the data for one treatment group, it is then necessary to combine this joint density over all relevant patients, and then to integrate over the distribution of the cluster effects. This needs to be done numerically. There are several approaches for this; here we have used adaptive Gaussian quadrature as implemented in SAS procedure NLMIXED. We provide sample code for this in Appendix A.

Using conventional likelihood procedures we can then obtain estimated means for cost and QALYs (inline image and inline image say) for treatment groups k=1,2 respectively, together with their estim ated variances and covariances. Note that the separate modelling steps for the two treatment groups implies that estimates are independent between groups. The increments between the two groups are then estimated as inline image and inline image.

The relative cost effectiveness of treatment 2 against treatment 1 can be summarized by the incremental net monetary benefit defined as

display math(6)

for λ, a given threshold willingness to pay for a unit of health gain. Its standard error can be calculated from the estimated variances and covariances of inline image and inline image in the usual way.

4. Missing data

4.1. Handling the missing data

It is well known that missing data can be the source of selection bias, and we can rarely construct analyses in which we can be confident that such bias has been eliminated. Rather, we use what information is available both in the data and in the substantive setting in an attempt to reduce potential bias. Following this, carefully targeted sensitivity analysis can play a valuable role. There are many ways in which analyses can attempt to deal with missing data, and in which sensitivity analysis can be constructed; see for example Little and Rubin (2002) and Molenberghs and Kenward (2007).

One important source of information that can be used potentially to reduce bias is contained in observed variables that are associated both with the outcome and with the missing value process itself. If these variables are not part of the substantive model, they are termed auxiliary variables in the missing value context.

There are several potential auxiliary variables in the current setting, and we shall use an approach which can incorporate them. To explain the intended role of these variables we need to introduce some definitions due to Rubin (1976). We use these in a fairly loose way here; more formal expositions can be found in Little and Rubin (2002) and Molenberghs and Kenward (2007), and in the references given there. One important distinction here from Rubin's original definitions is our use of these terms in a frequentist framework, which implies rather stronger conditions than Rubin's likelihood-based definitions.

Let inline image be the pair of observations from subject (i,j) and define the random variable inline image to take the value 1 if inline image is observed and 0 if missing. We say that the missing data are missing completely at random if inline image and inline image are independent. By contrast, the data are missing at random if there are observed variables, contained in X say, such that inline image and inline image are conditionally independent given X. It can be seen that missingness completely at random implies missingness at random (MAR). We can reject the missingness completely at random assumption in favour of MAR if we see associations between observed variables and inline image, which is of course completely observed.

If neither missingness completely at random nor MAR holds, we say that the missing data are missing not at random. It is usually impossible to rule out missingness not at random in practice from the data at hand, because this depends critically on the existence of associations between unobserved variables and the inline image, which the observed data cannot exclude. It is this dependence between inline image and inline image that is the potential source of bias.

It is therefore usually sensible to try at least to reduce this dependence by identifying potential auxiliary variables from among those observed, and this will form the first step in handling the missing data. This will be done separately for the two outcomes, because it is entirely plausible that very different missing value mechanisms will operate with the two outcomes. We make the simplifying assumption that our auxiliary variables are completely observed. This is not strictly necessary, and in principle the approach that is used here can be extended to the situation when they are not, but for our present purposes the restriction to complete variables permits a simpler exposition.

4.2. Multiple imputation

Having identified potential auxiliary variables, it is necessary to incorporate them in the analysis. If these variables were part of the substantive model, we could simply include them and so condition on them, and in this way reduce or remove the unwanted dependence between inline image and inline image. But these auxiliary variables are not in the substantive model because the target of inference is the overall population mean treatment effect, so this route is not available to us. Alternatively, we could construct an overall joint model in which these auxiliary variables are included as additional outcome variables. In the current setting this is awkward, although not infeasible, because of the clustered structure.

We instead choose to use MI (Rubin, 1978; Kenward and Carpenter, 2007). This has the advantage of retaining the original substantive model, adding to this an imputation model. This is essentially determined by the conditional distribution of the missing data given the observed data, which we allow to differ between outcomes and treatment arms. In the present setting, in which we are only considering missing data in the outcome, the conditional model follows from the substantive model.

Let inline image denote the matrix of all auxiliary variables, including individual and cluster level variables. The single-level imputation model SL, which ignores clustering, can be written as

display math
display math

where inline image is the vector of regression coefficients corresponding to the auxiliary variables, for {l=1,2} for cost and QALYs respectively, and inline image is the individual level variance–covariance matrix. With single-level MI, the imputed values are drawn from the conditional distribution of the missing observations given the observed data, ignoring any dependence between observations within a cluster that is not explained by the cluster level auxiliary variables included in the model.

However, the observations from within one cluster are mutually dependent, and the conditional distribution of a missing value involves all the other observed values in the same cluster. Therefore, the single-level imputation model does not properly represent the conditional distribution of the missing data given the observed data.

Instead, the appropriate imputation model acknowledges the clustered structure of the data, by including a random-cluster effect in the imputation model (which is also called multilevel imputation, denoted ML):

display math
display math

where inline image is the cluster level variance–covariance matrix and the individual level residuals inline image are assumed normally distributed as before and independent of inline image, the cluster random effects.

Given the substantive and imputation models, conventional MI procedures can be followed. These are set out in detail for the single-level MI procedures in many references, including Little and Rubin (2002) and Molenberghs and Kenward (2007). The overall multilevel MI procedure is as follows.

  • Step 1: sample the cluster effects inline image and inline image from their posterior distributions conditional on inline image, inline image and inline image and the observed data.
  • Step 2: the imputation model is fitted to the observed data and Bayesian draws are taken from the posterior of the model parameters.
  • Step 3: the missing data are imputed from the imputation model, using the parameters drawn in step 1.
  • Step 4: the substantive model is fitted (here by using maximum likelihood) to the data set that has been completed using the imputations from step 2, producing parameter estimates and their estimated covariance matrix.
  • Step 5: steps 1–3 are repeated a fixed number, K say, of times.
  • Step 6: the K sets of parameter and covariance estimates from step 3 are then combined by using Rubin's formulae (Rubin, 1987) to produce a single MI estimate of the substantive model parameters and associated covariance matrix.

Under the MAR assumption, this will produce consistent estimators and, in the absence of auxiliary variables, is asymptotically (as K increases) equivalent to maximum likelihood.

For the current analysis, the whole MI procedure has been done separately for the two treatment groups. We need the facilities to make the required Bayesian draws from the imputation model as described by steps 1–5 above, which is bivariate and includes a cluster random effect. This can be performed in a variety of MI software (see Section 'Discussion' for a discussion).

As current implementations assume a multivariate normal distribution for continuous variables, it is recommended that the imputation step is undertaken on the logarithm of the costs, transforming back to the original scale in step 4 above (Schafer, 1997). The approximation that is implied under the log-normal and gamma substantive models is unlikely to be critical.

4.3. Specifying the imputation model for the case-study

To investigate the associations of observed variables with the inline image, it is natural to use logistic regression, in this example with and without random-cluster effects. This has been done separately here for cost (l=1) and QALYs (l=2) and also for each treatment group (k=1,2).

In addition to the patient level covariates that were described, we added the cluster level variable cluster size which is inline image, defined as the number of participants who are randomized in each cluster. Previous studies suggest that cluster size may be associated with costs or health outcomes (Campbell et al., 2000; Omar and Thompson, 2000; Neuhaus and McCulloch, 2011). We also consider that the number of participants who are recruited in each cluster may be associated with missingness. In the ‘PoNDER’ case-study, because clinical protocols were less restrictive in the control than in the treatment group, it was expected that any relationship between the cluster size and the end points would be stronger in the control group.

In the control group, before allowing for clustering, ethnicity, economic status and cluster size were associated with missing costs, but after including a random effect to allow for clustering no covariates were associated with missing costs. Cluster size and the depression scale variable epds seem to be associated with unobserved QALYs. In addition, epds and cluster size inline image are associated with costs, and economic status, ethnicity and epds are associated with QALYs. In the treatment group, cluster size was seen to be associated with missing cost at the individual level, while adjusting for clustering resulted in economic status being predictive of missing costs. Only age is predictive of QALYs’ missingness, both ignoring and accounting for clustering. In addition, epds is associated with the value of both cost and QALYs.

The MI algorithm that is implemented here assumes that all variables included in the model are multivariate normally distributed. We exploit this by choosing the same imputation models for both outcomes, adding all auxiliary variables which seem associated with either end point and their missingness, and modelling the two outcomes simultaneously. The imputation models that were chosen are summarized in Table 2.

Table 2. Single-level (ignoring clustering) and multilevel (accounting for clustering) imputation models used for the cost and QALY end points in the case-study†
Type Model Control group Intervention group
  1. †Models that included a cluster level auxiliary variable are indicated by ‘C’.

Single levelSL inline image inline image
SL–C inline image inline image
MultilevelML inline image inline image
ML–C inline image inline image

5. Multiple-imputation estimates for the example data set

The imputation models that ignore clustering (models SL and SL–C) were implemented with the ice command in Stata (by chained equations), whereas the multilevel imputation models (models ML and ML–C) used multivariate normal Markov chain Monte Carlo algorithms implemented in MLwiN mi macros.

For each imputation model in Table 2, we obtained five imputed data sets. (Rubin (1987) showed that the relative efficiency of an estimate based on K, rather than an infinite number of completed data sets, is approximately inline image, where p is the proportion of missing information. So, with 50% missing information, an estimate based on K=5 completed data sets has a standard deviation that is about 5% wider than that of an estimate that is based on infinitely large K.)

Fig. 2 highlights the effect that accounting for clustering in the MI model can have on the distributions of ‘imputed’ values. It shows imputed cost data for the six clusters in the control arm with the highest number of observations with missing cost data. Fig. 2 contrasts data imputed after applying the single-level imputation models versus the multilevel imputation which included cluster size as an auxiliary variable. The cost distribution appears somewhat less clustered after the single-level imputation than after multilevel imputation.

Figure 2.

Difference in the ‘spread’ of imputed data, depending on whether the imputation model ignored (image) or accounted for (image) clustering (model SL versus model ML): distribution of costs in the six control clusters with the highest number of missing values in the original data set and including observed and imputed data points

The five multiply imputed data sets were each analysed with the three substantive models defined in Section 'Substantive model', i.e. random cluster effects models with bivariate normal (model N–N), log-normal–normal (model L–N) and gamma–normal (model G–N) distributions. Table 3 reports the MI estimates for mean cost and QALYs by treatment arm and for comparison also includes estimates from CCs.

Table 3. Mean (with standard errors in parentheses) costs and QALYs, and estimated correlations between the two end points in each treatment arm, by MI approach and choice of bivariate substantive model
Missing data approach Mean costs (£) for the control Mean costs (£) for the intervention
pounds group and the following models: group and the following models:
Model Estimate
   N–N L–N G–N N–N L–N G–N
CCMean cost273.3286.6277.2256.8258.5254.6
Mean QALYs0.0270.0270.0270.0300.0300.030
SLMean cost 295.0299.1295.00251.4253.1251.3
Mean QALYs0.0270.0270.0270.0300.0300.030
SL–CMean cost268.1275.5270.4257.2257.5255.4
Mean QALYs0.0260.0260.0260.0300.0300.030
MLMean cost264.6265.0262.5256.9257.9255.1
Mean QALYs0.0260.0260.0260.0300.0300.030
ML–CMean cost270.0280.6275.0262.1262.5259.5
Mean QALYs0.0260.0260.0260.0300.0300.030

Table 3 shows that, as expected, the standard errors for both end points are larger for the control than for the treatment group. It is also clear that ignoring the hierarchical structure of the data in the imputation model results in different point estimates for the mean cost, especially in the control arm. For all approaches, the estimated correlations between cost and QALYs are small.

We use the estimates from Table 3 to obtain incremental costs, QALYs and INBs for a willingness to pay λ of £20 000 per QALY. These are reported in Table 4.

Table 4. Estimated incremental cost, QALYs and INB, according to choice of MI approach and bivariate substantive model†
  Model Estimates (£) for the following models:
   N–N L–N G–N
  1. †Standard errors are given in parentheses.

Incremental costCC−16.5 (27.9)−28.1 (25.8)−22.5 (27.00)
δc SL−43.6 (19.5)−46.0 (19.4)−43.7 (21.2)
SL–C−10.9 (20.9)−18.0 (19.2)−15.0 (20.1)
ML−7.7 (34.1)−7.1 (28.8)−7.4 (31.8)
ML–C−7.9 (25.2)−18.1 (26.4)−15.5 (26.4)
Incremental QALYsCC0.003 (0.002)0.003 (0.002)0.003 (0.002)
δq SL0.004 (0.001)0.004 (0.001)0.004 (0.001)
SL–C0.004 (0.001)0.004 (0.001)0.004 (0.001)
ML0.004 (0.001)0.004 (0.001)0.004 (0.001)
ML–C0.004 (0.001)0.004 (0.001)0.004 (0.001)
INBCC76.5 (43.7)81.3 (41.6)75.3 (43.5)
SL117.0 (34.0)117.4 (34.9)117.9 (34.0)
SL–C82.6 (33.8)94.0 (32.4)90.8 (33.3)
ML82.7 (42.8)82.5 (38.6)82.6 (41.2)
ML–C84.5 (38.0)96.0 (38.2)93.0 (38.5)

Table 4 shows that the estimates of incremental cost, incremental QALYs and INB are relatively insensitive to the choice of cost distribution. In addition, for the incremental QALYs, where there are few missing values and the ICCs are low, the estimates and their standard errors are virtually identical following each missing data approach.

However, inferences about the estimated incremental costs and the INBs differ somewhat depending on the approach for handling missing data. Firstly, our CC estimates are very close to those reported in Morrell et al. (2009b), which found an estimated incremental QALY equal to 0.002 whereas incremental costs were 19.97. These estimates are likely to be biased, as the missingness mechanism is probably not missingness completely at random and the substantive model is not adjusting for any covariates. Single-level MI approaches produce smaller standard errors than those obtained with multilevel MI and CCs. This is because cost has a large ICC and we are looking at a between-cluster estimator. As a consequence, there is an increased risk of type I error, regardless of the choice of cost distribution that is used for the substantive model.

Moreover, ignoring informative cluster size in the multilevel imputation model increases the magnitude of the estimated standard errors. This cluster level covariate (cluster size) is associated with cost and with cost missingness, and so excluding it from the imputation model reduces the precision of the estimate, as information is lost. By contrast, including cluster size in the single-level imputation model results in point estimates for the incremental cost which are similar to those following multilevel MI, although estimates for the standard errors are still smaller than the corresponding multilevel MI estimates.

Fig. 3 shows the INB (with 95% confidence interval) at alternative thresholds of willingness to pay for a QALY gained. With the single-level MI, INB and the 95% confidence interval are positive throughout, indicating that the treatment is cost effective. For the multilevel MI that acknowledges informative cluster size, the 95% confidence intervals around INB are wider and include zero at realistic thresholds for a QALY gained. Although, for both approaches, INB remains positive throughout, the single-level MI approach appears to overstate both the absolute level of INB, and the precision surrounding the estimate.

Figure 3.

MI estimates of mean incremental net benefit from model ML–C (image) with 95% credible interval (image) versus model SL (image) with 95% credible interval (image), using the bivariate gamma–normal substantive model

Hence, in the case-study, once a more appropriate approach is taken to handling the missing data, it is less certain that the intervention is cost effective, even though the substantive cost effectiveness conclusion still favours the intervention.

6. Discussion

This paper provides a principled approach to handling missing data with complex structures, exemplified by CEAs that use data from CRTs. The approach proposed follows the general principle that the imputation model should reflect the structure of the analytical model. In the context of cluster trials, just as the substantive model can account for clustering with a random-effects model, so must the imputation model. Moreover, because the analytical models that are typically used in CEAs estimate the linear additive effects of treatment on mean costs and health outcomes without covariate adjustment, MI has particular appeal in this setting. By separating the imputation and substantive models, information on those auxiliary variables, such as baseline patient characteristics, associated with missingness and the end points of interest can and should be used, without the analyst having to modify the substantive model.

Our study highlights that a single-level imputation model can underestimate the uncertainty surrounding the estimates of interest. More generally, Taljaard et al. (2008) showed that MI approaches that ignore clustering can increase type I errors. Another common approach to handling missing data in CRTs is to include cluster as a fixed effect in a single-level imputation model (White et al., 2011; Graham, 2009), but this does not produce an imputation model that properly captures the conditional distribution of the missing data given the observed. Indeed, including cluster as a fixed effect represents the limiting case where ICC tends to 1, and does not reflect the variability of the imputed values. The simulation study by Andridge (2011) found that including cluster as a fixed effect in the imputation model can overestimate the variance of the estimates, especially when ICCs are low, and there are few clusters. Both features are common in our setting; a recent review found that, out of 63 published CEAs alongside cluster trials, 40% had fewer than 15 clusters per treatment arm, and a third reported ICCs of 0.01 or less for health outcomes (Gomes et al., 2012b).

Like most published CEAs that use data from cluster trials, the primary analysis of the ‘PoNDER’ case-study (Morrell et al., 2009b) undertook CC analysis. Our reanalysis suggests that, in this example, the conclusion that the intervention is relatively cost effective remains after adopting single-level or multilevel MI approaches. This is not always so for other examples (Briggs et al., 2003), and the information that is gained about the robustness of the cost effectiveness of the intervention must be of value to policy makers and other stakeholders. In the case-study, when we adopted a single-level MI approach that completely disregarded clustering (model SL), the resultant point estimates and uncertainty intervals differed somewhat from those from multilevel MI, or CC analysis. This single-level MI approach ignored the information from a cluster level covariate, cluster size, which was important in predicting both missingness and the cost end points and had a differential effect on cost according to the randomized treatment.

More generally when cluster level covariates are associated with the missingness patterns and the value of the outcomes to be imputed, then including them even in a single-level imputation model may provide more accurate point estimates than MI or CC approaches that ignore clustering. This is because including covariates that predict dependence on cluster effectively reduces ICC. However, unless such covariates fully explain the between-cluster variance, such single-level MI approaches would still overstate precision. Indeed, in our example, the standard errors for the incremental effects after the single-level MI approaches are smaller than following the multilevel MI. Hence, for future CEAs that use cluster trials we propose imputation models with random effects for clusters, rather than the single-level MI or CC analyses that are currently adopted.

A general challenge in CEA is choosing appropriate statistical models for costs, which tend to have right-skewed distributions. The bivariate models that were developed here use marginal log-likelihoods for one outcome and conditional log-likelihoods for the other, by expressing the relationship between the two responses as a linear regression (see the sample code that is provided in Appendix A). In principle, these models are generalizable to allow mixed distribution log-likelihoods, provided that the conditional likelihood of the dependent outcome is known explicitly and can be optimized. The advantage of this approach is that, by parameterizing the density according to the coefficient of variation, and maximizing the log-likelihood obtained, we avoid log-transforming and retransforming costs in the presence of heteroscedasticity (Manning, 1998; Duan, 1983; Mullahy, 1998; Manning and Mullahy, 2001). We consider three cost models that make alternative distributional assumptions but keep an essentially log-normal imputation model throughout, and we use standard optimization routines to obtain maximum likelihood estimates. Our findings suggest that assuming a different distribution for the imputation versus analytical model appears to have little effect, whereas the choice of whether or not the imputation model accounts for clustering can be important.

A previous barrier to adopting principled MI approaches for hierarchical data was the lack of available software, but this is no longer so. There are now three options for performing multilevel MI based on multivariate normal Markov chain Monte Carlo algorithms: PAN (Schafer and Yucel, 2002) which is available as an R package (R Development Core Team, 2011), the mi macro (Carpenter and Goldstein, 2004) which operates within MLwiN (Rasbash et al., 2011) and can handle up to four hierarchical levels and binary variables, and REALCOM impute macros (Carpenter et al., 2011), which can also handle categorical variables and cluster level variables with missing data.

The approach that is presented in this paper has some limitations. For simplicity, we assume that the missing data mechanism is MAR throughout. This assumption cannot be tested from the data at hand. Commentators have suggested that, in CEA, the probability of non-response for cost and health outcome data may be conditional on the level of the end points themselves (Briggs et al., 2003; Oostenbrink and Al, 2005). For example, in our case-study, it may be possible that women with lower health-related quality of life are more likely to have missing outcomes, so the data may be missing not at random. As the intervention has been found to be cost effective across all the analyses that were considered here, it is unlikely that small departures from MAR would affect inferences. However, it is not clear how large departures from MAR have to be to invalidate the results of an MAR-based analysis effectively. The data collection methods that were used in the case-study make a strong missingness not at random mechanism unlikely. Nevertheless, there are examples in the literature where results have been shown to be sensitive to the assumed missing data mechanism (Kaambwa et al., 2012). A sensitivity analysis is therefore recommended, and MI provides a flexible and convenient route for investigating the influence that alternative missingness not at random mechanisms have on the conclusions (e.g. Carpenter et al. (2007)). In principle standard procedures should apply without much modification, though implementation would require extensions to the current methodology.

An additional advantage of MI, which this case-study could not exploit, is that the imputation model may include post-randomization variables that are associated with missingness and end points, which should not be included in the substantive model.

A further concern is that the imputation and analytical models may make incorrect distributional assumptions. Simulation studies by Schafer (1997) have shown that MI can be fairly robust to model misspecification, but their simulation settings did not include multilevel structures; Yucel and Dermitas (2010) recently investigated the effect of misspecifying the multilevel imputation model but focused on violations of the distributional assumptions for the random effects. They found that, when the imputation model has sufficient auxiliary variables, inferences are insensitive to non-normal random effects, unless the rates of missingness are very high or the sample size is small. They obtained similar results when the assumption that level 1 residuals were normally distributed was violated.

Although we propose a general approach to handling missing data in cluster trials, it is illustrated through a single case-study which cannot represent all the circumstances that are faced by CEAs that use CRTs. There may be circumstances when the data display structures that are quite different from those considered here, e.g. a high proportion with zero costs (Mullahy, 1998), QALYs with highly irregular distributions (Basu and Manca, 2012) or many auxiliary variables available.

This paper suggests further extensions. Here, we combine multilevel MI with a multilevel model estimated by maximum likelihood, but there may be circumstances where it would be advantageous to combine multilevel MI with multilevel modelling estimated by Bayesian Markov chain Monte Carlo methods (Lambert et al., 2005), e.g. when synthesizing evidence across multiple sources (Welton et al., 2008) or indeed adopt a fully Bayesian approach to handling the missingness and specifying the analytical models (Mason et al., 2012).

Future simulation studies could be useful in contrasting the relative performance of the alternative approaches across a broad range of settings including those where there are a high proportion of observations with zero costs, health outcomes with irregular distributions and few clusters. Clearly, ignoring clustering in the imputation model will have less effect as ICC decreases. One way of reducing the outstanding variation at the cluster level within a single-level imputation model is to introduce more cluster level covariates. Further work is needed to assess under what circumstances this simple MI approach would provide reliable inferences.

Finally, it would also be useful to extend the approaches to handling missing data to other settings with hierarchical data. These could include trials with repeated measures over time, studies with a high proportion of zero costs, censored costs or non-randomized studies where covariate adjustment is required.


We are grateful to Jane Morrell (PI) and Simon Dixon for permission to use, and for providing access to, the PoNDER data. We thank James Carpenter, Simon Thompson, Richard Nixon, John Cairns, Manuel Gomes and Edmond Ng for helpful discussions. KDO was supported by a National Institute for Health Research Research Methods Fellowship while at Queen Mary University of London, and RG was partly funded by the UK Medical Research Council.

Appendix A: Implementation in SS

We have developed a method that allows us to exploit the optimization of general likelihood functions available in SAS procedure NLMIXED. Briefly, we duplicated the data and created an indicator variable for the first copy of the data, flag=1. We then used an ‘if’ statement indicating that we wished to estimate cost parameters if flag=1.

With this method, we could use marginal expressions of corresponding log-likelihood to estimate parameters for costs using in turn either a normal, log-normal or gamma log-likelihood, the last two parameterized by the coefficient of variation. We use Gauss–Hermite quadrature, with 70 quadrature points, and the Newton–Raphson maximization technique to estimate the maximum likelihood parameters. As likelihood maximization is sensitive to the initial parameters that are chosen in the NLMIXED model, we ran this twice, using different initial values, to ensure that optimization had achieved con vergence.

Here is some sample SAS code:

proc nlmixed data=ponder2copies method=Gauss qpoints=70 cov corr;

title ‘‘Control group bivariate LognormalNormal with 2 Cluster Effects′′;

where group=0; x=cost; y=qalygain;

parms b0=268 c0=0.27 a=3 lsyx=7 lnsc=8 lnse=2 r=0.01;

varyx=exp(lsyx); mux=b0+u1;

muyx =c0+( varyx /(cv* mux** 2))* x+u2;

var =log(cv+1); mu =log(mux)− (1/2)* var;

if (flag=1) then

ll= ((− 1/2)* log(2* constant ( ‘pi ’ )* var ))− log( x )− ((1/(2* var ))* (log( x)− mu )** 2);


ll =− (1/2)* log(2* constant( ‘pi ’ ))− log( varyx)− (ymuyx)* (ymuyx)/(2* varyx);

if (flag=1) then z = x; else z = y;

model z ∼ general(ll);

random u1 u2 ∼ normal([0,0],[exp(lnsc), r, exp(lnse)]) subject=cluster;

estimate ‘‘my′′ c0 +(varyx /(cv* b0** 2))* b0;run;