SEARCH

SEARCH BY CITATION

Keywords:

  • Bias analysis;
  • Cox model;
  • Omitted covariates;
  • Sensitivity analysis;
  • Survival analysis;
  • Unmeasured confounding

Abstract

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Bias Formulae
  5. 3. Bias Analysis
  6. 4. Sensitivity Analysis
  7. 5. Real Examples for Sensitivity Analysis
  8. 6. Discussion
  9. 7. Supplementary Materials
  10. Acknowledgements
  11. References
  12. Supporting Information

Summary

Omission of relevant covariates can lead to bias when estimating treatment or exposure effects from survival data in both randomized controlled trials and observational studies. This paper presents a general approach to assessing bias when covariates are omitted from the Cox model. The proposed method is applicable to both randomized and non-randomized studies. We distinguish between the effects of three possible sources of bias: omission of a balanced covariate, data censoring and unmeasured confounding. Asymptotic formulae for determining the bias are derived from the large sample properties of the maximum likelihood estimator. A simulation study is used to demonstrate the validity of the bias formulae and to characterize the influence of the different sources of bias. It is shown that the bias converges to fixed limits as the effect of the omitted covariate increases, irrespective of the degree of confounding. The bias formulae are used as the basis for developing a new method of sensitivity analysis to assess the impact of omitted covariates on estimates of treatment or exposure effects. In simulation studies, the proposed method gave unbiased treatment estimates and confidence intervals with good coverage when the true sensitivity parameters were known. We describe application of the method to a randomized controlled trial and a non-randomized study.

1. Introduction

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Bias Formulae
  5. 3. Bias Analysis
  6. 4. Sensitivity Analysis
  7. 5. Real Examples for Sensitivity Analysis
  8. 6. Discussion
  9. 7. Supplementary Materials
  10. Acknowledgements
  11. References
  12. Supporting Information

Treatment or exposure effects are commonly estimated from survival or other time-to-event data using the Cox model. The gold standard design for conducting such evaluations is the randomized controlled trial because randomization acts to balance measured and unmeasured confounders. Although it is common for researchers to present unadjusted analyses, it is recommended to adjust proportional hazards models for all measured covariates in randomized studies to maximise power to detect treatment effects (Hernandez, Eijkemans, and Steyerberg, 2006). Gail, Wieand, and Piantadosi (1984) derived asymptotic formulae for the bias in estimates of treatment effects when balanced covariates are omitted from the Cox model. It was shown that when censoring is moderate, the Cox model yielded more biased estimates of treatment effect than analysis with the exponential model.

In practice, randomized experiments may be difficult to conduct for reasons of cost, logistics or ethics (Black, 1996). The increasing availability of electronic medical record databases and population-based studies is creating new opportunities for using observational data to assess the effect of medical treatments and exposures, Ghani et al., 2001;, 2009. A major challenge in using clinical databases in this way is addressing the potential bias introduced due to unmeasured differences between the treatment groups(Klungel et al., 2004). Lin, Psaty, and Kronmal (1998) presented approximate formulae for the bias due to omission of a binary or continuous confounder when estimating treatment effects from censored survival time data using the Cox model. The bias formulae were used as the basis for a method of conducting sensitivity analysis to assess how the point and interval estimates of the treatment effect vary under a range of assumptions about the unmeasured confounder. The idea behind this approach is that the plausibility of the estimated treatment effects will increase if the inferences are insensitive over a wide range of relevant scenarios.

In this paper, we develop a general framework for estimating bias and conducting sensitivity analysis when covariates are omitted from the Cox model. Formulating the problem more broadly than previous work, we consider the combined influence of three different sources of bias: (1) bias due to omitting a balanced covariate; (2) bias due to censoring; (3) bias due to the missing covariate being a confounder. The proposed approach is applicable to both randomized trials and observational studies, and provides explicit formulae for arbitrary distributions of measured and unmeasured confounders. We consider the general case in which the censoring distribution can depend on treatment or other covariates. The treatment variable can be either a binary or continuous exposure.

The paper is organized as follows. Asymptotic bias formulae, derived from the large sample properties of the partial maximum likelihood estimators, are presented in Section 2. Simulation studies conducted to investigate the accuracy of the bias formulae and to characterize the impact of the different sources of bias are presented in Section 3. Section 4 discusses how the bias formulae can be used to develop a new method of sensitivity analysis for treatment effects in proportional hazards models. The method is applied to data from a randomized controlled trial and a non-randomized study in Section 5.

2. Bias Formulae

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Bias Formulae
  5. 3. Bias Analysis
  6. 4. Sensitivity Analysis
  7. 5. Real Examples for Sensitivity Analysis
  8. 6. Discussion
  9. 7. Supplementary Materials
  10. Acknowledgements
  11. References
  12. Supporting Information

We denote random variables by upper case letters and their values by lower case letters. Suppose inline image are K measured covariates with joint distribution inline image, and inline image are q unmeasured covariates with conditional joint distribution inline image. Let T and inline image represent the true event/failure time and possible censoring time respectively. We assume failure and censoring times are independent conditional on x (i.e., inline image). We observe inline image, where inline image, and inline image if inline image and 0 otherwise. The true hazard is assumed to be

  • display math(1)

where inline image is the baseline hazard function and inline image and inline image are coefficients for X and C, respectively. But since C is omitted, one is forced to fit the model

  • display math(2)

where inline image are the coefficients when C is missing. Let inline image be inline image independent replicates of inline image. Then the average partial log-likelihood based on (2) is

  • display math(3)

where inline image if inline image and 0 otherwise. It is shown in Web Appendix A that as inline image, the score function inline image has the limit

  • display math(4)

for inline image, where inline image inline image is the mean over the uncensored subjects, inline image is under the density inline image and inline image is the survival function of censoring time conditional on X. Inclusion of inline image allows the censoring distribution to depend on covariates.

The system of equations (4), inline image, relate inline image and inline image, and therefore the asymptotic biases inline image can be evaluated from them. The first-order Taylor series approximation is

  • display math(5)

2.1. The Distributions of Uncensored Subjects

Let

  • display math

be the uncensoring probability conditional on x and c, where inline image is the density of model (1) and inline image is the survival function of censoring time.

The density of the observed event times is then given by

  • display math

The mean of inline image for uncensored subjects is

  • display math(6)

2.2.

Lin et al. (1998) proposed bias formulae for survival analysis with unmeasured confounders based on the assumption of rare events (small inline image) or small inline image. For binary x, the proposed bias approximation is

  • display math(7)

The simulation of Lin et al. (1998) showed that (7) are good approximations when inline image was generated from the uniform inline image distribution and the censoring percentage is 90%.

Using the assumption of rare events and the simulation settings in Lin et al. (1998), Web Appendix B shows that Equation (4) reduces to a simple equation of inline image and inline image:

  • display math(8)

which leads to the formulae (7) when inline image and inline image or inline image. The Equation (8) therefore provides a general extension of the results in Lin et al. (1998) to arbitrary distributions of X and C.

3. Bias Analysis

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Bias Formulae
  5. 3. Bias Analysis
  6. 4. Sensitivity Analysis
  7. 5. Real Examples for Sensitivity Analysis
  8. 6. Discussion
  9. 7. Supplementary Materials
  10. Acknowledgements
  11. References
  12. Supporting Information

3.1. Bias Analysis for a Binary Treatment with a Single Omitted Covariate

We now show the asymptotic formula for the bias in the important special case of a single missing covariate C and a binary exposure variable X taking values 1 or 0 with probabilities p and inline image, respectively.

The Equation (4) leads to (see proof in Web Appendix C)

  • display math(9)

with

  • display math

where the expectations inline image and inline image are under inline image and

  • display math

respectively, and inline image is the ratio of uncensoring rates between control and treatment groups.

From (9), it can be seen that the relation between inline image and inline image mainly depends on three factors (corresponding to the three sources of bias): the effect of the missing covariate, inline image; censoring mechanism, inline image, inline image and inline image; and the ratio of conditional expectations, inline image. The latter ratio represents how much the density of C varies between inline image and inline image and, hence, measures the extent to which C is a confounder.

The bias is also affected by the cumulative baseline hazard function inline image. But if times are not censored, inline image is an exponential variable with the rate inline image and (9) reduces to

  • display math(10)

where inline image. As a result, the bias is independent of the form of inline image in the absence of censorship.

When inline image, C is not a confounder. In this case, Equation (10) shows that inline image and, consequently, the MLE of the Cox model is still biased even if C is a balanced covariate. Bretagnolle and Huber-Carol (1985) studied the bias in this case and showed that the estimated effect is biased toward zero as inline image increases. This is because the event times with inline image tend to zero as inline image and tend to inline image as inline image. Consequently the subjects with inline image cannot provide information about inline image in the limiting case. However, the subjects with inline image do still supply information about inline image and hence the limit of inline image as inline image is not zero for binary C. An illustration of this explanation is given in Web Figure 1.

Following (5), the first-order Taylor series approximation is

  • display math(11)

3.2. Accuracy of Asymptotic Formulae and Taylor Series Approximations

Figure 1 shows a comparison of the asymptotic and simulated biases and Taylor series approximation under the influence of different sources of bias. We generated 10,000 x from inline image. The confounder C was generated from inline image for the binary confounder, and from inline image for the normal confounder. The event times t were generated from model (1) with inline image, inline image and inline image taking 100 sequence values from inline image to 10. For the censoring cases, we let inline image with inline image. The observed times were given by inline image.

image

Figure 1. Comparison of simulated biases, asymptotic biases and first-order Taylor series approximations for different types of omitted covariate and censorship. Since inline image is the asymptotic value of the MLE inline image and the sample size=10,000 is large, we calculated the simulated bias by inline image. The asymptotic biases and Taylor series approximations were obtained from (9) and (11), respectively. Monte Carlo integration was used to approximate the expectations in formulae. (a) Binary confounder c: (inline image), censored; (b) Normal confounder c: (inline image), censored; (c) Binary confounder c: (inline image), censored; (d) Normal confounder c: (inline image), censored; (e) Binary balanced c: (inline image), uncensored; (f) Normal balanced c: (inline image), uncensored.

Download figure to PowerPoint

Figure 1 shows that the simulated and asymptotic biases are seen to agree closely, confirming that these asymptotic formulas adequately describe the biases. The accuracies of the Taylor series approximations decrease as inline image gets large, because the approximation error is of the order inline image.

For more modest values of inline image, for example inline image and inline image, the biases will have similar patterns but be shifted up as inline image (see Web Figures 2 and 3). In Web Figure 8, we let inline image and inline image to allow the distribution of censoring to depend on treatment group. The figure illustrates how different choices of censoring function can impact on the biases.

3.3. Bias of Omitting a Balanced Covariate in Randomized Studies

Figure 1e and f show the biases when a balanced covariate is omitted. It is clear that omission of a relevant covariate leads to biased treatment estimates for the Cox model, even in randomized studies.

The reason is that the parameters inline image and inline image are measuring different features of the population. When we model the hazard as

  • display math

the interpretation of inline image is the hazard ratio between inline image and inline image while the values of c are fixed. But in randomized studies (where we assume inline image), when we model the marginal hazard as

  • display math(12)

the interpretation of inline image is the hazard ratio between inline image and inline image while c is marginalized. Similarly, inline image is the hazard when inline image, and inline image is the hazard when inline image and c is marginalized. The superscript inline image emphasizes that they do not have the same interpretation.

When c is integrated out, the marginal hazards (12) for inline image and inline image are not proportional over time, and the MLE of inline image represents an average over time of the log marginal hazard ratios between inline image and inline image (Lin and Wei, 1989). Therefore, it will lead to bias if we use a marginal hazard ratio inline image to estimate a hazard ratio inline image. In randomized studies, as outlined in Section 'Bias Analysis for a Binary Treatment with a Single Omitted Covariate', usually inline image and inline image will attenuate to some limit between 0 and inline image as inline image.

3.4. The Limits of Biases as inline image

One phenomenon that can be noticed from Figure 1 is that all biases increase with inline image but always tend to some limits, no matter if C is a confounder or not. The reason is that the marginal hazard ratio has finite limits as inline image tends to inline image and inline image. For example, for inline image, the marginal hazard is

  • display math

The ratio, inline image, tends to inline image as inline image and

  • display math

3.5. The Effect of Censoring

Figure 2a shows the effect of censoring on the bias of omitting a balanced covariate. The event times were generated from (1) with inline image inline image and inline image. The possible censoring times inline image were simulated from uniform inline image with inline image.

image

Figure 2. The effect of overall censoring and confounding on bias: (a) biases of omitting a balanced covariate where inline image data are censored; (b) biases under different strengths of confounding, inline image and inline image when inline image data are censored.

Download figure to PowerPoint

Following the result A-4 in Web Appendix, the uncensoring probability can be written as

  • display math

where inline image is the density of possible censoring times.

Under the simulation settings, inline image and inline image. The probability of censoring conditional on x is thus

  • display math(13)

The values of inline image and inline image were then solved from (13) such that the probabilities of censoring were the same for inline image and inline image and could be inline image, inline image, inline image, inline image, and inline image. The number of event times n was fixed at 100,000 and the total sample size was inline image.

Figure 2a shows that censoring influences the bias in two different ways. The bias increases as the censoring percentage increases from inline image to inline image, but decreases as the censoring percentage increases from inline image to inline image. The bias is plotted for a wider range of censoring percentages in Web Figure 4.

The reason for this inconsistent effect of censoring is as follows: when the censoring percentage increases (0–50%) and inline image, the subjects with inline image, which provide most of the information about inline image, are likely to be censored, and consequently, the bias is increased. But as the censorship rate increases further (50–90%), almost all of the few events occur with inline image and almost all the times with inline image are censored. So nearly all the subjects supplying information about inline image have the same value of inline image (Chastang, Byar, and Piantadosi, 1988). If the sample size is sufficiently large, the bias will tend to zero as the censoring percentage tends to inline image. A similar explanation applies for inline image. An illustration of this explanation is given in Web Figure 5.

3.6. The Effect of Confounding

Of particular relevance to non-randomized studies, we considered the influence of different levels of confounding on the bias function when 50% of the data are censored (Fig. 2b). We generated inline image and consider three scenarios with inline image. The difference inline image represents the imbalance of the distributions of inline image between inline image and inline image and so measures the strength of confounding. As inline image increases, it can be seen that the estimate is biased upwards for inline image and downwards when inline image. For the case inline image, the bias would be affected in the other direction.

3.7. The Effect of Additional Measured Covariates

In practice, the analyst is likely to have access to additional measured covariates (possibly confounders) that would need to be adjusted for, in addition to the exposure variable X (and the unmeasured confounder, C).

Under the approach of Lin et al. (1998), an additional covariate Z does not affect the bias if the mean of C conditional on x and z is additive in x and z, that is inline image (VanderWeele, 2008). However, our simulation results in Figure 3a–c show that an additional covariate may introduce a small degree of bias when inline image is large. We generated 100,000 inline image and inline image. The additional covariate Z was simulated from inline image, inline image and inline image for Figure 3a–c, respectively. Under these data-generating processes, inline image and the additivity assumption is satisfied.

image

Figure 3. The effect of additional measured covariates on the simulated bias inline image: (a) inline image; (b) inline image; (c) inline image; (d) the effect of increasing the number of measured covariates on the simulated bias when inline image and inline image and 3.

Download figure to PowerPoint

A sample of 100,000 survival times was generated from inline image and inline image. The data were then fitted by the reduced model inline image. It can be seen that the bias is not impacted by the distribution of Z, but is affected by inline image, when inline image is large. The results were similar when we allowed censoring to depend on X and Z by assuming inline image (see Web Figure 9).

It is then natural to investigate the influence of more than one additional covariate when inline image is large. To simplify the problem, we examine the case where all the additional covariates are binary and independent of each other, with the same coefficient inline image. As the bias is only significant for large negative inline image, we set inline image. The results displayed in Figure 3d, show the bias increases slightly with the number of covariates and the increments are not linear.

4. Sensitivity Analysis

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Bias Formulae
  5. 3. Bias Analysis
  6. 4. Sensitivity Analysis
  7. 5. Real Examples for Sensitivity Analysis
  8. 6. Discussion
  9. 7. Supplementary Materials
  10. Acknowledgements
  11. References
  12. Supporting Information

The aim of our proposed method of sensitivity analysis is to assess how the point and interval estimators for inline image or associated P-value would change given clinically plausible values of the sensitivity parameters inline image and inline image.

4.1. Point Estimate

For a sample with inline image observed times inline image, of which n are uncensored inline image, from (4) we have the relation between inline image and inline image approximately relies on the equations inline image with

  • display math(14)

where the expectation inline image can be calculated analytically or approximately with respect to inline image.

Write inline image. Due to the functional invariance property of MLE, the point estimate of the true value inline image is then inline image. The function inline image and its inverse inline image relate inline image and inline image, and play a key role in sensitivity analysis.

The baseline survivor function inline image in (14) is estimated by solving

  • display math

where inline image is the Breslow (1972) estimator:

  • display math

The survival function of censoring can be also approximated by the Breslow (1972) estimator by considering events as “censored” observations and censored observations as “events” (Satten and Datta, 2001).

Table 1. Simulated bias of point estimates and coverage of 95% confidence intervals for the hazard ratio associated with treatment under two methods of sensitivity analysis, when censoring is moderate. The equations (14) and (16) were used to estimate inline image and its confidence interval in the last two columns
      unadjustedLin et al. (1998)inline image
     Fraction   
inline imageinline imageinline imageinline imageinline imageCensored (%)BiasCoverage (%)BiasCoverage (%)BiasCoverage (%)
1000.5610.10.9500.78390.0197−0.0495
 0.57 0.30.7510.3583−0.02910.0098
 0.58 0.50.551−0.0890−0.0890−0.0399
 0.3520.10.9501.383−0.04960.0796
 0.36 0.30.7500.4278−0.2187−0.0697
 0.35 0.50.550−0.2482−0.24820.01100
 0.2030.10.9501.710−0.12910.1591
 0.22 0.30.7490.3684−0.4072−0.1199
 0.21 0.50.550−0.4468−0.44680.00100
5000.5710.10.9500.760−0.02950.0095
 0.58 0.30.7500.3242−0.0690−0.0198
 0.57 0.50.550−0.1090−0.1090−0.0199
 0.3420.10.9501.270−0.1582−0.0492
 0.34 0.30.7500.4311−0.20700.0399
 0.34 0.50.550−0.3044−0.30440.01100
 0.2030.10.9501.650−0.18810.0489
 0.20 0.30.7500.3821−0.3828−0.02100
 0.21 0.50.550−0.484−0.4840.01100
10000.5710.10.9500.730−0.0493−0.0296
 0.57 0.30.7500.3011−0.07890.0199
 0.58 0.50.550−0.1080−0.1080−0.01100
 0.3420.10.9501.290−0.13800.0094
 0.34 0.30.7500.401−0.23410.0099
 0.34 0.50.550−0.307−0.307−0.01100
 0.2030.10.9501.650−0.18650.0390
 0.20 0.30.7500.400−0.3660.0099
 0.21 0.50.550−0.490−0.490−0.01100

4.2. P-Values

In many applications, we are interested in evaluating the evidence the data give about a null hypothesis inline image (for example, that a hazard ratio equals one). Using inline image, this null hypothesis is equivalent to inline image and the two-sided P-value is therefore

  • display math(15)

where inline image is the cumulative distribution function of inline image and inline image is the standard error of inline image.

4.3. Confidence Intervals

Since the distribution of inline image might be slightly skewed (see example in Web Figure 6), the traditional way of using standard error to calculate confidence intervals (CI) could be misleading. An alternative way is to construct CI by the highest density interval. To do this, we generate B bootstrap samples inline image from the multivariate normal distribution inline image, where inline image is the covariance matrix of inline image. The sample of the kth parameter inline image, inline image is then obtained from inline image for inline image. The highest density interval of inline image can be computed from the sample inline image by using the emp.hpd function in the R package TeachingDemos.

However, the bootstrap method may become computationally inefficient, when the dimension of inline image is high (e.g., inline image). We thus give an approximation by using the confidence bounds of inline image. Suppose we are interested in the parameter inline image and its confidence interval inline image. As shown in Section 'The Effect of Additional Measured Covariates', the effect of additional measured covariates is negligible. It means that the solution of inline image would not change appreciably if we ignore all the covariates except inline image in (14).

In addition, inline image is usually a monotonically increasing function of inline image in practice. Let inline image be the confidence interval of inline image. The lower bound inline image then can be estimated from the equation

  • display math(16)

Similarly, inline image can be obtained from the above equation by substituting inline image by inline image. Our simulation shows that this approximation is sufficiently accurate and very efficient.

4.4. Simulation Study

Lin et al. (1998) proposed a simple method for sensitivity analysis. Here we conducted simulation studies to compare their method with our approach.

Table 1 shows the biases of point estimators and coverage of inline image CIs in 1,000 simulation replications, when given the true inline image and inline image. To compare with the method of Lin et al. (1998), we used similar simulation settings to theirs: inline image, inline image, inline image, inline image and inline image was solved from (13) so as to ensure moderate levels of censorship (fraction censored was about 50%). It is clear that our proposed method gives almost unbiased point estimates and good coverage of confidence intervals. The method of Lin et al. (1998) gets worse as inline image increases, because it only addresses the bias attributable to confounding. The results for light (inline image) and heavy (inline image) censorships are presented in Web Tables 1 and 2, respectively. We note that both methods of sensitivity analysis gave biased treatment estimates when censoring was heavy and the sample size was small (inline image). However, since the accuracy of approximation (14) increases with the number of observed events, the proposed method is asymptotically unbiased irrespective of the censoring rate. The minimum sample size at which the method achieves approximately unbiased estimates increases with the censoring rate, and for a censoring rate as high as 90% is about inline image.

5. Real Examples for Sensitivity Analysis

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Bias Formulae
  5. 3. Bias Analysis
  6. 4. Sensitivity Analysis
  7. 5. Real Examples for Sensitivity Analysis
  8. 6. Discussion
  9. 7. Supplementary Materials
  10. Acknowledgements
  11. References
  12. Supporting Information

5.1. Vitamin and Minerals Trial

Ellis et al. (2008) conducted a randomized controlled trial assessing the effect of antioxidant and folinic acid supplementation on developmental outcomes for children with Down syndrome. Comparing infants allocated to folinic acid (inline image) with those who were not (inline image), the estimated hazard ratio for age of sitting was 1.25 (95inline image confidence interval 0.88–1.78). These results did not change appreciably after adjustment for area of residence, maternal ethnicity, birth weight, and social class.

We now assess the impact on the treatment estimates for age at sitting of assuming a binary confounder, c, has been omitted from the model, where inline image. As this is a randomized controlled trial and any random imbalance in the prevalence of the unmeasured confounder between treatment groups is likely to be small, we restrict inline image. Assuming the true prevalence of the omitted covariate for treatment groups combined is 0.5, the probability of a confounding effect inline image of more than 0.2 by chance is 0.02 for the trial sample of size inline image.

Figure 4a shows the sensitivity of the lower limit of the confidence interval for the hazard ratio of folinic acid to adjustment for an unmeasured binary covariate of specified properties, where we set inline image. For inline image, the difference in probabilities inline image must be inline image for the treatment effect to become significant. The same conclusion can be obtained from the contour plot in Web Figure 3 which shows results of a similar sensitivity analysis for the P-value of the treatment estimate. The results for antioxidant supplementation in Web Table 3 show that the treatment effect is significant only when inline image and inline image. Given the nature of the study design, the conditions required for the treatment effects to be significant are implausible, suggesting that the original findings of non-significance are robust to the presence of realistic levels of unmeasured confounding.

image

Figure 4. Contour plots of sensitivity analysis results: (a) the lower bounds of the 95% confidence intervals (use (16)) for the hazard ratio of folinic acid on age of sitting for children with Down syndrome; (b)the P-values (use (15)) for the two-sided test that the log-hazard ratio of deprivation score inline image.

Download figure to PowerPoint

A simulation study was conducted with similar sample size and censoring rates to the vitamin and mineral trial, providing support for the validity of the treatment estimates presented in the sensitivity analysis (see Web Table 4). However, we note that in this illustrative application, the width of the confidence intervals suggests the sensitivity analysis, in common with the original analysis, lacks power to establish statistical significance for small studies.

5.2. Leukaemia and Deprivation Study (Non-Randomized)

Henderson, Shimakura, and Gorst (2002) analyzed the effect of a social deprivation score X (where lower values indicate less affluent areas) on the time in years since diagnosis with acute myeloid leukemia to death (inline image). The estimated hazard ratio for a 1 point increase in x was 1.03 (P-value = 0.0012) after adjustment for age, gender and white blood cell count, indicating that prognosis is less good if the patient lives in a more deprived residential location.

We now consider a potential unmeasured binary confounder C, which affects both survival time T and the deprivation score X. We generated c from inline image, where inline image were solved from

  • display math

such that the marginal distribution is inline image and the desired inline image is obtained.

Figure 4b shows the sensitivity of P-value for different choice of inline image and inline image. It shows that even if the correlation is strong, that is inline image, the hazard ratio of the confounder needs to be inline image for the hazard ratio of x to become non-significant at the inline image level. It seems unlikely that such an important covariate would be missed, suggesting that the original finding of a significant effect of deprivation score is robust to the presence of realistic levels of unmeasured confounding.

A simulation study was conducted with the same sample size (inline image), covariate X and censoring rate (inline image) as this non-randomized study. To extend the range of scenarios considered, survival times were simulated assuming the true value of inline image was 0 (i.e., assuming the continuous exposure has no effect on survival). Here the emphasis was on comparing the extent to which the sensitivity analysis methods avoid false rejection of the null hypothesis inline image. The results are summarized in Web Table 5 and provide further support for the validity of the proposed formulae when applied to data from non-randomized designs.

6. Discussion

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Bias Formulae
  5. 3. Bias Analysis
  6. 4. Sensitivity Analysis
  7. 5. Real Examples for Sensitivity Analysis
  8. 6. Discussion
  9. 7. Supplementary Materials
  10. Acknowledgements
  11. References
  12. Supporting Information

We explored a general framework for assessing bias in treatment estimates from the Cox model with omitted covariates. Bias formulae based on asymptotic properties of the likelihood estimator were presented and validated in simulation experiments. The results showed that the confounding biases for censored survival data are typically complicated. However, the proposed approach made it possible to describe the influence of three different sources of bias: omission of a balanced covariate, data censoring and unmeasured confounding. Figure 5 characterises the sources of bias:

  1. In thei absence of a missing covariate, the bias curve remains at zero (the solid line); when a balanced covariate is omitted, the effect is underestimated to a limit as inline image increases (the dashed line).
  2. When the data are censored, the bias is maximized at inline image censoring but decreases with heavy censorship.
  3. When the missing covariate is a confounder, the shape of bias changes. If the association between x and c is positive, the limits increase on the right side but decrease on the left side, and hence the slope of bias increases. Conversely, if the association between x and c is negative, the limits decrease on the right side but increase on the left side.
image

Figure 5. An illustration of the influence of the different sources of bias when estimating binary treatment effects from the Cox proportional hazards model with an omitted binary covariate. (a) solid: no missing data, no bias; dashed: bias due to omitting a balanced covariate. (b) solid: bias due to omitting a balanced covariate; dashed: bias due to omitting a balanced covariate and censoring. (c) solid: bias due to omitting a balanced covariate and censoring; dashed: bias due to omitting a confounder and censoring.

Download figure to PowerPoint

Although the bias formula is applicable under a range of assumptions, this paper has focused on considering the simple case of a binary exposure and a single unmeasured confounder. Further simulation work showed that the bias increased slightly in the presence of one or more measured confounders for large values of inline image. The extension to multiple unmeasured confounders is straightforward. If there are several missing covariates inline image with coefficients inline image, then we can interpret c as the composite score, inline image, with inline image (Lin et al., 1998). Lin et al. (1998) also argue that the choice of a single unmeasured confounder is a less severe restriction when all the known confounders are adjusted for in the survival model.

The bias formula was used as the basis for proposing a new method to assess the sensitivity of estimates of treatment effects to omission of relevant covariates. Simulation experiments were conducted to compare the method with the approach of Lin et al. (1998), a special case of the proposed method when the rate of censoring is high. The method of Lin et al. (1998) has the benefit of ease of implementation, being based on a simple adjustment formula, but its relative performance deteriorates as the magnitude of inline image increases. In contrast, the simulations indicate that the proposed method can provide sufficiently unbiased treatment estimates, and associated confidence intervals with good coverage, over a wide range of scenarios, when the true sensitivity parameters inline image and inline image are known.

Sensitivity analysis is a flexible approach to addressing omission of covariates that makes it possible to assess the impact of ’clinically plausible’ levels of unmeasured confounding and other sources of bias on the treatment estimates (Groenwold, 2010). However, it does not provide a single precise estimate of treatment effectiveness nor does it help identify the nature of any bias from omitting covariates. A number of alternative strategies for tackling unmeasured confounding have been proposed that do attempt to provide explicit estimates of causal effects. An overview of these different methods was given in Aleyamehu et al (1996), including instrumental variables and the prior event rate ratio method (Tannen et al., 2009).

The method of sensitivity analysis proposed in this paper could be extended in a number of ways. First, incorporating adjustment for the propensity score into the sensitivity analysis would provide an efficient way of controlling for the effect of measured covariates (Rosenbaum, 1991). Other possible developments include consideration of specific distributional forms (both univariate and multivariate) for the unmeasured confounder(s) to provide special cases of the generic bias formulae for a wider range of common confounding models.

Omission of relevant covariates is a common source of bias when estimating treatment or exposure effects from survival data. Although we cannot directly adjust for unmeasured covariates, their potential impact can be assessed by means of sensitivity analyses. Indeed, Groenwold et al. (2010) argue that all analyses of causal associations in observational data should include an assessment of robustness to unmeasured confounding. The current study provides new tools for conducting sensitivity analysis for survival outcomes, with applicability to both randomized controlled trials and observational studies. Implementation of the methods requires numerical evaluation of the appropriate bias formulae. This can be achieved using Monte Carlo methods and illustrative R code is available on request from the authors.

7. Supplementary Materials

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Bias Formulae
  5. 3. Bias Analysis
  6. 4. Sensitivity Analysis
  7. 5. Real Examples for Sensitivity Analysis
  8. 6. Discussion
  9. 7. Supplementary Materials
  10. Acknowledgements
  11. References
  12. Supporting Information

Web appendices, tables and figures referenced in Sections 2, 2.2, 3.1, 3.2, 3.5, 4.3, 4.4, 5.1 and 5.2 are available with this paper at the Biometrics website on Wiley Online Library.

Acknowledgements

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Bias Formulae
  5. 3. Bias Analysis
  6. 4. Sensitivity Analysis
  7. 5. Real Examples for Sensitivity Analysis
  8. 6. Discussion
  9. 7. Supplementary Materials
  10. Acknowledgements
  11. References
  12. Supporting Information

We thank Prof. Robin Henderson for providing the leukaemia and deprivation data. We are grateful for the helpful comments of the editor, associate editor and two referees. This research was funded by the Medical Research Council [grant number G0902158]. William Henley and Stuart Logan were supported by the National Institute for Health Research (NIHR) Collaboration for Leadership in Applied Health Research and Care (CLAHRC) for the South West Peninsula. The views expressed in this publication are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health.

References

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Bias Formulae
  5. 3. Bias Analysis
  6. 4. Sensitivity Analysis
  7. 5. Real Examples for Sensitivity Analysis
  8. 6. Discussion
  9. 7. Supplementary Materials
  10. Acknowledgements
  11. References
  12. Supporting Information
  • Black, N. (1996). Why we need observational studies to evaluate the effectiveness of health care. British Medical Journal 312, 12151218.
  • Breslow, N. E. (1972). Discussion of the paper by D. R. Cox. J R Statist, Soc B 34, 216217.
  • Bretagnolle, J. and Huber-Carol, C. (1985). Sous-estimation des contrastes due a l'oubli de variables pertinentes dans le modele de Cox pour des durees de survie avec censure. C.R.A.S. 300, 359363.
  • Chastang, C., Byar, D., and Piantadosi, S. (1988). A quantitative study of the bias in estimating the treatment effect caused by omitting a balanced covariate in survival models. Statistics in Medicine 7, 12431255.
  • Ellis, J. M., Tan, H. K., Gilbert, R. E., Muller, D. P., Henley, W. E., Moy, R., Pumphrey, R., Ani, C., Davies, S., Edwards, V., Green, H., Salt, A. and Logan, S. (2008). Supplementation with antioxidants and folinic acid for children with Down syndrome: Randomised controlled trial. British Medical Journal 336, 594597.
  • Gail, M., Wieand, S. and Piantadosi, S. (1984). Biased estimates of treatment effect in randomized experiments with nonlinear regressions and omitted covariates. Biometrika 71, 431444.
  • Ghani, A. C., Henley, W. E., Donnelly, C. A., Mayer, S., and Anderson, R. M. (2001). Comparison of the effectiveness of non-nucleoside reverse transcriptase inhibitor-containing and protease inhibitor-containing regimens using observational databases. AIDS 15, 11331142.
  • Groenwold, R. H., Nelson, D. B., Nichol, K. L., Hoes, A. W., and Hak, E. (2010). Sensitivity analyses to estimate the potential impact of unmeasured confounding in causal research. International Journal of Epidemiology 39, 107117.
  • Henderson, R., Shimakura, S., and Gorst, D. (2002). Modelling spatial variation in leukaemia survival data. Journal of the American Statistical Association 97, 965972.
  • Hernandez, A. V., Eijkemans, M. J. and Steyerberg, E. W. (2006). Randomized controlled trials with time-to-event outcomes: How much does prespecified covariate adjustment increase power? Annals of Epidemiology 16, 4148.
  • Klungel, O. H., Martens, E. P., Psaty, B. M., Grobbee, D. E., Sullivan, S. D., Stricker, B. H., Leufkens, H. G., and de Boer, A. (2004). Methods to assess intended effects of drug treatment in observational studies are reviewed. Journal of Clinical Epidemiology 57, 12231231.
  • Lin, D. Y., Psaty, B. M., and Kronmal, R. A. (1998). Assessing the sensitivity of regression results to unmeasured confounders in observational studies. Biometrics 54, 948963.
  • Lin, D. Y. and Wei, L. J. (1989). The robust inference for the Cox proportional hazards model. Journal of the American Statistics 84, 207224.
  • Rosenbaum, P. R. (1991). Discussing Hidden bias in observational studies. Annals of Internal Medicine 115, 901905.
  • Satten, G. A. and Datta, S. (2001). The Kaplan–Meier Estimator as an inverse-probability-of-censoring weighted average. American Statistician 55, 207210.
  • Tannen, R. L., Weiner, M. G. and Xie, D. (2009). Use of primary care electronic medical record database in drug efficacy research on cardiovascular outcomes: comparison of database and randomised controlled trial findings. British Medical Journal338, b81.
  • VanderWeele, T. J. (2008). Sensitivity analysis: distributional assumptions and confounding assumptions. Biometrics 64, 645649.

Supporting Information

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Bias Formulae
  5. 3. Bias Analysis
  6. 4. Sensitivity Analysis
  7. 5. Real Examples for Sensitivity Analysis
  8. 6. Discussion
  9. 7. Supplementary Materials
  10. Acknowledgements
  11. References
  12. Supporting Information

All Supplemental Data may be found in the online version of this article.

FilenameFormatSizeDescription
biom12096-sm-0001-SuppData-S1.pdf293KSupporting Information.

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.