The purpose of this study was to evaluate a statistical method, prior event rate ratio (PERR) adjustment, and an alternative, PERR-ALT, both of which have the potential to overcome “unmeasured confounding,” both analytically and via simulation.

Methods

Formulae were derived for the target estimates of both PERR methods, which were compared with results from simulations to ensure their validity. In addition to the theoretical insights gained, relative biases of both PERR methods for estimating exposure effects were also investigated via simulation studies and compared empirically with electronic medical record database study results.

Results

Theoretical derivations closely matched simulated results. In simulation studies, both PERR methods significantly reduce bias from unmeasured confounding compared with the standard Cox model. When there is no interaction between unmeasured confounders and time intervals, the estimate from PERR-ALT is unbiased, whereas the estimate from PERR has well-controlled relative bias. When interactions exist, relative biases tend to increase but not greatly, especially when the exposure effect is relatively large in comparison with the interaction effects. When the event rate is low and the sample size is limited, PERR is more computationally stable than PERR-ALT. In empiric study comparisons with randomized controlled trials, both PERR methods show potential to reduce bias from the standard Cox model similarly when unmeasured confounding is present.

Widespread implementation of a strategy to perform reliable comparative effectiveness research[1] using large electronic medical record databases will require methodology that overcomes the potential influence of unmeasured confounding or bias on the results.

We initiated studies[2-5] to address this issue by using the United Kingdom General Practice Research Database[6] as a model electronic medical record database. Previously performed randomized controlled trials (RCTs) were replicated from the observational data except for randomization, and validity of the results was assessed by comparison with the replicated RCT.

In 14 cardiovascular outcomes from five replicated RCTs, Cox-adjusted hazard ratios (HRs) did not differ significantly from the RCT in six instances but differed significantly in the other eight, suggesting the presence of unmeasured confounding (Table 1). These findings prompted us to search for a method that could overcome unmeasured confounding, leading to development of the prior event rate ratio (PERR). As shown in Table 1, the PERR-adjusted results did not differ significantly from the Cox-adjusted results when the Cox was similar to the RCT, but it differed significantly from the Cox results in seven of eight outcomes where Cox was significantly different from the RCT results. This suggested that the PERR method could identify the presence of unmeasured confounding. Furthermore, in 11 of 14 outcomes, the PERR-adjusted HRs were not significantly different from the RCT, and in two of three outcomes, they were closer to the RCT than Cox, suggesting that PERR adjustment could also produce reliable HR point estimates.

Table 1. GPRD studies—comparison of outcome hazard ratios

MI

CVA

CABG/PTCA

*

Women's Health Initiative (WHI) trial of post-menopausal women treated with combined hormone replacement.[3, 7, 9]

†

Randomized controlled trial (RCT) values for myocardial infarction (MI) reflect the WHI re-analysis by age encompassing 50–70 years.

‡

WHI trial of post-menopausal women with a prior hysterectomy treated solely with conjugated estrogen.[8-10]

§

Significant difference (p < 0.05) compared with the RCT.

¶

Significant difference (p < 0.05) compared with the General Practice Research Database (GPRD) Cox-adjusted hazard ratio.

**

Scandinavian Simvastatin Survival Study (4S) of hypercholesterolemic subjects with coronary artery disease treated with simvastatin.[4, 11]

††

Prior event rate ratio (PERR) could not be performed because cerebrovascular accident (CVA) was a study exclusion criterion.

‡‡

Heart Outcomes Prevention Evaluation (HOPE) study of ramipril (angiotensin-converting enzyme inhibitor) treatment of patients either with established or at high risk for coronary artery disease.[5, 12]

§§

European Trial on Reduction of Cardiac Events with Perindopril (angiotensin-converting enzyme inhibitor) in Patients with Stable Coronary Artery Disease (EUROPA).[5, 13]

This prompted us to pursue rigorous investigation of the PERR method, regarding both its statistical and numerical properties. In addition, we examined an alternative formulation of the PERR method that may reduce bias.

METHODS

The subscripts “_{P}”, “_{S}”, “_{uE}”, and “_{E}” are used to represent quantities associated with prior interval, study interval, unexposed group, and exposed group, respectively. Therefore, a quantity with subscripts “_{uE,P}” pertains to the unexposed group in the prior interval. Our methods and results are applicable to both continuous and categorical confounders.

The prior event rate ratio method

The PERR adjustment method is based on the assumption that the ratio between an outcome event rate in an exposed as compared with unexposed cohort prior to the start of the study, before either group has been exposed to the study medication, should incorporate the effect of all confounders (both measured and unmeasured) on that outcome and it thus can be used to adjust the event rate for that outcome during the exposure interval. Simply put, it adjusts results of incidence rate ratios (IRRs) by estimating the IRR during the study interval and dividing this value by the IRR during the prior interval, that is, IRR_{PERR} = IRR_{S}/IRR_{P}, or alternatively for HRs wherein adjusted HR_{PERR} = HR_{S}/HR_{P}. In contrast, the standard Cox model uses only the study interval data and thus HR_{S} to estimate the exposure HR in this simple case. Generally, the Cox HR can adjust for observed, but not unmeasured, confounders.

In reality, the performance characteristics of the PERR method can be more complicated. Factors that may require consideration include the following: (i) differences in the degrees of association between the confounders and the exposure; (ii) differences in the effects of confounders on the prior and post events or intervals; and (iii) influence of prior events on post events within the same subject that may differ between the exposed and the unexposed groups. Our investigation was designed to incorporate the possibility of all these effects but focused on (i) and (ii), especially in theoretical derivation, to simplify the presentation. Simulation results are also reported when (iii) is applicable.

Behavior of the PERR method was investigated as depicted in Table 2A, assuming the possibility of different underlying unmeasured confounders U and V for the unexposed and the exposed groups, respectively. Unmeasured confounder effects are represented by γ, the temporal effect by α, and exposure effect by β. Unmeasured confounder effects are considered different for the two intervals, implying a potential interaction between the unmeasured confounders and time intervals. The interaction can differ depending on the groups. The temporal effect α represents a universal change in risks applicable to both groups.

Table 2. Underlying hazard models for prior event rate ratio method investigation

Prior interval

Study interval

A. Cox model

Unexposed

HR_{baseline} exp(γ_{P}U)

HR_{baseline} exp{γ_{S}U + α}

Exposed

HR_{baseline} exp(γ_{P}V)

HR_{baseline} exp{γ_{S}V + α + β}

B. General model

Unexposed

HR_{baseline}g_{p}(U)

HR_{baseline}g_{S}(U)exp(α)

Exposed

HR_{baseline}h_{p}(V)

HR_{baseline}h_{S}(V) exp(α + β)

The particular mathematical forms for the HR are taken from the Cox proportional hazard model with the baseline hazard HR_{baseline} unspecified. By varying the distributions of U and V and the magnitudes of γ, α, and β, we can investigate the bias of HR_{PERR} in estimating the exposure effect. In particular, we provide theoretical justification for scenarios in which small differences between HR_{PERR} and targeted exposure effect exp(β) can be expected.

Our theoretical derivation is actually based on more general underlying models than Table 2A. In particular, we derive the limiting bias of HR_{PERR} under the setting of Table 2B. Here, instead of using the exponential function with specific forms for the HRs, we use generic functions as long as they are positive. The goal is to characterize the bias of the PERR method in estimation of the targeted exposure effect exp(β) if the underlying data arise from the general hazard functions in Table 2B.

Alternative formulation of the prior event rate ratio method

On the basis of preliminary simulations and prior evidence that comparisons of population hazard differences frequently lead to attenuated estimates in the case of nonlinear models, as in Cox[14-17] or logistic regression models,[18] it appeared that the original PERR method could result in bias. Therefore, we examined an alternative formulation of the PERR method, PERR-ALT, which uses paired Cox regression[19] to estimate HR_{E} (the HR between the study and prior intervals for the exposed group) divided by HR_{uE} (the HR between the study and prior intervals for the unexposed group). The resulting estimate for the exposure HR is HR_{PERR-ALT} = HR_{E}/HR_{uE}.

The data extraction step is the same under both PERR methods. We also studied the limit behavior between HR_{PERR-ALT} and the targeted exposure effect exp(β) under the setting of Table 2B (when the study sample size is large).

RESULTS

Our primary goal was to characterize the biases of the PERR methods for estimation of the targeted exposure effect exp(β). Our investigation was carried out in two steps. First, we investigated the statistical limits of HR_{PERR} and HR_{PERR-ALT} for database studies with large sample sizes. In particular, we studied the statistical limits of HR_{P} and HR_{S} for HR_{PERR} and those of HR_{uE} and HR_{E} for HR_{PERR-ALT}. Then, we characterized the biases of HR_{PERR} and HR_{PERR-ALT} by using these statistical limits.

In Appendix A, we show that the limiting value of HR_{P} under large sample sizes can be obtained by solving the following equation for b based on the prior interval:

where τ_{P} is the duration of the prior interval and f, S, and G are the corresponding event marginal density, event survival function, and censoring survival function (respectively) for the two groups in the prior interval. For the statistical limit of HR_{S}, a derivation similar to Equation (1) applies using the study interval functions. However, no explicit expression can be obtained for b from Equation (1); therefore, no explicit expression exists for the statistical limit of HR_{PERR}. In order to gain theoretical insights, we obtain in Appendix A that, when the disease is rare, the limiting value of HR_{PERR} under large sample sizes can be approximated by

where T and C represent event and censoring variables, respectively, with the corresponding subscripts indicating their corresponding locations. This approximate form allows us to draw some insightful conclusions about HR_{PERR} in the following section.

In contrast, an explicit formula can be obtained for the limiting value of HR_{PERR-ALT} under large sample sizes (Appendix B):

From Equation (2), we conclude that when the disease is rare, HR_{PERR} ≈ exp(β) when the fraction on the right-hand side of Equation (2) is close to unity. This is true if (i) the effects of unmeasured confounders are similar between the two intervals, hence h_{P}(V)/g_{P}(U) ≈ h_{S}(V)/g_{S}(U); or (ii) the interactions between time intervals and unmeasured confounders are relatively small, hence g_{S}(U) ≈ g_{P}(U) and h_{S}(V) ≈ h_{P}(V). It is also easy to see that the settings (i) and (ii) also make the fraction in the right-hand side of Equation (3) close to unity. Therefore, HR_{PERR-ALT} ≈ exp(β) under these same settings. However, we emphasize that because Equation (2) is based on a rare disease assumption, these settings hold true for HR_{PERR} only when the disease is rare; whereas they hold true universally for HR_{PERR-ALT} as long as the underlying data follow the general models of Table 2 and the sample size for the database study is large. However, our numerical results suggest that the rare disease assumption on Equation (2) does not seem to be too restrictive.

Numerical results

We report both the performances of Equations (2) and (3) in approximating the limiting values of HR_{PERR} and HR_{PERR-ALT} when sample size is large. Because Equation (2) requires a rare disease assumption, it is crucial to see how well Equation (2) performs when disease is not rare. For simplicity, in Figure 1, we present only results for continuous confounders. Similar results have been obtained for categorical confounders. As expected, the exact limits agree well with the simulation results in all cases for both PERR methods. Note that the approximation (2) works well even when the confounder–temporal interaction effect is twice as big as the main exposure effect. The relative discrepancy is still well controlled (below 6%) even when the interaction effect is three times as large. We also note that, although the event rate varies depending on the magnitudes of interactions, it is about 20% on average. Therefore, the rare disease assumption on Equation (2) does not seem to be too restrictive.

Simulation results

Here, we report the relative biases of HR_{PERR} and HR_{PERR-ALT} for estimation of the true exposure HR, exp(β). Owing to limitations of space, we report only a limited number of scenarios. We let both exp(β) and confounder–interval interaction (γ_{S} − γ_{P}) vary and investigate relative biases, which are |HR_{PERR} − exp(β)|/exp(β) and |HR_{PERR-ALT} − exp(β)|/exp(β).

Figure 2 presents the relative biases. When there is no interaction between unmeasured confounder and exposure (γ_{S}/γ_{P} = 1), the estimate from PERR-ALT is unbiased, whereas the estimate from PERR has well-controlled relative bias (<5%). When interaction exists, relative biases are <10% when the exposure effect is moderate to large. Of course, in more extreme cases of interaction, the relative biases will increase.

For confidence interval (CI) coverage in Figure 3, we see that even if there is bias, the resulting 95%CIs from both PERR methods still cover the true value of exposure effect in all scenarios. The CI width of HR_{PERR} tends to be slightly smaller than that of HR_{PERR-ALT}. This is expected because the paired Cox analysis that forms the basis for HR_{PERR-ALT} adopts a more general model than regular Cox analysis,[19] which usually leads to wider CIs. On the other hand, the standard Cox CIs miss the true effect in all cases.

We also performed simulation studies of cases in which the prior event has an effect on the study interval events by incorporating prior event indicators in the study interval models for both the unexposed and the exposed groups. Biases from the PERR methods can increase depending on the influence of prior events. Detailed theoretical investigation is still being conducted. However, our simulation studies (results not shown here) suggest that biases depend mainly on both the magnitude of the prior event influence and the interaction effect between unmeasured confounders and time intervals. When the interaction effects are small compared with the exposure effect and when the magnitudes of the prior event influence are similar between the unexposed and the exposed groups, relative biases are well controlled by both methods (<10%) even in the presence of relatively large prior event effects. In the empiric studies performed (Table 1), approximately 90% of study events occurred in subjects without prior events.

Empirical results

Table 1 shows the results from our prior empirical studies using the PERR-ALT method (detailed methods and results of these empiric studies are provided in the primary publications[2-5, 10]). In no instance was there a significant difference between the PERR-ALT results and those from the original PERR method. Furthermore, the PERR-ALT method results largely followed a pattern similar to the PERR original method in regard to its comparisons with the RCT and the Cox-adjusted results (propensity score and Cox-adjusted results in the empiric studies were similar[2]). It should be noted that the PERR confidence limits in the Women's Health Initiative intact uterus and hysterectomy studies are very wide, and the point estimates appear inaccurate. This reflects the limited number of prior events in the exposed cohorts of both these studies (fewer than 10 for both cerebrovascular accident and coronary artery bypass grafting/percutaneous transluminal coronary angioplasty, and 22 and 16 prior myocardial infarction events for the two studies). Nevertheless, in the hysterectomy study, where the Cox results differ significantly from the RCT, the PERR-adjusted data differ significantly from the Cox results, apparently reflecting the presence of unmeasured confounding.

DISCUSSION AND CONCLUSION

In this paper, we studied the PERR and PERR-ALT methods by using both theoretical and numerical techniques under general settings of a proportional hazards model. Both methods have the potential to overcome unmeasured confounding in a variety of settings.

The PERR-ALT compares favorably with PERR, especially when the unmeasured confounder effect does not vary temporally. However, the original PERR method seems to be more computationally stable, especially in the setting of rare prior or study events. We think one of the reasons is that the paired Cox model fitting that underlies the PERR-ALT method is based on stratified analysis, assuming every paired observation has its own baseline hazard.[7] Therefore, only subjects with at least one event (either prior or study event) have the potential to enter the computation. This can decrease effective sample sizes when events are rare.

Several features and limitations of the PERR method should be acknowledged. Our empiric studies used a database with comprehensive longitudinal patient medical records, excellent outcome capture, and virtually complete drug-prescribing information. One advantage of our large database studies was the ability to capture information on the rate of an event prior to the defined study start time. In general, events encompassing up to 3–5 years prior to study start were used. In our empiric studies, exposed subject start time was the day of their initial drug prescription. Unexposed subjects were randomly age-matched and sex-matched to exposed subjects via computer, and their start time delineated as identical to their matched exposed subject, thereby eliminating potential start-time bias. Another important element may be use of a quasi-experimental design replicating an RCT, which might eliminate some degree of confounding. The major limitation of the PERR method is the requirement for prior events. Thus, death cannot be evaluated with this method, nor can other outcomes lacking prior events. Additional investigation is needed to define non-cardiovascular outcomes for which the method can be used.

Because both the PERR and the PERR-ALT methods estimate the exposure effect, although from different angles, they (along with PERR IRR-adjusted results, which may reduce bias[17]) can be used as an “internal checking” tool for the presumed magnitude of unmeasured confounding in the absence of RCT results. When the results of the three PERR methods from database studies differ, caution should be exercised in the interpretation of the findings. Finally, we recommend that the PERR results always be compared with Cox and/or propensity score-adjusted results: similarities strongly suggest a valid result; disparities suggest that unrecognized bias affecting the Cox/propensity score results is likely, and the PERR results are more likely to be correct.

Although we believe the PERR method, along with our overall experimental design, is an important advance for performing valid outcomes research using observational data, it alone will not be sufficient to overcome all the limitations that confront this field.

Other causal inference methods such as the propensity score[20-22] combined with regression calibration[23] and instrumental variable (IV)[24] analyses also have been used to address hidden confounding. In contrast to these two methods, PERR has been validated by detailed empiric study comparisons to specific RCTs. Furthermore, the propensity score calibration technique requires a validation study and therefore cannot deal with unmeasurable confounding. The IV analysis needs an appropriate, strong, and valid instrument, which, although hard to find, can be effective.[25-27] The IV method can also be hard to apply for nonlinear models such as the Cox model considered here.[28] Nevertheless, these methods along with others to be developed will be important in advancing the value of observational research.

The difference-in-differences method,[29] which is used in economics but has never been applied to clinical epidemiology, has some similarities to the PERR method in that it compares the differences between post and prior behavior in two groups. Although the difference-in-differences analysis differs from the PERR technique, its use of prior behavior to adjust for subsequent behavior does employ a somewhat similar rationale to address confounding.

Finally, we would like to point out that our PERR-ALT method has similar intuition as the case–time–control (CTC) design of Sammy Suissa.[30-32] However, there are major differences. Firstly, our method deals with time-to-event outcome that subjects to censoring, whereas the CTC design deals with binary outcomes. Second, the CTC design assumes common period effect for both cases and controls, whereas our method does not. Thirdly, the CTC design assumes that an unmeasured confounding is time constant for the same subject, whereas our method does not. Therefore, our method employs fewer assumptions and may be generalized to study the CTC design when the CTC assumptions are violated.

CONFLICT OF INTEREST

The authors declare no conflict of interest.

KEY POINTS

Extensive simulation studies and theoretical derivations were performed for the original prior event rate ratio (PERR) adjustment technique and also for an alternative (PERR-ALT) analytic approach. Both methods may reduce bias from unmeasured confounders when the exposure effect is relatively large in comparison with confounder-exposure interaction.

In empiric study comparisons with randomized controlled trials, both PERR methods show potential to reduce bias from the standard Cox model similarly when unmeasured confounding is present.

ACKNOWLEDGEMENTS

This project was supported by a research grant funded by the PENN/Pfizer Alliance, a competitive research grant collaboration between the University of Pennsylvania and Pfizer Inc. We also would like to thank the late Tom Tenhave for his encouragement and insightful discussions during the development of this manuscript.

APPENDIX A

STATISTICAL PROPERTY OF HR_{PERR} BASED ON COX MODELS

We first derive the statistical limit of HR_{P} for the prior interval. Assume that two groups of subjects were studied: subjects 1 to n are from the unexposed group (Z = 0) and subjects (n + 1) to 2n from the exposed group (Z = 1). The observed data are represented by (Z_{i,P}, T_{i,P} ^ C_{i,P}, 1{T_{i,P} < C_{i,P}}) where a ^ b is the minimum of a and b and 1{a < b} is an indicator operation for any a, b. Here, T_{i,P} and C_{i,P} are event and censoring variables. For the two groups, the underlying (true) hazards for T_{i,P} are λ_{uE,P}(t) = λ_{0}(t) g_{P}(U) and λ_{E,P}(t) = λ_{0}(t) h_{P}(V), respectively, where U and V are unmeasured confounder. This corresponds to the first column of Table 2.

Now a Cox model using Z as covariate is fitted, and the estimating equation based on the Cox partial likelihood is n^{− 1}Ψ_{P}(b) = 0, where

The solution to n^{− 1}Ψ_{P}(b) = 0 is the Cox partial likelihood estimator b^ for the effect of Z. Under large sample sizes, it can be shown that n^{− 1}Ψ_{P}(b) = 0 converges to

where τ_{P} is the duration of the prior interval and f_{uE,P}(u), S_{uE,P}(u), G_{uE,P}(u), f_{E,P}(u), S_{E,P}(u), and G_{E,P}(u) are the corresponding event marginal density, event survival function, and censoring survival function for the two groups. The derivation of Equation (A.1) builds on the general results for the asymptotic behavior of the Cox partial likelihood estimator under possibly mis-specified model.[14-17] It is not obvious to obtain an explicit form for b^. However, if the disease is rare, we have S _{uE,P}(u)G _{uE,P}(u) ≈ S _{E,P}(u)G _{E,P}(u) ≈ 1; therefore, we can treat the denominator in Equation (A.1) as a constant. This leads to an approximate solution:

where T_{uE,P}, C_{uE,P}, T_{E,P}, and C_{E,P} represent event and censoring variables from the two groups in the prior interval.

We can similarly derive an approximate solution for the study interval Cox model fitting. Therefore, an approximation of HR_{PERR} under large sample sizes is

STATISTICAL PROPERTY OF HR_{PERR-ALT} BASED ON PAIRED COX MODELS

Here, we have observed data (Z_{i,P} = 0, T_{i,P} ^ C_{i,P}, 1{T_{i,P} < C_{i,P}}) and (Z_{i,S} = 1, T_{i,S} ^ C_{i,S}, 1{T_{i,S} < C_{i,S}}) for n subjects with i = 1, …, 2n, where subjects 1 to n are from the unexposed group and subjects (n + 1) to 2n from the exposed group. The data from the two intervals are paired because they come from the same subjects. Z_{i,P} and Z_{i,S} indicate the interval status. Now, the paired Cox models using Z_{i,P} and Z_{i,S} as covariates are fitted to both unexposed and exposed groups to obtain HR_{PERR-ALT}. Consider the paired Cox model for the unexposed group.

The corresponding paired estimating equation for the Cox model [19] is n^{− 1}Ψ_{uE}(b) = 0, where

The solution to n^{− 1}Ψ_{uE}(b) = 0 therefore is the Cox partial likelihood estimator b^ for the effect of exposure. Through straightforward algebra, here, we can obtain b^ explicitly as

We can similarly derive an approximate solution for the paired Cox model fitting of the exposed group. Therefore, an approximation of HR_{PERR-ALT} under large sample sizes is

One important difference between our derivation for the limiting behaviors of HR_{PERR} and HR_{PERR-ALT} is that no approximation is needed for HR_{PERR-ALT} because of the existence of explicit expressions for the resulting paired Cox model fittings.