Prior event rate ratio adjustment for hidden confounding in observational studies of treatment effectiveness: a pairwise Cox likelihood approach

Observational studies provide a rich source of information for assessing effectiveness of treatment interventions in many situations where it is not ethical or practical to perform randomized controlled trials. However, such studies are prone to bias from hidden (unmeasured) confounding. A promising approach to identifying and reducing the impact of unmeasured confounding is prior event rate ratio (PERR) adjustment, a quasi‐experimental analytic method proposed in the context of electronic medical record database studies. In this paper, we present a statistical framework for using a pairwise approach to PERR adjustment that removes bias inherent in the original PERR method. A flexible pairwise Cox likelihood function is derived and used to demonstrate the consistency of the simple and convenient alternative PERR (PERR‐ALT) estimator. We show how to estimate standard errors and confidence intervals for treatment effect estimates based on the observed information and provide R code to illustrate how to implement the method. Assumptions required for the pairwise approach (as well as PERR) are clarified, and the consequences of model misspecification are explored. Our results confirm the need for researchers to consider carefully the suitability of the method in the context of each problem. Extensions of the pairwise likelihood to more complex designs involving time‐varying covariates or more than two periods are considered. We illustrate the application of the method using data from a longitudinal cohort study of enzyme replacement therapy for lysosomal storage disorders. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.


Introduction
Observational studies, based on routinely collected patient data or data from population-based cohorts, offer a rich source of information for evaluating real-world effectiveness of medical treatments [1]. With the anticipated growth in implementation of electronic health record systems, there is increasing scope for using large observational datasets to inform the design of randomized trials and to address clinical questions for which trials are unlikely to be conducted because of ethical or logistical considerations. However, a major challenge in adopting this approach is the need to remove bias introduced due to confounding by indication or other biases due to the effect of unmeasured covariates [2]. Analyses that fail to account for relevant confounders may have important negative consequences for health policy and patient safety.
Tannen et al. [3] introduced prior event rate ratio (PERR) adjustment, a new quasi-experimental analytic approach to identifying and reducing hidden (unmeasured) confounding in the analysis of time-to-event data from clinical database studies.
The PERR approach replicates a randomized trial by identifying a group of individuals from a clinical database using the same inclusion and exclusion criteria, a similar study time frame and a similar treatment regimen as the trial. The exposed group is the individuals who received treatment within the recruitment interval of the trial. The unexposed group are individuals not receiving the treatment within the recruitment window. Previous studies using the PERR method have generally matched unexposed patients to exposed patients on an index date (often, the date that the exposed patients first received treatment). The method relies on a before-and-after design and assumes that the hazard ratio of the exposed to unexposed for a specific outcome before the start of the trial reflects the combined effect of all confounders independent of any influence of treatment. Let HR s be the unadjusted hazard ratio of treated versus control during the study from a Cox regression model, and HR p be the corresponding unadjusted hazard ratio of treated versus control in the prior period. To account for unmeasured confounding, the PERR-adjusted hazard ratio is given by HR PERR = HR s ∕HR p . The schematic of this method is displayed in Figure 1.
Two studies by Tannen and colleagues reported good concordance between PERR results and those from published RCTs. First, observational data from the clinical practice research datalink were used to replicate the Scandinavian Simvastatin Survival Study of statin treatment for hypercholesterolemic subjects with coronary heart disease [4]; second, data from the same source were used to replicate two randomized trials of angiotensin-converting enzyme inhibitors in patients without congestive heart failure at high risk for cardiovascular disease [5]. Tannen et al. [5] conducted a broader validation study in which the PERR adjusted HRs were not significantly different from the trial HRs in five out of eight comparisons of cardiovascular outcomes where unadjusted HRs were inconsistent with the trial results (suggesting the presence of unmeasured confounding). However, Yu et al. [6] noted that the original PERR method frequently led to attenuated treatment estimates in simulation experiments and introduced an alternative, referred to as PERR-ALT, based on a paired Cox model. This approach gave unbiased estimates in simulation studies where the unmeasured confounder effect did not vary temporally. More generally, both PERR methods performed well in reducing bias when the treatment effect was large compared with any confounder-treatment interaction. The original PERR estimator was more computationally stable than PERR-ALT when the event rate was low and the sample size was limited.
Given the promising results in these early studies, the PERR method is gaining acceptance as a useful approach for addressing hidden confounding when comparing or evaluating treatments in observational studies [2]. For example, PERR adjustment has been used to assess the incidence of Campylobacter and Salmonella infection in patients-prescribed proton pump inhibitors compared with controls, using electronic health records from the secure anonymized information linkage databank [7]. Tannen et al. [8] proposed a strategy for performing comparative effectiveness research using the THIN database in which PERR adjustment is used to remove bias because of unmeasured confounding.
The aim of this paper is to set out a detailed statistical framework for using a pairwise approach to PERR adjustment and to address some of the methodological challenges. Yu et al. [6] present a simple and convenient formula for PERR-ALT adjustment. We extend their approach by deriving a flexible pairwise Cox likelihood function and using this to show that the PERR-ALT method is consistent, under relevant assumptions. The pairwise likelihood can be used to obtain consistent estimates of treatment effects, other measured covariates, and period effects. We also consider ways in which the likelihood can be extended to relax the assumptions of the original PERR method. We address the issue of estimating standard errors and confidence intervals for the treatment effect estimates. Previous work has used a bootstrapping technique to estimate confidence intervals for the PERR-ALT method because of difficulties in estimating the covariance between HR s and HR p [8]. We provide a direct approach based on the observed information matrix.
The paper is organized as follows. The definitions of prior and study event times, the hazard models, and assumptions required for using the pairwise Cox method (as well as PERR and PERR-ALT ) in the analysis of observational data are clarified in Section 2. In Section 3, we consider the nature of the bias in the original PERR method using asymptotic bias formulae. In Section 4, we propose a statistical test for detecting hidden confounding using prior event data. The pairwise Cox likelihood function is derived in Section 5 and formulae given for estimating standard errors. We also consider how the method could be applied in the context of crossover trials. Section 6 discusses situations in which the PERR method will be prone to bias. Extensions of the pairwise likelihood that permit more flexible modeling are considered in Section 7. The method is applied to data from a longitudinal cohort study of enzyme replacement therapy for lysosomal storage disorders in Section 8.

Definitions, assumptions, and hazard model
We start by introducing some notation and setting out a general framework for quasi-experimental analytic studies using a pairwise Cox approach. Here, random variables are denoted by upper case letters, and their values are denoted by lower case letters.
In what follows, we assume that there are two types of events of interest: the study event and the prior event. Usually, the study event is defined so that some individuals receive a treatment or exposure during or before the time of the study event (the exposed group) and some do not (the unexposed group). The prior event refers to an earlier event when no individuals are treated. However, these requirements are not compulsory. As illustrated later in an example related to treatment for diabetic retinopathy, there could be no order between the prior and study events, and patients can be treated during or before the time of the prior event. We note that the approach can easily be generalized to more than two events as considered in Section 7.2.
Designing studies based on the pairwise Cox likelihood method requires the user to define the events and time origins carefully to reflect the requirements of the research problem. First, the prior and study events need to be defined. Usually, the prior and study events should be of the same nature so that the unknown confounders will have the same effects on the hazards of prior and study events. For example, this may be a reasonable assumption in a study investigating the protective effects of estrogen replacement therapy on risk of myocardial infarction (MI) in post-menopausal women where previous MI is used as the prior event [8].
In order to define the event times used in the statistical analysis, the time origins when individuals first become at risk for the prior and study events need to be defined in the context of the research problem. This fixes the prior and study periods that provide the framework for the pairwise design. We define T * p for the variable of prior event time measured from the prior start point to a prior event and T + p to be the corresponding variable for the prior censoring time. The prior start point is defined as the time origin when the individual is considered to be at risk of experiencing a prior event. The variable for the observed prior time is T p = min with the variable for the prior censoring indicator Δ p = I is an indicator function such that Δ p = I and 0 otherwise. We call the time span that the individual is at the risk of a prior event (or being observed for a prior event) the prior period, which is measured from the prior start point to the observed time T p . The definition of prior period means that an individual is at risk of a prior event (or being observed for a prior event) until the prior event is observed . Similar definitions apply to the study start point, study period, and the variables ( for the study event, censoring and observed times, and censoring indicator, respectively. Note that it is generally not appropriate when using the pairwise and PERR methods to measure T s from the time of receiving treatment as this is often not the time when an individual starts to be under the risk of a study event (see later).
To illustrate the process of model formulation, we consider the example of assessing the effectiveness of influenza vaccination in older adults in the UK using electronic medical record data. Influenza is a major cause of illness and death amongst the elderly, and reliable estimates of vaccine effectiveness are important for informed vaccine policies and programs [9]. To apply the pairwise Cox method, we first need to define the prior and study events: If the focus is on more serious outcomes, then one possible event would be hospitalization for an influenza-related illness. We make the simplifying assumption that individuals are at risk of influenza during the winter season (from October 1 to March 31 each year) but not during the summer season (from April 1 to September 30 each year). Suppose data are available from the 2013/2014 and 2014/2015 influenza seasons. The prior and study time origins are October 1, 2013 and October 1, 2014, respectively ( Figure 2). Let X = (X 1 , … , X q ) tr be q-measured covariates in the study period, where tr is the notation of transpose and C be the column vector for hidden covariates (or confounders). Conditional on X and C, (T p , T s ) and are assumed to be independent. The independent censoring assumption is sensible and has been widely used in multiple events models such as Prentice, Williams, and Peterson [10] and Wei, Lin, and Weissfeld [11] (hereafter, referred to as PWP and WLW). For example, in the influenza vaccination study, if patients are not infected with flu from October 2013 to April 2015, the prior and study events are right censored with T + p = T + s = 6 months (i.e., the length of flu season). In this case, the censoring is similar to administrative censoring.
In the presence of death, if death is related to the prior or study events, a competing risk model will be needed [12]. In this paper, we focus on the case where death is not related to the prior or study events and can be regarded as right censoring. If the individual died after the prior event and before the study period, T si = 0 and Δ si = 0. We will show that this kind of data has no contribution to the pairwise Cox likelihood. Let . It is assumed that the hazard of a prior event at T * pi = t is as follows: where h 0p (t) is the unspecified baseline hazard for the prior event, is the vector of coefficients for C i , and I(t ⩽ T pi ) indicates whether t is within the prior period, that is, whether the individual is at the risk of a prior event (or being observed for a prior event) until the prior event is observed or a censorship occurs. Similarly, we assumed the hazard of a study event at T * si = t is as follows: where h 0s (t) is the unspecified baseline hazard for the study event, = ( 1 , … , q ) is the vector of coefficients for X i , and I(t ⩽ T si ) is the indicator for the study period.
In (1) and (2), C i is assumed to be time invariant. In practice, the prior and study events need to be carefully defined to satisfy this assumption. As we said, usually the prior and study events should be of the same nature. As a consequence, the pairwise Cox likelihood approach is only applicable to problems where the prior and study events are non-terminal events. In particular, the method cannot be applied when death is an outcome of interest.
We maintain that a careful clarification of research purpose is needed before one can choose a statistical model. Like the PWP and WLW models, the basic models (1)-(2) allow for the different events (prior and study) to have different baseline hazards (h 0p and h 0s ). The WLW model usually operates with a common time origin, while the basic models (1)-(2) allow the different event times to have different time origins. In some cases, the Andersen and Gill model is suitable if the events shared a common baseline hazard, while in other cases, it is reasonable to assume different baseline hazards for different events. For example, in the study of influenza vaccination, it is reasonable to allow different baseline hazards, h 0p (t) and h 0s (t), for the prior and study events because of the unknown changes in conditions between the two flu seasons (e.g., due to differences in the circulating strains of influenza). The basic models (1)-(2) can be used wherever the PWP, WLW, and the Andersen and Gill (AG) [13] models are applicable. For example, if the prior and study events are the first and second events, respectively, then T pi and T si are the times for the first and second events, respectively. In this case, by adopting a time-varying treatment indicator X i (t), the basic models (1)-(2) are the same as follows: • the PWP model, if T si is measured from the time point of the prior event; • the WLW model, if T si is measured from the same time origin as T pi ; and • the Andersen and Gill model, if T si is measured from the same time origin of T pi and we assume The definition of events and measurement of times are important when using the PERR/pairwise method. Returning to the estrogen replacement therapy example, it is not appropriate to define the prior event as the first MI (no matter whether the patient is treated or not) and the study event as the first MI after the treatment, and measure T s from i , the treatment time for the ith individual. Consider, for example, if the treatment has no effect ( = 0), a first MI after treatment at T si = t without a prior MI can be regarded as both the prior and study events and will consequently have two different hazards h 0p ( i + t) exp( c i ) ≠ h 0s (t) exp( c i ) (the common baseline hazard h 0s could not be the same as h 0p ( i + t), which depends on individual i). For this example, it is more appropriate to define the prior event as the first MI and the study event as the second MI. The time T s can be measured from the first MI (like the PWP model) or from the prior start point (like the AG and WLW models). The pairwise approach can be easily generalized to the cases where the third, fourth, or more MIs are of interest by using a time-varying treatment variable.
Another example is an observational study of the effectiveness of laser photocoagulation in delaying the onset of blindness in patients with diabetic retinopathy. A patient could potentially experience blindness in both eyes; we define the prior and study events to be left and right eye blindness, respectively. In this example, there is no order between the prior and study events, and the prior and study periods overlap. The right eye can be blind before the left eye. The patient starts to be at the risk of blindness for both eyes from the same time origin. A WLW-type model can be used: where X ip = 1 if the left eye was on treatment and 0 otherwise, X is = 1 if the right eye was on treatment and 0 otherwise, and p and s are the treatment effect on the left and right eyes, respectively. The process of observing the left eye blindness is stopped (i.e., . The likelihood function for this example is given in (16). For the influenza vaccination example, the basic models (1)-(2) can be extended as follows: where T pi and T si are respectively measured from October of 2013 and 2014 to the times of hospitalizations for an influenza-related illness or to April of 2014 and 2015 (if the times are censored). The time origins are defined as the start of the 2013 and 2014 influenza seasons because these are the points at which patients first become at risk for a prior or study event, respectively. We note that in this example, there is a gap in being under risk between the prior and study periods (during the summer period in 2014). In this type of problem, it is not appropriate to define the study time origin at the first event (like the PWP model) or at the start of the prior period (like the AG and WLW models). The definition of the time origins in this example requires the indicators of vaccinations X ip (t) and X is (t) to be time-varying: X ip (t) = 1 if the individual i was vaccinated at or before the time t pi = t and 0 otherwise; X is (t) = 1 if the individual i was vaccinated at or before the time t si = t and 0 otherwise. If there is a prior event at time t * pi , then T pi = t * pi . The indicator for the prior period I(t ⩽ T pi ) = 0 for t > t * pi reflecting the assumption that there is no risk of flu after a previous hospitalization related to influenza within the same flu season. This is because an individual contracting influenza develops antibodies to the circulating strain. If there is no prior event, the prior event time is censored, and the observed prior time T pi is equal to the length of the flu season, that is, 6 months. The indicator for the prior period I(t ⩽ T pi ) = 0 for t > 6 reflecting the assumption that there is no risk of flu after the flu season.

Bias in the prior event rate ratio method
Suppose the true hazards are (1) and (2). The PERR method fits the following models for the prior and study event times: respectively. The parameters * p = log(HR p ) and * s = log(HR s ) are the log-hazard ratios without adjustment for C i for prior and study events, respectively. The PERR-adjusted log-HR is given by log(HR PERR ) = PERR = * s − * p . The parameters * p and * s are estimated from the log partial likelihoods: respectively. However, the PERR adjustment cannot entirely remove the bias from the estimate of the treatment effect, that is, PERR ≠ . Lin et al. [14] studied the problem of hidden confounding in the Cox regression model and showed that due to the nonlinearity of the Cox model, the resulting bias is a complicated combination of three sources: (i) bias due to omission of hidden covariates, even if they are not confounders; (ii) bias due to censoring; and (iii) bias due to the hidden covariates being confounders. Here, we conducted a simple simulation study to illustrate the impact of these three sources of bias on the performance of the PERR method. We generated 100,000 paired times (t p , t s ) for prior and study events from the models (1) and (2), respectively, with h 0p (t) = = 1, h 0s (t) = exp(1), X ∼ Bin(1, 0.5) and the log HR of the hidden covariate, , taking 100 sequenced values from −10 to 10. The data were fitted by the models (3) and (4). Because the sample size is large (n = 100, 000), the bias of the unadjusted estimatê * s for the naive Cox model and the bias of the PERR estimatêP ERR were approximated bŷ * s − and̂ * s −̂ * p − , respectively.
We present bias curves in Figure 3 showing the relationship between the bias of the PERR estimate and the value of under three different scenarios to separate out the effect of the three potential sources of bias: Figure 3(a) shows the bias when there is a hidden balanced covariate C ∼ Bin(1, 0.5) in the absence of censoring; Figure 3(b) shows the bias when again C ∼ Bin(1, 0.5), but in addition, both t p and Figure 3. Bias curves for the prior event rate ratio (PERR) estimate PERR and the unadjusted estimate * of the naive Cox model in the presence of omitted covariates. Graphs are based on simulated dataset with n = 100, 000 and true log-hazard ratio of treatment = 1. An exponential baseline hazard was used with a rate parameter of h 0 (t) = 1. The censoring mechanism was uniform. t s are 50% censored; Figure 3(c) shows the bias when again t p and t s are 50% censored, but C is now a hidden confounder with distribution Bin(1, 0.3 + 0.4X) (i.e., the omitted covariate has the same marginal distribution as Figure 3(a) and (b), but the conditional distribution depends on X). The three figures show that PERR adjustment is biased even if the hidden covariate C is not a confounder. In Figure 3(a) and (b), it is hard to see the difference between the PERR method and the naive unadjusted Cox model because the bias curves for * s and PERR overlap closely for all values of : PERR adjustment fails to remove any of the bias attributable to censoring or to hidden covariates that are balanced across treatment groups. The figures show that the bias of PERR adjustment increases with the absolute value of the log-hazard ratio of the hidden covariate. The direction of the bias is always the same: Treatment effect estimates are attenuated. The PERR adjustment is only successful in removing the bias attributable to confounding, as shown in Figure 3(c). The unadjusted Cox model fails to remove the hidden confounding bias, and the direction of the bias depends on the sign of and the distribution of C conditional on X.
Although the models (3) and (4) are misspecified, the maximal likelihood estimateŝ * p and̂ * s converge to well-defined constants as n → ∞ [15]. The limiting values of̂ * p and̂ * s are sometimes called the 'least false' value [16]. For simplicity of notation, we use the same * p and * s to denote these limits. By calculating the limits of the score functions of the likelihood (5) and (6) as n → ∞ under the true models (1) and (2) and following the result of Lin et al. [14], it can be shown that the limits * p and * s are the solutions of the equations: where dt, E xc is the expectation with respect to X and C, and S + p (T p |X) and S + s (T s |X) are the survival functions of the prior and study censoring times conditional on X, respectively.
The first-order Taylor series approximation for the bias of PERR adjustment is as follows:

Use of prior event data to detect unmeasured confounding
A hypothesis test for detecting unmeasured confounding can be developed based on the first equation in (7). Suppose C⊥X so that C is an unmeasured balanced covariate but not a confounder, the first equation in (7) reduces to the following: Under the independent censoring assumption T * p ⊥T + p |(X, C) [11], the survival function for the prior censoring time can be written as S + p (t|X, C). Because X is the set of covariates associated with the study period, S + p (t|X, C) = S + p (t|C). With C⊥X, it follows: Therefore, (8) is further simplified to the following: which has the solution * p = 0. This provides the basis for the following procedure to test for the presence of an unmeasured confounder using the prior event data: (1) Suppose the null hypothesis is H 0 ∶ C⊥X so that C is an unmeasured balanced covariate but not a confounder.
(2) To assess (9), fit the model (3) to the data t p and x but with the censoring indicators reversed (i.e., by considering events as 'censored' observations and censored observations as 'events'). If the p-value for the estimatê * p is greater than 0.05, it indicates that (9) is deemed valid. (3) Fit the model (3) to t p and x again but without reversing the censoring indicators. Under the null hypothesis H 0 and with (9), the distribution of the Wald test statistic for̂ * p can be approximated by the standard normal distribution N(0, 1). (4) Calculate the p-value. Reject H 0 , if the p-value is less than 0.05.
Note that even if H 0 is not rejected, the pairwise method is still recommended because the unadjusted estimatê * s may still incorporate bias because of hidden balanced covariates and censoring, as illustrated in Figure 3

Pairwise Cox likelihood
For the hazards (1) and (2), we first write the baseline hazard ratio as follows: When evaluating effectiveness of treatments for many common chronic conditions such as cardiovascular disease, it is likely that h 0s (t) > h 0p (t) and so exp{ (t)} > 1 for all T pi = T si = t ⩾ 0. If we assume the baseline hazards h 0s (t) and h 0p (t) are proportional over time (analogous to the familiar Cox proportional hazards assumption), the log baseline hazard ratio is then equal to a constant (t) = .
The parameter can be regarded as a period effect. The hazards (1) and (2) where the hazards (10) and (11) now share the same term h 0pi (t) = h 0p (t) exp( C i ), which can be regarded as an individual-specific baseline hazard. For n individuals, the full likelihood is as follows: where H 0pi (t) = ∫ t 0 h 0pi (u)du is the cumulative hazard function. It is clear that we cannot obtain a consistent estimate of using (12), because the number of nuisance parameters h 0p1 (t), … , h 0pn (t) increases with the sample size n. We face the infinitely many nuisance parameters problem. The nuisance parameter h 0pi (t) needs to be removed.
Suppose that and are given, then as shown in Appendix A, the maximum likelihood estimators (MLEs) of h 0pi (t) and H 0pi (t) obtained from L full are as follows: where P i = I(T pi ⩽ T si ) and S i = I(T si ⩽ T pi ).
Plug the results (13) and (14) back into (12), and we get the pairwise Cox likelihood: The likelihood (15) is now free of the unknown term h 0pi (t) = h 0p (t)exp( C i ).
As shown in web Appendix A, if we present the hazard models (10)-(11) and the likelihood (15) in the mathematical framework of counting processes using the same techniques in the proofs of Gross and Huber [17], it can be shown that the estimateŝand̂from (15) are consistent and asymptotically normal. The covariance matrix can be estimated by the following: , and the confidence interval can be constructed based on a normal approximation. We can also show the consistency of the PERR-ALT adjustment in Yu et al. [6]. For the exposed group (X i = 1), the hazards are as follows: The pairwise likelihood is then Because (̂,̂) are consistent for ( , ), the HR estimate for the exposed group (as denoted byĤR E in Yu et al. [6])ĤR E = exp(+ ) is consistent for exp( + ).
For the unexposed group (x i = 0), the hazards are the following: and the paired likelihood is as follows: Similarly, it can be shown that that the HR estimate for the unexposed group (as denoted byĤR uE in Yu et al. [6])ĤR uE = exp(̂) is consistent for the baseline hazard ratio exp( ).
Therefore, the PERR-ALT estimateĤR PERR-ALT =ĤR E ∕ĤR uE is consistent for the true treatment effect exp( ).
Note that the proportional baseline hazards assumption, log{h 0s (t)∕h 0p (t)} = (t) = , is not compulsory for the pairwise Cox method. The pairwise Cox likelihood is flexible. We can, for example, assume the log-baseline hazard ratio is linearly associated with time and thus specify (t) = 0 + 1 t. The pairwise likelihood is then as follows: For the example of diabetic retinopathy, the likelihood function is as follows: Because there are no biological differences between the left and right eyes, we can assume a common baseline hazard (h 0p (t) = h 0s (t)) and a common treatment effect on both eyes ( p = s = ). Therefore, = log(h 0s (t)∕h 0p ) = 0, and the likelihood becomes where X i = X is − X ip . We note that only patients receiving treatment on one of the eyes (i.e., X i = 1 or −1) will contribute information to the likelihood (17). Figure 4 compares the biases of the unadjusted, PERR, and pairwise methods in the case of a time-varying baseline hazard. We generated the prior and study event times from a Weibull model with h 0p (t) = 2t, = 1, = −1, ∈ (−10, 10), X ∼ B(1, 0.5), C ∼ B(1, 0.3+0.4X), and both the prior and study data were 50% censored. The simulation results are similar to those in Figure 3(c) and show that the pairwise method is consistent. Table I gives some examples of the variances for the unadjusted, PERR, and pairwise estimates under different censoring proportions for the prior and study periods. We generated t p and t s from (1) and (2), respectively, with h 0p (t) = = = 1, h 0s (t) = exp(−0.5), X ∼ Bin(1, 0.5), and C|X ∼ Bin(1, 0.3 + 0.4X). We set censoring proportions for the prior and study periods as (10%, 50%, 90%). The average variance for each of the three methods and each value of the censoring proportion were calculated over 1000 simulation replications. For each replication, we set the number of events as 200 and the sample size as 200∕{1 − min(CP p , CP s )}, where CP p and CP s are the censoring proportions for prior and study periods, respectively. Table I shows that generally, the variances of the PERR and pairwise estimates are larger than those for the unadjusted estimates. The square root of the ratio between average variances shows how much the confidence interval of the pairwise method will be wider than that of the unadjusted method. It can been seen that the square root of variance ratio will generally increase with CP s − CP p .

Treatment of ties and left truncation
In the special case where the prior and study times are tied , we assume the equality occurs because the measurement of event times is not accurate enough. The exact likelihood of the tied pair is hence the sum of the likelihoods in (13) The sum is as follows: Therefore, the tied pairs make no contribution to the full likelihood and should be removed before using the pairwise Cox method. If the tied pairs are not removed, the partial likelihood in (15) will be the following: which is not identical to the exact likelihood, and thus, estimates of and will be biased Another issue that will be relevant in some studies is that of left truncation. For example, this can arise when the time at which a patient first becomes at risk of a prior event is unknown. For left truncation, there are two options. One is to redefine the events with different starting points to avoid the problem. For example, it may be reasonable to define the prior origin as the date of the patient's first medical record after reaching a particular age (or the date of diagnosis), even though the patient may have been at risk before that time. However, this option is not always feasible and may change the interpretation of parameters, potentially leading to violation of the assumptions of the hazard models. The other option is to impose a distributional assumption on the time of left truncation [12]. The difficulty is that it is necessary to have reliable information on how to specify the distribution. More details can be found in section 4.5 of the book by Cook and Lawless [12].

Applications in crossover trials
As Figure 3(a) and (b) shows, the unadjusted Cox model will give biased estimates of treatment effects, even in randomized trials, if needed covariates are omitted. As it is never possible to capture all component causes [18], this potential for bias is likely to be present to some degree in all randomized controlled trials. In particular, it can have practical consequences for effectiveness estimates in smaller trials.
The pairwise Cox method can be applied with the crossover trial design to remove the bias from hidden covariates. As we have shown, even if there are infinitely many background covariates and it is impossible to measure all of them, as long as their values and effects are constant in (1) and (2), the relevant terms will only rescale the individual-specific baseline hazard h 0pi (t) and will be eliminated by the pairwise partial likelihood (15). However, as shown in Table I, there is a trade-off between bias and variance when using the pairwise method with crossover trials.

Limitations of prior event rate ratio adjustment
As with other methods for addressing hidden confounding, the validity of PERR adjustment based on the pairwise Cox likelihood relies on assumptions about the underlying causal model linking treatment (exposure) to outcomes. To assist applied researchers in deciding when the PERR method is likely to be of use in practice, we now review the key assumptions of the method and consider robustness of the method to violations of these assumptions:
If the models are misspecified, for example, if there is an interaction between treatment and the unmeasured confounder(s) or the value and effect of the unmeasured confounder are time-varying, the pairwise method will be biased. Uddin et al. [19] studied the performance of the PERR estimator when the effect of the unmeasured confounder varies between periods and showed that the PERR method gives biased estimates of the rate ratio, but the bias is generally smaller than for conventional analyses. Figure 5 shows an example of the biases for the pairwise Cox likelihood method, the original PERR adjustment method, and the standard Cox regression model for different degrees of confounder-treatment interaction. The paired event times t p and t s were generated from (1) and h 0p (t) exp( X i + C i + + 1 X i C i ) with h 0p (t) = = = = 1, X ∼ B(1, 0.5), and C|X ∼ B(1, 0.3 + 0.4X). The sample size was 100,000, and there was no censorship. The bias was estimated for different values of the interaction effect 1 .
It can be seen that when the true hazard model contains an interaction, the bias curve for the pairwise Cox method has a similar sigmoid shape to the curves for the Cox model ( i.e.h * 0s (t)e * s X ) and the original PERR method. When the magnitude of the interaction effect is less than that of the main effect of treatment, that is, | 1 | < 1, the pairwise method reduces the bias compared with the Cox model. However, the figure illustrates that there may be regions, corresponding to more extreme degrees of model misspecification, where the Cox model performs no worse than or better than PERR adjustment.

Assumptions 2
Prior event occurrence should not influence likelihood of future treatment/exposure.
Gallagher et al. [20] and Uddin et al. [19] showed that the original PERR is biased when the likelihood of treatment is affected by the prior event occurrence. We assessed the performance of the pairwise method under this problem and found that it also gave biased estimates of treatment effects. An example of this bias is illustrated in Figure 6. The paired times t p and t s were generated from (1) and (2) with h 0p (t) = = = = 1. The sample size was 100, 000. The censoring mechanism was uniform, and both the prior and study event data were 50% censored. We generated   (1) and (2). An exponential baseline hazard was used with a rate parameter of 1. The censoring mechanism was uniform, and both the prior and study event data were 50% censored. The main effect of treatment was = 1, and the period effect was = 1.
C and X i were generated from (19) so that the treatment variable was associated with the unmeasured confounder and the prior censoring indicator pi .
so that the treatment variable was associated with the unmeasured confounder and the prior censoring indicator pi . Understanding the mechanism underlying the bias and developing possible solutions is an interesting open question in its own right.
Remarks on the problem of differential case fatality: Gallagher et al. [20] also showed that the original PERR method is biased in the case of differential case fatality (i.e., where high-risk patients are more likely to die before reaching exposure).
We conducted a simulation study to further explore the performance of the PERR and pairwise methods in the presence of differential case fatality (web Appendix B). The underlying bias of the PERR estimator (as shown in Section 3) was shown to change when case fatality was differential (both increases or decreases in bias were possible). However, we found that the pairwise method was unbiased in this situation. The reason is that the pairwise method compares the outcomes of the same patient before and after study and thus differential case fatality will not introduce bias. In contrast, for the PERR method, the distribution of the unmeasured confounder is changed when case fatality is differential. For example, suppose C ∼ Bin(1, 0.5) at the beginning of the prior period and the groups with c = 0 (low risk) and c = 1 (high risk) have differential case fatality of 10% and 50%, respectively, then the distribution of C for the subjects in the PERR analysis becomes Bin . As a result, differential case fatality changes the distribution of the unmeasured confounder and thus changes the bias of PERR. In fact, the bias of PERR in this case is the same as the bias in the absence of differential case fatality with C ∼ Bin ) . This change in the bias due to differential case fatality corresponds to the difference between the PERR biases for the two unmeasured confounding distributions (web Appendix B). We note that in the extreme case that all the subjects with c = 1 die before exposure and all the subjects with c = 0 survive to exposure, the PERR method (as well as the unadjusted Cox model) will be unbiased because all the subjects in the analysis have the same value of c = 0.

Time-varying covariates
The pairwise likelihood (15) can be extended to allow flexible modeling in more general situations. For example, in practice, we may have some covariates measured in the prior period, and their values and effects might change over time. Adopting a commonly used model for time-varying covariates, the hazards can be written as follows: where pi is the time span between the starting points of the prior and study periods for the ith individual. The models (20) and (21) assume that the hazards at time t pi = t si = t are only affected by the current value of the covariate and its effect at the time t pi = t si = t. Unbiased estimates of the model parameters can be obtained from the following: The simple hazard models (10) and (11) and the likelihood (15) can be regarded as the special cases of (20), (21), and (22), respectively, when Such a model could be used for considering the effects of treatments, such as statins, for which effects on outcomes or side effects may not become apparent until after an initial period on treatment. The corresponding hazards are as follows: where s is the parameter of interest. The likelihood is then A simulation study was conducted to compare the performance of the likelihood (25) with the unad- . We generated 100,000 paired event times (t pi , t si ) from (23) and (24) with h 0p (t) = s = 1, p = = −1, = (−10, 10), X p ∼ Bin(1, 0.5), X s ∼ Bin(1, 0.7 − 0.4X p ), and C ∼ Bin(1, 0.3 + 0.4X s ). The censoring mechanism was uniform, and both the prior and study event data were 50% censored. The biases of the pairwise method were calculated bŷ s − s , wherês was obtained from (25). The bias of the unadjusted hazard ratio waŝ * s − s . The results presented in Figure 7 show that the pairwise method eliminates the bias in the unadjusted model. The PERR method was not considered here because it is not applicable in this case.

More than two periods
The likelihood can also be extended to the case of more than two periods. Suppose the hazard of the ith individual of experiencing the kth event is as follows: where h 0 (t) is the baseline hazard of the first event, ik is the time span between the starting points of the first period and the kth period, and k is the log ratio of baseline hazards between the kth and the first events with 1 = 0.
in the presence of time-varying covariates and effects. Graphs are based on a simulated dataset with n = 100, 000 and log-hazard ratio of treatment s = 1 and p = −1. An exponential baseline hazard was used with a rate parameter of h 0 (t) = 1. The censoring rate is 50% for both prior and study periods.
For a sample of n individuals and the ith individual with m i events, the partial likelihood is as follows: In the case of ties, (26) is a Breslow approximation to the exact likelihood, which should work well if the ratio between the number of ties and m i is small. However, if this condition is not satisfied, for example, m i = 2, the approximation can be poor as shown in (18). In this situation, the exact likelihood or Efrons approximation would be suggested.

Background and model specification
We consider application of the pairwise Cox approach to data from a longitudinal cohort study of enzyme replacement therapy (ERT) for Fabrys disease. Fabry disease is a rare, inherited metabolic disorder resulting from absolute or partial deficiency of the enzyme -galactosidase A and the progressive accumulation of undigested macromolecules in cells throughout the body, particularly in the heart, kidneys, and nerve tissue [21]. There are two forms of enzyme replacement products approved for use in the USA and Europe: agalsidase alfa and agalsidase beta. Studying the comparative efficacy and safety of these two therapies is of current clinical relevance as there is a tendency for the drug of choice to vary between treatment centers, often for historical reasons [21]. The rarity and severity of this condition have resulted in a lack of adequately powered randomized trials for making such comparisons, and so researchers have instead had to rely on observational studies, with the attendant risk of bias from hidden confounding. We fitted the pairwise Cox model to estimate comparative effectiveness of the two forms of ERT using data from the National Collaborative Study for Lysosomal Storage Disorders, a longitudinal cohort study collecting both prospective and historical data [22]. Here, the focus is on illustrating usage of the methods rather than drawing firm conclusions about the treatments involved, and so we refer to the treatments simply as therapy A and therapy B. Our analyses necessarily involve simplification of the issues that would be involved in substantive analyses of the source data, and so the results we present should not be used to draw inferences for clinical practice.
Overall, a total of 211 adults with Fabry disease were being treated with ERT on recruitment to the National Collaborative Study for Lysosomal Storage Disorders study. For this illustrative example, the outcome of interest was estimated glomerular filtration rate (eGFR), a measure of kidney function, and we analyzed data for the 45 patients with at least two measures of eGFR, both before and after starting therapy. The data for the 45 patients are provided in web Appendix C and is available to download as a CSV file. Of these patients, 26 received therapy A, and 19 were on therapy B. Two potential confounding variables, age and gender, were available for this analysis. The patients on therapy A were younger and more likely to be female (mean age of starting therapy = 41.0 years; 35% female) than patients on therapy B (mean age of starting therapy = 48.1 years; 5% female). We were unable to consider other potential confounding variables, such as genotype and disease severity, because of a lack of consistency in the way that they were assessed and recorded. The aim of the pairwise Cox analysis was to account for these unmeasured differences. Given the modest sample size, the analysis can be seen as a pilot study to inform the design of a more extensive follow-up study.
To facilitate application of the pairwise Cox method, the multiple renal function measures were converted into time-to-event data by defining the prior and study events as the first times when eGFRs were greater than those in the first visits before treatment and the visits at or following treatment, respectively (in the absence of measurement error, increases in eGFR correspond to improvements in kidney function). According to these definitions of the prior and study events, the start of the prior period was the first visit recorded in the database before treatment, and the corresponding start of the study period was the visit at or following the time of initiating ERT therapy. The observed prior and study event times, T p and T s , were measured from the start of the prior and study periods to the prior and study events (or to the times of receiving therapies and the end of study), respectively. The prior and study periods are terminated at T p and T s , respectively. We note that defining the study origin as the visit at or following the point of starting ERT therapy was a consequence of the definition of a study event. This helped simplify the problem for the purposes of this illustrative example. In practice, it is inappropriate to set the time of receiving treatment as the study origin, if the time of receiving treatment is not the time when an individual starts to be under the risk of the defined study event.
Let C i be an unmeasured confounder, X i5 be the indicator of gender (0 = male; 1 = female), X i3 and X i4 be the ages in years at the beginning of the prior and study periods, respectively, and X i1 and X i2 be the treatment indicators (no treatment: X i1 = X i2 = 0; therapy A: X i1 = 1, X i2 = 0; therapy B: X i1 = 0, X i2 = 1). The hazard functions of the prior and study events for the ith patient are assumed to be as follows: where , 5 , 3 , 1 , and 2 are the coefficients and = log { h 0s (t)∕h 0p (t) } . Note that in the hazard models (27)-(29), age is a time-varying covariate measured in both the prior and study periods and taking the values X i3 + t and X i4 + t at the times of prior and study events, respectively. Because the prior and study events are of the same nature, that is, renal function increases, the coefficient 5 (or 3 ) is assumed to be the same in the hazards (27) and (28). Otherwise, we would need to specify different 5 (or 3 ), such as using 5p in (27) and 5s in (28). As all the patients received one of the two types of ERT therapy and no one was untreated, X i2 = 1 − X i1 , and we are unable to estimate 1 and 2 . But the difference between therapy A and therapy B, 1 − 2 , can be estimated.

Estimates under the pairwise, unadjusted Cox models and prior event rate ratio
The pairwise likelihood is as follows: The likelihood of the Cox model fitted to the study data without adjustment for C i is as follows: The likelihood of the Cox model fitted to the prior data without adjustment for C i is as follows: The estimates of e 1 − 2 under the pairwise model (30) and the unadjusted Cox models (31) and (32) are presented in Table II with 95% confidence intervals obtained from the observed information. The R code for the pairwise model (30) is provided in Appendix B. The PERR estimate is 1.25∕0.93 = 1.34, and the 95% confidence interval was obtained by the bootstrap method. The estimates for the pairwise, unadjusted and PERR methods are consistently above 1, but the differences are not significant at the 5% level. Use of the pairwise method has led to an increased point estimate of 1.55, suggesting there may be the potential to uncover genuine differences between therapies in a larger study.

Using the prior data to detect unmeasured confounding
To make use of the method in Section 4, we first test whether the assumptions (X 5 , X 3 )⊥X 1 and S + p (t|X 1 , X 5 , X 3 ) = S + p (t|X 5 , X 3 ) hold. We assessed these by fitting a logistic regression to X 1 on X 3 and X 5 , and a Cox model to t p on X 1 , X 3 , X 5 with the censoring indicator swapped. There was no significant evidence against either of these assumptions (the p-values for likelihood ratio tests are 0.14 and 0.88, respectively ). Under these assumptions, the p-value of the unadjusted Cox model (32) provides a test for hidden confounding. In this example, there was no significant evidence of hidden confounding (p = 0.86), perhaps due to lack of power to detect an effect in this small sample. However, as mentioned in Section 4, the unadjusted Cox model and the PERR will underestimate the treatment effect, even if the unmeasured covariate is not a confounder. The pairwise method is still recommended in the absence of significant evidence of hidden confounding.

Discussion
This paper is concerned with identifying and reducing the impact of hidden confounding in observational studies of treatment effectiveness. Building on the work of Tannen and colleagues, we set out a general framework for quasi-experimental analysis using the PERR approach and derived a flexible pairwise Cox likelihood function that can be used to estimate unbiased treatment estimates, under appropriate assumptions. The pairwise likelihood was used to demonstrate the consistency of the simple and convenient PERR-ALT estimator introduced by Yu et al. [6]. We showed how to estimate standard errors and confidence intervals for treatment effect estimates based on the pairwise model and provided R code to illustrate how to implement the method. A simple test to detect unmeasured confounding was also developed based on data from a prior period.
Much of the previous work on PERR adjustment has focused on analysis of electronic medical record databases. While the growth of large-scale data registries makes this a particularly important application, the approach also has potential value in population-based research studies. This was illustrated through an example in which the pairwise method was applied to a longitudinal cohort study combining prospective data collection with retrospective use of routine health records.
Designing future quasi-experimental studies that exploit the pairwise Cox method requires an understanding of the power of the method. We showed that the variance for the pairwise method is usually larger than that for a conventional Cox regression analysis. In the era of large electronic medical record databases, the additional variance is unlikely to be overly burdensome unless the focus is on narrow segments of the population.
One of the strengths of the pairwise Cox likelihood approach is its inherent flexibility. Like the conventional Cox model, the pairwise method can easily be extended to accommodate time-varying treatment covariates and coefficients. Furthermore, the form of (t), corresponding to the period effect, can be chosen by comparing models with different forms of (t), making it possible to relax the proportional baseline hazards assumption. In addition, we leave the user free to define the prior and study events for different research problems, without the need for the prior and study periods to be successive.
Several limitations of the PERR approach need to be considered by investigators conducting observational research. Yu et al. [6] highlight the fundamental requirement for prior events. Consequently, in its current form, the pairwise Cox likelihood cannot be used for evaluating treatment effects on mortality or other terminal events. The method also relies on strong assumptions about model specification including the absence of time-varying hidden confounders and confounder by treatment interaction. In practice, the method may be robust to realistic levels of departure from these assumptions [6], but as we have demonstrated in extreme cases, the method can be as biased, or even more biased, than conventional approaches.
An additional challenge to the validity of the method arises when treatment allocation is associated with prior events, such as if a patient is switched to a new treatment following failure of an existing treatment. Consistent with the findings of Gallagher et al. [20], our simulations highlighted that both pairwise and PERR methods are likely to be biased in this situation. These findings reinforce the need to consider the context of the problem carefully when assessing the suitability of the PERR approach. In its current form, the pairwise method may be most useful where the decision to start treatment is not likely to be related to the risk of future events, for example, in drug safety monitoring or evaluations of national vaccination programs.
A number of other biostatistical methods for tackling hidden confounding have been proposed. Common approaches that provide estimates of causal effects include propensity scoring combined with regression calibration [23], instrumental variables [24], and regression discontinuity designs [25]. In practice, no one method is likely to be best in all problems, and it is essential for investigators to carefully assess the potential biases in each proposed study, where possible tailoring the methods or combination of methods to address these biases [2]. Tannenet al. [5] provided preliminary evidence that the PERR approach can produce reliable results when replicating outcomes of cardiovascular trials, but further empirical studies are needed to establish the validity of the method for use in other clinical problems, and to determine the strengths and limitations of the PERR approach relative to other methods. Where possible, this should include proof of concept studies to replicate results of randomized trials as well as clinically informed simulation studies.
Like the instrumental variable method that requires a suitable instrument and propensity score calibration that needs a suitable validation study, the pairwise method can be only used when suitable data are available. For example, data on prior events will not be available in some problems, such as a primary prevention study of statin therapy for patients with no prior history of cardiovascular disease. However, the pairwise method is likely to become more accessible with the increasing availability of large electronic medical record databases, because such datasets can often provide the necessary volumes of longitudinal data (albeit of variable quality and completeness) before and after patients receive treatments.
In conclusion, the PERR and pairwise Cox methods offer a promising approach to addressing biases that can arise in observational studies because of lack of randomization and through further development could become a highly cost-effective way of using established datasets to answer questions about treatment effectiveness in clinical practice. The flexibility of the pairwise Cox likelihood offers a basis for generalizing the method, but widespread adoption of the approach will require further progress in addressing the challenges of dealing with prior events that influence treatment, terminal events, and the presence of time-varying confounding.

Appendix A:Estimate of h 0pi (⋅)
The log full likelihood is as follows: Using the argument similar to Johansen [26], we assume the hazard is 0, h 0pi (t) = 0, except at event times and the cumulative hazard is the summation of the hazards, that is, Then l full becomes Keeping and fixed, the MLEs of h pi and h si are the following: These together with (A.1) and (A.2) lead to the results (13) and (14).