Marginal hazard ratio estimates in joint frailty models for heart failure trials

Abstract This work is motivated by clinical trials in chronic heart failure disease, where treatment has effects both on morbidity (assessed as recurrent non‐fatal hospitalisations) and on mortality (assessed as cardiovascular death, CV death). Recently, a joint frailty proportional hazards model has been proposed for these kind of efficacy outcomes to account for a potential association between the risk rates for hospital admissions and CV death. However, more often clinical trial results are presented by treatment effect estimates that have been derived from marginal proportional hazards models, that is, a Cox model for mortality and an Andersen–Gill model for recurrent hospitalisations. We show how these marginal hazard ratios and their estimates depend on the association between the risk processes, when these are actually linked by shared or dependent frailty terms. First we derive the marginal hazard ratios as a function of time. Then, applying least false parameter theory, we show that the marginal hazard ratio estimate for the hospitalisation rate depends on study duration and on parameters of the underlying joint frailty model. In particular, we identify parameters, for example the treatment effect on mortality, that determine if the marginal hazard ratio estimate for hospitalisations is smaller, equal or larger than the conditional one. How this affects rejection probabilities is further investigated in simulation studies. Our findings can be used to interpret marginal hazard ratio estimates in heart failure trials and are illustrated by the results of the CHARM‐Preserved trial (where CHARM is the ‘Candesartan in Heart failure Assessment of Reduction in Mortality and morbidity’ programme).

considered, which precludes the occurrence of further recurrent events. Such kind of data particularly arise in clinical heart failure trials where treatment effects both on mortality (assessed as cardiovascular death, CV death) and morbidity (assessed as recurrent non-fatal hospitalisations) are to be evaluated (Zannad et al., 2013;Zanolla & Zardini, 2003). One example is the CHARM-Preserved trial (where CHARM is the 'Candesartan in Heart failure Assessment of Reduction in Mortality and morbidity' programme), a large multicentre trial in heart failure disease. Here treatment with the drug Candesartan did only affect the hospitalisation rate, but not mortality (Yusuf et al., 2003).
The CHARM-Preserved trial is only one of many heart failure trials, where a composite of heart failure hospitalisations (HFHs) and CV death was defined as the primary endpoint, with a time to first event analysis as primary analysis (Anker et al., 2016). This procedure is criticised for a long time, as only a part of the data is integrated in the analysis (Anker & McMurray, 2012). In order to capture the whole disease burden, alternative statistical approaches were suggested, for example to simultaneously consider death and hospitalisations as events in Poisson or negative binomial regression (Rogers et al., 2012;Rogers, Jhund, et al., 2014;. Hence, all available data are incorporated in the treatment effect estimate. However, every approach of aggregating all the available information into a single summary measure for the treatment effect does not differentiate between the treatment effects on mortality and morbidity (Ferreira-González et al., 2007). For this reason statistical models that allow to investigate both the effect on mortality and the effect on the hospitalisation rate, are of scientific interest.
Recently, a joint frailty regression model has been recommended to simultaneously evaluate treatment effects on HFHs and CV death whilst accounting for a potential association between the two processes (Rogers, Yaroshinsky, Pocock, Stokar, & Pogoda, 2016). Here a frailty term acts multiplicatively on both hazard rates, reflecting unexplained heterogeneity and inducing an association between the two event processes. In heart failure trials a positive association in the sense that, even conditional on covariates, a high hospitalisation risk comes along with a high mortality risk, seems plausible. Conditional on frailty, that is, on the subject's level, the joint frailty model implies a proportional hazards assumption for both risk processes (Huang & Liu, 2007;Liu, Wolfe, & Huang, 2004;Rondeau, Commenges, & Joly, 2003;Rondeau, Mathoulin-Pelissier, Jacqmin-Gadda, Brouste, & Soubeyran, 2007).
Even though the joint frailty model seems to reflect cardiovascular disease courses appropriately, in practice the effects on morbidity and mortality are still more often quantified by means of marginal proportional hazards models, that is, by the Cox model (Cox, 1972) for CV death and the Andersen-Gill model (Andersen & Gill, 1982) for the hospitalisation rate, respectively. Marginally (i.e. on the population's level), the proportional hazards assumption will be violated in the presence of a joint frailty term and treatment effect estimates relying on marginal proportional hazards models are expected to differ from those of conditional proportional hazards models (Aalen, Cook, & Røysland, 2015). Although results derived from joint frailty and marginal models have been contrasted for single case studies (Rogers, Jhund, et al., 2014;Rogers, Yaroshinsky, et al., 2016), there exists no systematic investigation of the differences in effect estimates. However, this is strongly needed not only for a proper interpretation of trial results but also for a careful planning of a clinical trial as power depends on effect size. For this reason, we investigate analytically the properties of marginal model estimates in the situation where the data-generating process actually corresponds to a joint frailty model.
In our derivations we use, that a maximum partial likelihood estimator (MPLE) in a misspecified proportional hazards model converges with sample size to a least false parameter, that is, a weighted average of the marginal hazard ratios over time (Struthers & Kalbfleisch, 1986). This result has already been used to investigate the consequences of non-proportional event-specific hazards in multistate models (Grambauer, Schumacher, & Beyersmann, 2010) and of covariate omission (Bretagnolle & Huber-Carol, 1988;Schmoor & Schumacher, 1997) or functional misspecification of covariates (Gerds & Schumacher, 2001) in the Cox model. Furthermore, Henderson and Oman (1999) and Cécilia-Joseph, Auvert, Broët, and Moreau (2015) used Struther's results to study the effects of non-consideration of unobserved heterogeneity in univariate survival analysis. However, the theory of least false parameters has not yet been applied to investigate the consequences of an association between the risk rates for recurrent and terminal events on marginal treatment effect estimates and rejection probabilities for the recurrent event outcome. This paper fills that gap: our findings explain the relation between marginal and joint frailty estimates, which is a relevant and as yet unanswered question in heart failure trials. Thus, they can contribute to the ongoing debate on defining an appropriate statistical model for recurrent hospitalisations and CV death (Anker et al., 2016).
The paper is organised as follows. In Section 2, we will introduce to the general framework of modelling recurrent events in the presence of a terminating event. The marginal hazard ratios as a function of time are derived in Section 3. In Section 4, we analytically derive results on the asymptotic marginal treatment effect estimates. In Section 5, these findings are related to clinical trials, for example the CHARM-Preserved trial in chronic heart failure. Simulation studies for the investigation of finite sample properties of the marginal analysis including rejection probabilities are given in Section 6. Finally, in Section 7 we conclude with a discussion where clinical trial results are interpreted in the light of our new results.
The hazards of the two counting process components are defined as the instantaneous rates of occurrence for a recurrent and for a terminal event, respectively, among individuals still alive and conditional on the process history: Let ( ) = ( ≥ ) be the at-risk-indicator. Then ( ) ( | ( )) ( = 1, 2) is the intensity of the respective counting process component.
The processes 1 and 2 are by definition dependent, as no recurrent events occur after death. But also beyond that, there might be an association between the event processes. To define that formally, consider the rate of the recurrent event process given the process history and the terminal event time = : * Then * 1 ( | ( ), = ) = 0 for ≥ . We consider the two processes as 'associated', if the rate * 1 ( | ( ), = ) does, for < , depend on . This means that a subject's time of death is informative for its prior recurrent event rate. The use of frailty variables, which represent risk factors that are not covered by the known covariates, has been proposed for modelling such an association of the event processes (Huang & Liu, 2007;Liu et al., 2004;Rondeau et al., 2003Rondeau et al., , 2007. We will first introduce a correlated frailty model, which is very flexible regarding the form of association. Thereafter we will consider a joint frailty model, which is more restrictive with respect to the kind of dependency between the processes. Instead of modelling the hazards ( | ( )) directly, frailty models are modelling conditional hazards. In general, we may think of positive random variables and , without making any assumptions on their joint distribution. The frailties and are assumed to be independent of the known covariates . The conditional hazards are defined as follows: The conditional hazards within the correlated frailty model are modelled as In the correlated frailty model 10 ( ), 20 ( ) and 1 , 2 are unspecified baseline hazard rates and regression coefficient vectors for the two event processes. The model contains a proportional hazards assumption on the subject's level (i.e. for the conditional hazards) and the censoring time is assumed to be independent. The association between the two event processes is solely captured by the dependency between the two frailty variables -meaning that the process history (except for covariates) is excluded from the conditional hazards and that 1 ( | , ) is a Poisson process. The processes 1 and 2 are associated on the marginal level, as shown in Appendix A.1.
Without pre-specification of the joint distribution of the frailties and , model (5) is not identifiable. For this reason, further assumptions, for example = or = with having a particular distribution were discussed (Huang & Liu, 2007;Liu et al., 2004;Rondeau et al., 2003Rondeau et al., , 2007. Conveniently, a gamma or a log-normal distribution with mean ( ) = 1 and variance Var( ) = is used for that purpose.
Below we will mainly focus on the case = .
In line with other authors (see Huang & Liu, 2007;Liu et al., 2004;Rondeau et al., 2003Rondeau et al., , 2007 we will refer to that model as joint frailty model: Both the frailty variance and the -parameter determine the degree of association between the two event processes. The frailty variance is additionally a measure for the correlation between subsequent recurrent event times.

MARGINAL HAZARDS IN THE CORRELATED FRAILT Y MODEL
As mentioned before, the hazard functions 1 ( | ( )) and 2 ( | ( )), which are defined in (2), are not modelled explicitly within the correlated frailty model. However, for a particular distribution of and , they are implicitly given by ) .
As we introduced 1 ( | , ) and 2 ( | , ) as the conditional hazards, we will call 1 ( | ) and 2 ( | ) the marginal hazards from now on. If the two event processes are associated according to the correlated frailty model (5), the marginal hazards are in general not proportional anymore, as shown below for both endpoints. Due to selection effects, a time-constant treatment effect on the subject's level will become time-dependent on the population's (marginal) level. In the following, we will briefly derive the marginal hazards and their ratios as a function of time. Thereby, we will focus on the situation of a two-arm randomised controlled trial (RCT). Hence, we consider to be a binary (1, ) distributed covariate indicating the treatment group. The results will afterwards be used to investigate how regression coefficient estimation is affected, if the estimation relies on (in general misspecified) marginal proportional hazards models.
For the derivation of the marginal hazards we will use the Laplace transform of the distribution of a random variable , which is defined as with being the probability density function of . The first derivative of  is given by If the two risk processes are associated according to the correlated frailty model (5), the marginal hazards are given by with conditional expectations being dissolved as Here Λ 20 ( ) = ∫ 0 20 ( ) d denotes the cumulative baseline hazard for mortality. Details of these derivations are given in Appendix A.2. In the special case of = the numerator in (12) simplifies to − ′ (exp( 2 )Λ 20 ( )) so that (12) and (13) coincide.
Using (10)-(13), the marginal hazard ratios for the two endpoints are given by Both marginal hazard ratios can be dissected into the product of the conditional hazard ratio and a deviation factor. Again, in the special case of = , the deviation factors in the hazard ratios of both endpoints coincide. The deviation factors of both endpoints depend on the frailty distributions, on the covariate-effect 2 on the terminal event rate and on the cumulative baseline terminal event rate Λ 20 ( ). As the latter is a function of time, the marginal hazard ratios are time-dependent in contrast to the time-constant conditional hazard ratios, It is worth mentioning that the deviation between the marginal and the conditional hazard ratio over time is independent of the regression coefficient 1 and the cumulative baseline hazard Λ 10 ( ) that describe the recurrent event process. This holds for both endpoints (recurrent and terminal).

ASYMPTOTICS OF MARGINAL HAZARD RATIO ESTIMATES
We will now derive asymptotic characteristics of marginal parameter estimates (log-hazard-ratio estimates) that erroneously rely on marginal proportional hazards models, when in fact the proportional hazards assumption holds on the conditional (subject's) level. Thus, instead of model (5), an Andersen-Gill model for the recurrent events and a Cox model for the terminal event is assumed. Both do not include the frailty variables and . As before, we consider a two-arm RCT setting with being binary. Under the Andersen-Gill and Cox modelling assumptions, regression coefficients are estimated by maximum partial likelihood estimators (MPLEs) (Andersen & Gill, 1982;Cox, 1972).
Let 1 < 2 < ⋯ < ≤ 0 = ∞ denote the recurrent event times, the right-censoring time, the time of death and the binary covariate for the th of subjects. The MPLEs relying on the marginal models are given bŷ ( ) is the set of subjects being at risk at time , that is, ( ) = { | ( ∧ ) ≥ }. In the Andersen-Gill estimator̂1, death is handled as independent censoring.
As shown by Struthers and Kalbfleisch (1986), the MPLE'ŝ1 and̂2 converge (in probability) to some least false parameters * 1 and * 2 , respectively. By applying Struther and Kalbfleisch's general results to our situation, the least false parameter * ( = 1, 2) can be identified as being the unique solution of the parameter integral equation ( ) = 0 with Here denotes the maximum of observation times. In addition,̄( ) ( ) denotes the probability that a randomly selected subject belongs to treatment group ∈ {0, 1} and is still at risk at time , that is, In the following, we apply (19) to investigate the asymptotic behaviour of marginal estimates for various settings. Thereby we will focus on the Andersen-Gill estimator̂1, as the properties of the Cox estimator̂2 in the presence of unexplained heterogeneity are already well investigated by other authors (see Cécilia-Joseph et al., 2015;Henderson & Oman, 1999;Schmoor & Schumacher, 1997).

No treatment effect on mortality ( = )
First of all, we will make some weak assumptions regarding censoring and baseline hazards: we will assume that the baseline hazards 10 ( ) and 20 ( ) are continuous functions which implies that the marginal hazard 1 ( | = ) and the marginal survival function ( | = ) are likewise continuous. By further assuming censoring to be continuous on (0, ), we obtain̄( ) ( ) to be continuous on (0, ). After Cauchy's mean value theorem there exists a * ∈ (0, ) so that As the integral is strictly positive, we obtain * 1 = log So the least false parameter * 1 is in the value range of the logarithmised marginal hazard ratio over the interval (0, ). If the treatment has no effect on mortality, that is, 2 = 0, the marginal hazard ratio is constant over time and coincides with the conditional hazard ratio exp( 1 ) as is evident from (14). Applying (22) shows that the Andersen-Gill estimator asymptotically coincides with the conditional treatment effect for recurrent events, that is, * 1 as the latter equation is true for all according to (14). This is illustrated in Figures 2-4 and in Table 3 for various settings in a joint gamma frailty model. But the result generally holds in every kind of correlated frailty model. Even though the marginal hazards 1 ( | = 1) and 1 ( | = 0) do not correspond to the their conditional counterparts 1 ( | = 1, ) and 1 ( | = 0, ), their respective ratios are equal. If treatment has no effect on mortality, the selection due to unexplained heterogeneity affects the treatment and the control group in the same way, resulting in a still proportional time course of the marginal hazards. Hence this is in fact the only situation, where the proportional hazards assumption is not violated on the marginal level. However, it should be underlined, that the processes are associated even if 2 = 0.

Joint frailty model
The least false parameter * 1 can be derived by numerically solving 1 ( ) = 0 (see formula (19)) after specifying the correlated frailty model. To characterise the marginal treatment effect estimate for recurrent events more in detail, we will consider the = , see formula (6)) from now on. We will show results for a gamma-distributed frailty with mean ( ) = 1 and variance Var( ) = . However, subsequent results do qualitatively not change if a log-normal distribution is adopted (results not shown). Furthermore, we will consider a 1:1 allocation ratio and administrative censoring after time units. The subject-specific hazards are assumed to originate from Weibull distributions. Thereby the conditional baseline hazard for the th endpoint (1 = recurrent, 2 = terminal) is given by 0 ( ) = −1 , with being the scale parameter and being the shape parameter. The latter determines, if the hazard is decreasing ( < 1), constant ( = 1) or increasing ( > 1) over time. Figure 1a shows the time course of the marginal recurrent event hazards for the situation where both processes are positively associated ( = 0.8), meaning that patients with a high mortality risk also have a high risk for recurrent events and vice versa. We further consider that treatment is protective with regard to both endpoints ( 1 < 0, 2 < 0) and that the conditional baseline hazards for both endpoints are constant over time. At the time point of randomisation ( = 0), treatment and control group do not differ with regard to the distribution of the frailty variables. Due to dropouts of frail patients, the marginal hazard is decreasing over time in both groups. But as selection effects differ between the groups ( 2 < 0), that decrease is non-proportional. Hence, the marginal hazard ratio corresponds to the conditional one only at time point 0 and is afterwards increasing over time, as shown as 'True' in Figure 1b. It is further demonstrated that the estimated marginal hazard ratio exp( * 1 ) at time point = is a weighted average of the marginal hazard ratios over the interval (0, ), shown as 'Estimate' in Figure 1b. For this reason, the marginal hazard ratio estimate depends on the length of follow-up.
In the following we focus on the evaluation of the estimate * 1 after a fixed follow-up period of = 2 time units. In each of Figures 2-4, the difference ( * 1 − 1 ) between the marginal treatment effect estimate (after two time units of follow-up) and the conditional treatment effect is shown in dependence of various parameters of the joint frailty model.
First, we will focus on situations where * 1 and 1 coincide: this is the case if either 2 = 0 (Figures 2b, 3 and 4) or = 0 (Figure 4b) or = 0 (Figures 3 and 4). It has already been shown analytically in Subsection 4.1 that * 1 = 1 , if treatment has no effect on mortality ( 2 = 0). If = 0, the joint frailty model reduces to a model with inter-individual heterogeneity in recurrent events but without an association between recurrent and terminal event rates. In this situation, the Andersen-Gill estimator is consistent for the regression parameter (i.e. * 1 = 1 ), as shown by other authors (Lin, Wei, Yang, & Ying, 2000;Liu, 2014). There is likewise no association if = 0 because then the submodel for recurrent events reduces to an Andersen-Gill model and of course 1 is estimated consistently by the Andersen-Gill estimator in that situation.
Next, we will take a closer look at parameters that do not affect the difference ( * 1 − 1 ): the scale parameter 1 of the conditional recurrent event hazard, which mainly determines how many recurrent events are expected within the trial time, has this property. This can easily be derived analytically, as 1 can be extracted as a factor in (19) and thus does not affect the solution of 1 ( ) = 0. The second parameter not influencing the difference ( * 1 − 1 ) is 1 itself, as illustrated in Figure 2 (however, an analytical proof for that result is lacking). Accordingly, results in Figures 3 and 4 do not depend on 1 , which is for that reason no longer specified there. ν 1 = 1,ν 2 =1 ν 1 = 2,ν 2 =0.5 F I G U R E 2 Asymptotic difference between the marginal treatment effect estimate * 1 and the conditional treatment effect 1 in a joint gamma frailty model with frailty variance = 1 and association parameter = 0.8. Note: Subject-specific hazards originate from Weibull distributions (scale parameters 1 = 3, 2 and shape parameters 1 , 2 ). The scale-parameter 2 is not shown here, but was in each scenario selected in such a way that the conditional survival probability at the end of follow-up is (2| = 0, = 1) = 0.7. Subjects are censored administratively after = 2 time units F I G U R E 4 Asymptotic difference between the marginal treatment effect estimate * 1 and the conditional treatment effect 1 in a joint gamma frailty model with frailty variance and association parameter . Note: Subject-specific hazards are constant (Weibull scale parameters 1 = 3 and T A B L E 1 Event numbers and (unadjusted) treatment effect estimates of the CHARM-Preserved trial according to  and Yusuf et al. (2003) Hazard ratio (95% CI) Marginal treatment effect estimates rely on the Cox model (CV death) or the Andersen-Gill model (heart failure hospitalisations). Conditional treatment effect estimates rely on a joint gamma frailty model. Now we focus on parameters that determine the absolute difference | * 1 − 1 | between the marginal treatment effect estimate and the conditional treatment effect. An increase in the frailty variance (Figures 3 and 4) and an increase in the absolute value of the association parameter | | (Figure 4) result in an increase of that absolute difference. In addition, increasing the mortality risk during follow-up leads to the same result ( Figure 3). As illustrated in Figure 2, the absolute difference further depends on the ratio 1 ∕ 2 between the shapes of the baseline hazards. The larger the 1 ∕ 2 ratio, the less recurrent events can occur before death and the more the marginal treatment effect estimate deviates from the conditional treatment effect.
In particular, the treatment effect on mortality 2 and the association parameter determine, if the marginal treatment effect estimate * 1 is smaller, equal or larger than the conditional treatment effect 1 , that is, they specify the sign of the difference ( * 1 − 1 ). This becomes apparent in Figure 4: if > 0, the difference is positive in case of 2 < 0 and negative in case of 2 > 0 ( Figure 4c). For the special situation of = 1, an analytical proof for that finding is given in Appendix A.3. The effects turn around for < 0, that is, the difference is negative in case of 2 < 0 and positive in case of 2 > 0 (Figure 4a). The latter scenario is however less reasonable for heart failure trials. So let us have a look at particular situations that might be most realistic in heart failure trials: here a positive association ( > 0) seems reasonable because patients with a high hospitalisation rate will probably die earlier than patients who are rarely in need for hospitalisation. In addition, most heart failure drugs either have a protective or negligible effect on mortality ( 2 ≤ 0). So in these settings, the marginal treatment effect estimate is larger than the conditional treatment effect ( * 1 ≥ 1 ). Hence, a conditional (joint frailty) analysis of the data will result in an estimate that favours treatment more compared to a marginal (Andersen-Gill) analysis. To illustrate that, we will consider the joint frailty setting that is shown in Figure 4c ( = 0.8, = 1) in greater detail: if 2 = −0.5, the difference of interest is given by ( * 1 − 1 ) = 0.045. Hence in case of 1 = −0.3 the marginal estimate is * 1 = −0.3 + 0.045 = −0.255. As already emphasised before, the parameter 1 itself does not affect the deviation of * 1 from 1 . Hence in case of 1 = −0.02, the marginal estimate is given by * 1 = −0.98 + 0.045 = 0.025. This example shows that there may exist situations, in which even the direction does not match between the marginal estimate and the conditional treatment effect. Generally said, it may happen that sign( * 1 ) ≠ sign( 1 ), corresponding to a misjudgement of the treatment effect direction for recurrent events.

APPLICATIONS
The 'Candesartan in Heart failure Assessment of Reduction in Mortality and morbidity' (CHARM) programme comprised three independent, randomised clinical trials that evaluated the benefit of Candesartan as a supplementary therapy in patients with chronic heart failure. The CHARM-Preserved trial focussed on heart failure patients with preserved ejection fraction (defined as >40% left ventricular ejection fraction). A total of 3,023 patients met the inclusion criteria and were randomised to receive either Candesartan or Placebo in addition to the recommended standard therapy. The median follow-up was 36.6 months. Regarding the primary endpoint of the trial, a composite of CV death and HFHs that was evaluated by a time to first event analysis, no treatment benefit could be shown Yusuf et al., 2003). Table 1 shows results on CHARM-Preserved that were reported in  and Yusuf et al. (2003). As apparent, CV death did not differ between the two groups. But patients in the Candesartan group had fewer HFHs compared to the Placebo group, which is reflected by a HFH hazard ratio of 0.71 in a marginal analysis (Andersen-Gill model) and of 0.69 in a conditional analysis with the joint gamma frailty model. We used our findings from Section 4 to derive the expected marginal hazard ratio estimates, thereby varying the frailty variance, and compare these with the observed estimates. For that, formula (19) is applied with parameters, which match the situation in CHARM-Preserved (see Table 2). Thereby we assumed an association parameter of = 0.8, as published joint frailty results from other heart failure trials (CHARM-Added, T A B L E 2 Expected marginal hazard ratio estimates in joint gamma frailty models that reflect the situation of the CHARM-Preserved trial (using constant baseline hazards with rates that match the observed event numbers and administrative censoring after = 3.05 years follow-up) CHARM-Alternative) suggest that heterogeneity might be greater with respect to HFHs than for mortality (Rogers et al., 2016). We can conclude that our results can well explain that there are only slight differences between marginal and conditional estimates in CHARM-Preserved: in the presence of a negligible treatment effect on the terminal event (CV death), the hazard ratio estimates for recurrent events (HFHs) from marginal and conditional models nearly coincide (see 2 = 0 from Subsection 4.1). As in CHARM-Preserved, also in many other heart failure trials the randomised treatment groups differ only slightly in their risk for CV death (Zheng et al., 2018). Therefore, marginal and conditional hazard ratio estimates are supposed to coincide in these trials. HFHs and CV death are also the outcomes of interest in type 2 diabetes studies. Opposite to heart failure trials, substantial treatment effects for CV death were shown here (Schnell, Rydén, Standl, & Ceriello, on behalf of the D&CVD EASD Study Group, 2017). For example, we consider the EMPA-REG trial (Zinman et al., 2015) that reported marginal hazard ratios of 0.65 for HFHs and 0.62 for CV death with a median observation time of 3.1 years. Note that the reported marginal HFH hazard ratio estimate is derived from a Cox analysis (time to first HFH) only and may therefore not exactly match to the (Andersen-Gill) estimate investigated in this paper. Following our findings, in this trial the conditional hazard ratio estimate for HFHs ought to be smaller than the marginal one, that is, exp( 1 ) < exp( * 1 ) = 0.65, if the rates for HFHs and CV death are positively associated ( > 0, > 0, see Subsection 4.2). However, as conditional estimates are not published, an empirical verification cannot be provided at this point.

SIMULATION STUDY
As differences between marginal and conditional hazard ratio estimates may also affect the rejection probability of the corresponding two-sided statistical Z-test (based on the Andersen-Gill estimate in combination with its robust standard error) for evaluation of the null hypothesis 0 ∶ 1 = 0, we performed simulations to investigate this effect. Data were simulated according to a joint gamma frailty model with association parameter = 0.8. We consider administrative censoring after = 2 time units. As before, the conditional baseline hazards for the two processes (1 = recurrent, 2 = terminal) originate from Weibull distributions with scale parameters 1 , 2 and shape parameters 1 , 2 . In all scenarios, the cumulative baseline hazard for recurrent events at the end of follow-up is Λ 10 (2) = 6, corresponding to six expected recurrent events during follow-up if no terminal event was present. This is achieved both in a setting with a constant baseline hazard ( 1 = 3, 1 = 1, see Figure 5 and Table 3) and a setting with a decreasing baseline hazard ( 1 = 4.24, 1 = 0.5, see Table 3) for recurrent events. Furthermore, we always consider a constant baseline mortality hazard ( 2 = 0.178, 2 = 1), corresponding to a probability of (2| = 0, = 1) = 0.7 to survive the follow-up.
In all scenarios shown in Figure 5 and in half of the scenarios shown in Table 3, no treatment effect on the recurrent event rate is present (i.e. 1 = 0). Here the rejection probability of the test corresponds to its type I error probability. The test is exact if 2 = 0 and anticonservative if 2 ≠ 0. In the latter case, the rejection probability under 0 increases with sample size (Figure 5) because the Andersen-Gill estimator̂1 converges in probability to * 1 ≠ 1 = 0, while its robust standard error (̂1) converges in probability to 0. Besides 2 , the rejection probability under 0 is of course also affected by all the additional joint frailty parameters that determine how much * 1 deviates from 1 = 0 (see Subsection 4.2). For example, it increases by increasing the frailty variance ( Figure 5, Table 3) and by switching from a constant to a decreasing baseline hazard for recurrent events (Table 3). Table 3 further shows scenarios with an existing treatment effect on the recurrent event rate ( 1 = −0.5). Here the rejection probability of the test corresponds to its power. In these selected scenarios, the null hypothesis is rejected almost for certain. The reason for this is that both baseline recurrent event rate and treatment effect were chosen to be relatively large. However, power would be rather poor in scenarios where * 1 = 0 but 1 ≠ 0. Furthermore, the simulation results of Table 3 confirm the effects that were already observed in the numerical findings shown in Figures 2-4: the difference (̂1 − 1 ) is zero in case of 2 = 0, while being positive for 2 < 0 and negative for 2 > 0. These directions of deviation are caused by the positive association of = 0.8 (compare with Figure 4c). In addition, the magnitude of the difference (̂1 − 1 ) is not affected by the treatment effect 1 itself (compare with Figure 2), but is increasing with the frailty variance (compare with Figures 3 and 4). The difference is smaller in the scenario with a decreasing subject-specific recurrent event hazard because here a subject has on average more recurrent events before death may terminate its follow-up compared to the scenario with a constant recurrent event hazard (compare with Figure 2a).
Although this paper is primarily dealing with the recurrent event endpoint, Table 3 also shows estimates and rejection probabilities of a marginal analysis for the terminal event (using the Cox model). Of course, the difference (̂2 − 2 ) is not affected by parameters that solely refer to recurrent events, for example the treatment effect 1 and the shape of the hazard for recurrent events. But the difference is increasing with the frailty variance and strongly depends on 2 : althougĥ2 = 2 in case of 2 = 0,̂2 is always shifted towards zero (relative to 2 ) in case of 2 ≠ 0. Importantly, the Cox analysis is keeping the type I error despite unobserved heterogeneity. These results are in line with those of other authors who already investigated the behaviour of Cox model estimation in the presence of unobserved heterogeneity (see Cécilia-Joseph et al., 2015;Henderson & Oman, 1999;Schmoor & Schumacher, 1997).
In each scenario shown in Table 3, we see slight differences between̂1 (or̂2), that is, the result obtained by simulation, and * 1 (or * 2 ), that is, the result obtained by numerically solving 1 ( ) = 0 (or 2 ( ) = 0). These differences may be due to asymptotics and/or slight numerical imprecision.

DISCUSSION
In the present paper, we derive the properties of marginal hazard ratio estimates for a recurrent event in the presence of an associated terminal event. Our findings are answering the up to now unresolved question if, why and how terminal events will affect marginal hazard ratio estimates for recurrent events. These results will contribute to a proper interpretation of marginal hazard ratios that are commonly presented from clinical trials. An important field of applications are clinical studies in heart failure, which have motivated our research: although the rate of hospitalisations due to heart failure disease will most probably be associated with the risk of CV death, the treatment effect on the hospitalisation rate is often quantified by the Andersen-Gill model -an approach that is assuming proportional hazards on the marginal level.
We have derived the least false parameter * 1 that is consistently estimated by the Andersen-Gill estimator in a correlated frailty model by applying theory on misspecified proportional hazards models (Struthers & Kalbfleisch, 1986). Within the correlated T A B L E 3 Simulation results (10,000 simulated datasets, each with 1,000 subjects) for a marginal analysis of data from a joint gamma frailty model with association parameter = 0.8, frailty variance and treatment effects 1 (recurrent events) and 2 (mortality) Subject-specific hazards originate from Weibull distributions with scale parameters 1 , 2 and shape parameters 1 , 2 . We consider constant subject-specific mortality hazards ( 2 = 0.178, 2 = 1) in combination both with (a) constant ( 1 = 3, 1 = 1) and (b) decreasing ( 1 = 4.24, 1 = 0.5) subject-specific recurrent event hazards. Subjects are censored administratively after = 2 time units. The table shows the estimates (̂), the robust standard errors ( (̂)) and rejection probabilities (rp) resulting from the simulation of a marginal model analysis. In addition, the asymptotically valid least false parameters ( * , numerical calculation) are shown. frailty model, an association between the recurrent and the terminal event process is induced by correlated random effects and . It implies the assumption of proportional hazards with treatment effects 1 (recurrent events) and 2 (terminal event) on the subject's level. Moreover, it assumes the subject-specific counting process for the recurrent events to be of Poisson type. On the marginal level, the proportional hazards assumption does in general not hold anymore, leading to a misspecification of the Andersen-Gill model. We have shown that the least false parameter is implicitly defined by the true underlying datagenerating process, that is, it depends on parameters of the correlated frailty model (treatment effects, baseline hazards, frailty distributions). In addition, the asymptotic estimate strongly depends on the censoring distribution including especially the length of follow-up. Using these derivations we proved that the Andersen-Gill estimator asymptotically coincides with the conditional (i.e. subject-specific) treatment effect, if the treatment does not affect mortality. Further results were obtained numerically for the case of a joint frailty model, where the frailties affecting the two processes have a deterministic dependency: here we have shown that the sign of the difference ( * 1 − 1 ) is, amongst others, determined by the treatment effect on mortality 2 . Moreover, our results suggest that the size of the difference ( * 1 − 1 ) is not affected by the treatment effect 1 itself. In particular, a misjudgement of the treatment effect direction on recurrent events is possible, that is, sign( * 1 ) ≠ sign( 1 ). In this paper, we focussed on the gamma frailty as it is most widely applied (Wienke, 2010). Equal conclusions are obtained when a lognormal frailty is applied. To investigate other frailty distributions, further numerical investigations or simulation studies could be performed. Some authors use the term 'bias' for describing deviations of marginal model estimates from conditional ones. As both approaches have different purposes depending on the aim of the study, differences between marginal and conditional model estimates are not a matter of bias. Marginal estimates provide an information about the treatment effect on the population's level, whereas conditional models, which account for the association by using frailty terms, are intended for the assessment of a treatment effect on the subject's level.

Simulation parameters Recurrent events
As pointed out by Bretagnolle and Huber-Carol (1988), Schmoor and Schumacher (1997) and Aalen et al. (2015), omission of relevant covariates in a usual Cox analysis for a survival outcome leads to non-causal treatment effect estimates despite randomisation. In the present paper, we transfer these concepts to the analysis of recurrent events and show that the same issue arises in the assessment of a treatment effect on recurrent events if an associated terminal event is present. If the treatment affects mortality, a constant conditional hazard ratio (i.e. on the subject's level) translates to a time-dependent marginal hazard ratio (i.e. on the population's level), resulting in a time-dependent marginal hazard ratio estimate exp( * 1 ). However, if treatment does not affect mortality, unobserved risk factors that affect both endpoints, remain balanced over time between the survivors in the treatment and the control group. Hence, the marginal treatment effect estimate coincides with the conditional treatment effect despite an association between the processes.
Recently, Rogers et al. (2016) recommended to evaluate future heart failure trials by using a joint frailty model. The authors performed simulations to study the behaviour of different marginal treatment effect estimates for the recurrent event endpoint, when informative dropouts according to a joint frailty model occur. Our findings extend these simulation results by providing numerical solutions for the Andersen-Gill estimates. Our analytical results are supported by clinical data of the CHARM-Preserved heart failure trial. Here the Andersen-Gill estimate does only slightly differ from the joint frailty estimate for the recurrent events because mortality seems to be unaffected by the Candesartan treatment Yusuf et al., 2003). Nowadays, lots of heart failure drug trials fail to show a treatment effect on mortality (Zheng et al., 2018), suggesting that marginal and conditional treatment effect estimates for the hospitalisation rate would coincide -provided the assumptions of a joint frailty model hold. Our results can also be applied for diabetes trials where cardiovascular outcomes such as HFHs and CV death are also of interest (Schnell et al., 2017). Furthermore, the findings presented here are relevant for the analysis of safety data in randomised clinical trials, where the treatment effect on the rate of recurrent, adverse events is of interest (Allignol, Beyersmann, & Schmoor, 2016;Hengelbrock, Gillhaus, Kloss, & Leverkus, 2016;Schmoor, Bender, Beyersmann, Kieser, & Schumacher, 2016). In that context, treatment discontinuation (e.g. due to death) represents the terminal event that is probably associated with the risk of adverse events.
In a recent article, Putter and van Houwelingen critically discussed the use of frailties in multistate models to account for possible associations between transition times. They have pointed out that deviations from the modelling assumptions can be absorbed by the frailty term in terms of estimating an increased frailty variance (Putter & van Houwelingen, 2015). Consequently, it can be difficult to disentangle (true) heterogeneity and violations of the modelling assumptions when applying the joint frailty model. Besides assuming proportional hazards on the subject's level, a further crucial assumption of this model is that the subject-specific hazards for both endpoints only depend on time, but not on the history of recurrent events. This is questionable and the model could be improved by allowing for event dependency on the subject's level. However, such complex models will result in identifiability problems, as it is difficult, if not impossible, to distinguish between event dependency and unexplained heterogeneity in the data.
As aforementioned, several potential pitfalls have to be considered when treatment effects are assessed by more and more complicated models. Therefore, simpler marginal models may be preferred and we need to be able to interpret their estimates, if the underlying data-generating processes are rather complex. We contribute to the interpretation of the Andersen-Gill estimate for the treatment effect on the recurrent event rate, if informative dropouts due to an associated terminal event are present. Our results complement previous articles reporting on treatment effect estimation in a misspecified Andersen-Gill model: Lin et al. (2000) and Liu (2014) showed that marginal and conditional treatment effects coincide, if unexplained heterogeneity, but no associated terminal event is present. Cheung, Xu, Tan, Cutts, and Milligan (2010) analysed asymptotic properties of the Andersen-Gill estimator in a model with unexplained heterogeneity and event dependency, again without considering an associated terminal event. To complete the picture, a straightforward next step would be to characterise the estimator's properties if the data-generating process contains both informative dropouts and event dependency, probably reflecting the most realistic scenario in heart failure trials.

SUPPORTING INFORMATION
Additional supporting information including source code to reproduce the results may be found online in the Supporting Information section at the end of the article. As and are dependent (via ), the conditional expectation ( | , = ) depends on . Consequently the rate * 1 ( | ( ), = ) does, for < , depend on . This is considered as an association of the processes throughout the paper.

A.3 Direction of deviation for the marginal treatment effect estimate in a joint gamma frailty model with =
Here we provide a proof regarding the sign of the difference ( * 1 − 1 ) for the special situation of a joint gamma frailty model with association parameter = 1. As shown by other authors (see, e.g. Wienke, 2010), the Laplace transform of a gamma distribution (with mean 1 and variance ) and its derivative are given by respectively. Using (14) for the special case of = , the marginal hazard ratio for the recurrent event endpoint simplifies to 1 ( | = 1) 1 ( | = 0) = exp( 1 ) 1 + Λ 20 ( ) 1 + exp( 2 )Λ 20 ( ) The sign of the treatment effect on mortality determines, in which direction the marginal hazard ratio deviates from the conditional one for every time-point during follow-up. Applying (22) results in