Multilevel joint frailty model for hierarchically clustered binary and survival data

Hierarchical data arise when observations are clustered into groups. Multilevel models are practically useful in these settings, but these models are elusive in the context of hierarchical data with mixed multivariate outcomes. In this article, we consider binary and survival outcomes and assume the hierarchical structure is induced by clustering of both outcomes within patients and clustering of patients within hospitals which frequently occur in multicenter studies. We introduce a multilevel joint frailty model that analyzes the outcomes simultaneously to jointly estimate their regression parameters and explicitly model within‐patient correlation between the outcomes and within‐hospital correlation separately for each outcome. Estimation is facilitated by a computationally efficient residual maximum likelihood method that further predicts cluster‐specific frailties for both outcomes and circumvents the formidable challenges induced by multidimensional integration that complicates the underlying likelihood. The performance of the model and estimation procedure is investigated via extensive simulation studies. The practical utility of the model is illustrated through simultaneous modeling of disease‐free survival and binary endpoint of platelet recovery in a multicenter allogeneic bone marrow transplantation dataset that motivates this study.


INTRODUCTION
In medical research, it is a common practice to analyze two or more outcomes to examine the relative efficacy of an experimental drug, identify important risk factors or investigate the surrogacy of one endpoint for the other. In many applications, the primary endpoint is a time-to-event outcome, such as overall survival, disease-free survival, and progression-free survival. Secondary endpoints of interest often include a biomarker response that is measured to examine a specific biologic activity in relation to survival. We consider the case where the biomarker response is a binary endpoint. For example, in prostate cancer clinical trials, measurements of prostate-specific antigen (PSA) are frequently taken, and the binary endpoint can be whether a patient achieves a PSA decline greater than 30% as compared to baseline. 1 Similarly, it could be a binary measure of tumor response, for example, define by a percentage shrinkage in tumor size. 2 Burzykowski and Molenberghs 3 developed a method for validating binary tumor endpoint as a surrogate for survival in advanced colorectal cancer trial of fluoropyrimidine-based therapies. Other examples include binary measures of disability status in physical functioning studies defined by activities of daily living or the global activity limitation indicator. 4,5 This present research is motivated by features of the bone marrow transplantation data detailed in Copelan et al. 6 The data come from a multicenter study in which patients received allogeneic bone marrow transplantation for their leukemia. In posttransplantation, depressed platelet count is an immediate event, and recovery to their normal levels may occur for some patients at which time their prognosis for treatment related relapse or death may change. 7 The study enrolled 137 patients and transplants were conducted at four hospitals. The data provide measurements of time-to-relapse or death (ie, disease-free survival), binary indicator of platelet recovery (0-platelets never returned to normal, 1-platelets returned to normal) and some potentially important risk factors measured at the time of transplantation. Lai and Yau 8 analyzed this set of data to illustrate the methodology of mixture cure modeling of clustered survival data. In allogeneic transplantation studies, for example, as in the work of Ramirez et al 9 and Akahoshi et al 10 time-to-event endpoint and platelet recovery constitute two important outcome measures and it is frequently of scientific interest to identify the associated risk factors and explore their relations. In the light of this, the bone marrow transplantation data can reasonably be regarded as multilevel data, taking the observations of disease-free survival and platelet recovery as lower level units clustered within patients and patients, on the other hand, as higher level units clustered within hospitals. The data are actually correlated because each patient contributes two outcomes, and patients in the same hospital share the same environmental effect. Neglecting the multilevel structure or correlation may conceal important information in the data and potentially lead to wrong inferences. This, however, presents a formidable challenge in analyzing the data as to our knowledge there are no multilevel models in the literature that simultaneously analyze survival and binary endpoint. The existing methodological studies on multilevel modeling are predominantly customized to a single outcome measure, that is, either survival, continuous or categorical outcome, for example, as in Ng et al, 11 Austin and Merlo, 12 and Tawiah et al. 13 Joint modeling remains the centerpiece to the simultaneous modeling of outcomes to account for their correlation. This modeling approach has received a considerable attention in developmental toxicology experiments of laboratory animals clustered within litters and various variants of the model have been proposed therein for outcomes that are categorical (or binary) and continuous, termed as mixed responses. 14 Fitzmaurice and Laird 15 introduced a marginal modeling approach that treats the correlation as a nuisance characteristic of the data. In the presence of clustering, the number of nuisance parameters that need to be estimated proliferates rapidly with increasing cluster size and a solution to the underlying likelihood equation quickly becomes intractable. Prentice and Zhao 16 applied the generalized estimating equations for covariance matrix of mixed responses and used a class of quadratic exponential models to develop joint estimating equations for the mean and covariance parameters. Catalano and Ryan 17 presented a latent variable approach on the assumption that both outcomes have a common bivariate normal distribution. On this assumption, the categorical outcome is conceived to be a discretization of an unobserved continuous variable. Their model includes standard random effects model for continuous outcome and a correlated probit model for discrete outcome as special cases and it provides quasi-likelihood techniques for estimation of parameters. A similar approach is provided in Teixeira-Pinto and Normand 18 and a Bayesian latent variable method is considered by Dunson. 19 The latent variable approach has long history in the social sciences as a statistical tool for characterizing the relationship between categorical variables, through factor analytic methods. 20 A similar latent variable modeling approach has been used in a thresholding modeling technique to formalize the relationship between longitudinal mixed responses within the mixture modeling methodology and estimation is handled in a Bayesian way by the MCMC method. 21 The model is applied to identify typical patterns of temporal evolution of several factors that are related to poverty and material deprivation, measured by continuous, binary and ordinal variables. In the pattern recognition literature, Ng et al 22 developed an unsupervised mixture regression model of multivariate generalized Bernoulli distributions for cluster analysis of categorical outcome features with mixed risk features. Lin et al 14 proposed a bivariate random effect joint modeling approach with maximum likelihood (ML) estimation method. The model encompasses the logistic and linear regression as submodels. Wu and de Leon 23 developed a copula-based random effect approach. These studies neither considered joint modeling of binary and time-to-event outcomes nor multilevel structure modeling.
The development and application of joint models also appear in survival analysis. One such approach is introduced by Tawiah et al 24 and Ng et al 25 using a multivariate frailty (or random effect) to adjust for dependent censoring in the sense of modeling the association between longitudinal recurrent event gap times and a terminal event time. Other authors proposed joint models for concurrent modeling of progression-free survival and overall survival in oncology clinical trials. 26,27 In longitudinal studies, the concept of joint modeling has evolved considerably to jointly analyze a survival endpoint and repeated measures of a biomarker process such as CD4-lymphocyte counts in HIV/AIDS clinical trials and antibody measurements in cancer vaccine trials. 28,29 These approaches to joint modeling predominantly apply the proportional hazards (PH) model for the survival endpoint and incorporate the longitudinal biomarker process as a time-dependent continuous covariate with possibly missing data and measurement error, where the covariate is adjusted for in the PH model and modeled by repeated measures mixed models. Bartolucci and Farcomeni 30 introduced  a common random effect with distribution following a continuous-time hidden Markov chain to jointly model the longitudinal and survival process, whereby parameters are estimated by maximum likelihood. Zhou et al 31 used a Bayesian  approach with MCMC sampling for estimation, allowing dynamic covariate process. Only a few studies have developed methodologies for simultaneous modeling of survival and binary outcomes. Among these, Inoue et al 2 used the mixture modeling concept, and more recently, Chen and Wang 32 considered a joint modeling approach via the use of random effects and developed a maximum penalized likelihood method for estimation. This study considered a joint model with a one-level multivariate random effect incorporated to correlate the outcomes of different individuals within a cluster. In contrast, our present paper deals with hierarchically clustered binary and survival data, and proposes a multilevel joint frailty modeling approach that admits nested random effects at the levels of clustering in the data, using the bone marrow transplantation data as a motivating example. At patient level, we reasonably suspect within-subject correlation between the binary and the survival outcome and so we incorporate a multivariate random effect with a specified correlation structure. At hospital level, within-hospital correlation may be present separately for each outcome. In consequence, we introduce two independent random hospital effects to characterize the correlation separately for the binary and survival outcome. The model adjusts for covariate effects through the logistic and the PH intensity function. The multilevel random effect structure increases model complexity and difficulty in estimation and computation. Particularly, specifying the multivariate random effect at patient level slows down computation considerably and leads to computational memory problems. To avoid these problems and ease difficulties, we develop a computationally efficient residual maximum likelihood (REML) method for estimation and inference. The REML method arises from the context of the generalized linear mixed model (GLMM) methodology. 33,34 It allows the fixed effects and the random effects to be estimated alike by maximizing a best linear unbiased predictor (BLUP)-type log-likelihood, and extending these estimates to compute the variance component estimates via a system of estimating equations formed from the first-order derivative of the REML log-likelihood.
The article is organized as follows. In Section 2, we present the proposed multilevel joint frailty model and the REML estimation procedure. In Section 3, simulation studies are presented in a small sample setting to validate the model and the estimation procedure. Section 4 examines the practical relevance of the multilevel joint model and the REML estimation procedure through real numerical application using the multicenter bone marrow transplantation data. Concluding remarks in the form of discussion are presented in Section 5.

Data notation and joint model
We suppose both survival and binary outcomes are each measured from n number of patients nested within m number of hospitals in a multicenter study setting.
the jth patient in the ith hospital, where n i is the number of patients in the ith hospital. We define t ij = min as the observed survival time and the associated censoring indicator on the jth patient in the ith hospital, where t * ij and c ij are the uncensored and censored survival time, respectively. We assume that c ij is noninformative and independent of t * ij . Throughout this article, right-censoring is assumed. Furthermore, we denote y ij as the binary biomarker response measured on the jth patient in the ith hospital and z ij = (z ij1 , … , z ijp ) T is considered as a p dimensional covariate vector, where the superscript T denotes vector transpose. The covariate vector may contain data for treatment arms and prognostic variables. The binary measure y ij takes on values 0/1 to indicate the absence or presence of a biomarker response.
We assume that the binary biomarker response y ij and the survival time t ij from the same patient tend to be correlated due to patient-specific latent risk factors (eg, lifestyle, genetic predisposition), collectively termed as frailty that is associated with both outcomes. As there inherently exists individual differences in frailty it is reasonable to expect between-patient heterogeneity in the distribution of the outcomes. Moreover, observations are taken from different hospitals, therefore, inter-hospital variation in the baseline characteristics of the patients as well as differences in service practice across the hospitals could induce heterogeneity in the outcomes. Note that many of these hospital-level risk variables are often latent and thus could similarly be regarded as frailty in a collective sense.
Let (.) = Pr(y ij = 1) denote the probability of observing a biomarker response. The survival time t ij may be modeled in terms of a hazard function or a survival function. Consistent with previous studies, 8,35 we choose the former and let (.) denote the hazard function for a survival outcome. Furthermore, we let q ij = (u T ij , U T ij ) T denote a multivariate random effect for the jth patient in the ith hospital, where u ij and U ij represent the patient-specific frailty terms associated with the binary and the survival endpoint, respectively. For patient j, we assume that the correlation between y ij and t ij can be modeled by the random effects u ij and U ij . Denote by v i and V i the random effects for hospital i associated with the binary and the survival outcome, respectively. We assume that y ij and t ij are independent conditional on u ij , U ij , v i , and V ij . This provides a two-level model that jointly analyzes y ij and t ij to account for the correlation and heterogeneity in the data. The model is stated as follows: ( where w ij = (1 z T ij ) T , ij and ij are the linear predictors corresponding to biomarker response probability and the hazard rate for the survival outcome, = ( 0 , 1 , … , p ) is the fixed effect parameter vector that measures the association between z ij and y ij , = ( 1 , … , p ) is the fixed effect parameter vector that relates z ij to t ij and 0 (.) is the baseline hazard function for the survival outcome which may be parametrically or nonparametrically specified. In fully parametric survival models, the Weibull distribution is commonly used as is flexible in the baseline hazard for either decreasing, constant, or increasing hazard. 11 However, such parametric models involve stronger distributional assumptions than is desirable and inferences may not be robust to departures from these assumptions. 36 Additionally, verifying the distributional assumptions could be challenging in practice. 37 Hence, we do not impose any distributional assumptions on the baseline hazard 0 (.) in (1). Therefore, the model in (1) is semiparametric in that 0 (.) is nonparametric, while exp ( are parametrically specified. The specification of 1 in w ij allows the probability submodel in (1) to incorporate an intercept term 0 . As in our previous studies, 13,38 the linear predictors can be extended to incorporate random slopes, for example, to examine the variability in treatment effect across hospitals. Such analysis is often warranted in multi-center clinical trials when response to treatment tend to vary across trial centers (ie, treatment-by-center interaction) such that the fixed effect estimate for treatment is not sufficient to describe the trial results. Note that the covariate information is available at the discretion of the practitioner or data analyst based on certain scientific rationale, for example. Therefore, the covariates may not be necessarily the same for the biomarker response and the survival outcome.
Let u = (u 1 , … , u n ) T and U = (U 1 , … , U n ) T denote random vectors for the realizations of u ij and U ij such that q = (u T , U T ) T . The random effects u and U are frailties at patient-level and can be regarded as unobserved or missing patient-level covariates associated with y ij and t ij , respectively. To allow for correlation between y ij and t ij , we assume that q follows a multivariate normal distribution N(0, Σ), where Σ = Γ ⊗ I n defines the variance covariance matrix of the distribution, ⊗ is a Kronecker product, I n is an n × n identity matrix, and Γ is given by The variance component parameters u and U in Γ measure between-patient heterogeneity in the distribution of y ij and t ij , respectively. Larger values depict substantial heterogeneity between patients. Also, is an association parameter that quantifies the correlation between y ij and t ij within patients. In practice, admitting into the model is realistic because y ij and t ij originate from the same patient and so they may share a common latent trait that characterizes their correlation. Exploring such association between a biomarker response and a survival outcome affords the potential to identify a new biomarker which is essential for monitoring and predicting prognosis. From statistical viewpoint, it is appropriate to allow y ij and t ij to be correlated for the same patient because omitting the correlation may lead to incorrect inferences when in fact it is present in the data. Analogously, we choose v = (v 1 , … , v m ) T and V = (V 1 , … , V m ) T to denote random vectors of v i and V i assumed independent of each other and also independent of q. We further assume that v has the distribution N(0, 2 v I m ) and that of V is N(0, 2 V I m ), where I m is an m × m identity matrix, and 2 v and 2 V quantify the extent of unobserved hospital heterogeneity (or within-hospital correlation) in y ij and t ij , respectively. The normality assumption for the random effects is considered mainly for mathematical convenience. It is important to mention that the model in (1) includes several existing models as special cases. For instance, when = 0 the model degenerates to two independent models, that is, the multilevel mixed effect logistic model and the multilevel PH frailty model. When ≠ 0, (1) yields a two-level joint model which has not appeared in the literature yet and so software packages for fitting the existing special case models are not directly applicable. In Section 2.2, we develop a multivariate REML estimation method for the proposed multilevel joint model (1) based on a generalization of the GLMM approach of McGilchrist and Yau. 33,34

Estimation procedure
To this end we develop an estimation method for the proposed multilevel joint frailty model in (1). The marginal likelihood for the model is an integral over the random effects space, which is usually not available in a closed form, except in few scenarios, for example, gamma frailty models with only one random effect. 39 For log-normal or normal random effect frailty models a closed form expression for the marginal likelihood does not exist in general. Integral approximation by numerical quadrature methods may be considered. Nevertheless, these methods are usually infeasible when multiple random effects are involved. Moreover, they require the baseline hazard function to be parametrically specified, otherwise non-parametric terms would appear in the integrand which cannot be estimated directly by the quadrature. 40 The Monte-Carlo EM algorithm could be considered as an alternative method treating the random effects as missing data. Similarly, this approach often has intractable E-step and complicated M-step due to the necessity of finding and maximizing the conditional expectation of the complete data log-likelihood in the presence of nonparametric baseline hazard function. 41 Given the multiple random effect terms presented in our model, application of these methods can be challenging. Instead, we develop a multivariate REML estimation approach that circumvents these complications. In GLMMs, the ML is a natural competitor, nevertheless, the REML method performs well in terms empirical bias of model parameters. 13,33 Let W, Z, R 1 , and R 2 denote the design matrices corresponding to , , u∕U, and v∕V respectively. In matrix notation, the linear predictors of model (1) may be expressed as To proceed, we define  1 as a joint likelihood function for the conditional independence of y ij and t ij , given by u, v) and  ( , U, V) are the respective likelihood functions for y ij and t ij with u, U, v, and V conditionally fixed. More precisely, we may write where is the cumulative hazard function of t ij . Let t r = {t 1 < · · · < t K } denote a reordering of t ij in ascending order with corresponding reordered vector of censoring indicator Δ r . The binary response y ij and the linear predictors ij and ij are reordered accordingly, with their reordered vectors denoted by y r , r , and r , respectively. Given that the baseline hazard 0 (.) is nonparametrically specified, it is admissible to regard it as a nuisance parameter and therefore eliminate it from the conditional joint likelihood in (2). We achieved this by means of profile likelihood construction, 42 which provides an appropriate partial likelihood for conditional on U and V in terms of . Letting 1 = log( 1 ) it follows that where R(r) is the risk set at t r , that is, the set of individuals who have not encountered the survival endpoint at that time and l is the corresponding linear predictor. McGilchrist and Yau 33,34 extended the BLUP concept of normal mixed models to construct a penalized likelihood for estimating the parameters of random effect models in GLMMs. Following their work and letting Ω = ( , , u, U, v, V) and Φ = we define Ω = 1 + 2 as a multivariate BLUP-type log-likelihood, where 2 is a penalty term denoting the logarithm of the joint density function on the basis of q = (u T , U T ) T , v and V given by The term multivariate is adopted because our model and the above BLUP log-likelihood deals with a multivariate out- as the maximizer of the BLUP log-likelihood. Standard numerical algorithms like the Newton-Raphson iterative method is an excellent choice forΩ. This method involves the calculation of the observed information matrix G using G = − 2 Ω ∕ Ω Ω T (Appendix A) and application of its inverse in each iteration. Note that G forms a block matrix with partitions conformable to | |q|v|V. For ease of presentation, we write G as G can be very massive in dimension, particularly in applications with large cluster size and/or large sample size because the cluster size at patient-level increases with increasing sample size. For instance, if we consider a data example with a sample size of n = 500, p = 3 covariates and m = 30 hospitals, matrix G will present 1067 × 1067 dimension. Obtaining a direct inverse of G is computationally demanding and thus presents a high cost of computational time and memory problems to the Newton-Raphson estimators. The bone marrow transplantation data analyzed in Section 4 consist of n = 137 patients and m = 4 hospitals, yet this hurdle was encountered. For the simple frailty model, Therneau et al 43 suggested a sparse computation in which a piece of the information matrix is discarded and the Cholesky decomposition is then applied to it within each Newton-Raphson iteration to provide significant savings in computational space and time. However, this approach has substantial non-convergence issues and it leads to unacceptable results when the frailty variance component is large. 44 Moreover, for REML estimation ignoring a piece of the information matrix could be problematic because the REML estimators of the variance components depend on the inverse of the full information matrix. Contrary to the sparse computation, we applied the Cholesky factorization to the full observed information matrix. This speeds up estimation dramatically and eliminates memory issues. Let G −1 denote the Cholesky factorization of G, the Newton-Raphson method can be specified asΩ whereΩ (k) is the vector of BLUP estimates of Ω in the kth iteration and the score vector Ω ∕ Ω has the following simplification See Appendix A for the derivation of 1 ∕ and 1 ∕ . From above we may write G −1 as The estimates of the variance component parameters 2 u , 2 U , , 2 v , and 2 V are obtained from REML estimators, given by a solution to the first order derivative of the REML log-likelihood. 34 Details can be found in Appendix B. The solution provides the following estimating equations for 2 where tr denotes the trace of a matrix, and K 1 , K 2 , and K 3 are block matrices defined as , and K 3 = For computational implementation of the estimation procedure, we allow the initial values of the fixed effect parameters and to be estimated by the standard logistic regression model and the Cox PH model, respectively. By standard, we mean these models do not include random effects. In addition, we thoroughly investigated the scenario where these parameters are initialized at zero and found that this approach and the above procedure converge to the same estimates for the foregoing REML estimation. For the variance components, their initial values are set to be relatively small and those of the random effects are set to zero. The asymptotic standard error estimates are obtained as part of the estimation procedure at maximum. For the fixed effect parameters and , their standard error estimates are readily computed from H , and H , , respectively. On the other hand, the asymptotic standard errors of the variance components are obtained by inverting the REML information matrix. 34 Details can be found in Appendix B. Complete details of the computational procedure are summarized below: Step 2. Given the current estimates of Ω and Φ, use (5) to update the BLUP estimates in Ω.
Step 3. Update the variance components in Φ by use of the REML estimating Equations (6).
Step 5. Compute the asymptotic standard errors of , and those of Φ = Convergence of estimation is attained when both max |Ω k − Ω k−1 | < 1 and max |Φ k − Φ k−1 | < 2 criteria are met simultaneously, where 1 and 2 are small values and 1 ≤ 2 . On convergence, the information matrix G provides the variance estimates of the random effects which can be used to calculate their prediction intervals. 13 For example, using the empirical Bayes (EB) method 45,46 the variances of the random effects u ij and U ij for the jth patient in the ith hospital can be computed from where O ij denotes the observed data on the jth patient in the ith hospital. Note that − 2 Ω ∕ u u T and − 2 Ω ∕ U U T in (7) can be obtained from the matrix G q,q in the observed information matrix G. Alternatively, the conditional mean square error of prediction (CMSEP) method can be used to estimate the variances of the random effects. 46 This is likewise obtained from matrix G q,q , but unlike the EB method, the CMSEP approach accounts for the extra variability caused by estimating the fixed effects and the baseline hazard. It is important to mention that in the proposed REML estimation, the CMSEP will exclude the variability of the baseline hazard as it is eliminated from the estimation process. Once the variance estimates of u and U are calculated by either of these methods, their standard errors are obtained by the square root of the variance estimates and a 95% prediction interval of the random effects can be constructed based on the normal approximation. Prediction inference of random effects has merits in the analysis of clustered data. For example, it is employed in litter mates tumorigenesis experiments, multi-center clinical trials, small area estimation and disease mapping, and in some cases it is used as a diagnostic tool for identifying outlying clusters or checking for violations of the normality assumption for the random effects. 46,47 Note that a slight modification of our proposed REML approach yields ML estimators for the model. The ML estimation analogously maximizes the BLUP log-likelihood to estimate the regression parameters Ω = ( , , u, U, v, V) and obtain ML estimating equations to estimate the variance component parameters Φ = . Nonetheless, we do not consider the ML estimation in our article because its inferiority to the REML method, in terms of accuracy of empirical bias is extensively documented; for example, see References 13 and 33 and references therein. Our proposed methodology is computationally documented as an R coded program and can be found at the link provided in Data Availability Statement.

SIMULATION STUDIES
Simulation studies are presented to examine the performance of the proposed multilevel joint frailty model and the multivariate REML estimation method. We generated the data O ij = { t ij , ij , y ij , z ij } for two samples, that is, n = 480 and n = 510 according to two clustering designs, (a) the former with m = 10 hospitals and n i = 48 patients within each hospital, and (b) the later involving m = 15 hospitals and n i = 34 patients per hospital. We considered two covariates, given by where z ij1 is a binary and z ij2 is a continuous covariate. First, the binary covariate z ij1 representing a treatment variable, for example, is generated from the Bernoulli distribution with a 0.5 chance of randomization and the continuous variable z ij2 is sampled from the standard normal distribution. The random effect q ij = (u T ij , U T ij ) T for the jth patient nested in the ith hospital is generated from the multivariate normal distribution . Analogously, the random effects v i and V i for the binary and survival outcome from the ith hospital are independently generated from N ( , respectively. We varied the true parameter values of Σ ij , 2 v and 2 V to reflect moderate 02 in the second sample, we attained an approximate censoring rate of 10%. Each simulation involves 500 replications of the above data generation process and application of the proposed multilevel joint frailty model and the REML method for estimation. For comparison purpose, we additionally applied the independent modeling approach for estimation. The independent modeling ignores the correlation between y ij and t ij , and applies the multilevel mixed effect logistic model and the multilevel Cox-frailty model to separately analyze these outcomes, respectively. Both models provide ML estimates via the R packages lme4 and coxme, respectively. The current version of these packages do not provide the standard error estimates for the variance component parameters. The results are summarized in Table 1 and Appendix Table C1 in terms of average bias (bias), average of the asymptotic standard error (ASE), the empirical standard error (ESE) and the coverage probability (CP) of the 95% confidence interval based on the normal approximation. The empirical standard error is calculated to be the standard deviation of the parameter estimates over 500 replications. For all the simulations presented, the proposed multilevel joint frailty model provides minimal bias for both the fixed effect and the variance component parameters, confirming applicability of the model in a variety of numerical contexts. Its ASE for the fixed effect parameters are fairly comparable to their corresponding ESE and it produces CPs that are close to the nominal level, suggesting a good accuracy of standard error estimates for these parameters. The standard errors for the variance components 2 v and 2 V have acceptable accuracy, comparing ASE and ESE. However, for the variance components 2 u , 2 U and their ESE estimates are consistently smaller than their corresponding ASE estimates and their CPs are consistently estimated above the nominal level. This pattern is also seen in related models on the use of the REML method. 13,48 A possible reason is: the analytic simplification that derives the estimating equations for 2 u , 2 U , and has much increased the accuracy of their estimates. One should, therefore exercise caution when interpreting the statistical significance of these variance component parameters. Even in the case where the standard errors of the variance components are accurately estimated, it is not advisable to directly apply their ASE estimates to interpret statistical significance because the null hypotheses of the variance components lie on the boundary of parameter space, and so application of the normal approximation is inappropriate. 45 This could be remedied by using TA B L E 1 Results for simulated data with m = 10 hospitals and n i = 48 patients per hospital. the score test of homogeneity, the likelihood ratio test with corrected null distribution or prediction interval of the random effects. 13,45,46 The later approach allows one to interpret the significance of the random effects themselves, rather than their variance components; see for example, Ha et al 46 and Tawiah et al. 38 Compared to the proposed multilevel joint frailty model, the independent modeling approach yields larger bias for both the fixed effects and the variance component parameters. Due to the large magnitude of the bias of the fixed effects, their CPs tend to shrink from the nominal level. This is particularly apparent for 1 and 2 when takes on a moderate or a high level correlation in both samples. The variance component parameter 2 u is consistently estimated near zero by the ML estimator of the independent model and so it leads to substantial downward bias for this variance parameter. Similarly, 2 U has downward bias and it can be severe depending on the magnitude of its true value; larger true values tend to have severe negative bias than smaller true values. Note that there are cluster sizes of two at patient-level paired by the random effects u and U. These small cluster sizes might be the obvious reason for the poor performance of the ML estimators for 2 u and 2 U . However, in spite of these the proposed REML estimators provide small and acceptable biases for 2 u and 2 U . We further investigate the performance of the proposed model and the REML method in the setting of unbalanced number of patients across hospitals. We considered a sample of 137 patients under two clustering designs. The first design comprises m = 4 hospitals with number of patients per hospital (n i ) ranging from 17 to 76 (median, n i = 22) as in the bone marrow transplantation data analyzed in Section 4. In the second design, there are m = 11 hospitals, range of 6-22 number of patients per hospital and a median of 12. As reported in Table 2 the results are compared across censoring rates of 49% and 65% to gauge the performance of the model in different censoring contexts and unbalanced hospital sizes. In both clustering considerations, the model and the REML method perform satisfactorily well by providing acceptable biases for both the fixed effects and the variance component parameters. Also, the standard error estimates for the fixed effects have good accuracy. As censoring rate increases to high level, the bias of the fixed effect parameters of the binary TA B L E 2 Results for simulated data with unbalanced number of patients across hospitals based on the proposed model. submodel increases marginally. On the other hand, the bias of the fixed effect parameters of the survival submodel increases noticeably but remains in acceptable limits. This notwithstanding we noticed that in scenarios with high censoring rates, fewer number of hospitals and large intra-hospital correlation the bias of the variance components and that of the fixed effect parameters in the survival submodel increase rapidly.

APPLICATION TO DATA EXAMPLE
For leukemia patients, depressed platelet count is a common phenomenon after allogeneic bone marrow transplantation. Recovery of the platelet count to a self-sustaining level ≥ 40 × 10 9 ∕1 may occur for some patients at which time their prognosis for treatment related death or relapse may change. 7 The multicenter bone marrow transplantation study introduced in Section 1 enrolled 137 patients with acute myelocytic leukemia (AML) and acute lymphoblastic leukemia (ALL). The patients received allogeneic transplant at four hospitals in Australia and the United States. The number of patients per hospital ranged from 17 to 76 with median of 22. The data provide some potentially important risk factors measured at the time of transplantation. These include disease risk groups (ie, ALL, AML low risk and AML high risk) and characteristics of the patients/recipients and their donors such as age, gender and cytomegalovirus immune (CMV) status. Of the entire patient sample, 50% had positive CMV status and 42% donors were CMV positive. Both patients and donors had the median age of 28 years, with range of 7-52 and 2-56 years, respectively. The study comprised 58% male patients and 64% male donors and received a maximum follow-up of 7 years. Following transplantation, 61% of the patients experienced disease-free survival events, of which 21% had failed platelet recovery. Platelet recovery is provided as a binary measure (0-platelets never returned to normal, 1-platelets returned to normal) and disease-free survival is defined as the time from transplantation until relapse or death. Survival and platelet recovery are both important outcome measures in allogeneic TA B L E 3 The proposed model and the independent model fitted to the bone marrow transplantation data.

Proposed model
Proposed model with = 0 Independent model a transplantation research. 9 We simultaneously analyze platelet recovery and disease-free survival within the framework of the proposed model (1) to ascertain the associated risk factors, adjusting for the correlation between the outcomes and the multilevel structure of the data. The proposed REML method in Section 2.2 is used for estimation of model parameters. To allow comparison, the independent modeling approach is applied to the data and we further analyze the data within the context of the proposed model and the REML method under the setting where is restricted to zero. The results are reported in Table 3. The continuous covariates, that is, recipient and donor age are centered at their means. From the multilevel joint model the interpretation of the risk factors for platelet recovery is given by the parameters of the binary or probability submodel. We found no significant difference in the probability of platelet recovery for ALL vs AML low risk patients, and AML high risk vs AML low risk patients. The AML high risk group provides a negative coefficient estimate with an estimated odds ratio of 0.584 which suggests a lower chance of platelet recovery, but this is not statistically significant (95% CI: 0.12-2.93; P-value: 0.513). The probability of platelet recovery is higher in favor of female patients, but this is similarly not significant. From the independent modeling approach, increasing age of patients is identified to be a significant predictor of higher platelet recovery probability. However, the proposed multilevel joint frailty model slightly attenuates the effect of patient age on platelet recovery probability and it becomes statistically insignificant. Similarly, the proposed model identifies donor age as a statistically insignificant predictor for platelet recovery, but the independent modeling approach finds it to be significant. The probability of platelet recovery is higher for patients who were matched with female donors, compared to those who had male donors, nonetheless, the difference is not significant. Donors' positive CMV status is not a significant predictor of platelet recovery across both models. For disease-free survival, we found an insignificant difference between the ALL and AML low risk patients, but a significant difference between the AML high risk and the AML low risk patients in the context of both models. Compared with the AML low risk patients, the AML high risk patients have significantly elevated risk for disease-free survival events. The risk for disease-free survival events is lower for female patients, but did not differentiate significantly when compared to male patients. There is no significant difference in disease-free survival rate, comparing patients with positive and negative CMV status. Recipient age and donor characteristics did not reveal significant association with the risk for disease-free survival events.

Model components Parameter Estimate (SE) P-value Estimate (SE) P-value Estimate (SE) P-value
In terms of variance components, the estimates are large for the proposed model compared to the independent modeling approach. For instance, the independent model estimates 2 u near zero while the proposed multilevel joint model provides an estimate of 1.868. The estimate of 2 U is very large which depicts substantial between-patient heterogeneity for disease-free survival. At hospital level, there is a relatively large inter-hospital variability for platelet recovery compared to disease-free survival. Note that the P-values for the variance components have been reported in Table 3 but they have not been used to interpret the results. The proposed model finds a strong negative correlation (̂: −0.912) between platelet recovery probability and disease-free survival. The correlation depicts that a higher platelet recovery probability predicts a reduced risk for disease-free survival events, while a lower chance of platelet recovery predicts an increased risk for disease-free survival events. It is worth mentioning that the proposed multilevel joint frailty model alters the fixed effect estimates from those of the independent model when there is a meaningful correlation between the outcomes. As observed in the real data application, almost all the variables lost their statistical significance in the multilevel joint frailty model. This is because the strong correlation between platelet recovery probability and disease-free survival provides a considerable contribution in explaining the outcomes. Consequently, it weakened the effect (or contribution) of other variables and in turn rendered them to be non-significant. For instance, in Table 3 we observe that the fixed effect estimates of the significant variables (patient age and donor age for platelet recovery) identified by the independent model are shrunken in the multilevel joint frailty model in the application where the outcomes are strongly correlated. The shrinking effects manifested themselves in the calculation of P-values which led to loss of significance for those variables. To benchmark these results, the correlation parameter in the multilevel joint frailty model is restricted to zero (Table 3). Under this consideration, the fixed effect estimates of the aforementioned variables increased in magnitude and gained statistical significance. As expected, the fixed effect estimates are close and the statistically significant variables are identical to those of the independent modeling approach. For instance, the fixed effect parameter estimates for positive CMV status of donor in the binary submodel and that of patient age in the survival submodel changed from negative to positive consistent with the corresponding estimates produced by the independent model.
The negative association between platelet recovery and disease-free survival is apparent in Figure 1, the point estimates and the prediction intervals for the frailties u and U estimated from (7) using the EB method. From the figure we see that patients with lower probability for platelet recovery, for example, Patient 4, 5, and 21 have significantly higher risk for disease-free survival events, and those with higher probability for platelet recovery, for example, Patient 10, 11 and 50 have significantly better prognosis for disease-free survival. As mentioned previously, the prediction intervals for the frailties could be considered as an alternative way of gaining understanding and interpreting the amount of heterogeneity in the data. This is particularly useful, given that the P-values and the CIs of the variance components may not be accurately estimated. For instance, many of the interval lengths of the frailty term U do not include zero and thus could be used as evidence for between-patient heterogeneity for disease-free survival. For the frailty term u we similarly see several interval lengths that depart from zero. Correspondingly, the proposed multilevel joint frailty model provides larger estimates for the variance components of u and U ( Table 3). The prediction intervals for the hospital level frailties v and V can similarly be computed.
The bone marrow transplantation data have often been analyzed within the framework of semi-competing risks models with a prime focus on modeling the bivariate association between terminal (mortality) and non-terminal (morbidity) event time distribution, both subject to censoring (see, for example, Ghosh 49 ). These existing models account for dependence between two event time endpoints. Taking advantage of the multilevel structure of the data, our newly proposed method concurrently models patient and hospital effects, specifying a bivariate association between event time and binary endpoint which leads to formidable challenges as a common multivariate distribution for these variables is not in existence. The method adds to the existing knowledge by allowing simultaneous modeling of binary and event time F I G U R E 1 Patient-specific frailties and their 95% EB prediction interval for heterogeneity in (A) platelet recovery probability and the hazard rate for (B) disease-free survival events. endpoint to account for their dependence and estimation of hospital effects for each endpoint. Basing our method on REML estimation, it provides a joint prediction of random-patient and -hospital effects for both outcomes which also adds to the existing knowledge.
Given the complexity of the proposed model, it is worth documenting the calculation time of the proposed REML estimation method. Utilizing the bone marrow transplantation data (n = 137), the method completes estimation in 18.71 s using windows machine with Core i7-10510U CPU@1.80GHz processor. We further provide details on calculation time for the method by considering three simulated datasets with censoring rate ranging from 48% to 54% and sampling characteristics as follows: (n = 400, m = 10, n i = 40), (n = 750, m = 25, n i = 30), and (n = 1000, m = 40, n i = 25). Their corresponding calculation times are 30.25, 317.03, and 479.44 s, respectively. Comparing across different censoring rates above and below the reported rates for the same samples, we observed marginal differences in calculation times.

DISCUSSION
We introduce a multilevel joint modeling approach for hierarchically clustered/correlated binary and survival data. The multilevel structure is attributed clustering of binary and survival observations at patient level and clustering of patients at institutional level, for example, hospital. In the proposed modeling framework, both outcomes are fastened together and jointly modeled by a multivariate random patient effect with a specified variance-covariance structure and two independent random effects are considered to adjust for clustering effect at hospital level. Unlike the conventional approach that analyzes the outcomes in isolation and independently, the proposed model allows the outcomes to be analyzed simultaneously, taking advantage of their correlation to allow for valid inference. The shared frailty concept could be considered as an alternative approach for simultaneous modeling of the outcomes, for example, as adopted in the joint models of recurrent events and terminal event time. 50 However, one of the drawbacks of the shared frailty approach is that it does not permit negative correlation and it forces the frailties to be the same for both outcomes, 24 which may not fit well in some applications. Our proposed model, on the other hand, provides frailties that are unique to the outcomes and allows flexibility in modeling both positive and negative intra-patient correlation. An attractive feature of the proposed multilevel joint model is that it retains the characteristics of the widely used mixed effect logistic regression model and the Cox PH frailty model while providing extra parameter to characterize the correlation. Therefore, it is easy to use and interpret parameters because prior experience on the use of these featured models are generally extensible to the proposed multilevel joint model.
For estimation, we developed a computationally efficient multivariate REML method. We acknowledge that a Bayesian approach could be considered as an alternative estimation method for the proposed model. The likelihood of the model is complicated by a multidimensional integration that needs to be evaluated though is not a simple task as is not available in a closed form. When using a Bayesian approach, approximate calculation of the integral may be obtained by, for example, the Laplace approximation method or numerical integration methods, yet estimation can be time-consuming particularly when using the numerical methods. Comparatively, the proposed REML method is mathematically and computationally appealing as it circumvents the intractable integration. This method maximizes a multivariate BLUP-type log-likelihood formed from the joint distribution of the outcomes and the intervening random effect terms, in principle with the GLMM methodology. 33,34 Besides avoiding the formidable challenges posed by the intractable multidimensional integration, the BLUP preserves the baseline hazard elimination property of the partial likelihood to provide a semiparametric estimation for the joint model. Therefore, it is relatively easy to tolerate. The proposed REML estimation method shown desirable properties in simulation. In terms of bias, the proposed multilevel joint model outperformed the conventional independent modeling approach in all the correlation scenarios considered. Therefore, our proposal could be considered as a bias correction method for correlated survival and binary response data.
The proposed REML method is feasible when the number of clusters increases. For verification purpose, we checked the feasibility of method in simulated samples with much larger number of clusters (i.e., m = 100, and m = 200). Nevertheless, the limit of the method lies in the magnitude of sample size. Thus, like most statistical methods, the proposed method is prohibitive in extreme applications with super large sample sizes.
The advantages of the proposed model and the REML method include their ability to predict patient-level and hospital-level frailty effects. This provides useful information for identifying patients and hospitals with inferior outcomes. The model has broad applicability and could be adopted to answer many important biomedical and health sciences questions concerning simultaneous modeling of morbidity and mortality data, for example. The proposed model can be extended to handle longitudinal data with binary and survival endpoint, and a multivariate random effect with a more general covariance structure could be adopted to generalize the model to fit situations with more than two outcomes. Robustness of the normal assumption for the random effects is usually of statistical interest and could be investigated in future research. Let A = − 2 1 ∕ T , A = − 2 1 ∕ T , A = 2 1 ∕ T = 0 and A = 2 1 ∕ T = 0. The observed information matrix G is obtained from − 2 Ω ∕ Ω Ω T which equals  .