Investigation of one‐stage meta‐analysis methods for joint longitudinal and time‐to‐event data through simulation and real data application

Background: Joint modeling of longitudinal and time‐to‐event data is often advantageous over separate longitudinal or time‐to‐event analyses as it can account for study dropout, error in longitudinally measured covariates, and correlation between longitudinal and time‐to‐event outcomes. The current literature on joint modeling focuses mainly on the analysis of single studies with a lack of methods available for the meta‐analysis of joint data from multiple studies. Methods: We investigate a variety of one‐stage methods for the meta‐analysis of joint longitudinal and time‐to‐event outcome data. These methods are applied to the INDANA dataset to investigate longitudinally measured systolic blood pressure, with each of time to death, time to myocardial infarction, and time to stroke. Results are compared to separate longitudinal or time‐to‐event meta‐analyses. A simulation study is conducted to contrast separate versus joint analyses over a range of scenarios. Results: The performance of the examined one‐stage joint meta‐analytic models varied. Models that accounted for between study heterogeneity performed better than models that ignored it. Of the examined methods to account for between study heterogeneity, under the examined association structure, fixed effect approaches appeared preferable, whereas methods involving a baseline hazard stratified by study were least time intensive. Conclusions: One‐stage joint meta‐analytic models that accounted for between study heterogeneity using a mix of fixed effects or a stratified baseline hazard were reliable; however, models examined that included study level random effects in the association structure were less reliable.


INTRODUCTION
longitudinal outcome, and a Cox Proportional Hazards (PH) model with an unspecified baseline hazard for the time-to-event outcome. The two sub-models are linked through shared zero mean random effects, with common association parameter (represented using terms) for the random effects acting at the same level. Unlike joint models for single study data, the proposed models must account for the clustering of individuals within studies and model potential heterogeneity between these studies.
The one-stage joint model follows the structure W 2 (t) = (2) ( Z (2) b (2) ) Studies are identified by k = 1 … K, where K is the total number of studies in the meta-dataset. Individuals within each study are represented by i = 1 … n k , where n k denotes the total number of individuals in study k. The longitudinal measurement points are identified using j = 1 … m ki , where m ki represents the total number of longitudinal measurements recorded for individual i in study k.
The longitudinal measurement recorded for individual i in study k at time-point j is represented by Y kij , with the longitudinal error term kij . Fixed effects are represented using terms, with the first element of the subscript identifying the sub-model they belong to (such that 1 = 11 , 12 , 13 , … are the longitudinal sub-model fixed effects and 2 = 21 , 22 , 23 , … are the time-to-event sub-model fixed effects). Random effects are represented by b, with individual level (level 2) random effects represented by b ( ) ki and study level (level 3) random effects by b ( ) k . Design matrices are represented by X for the fixed effects and Z for the random effects. X 1 represents the longitudinal sub-model fixed effects design matrix, and X 2 represents the time-to-event sub-model fixed effects design matrix. Additionally, Z ( ) ki represents the design matrix for the individual level (level 2) random effects, and Z ( ) k represents the design matrix for the study level (level 3) random effects. The individual level random effects follow distribution b (2) ∼ N( , D), whereas the study level random effects follow distribution b (3) k ∼ N ( , A), and the error terms each follow distribution ∼ N(0, 2 e ). The individual level and the study level random effects are considered independent of each other and of the error terms. The random effects are intended to represent how covariate effects differ for units at the respective levels (individuals or studies) from those estimated for the overall population by the fixed effects, for example, how the individuals contained within a particular study differ from those in the overall population. As such, the Z matrices are assumed to be subsets of the X 1 matrix.
In the time-to-event sub-model, 0 (t) represents the unspecified baseline hazard. The sub-models are linked through shared zero mean random effects, with common association parameters (2) for the individual level random effects and (3) for the study level random effects. Note that if a particular component of the joint model is not required (eg, the study level random effects), terms involving this component (eg, Z ( ) k b ( ) k ) do not appear in the model. A range of model groups are investigated, which represent a variety of methods to account for between study heterogeneity. The specifications of the model groups are stated in Table 1. These models involve only longitudinal time (t kij ), a binary treatment assignment variable (treat ki ), and study membership (study ki ) as covariates. However, the models examined can be easily extended if other covariates are of interest to the MA. Note that instances of longitudinal time t kij in the association structure term W 2ki (t) (which is present in the time-to-event sub-model) are replaced by the individual's survival time T Ski .
Model group 0 in Table 1 is a naïve model which does not account for between study heterogeneity in any way. This model is presented here to highlight the consequence of ignoring the clustered nature of multi-study joint data. Note that any instances of longitudinal time in the association structure are replaced with the individual's survival time (denoted T Ski , equal to the minimum of their event and censoring times).
Model group 1 accounts for between study heterogeneity using a fixed study membership variable, along with its interaction with treatment assignment, in both sub-models. Study membership is expected to be a factor variable, and so a separate 13 , 14 , 22 , and 23 parameter will be produced for each study k in the meta-analysis (apart from the reference or baseline study), denoted 13k , 14k , 22k , and 23k . The study considered to be the reference study should be the representative of the population of interest. In model group 1, inclusion of the fixed study membership variable allows calculation of study specific fixed longitudinal trajectory intercepts (with 10 representing the fixed intercept for the reference study, and 10 + 14k for non-reference study k). Likewise, study specific longitudinal treatment effects can be calculated (with 13 representing the fixed longitudinal treatment effect for the reference study, and 13 + 15k for non-reference study k). In the time-to-event sub-model, the 22k parameter represents the difference in risk of an event between study k, and the Longitudinal sub-model Y kij = 10 + 11 t kij + 12 treat ki + (2) 0 + (2) 1 + ε Time-to-event sub-model ki (t) = 0 (t) exp ( 21 treat ki + W 2ki (t)) Association structure 2 ( ) = (2) ( (2) 0 + (2) 1 ) 1 Longitudinal sub-model Y kij = 10 + 11 t kij + 12 treat ki + 13 study ki + 14 treat ki * study ki + (2) 0 + (2) 1 + ε Time-to-event sub-model ki (t) = 0 (t) exp ( 21 treat ki + 22 study ki + 23 treat ki * study ki + W 2ki (t)) Association structure 2 ( ) = (2) Longitudinal sub-model Y kij = 10 + 11 t kij + 12 treat ki + 13 study ki Longitudinal sub-model Y kij = 10 + 11 t kij + 12 treat ki Longitudinal sub-model Y kij = 10 + 11 t kij + 12 treat ki + 13 study ki + 14 treat ki * study ki Longitudinal sub-model Y kij = 10 + 11 t kij + 12 treat ki + 13 study ki reference study. The deviation in risk of an event due to treatment group is equal to 21 for the reference study, and by 21 + 23k for non-reference study k. Model group 2 accounts for between study heterogeneity using a fixed study membership variable in both sub-models, and a study level zero-mean random treatment effect (b (3) 1k ). Study specific longitudinal trajectory intercepts and log-hazard ratio risks of an event for each study can be calculated from the fixed effects as for model group 1. The interpretation of the study specific random treatment effect b (3) 1k is more complex than for separate longitudinal or time-to-event one-stage MA-models due to its presence in both sub-models. In the longitudinal sub-model, the b (3) 1k term adjusts the overall population treatment effect coefficient 12 to give the observed treatment effect in study k of 12 + b (3) 1k . Through the association structure, b (3) 1k is present in the time-to-event sub-model. As such, the population treatment effect coefficient 21 is altered to give a study specific estimate of the deviation in the risk of an event due to treatment group ( 21 1k ). Model group 3 accounts for between study heterogeneity solely using study level random effects, as it involves a study level random intercept (b (3) 0k ) and random treatment effect (b (3) 1k ). Again, the interpretation of these random effects is more complex than for separate one-stage longitudinal or time-to-event MA-models due to their presence in both sub-models through the association structure. The study level random intercept b (3) 0k causes the longitudinal intercept for study k to equal 10 +b (3) 0k , but also (3) 0k represents the deviation in the risk of an event in the kth study from the population average taken across all studies in the meta-analysis. The interpretation of the random treatment effect (b (3) 1k ) is the same as for model group 2.
Model group 4 has a longitudinal sub-model with the same specification (and so interpretation) as model group 1. However, the baseline hazard in the time-to-event sub-model is stratified by study ( 0k (t)), and the time-to-event sub-model contains only a fixed treatment assignment term. As such, between study heterogeneity in the time-to-event model is captured by the study specific baseline hazards.
Model group 5 accounts for between study heterogeneity in a variety of ways. A fixed study membership term is included in the longitudinal sub-model, a study level random treatment effect (b (3) 1k ) is present in both sub-models through the association structure, and the baseline hazard of the time-to-event sub-model is stratified by study. Each component of the model has interpretations as already discussed.
In addition to the one-stage joint MA-models, we also fit separate longitudinal and time-to-event one-stage MA-models for the comparison with the joint estimates. These separate models have the same specification as the corresponding joint model sub-models, except for the W 2ki (t) term that is removed from the time-to-event one-stage MA-models.

MODEL FITTING
The models described in Section 2 were fitted using the Expectation Maximization (EM) algorithm, 22 whose use in single study joint modeling analyses has been described by Wulfsohn and Tsiatis 1 and Rizopoulos. 4 Starting values for the algorithm were extracted from initial separate longitudinal and time-to-event model fits (of the same specification as the corresponding sub-models of the joint model, excluding the association structure). In the Expectation or E-step, estimates of functions of random effects were calculated using pseudo-adaptive Gaussian quadrature procedures, 23 where conditional modes of the random effects calculated in the initial separate longitudinal model fit were used to calculate appropriate locations for the abscissa to be used throughout the model fitting process. In the Maximization or M-step, these estimated functions of the random effects were used to calculate maximum likelihood estimates of model parameters. The derived maximum likelihood estimators have been made available as Supplemental Material.

SOFTWARE
We developed a flexible R 24 code to fit one-stage multi-study joint models described in this article, which will be available as joineRmeta package; the R codes can currently be downloaded at https://github.com/mesudell/joineRmeta/. This software is an extension of the single study joint modeling package joineR 25 to the multi-study case. Example code and simulated data are available in the supplemental information, demonstrating methods discussed in this article.

Example data
To investigate the behavior of the proposed methods in a real world scenario, the methods were applied to a subset of the INDANA dataset. 26 This is a multi-study dataset compiled to investigate the effect of patient characteristics on the efficacy of pharmacological treatment for high blood pressure. The subset analyzed here (henceforward referred to as the INDANA dataset) contains any study identified by the INDANA collaboration 26 that supplied both longitudinal and time-to-event data and contains 6 studies (EWPHE, 27 COOP, 28 STOP, 29 SHEP, 30 MRC1, 31 and MRC2 32 ). The INDANA dataset concerns hypertensive patients assigned to one of two treatment groups, that is, any treatment for hypertension versus placebo, no treatment, or usual care. Longitudinally measured Systolic and Diastolic Blood Pressure were available, referred to as SBP and DBP. Three time-to-event outcomes were measured, namely, time to death, time to myocardial infarction (MI), and time to stroke.
The data contained 9 possible longitudinal time-points at baseline, 6 months, 1 year, and annually thereafter to a maximum of 7 years. The SHEP study recorded individuals at only 6 measurement times, while STOP and MRC1 presented 7 measurement times, with the remaining studies presenting data at each of the 9 possible measurement times. Only longitudinal data recorded prior to an individual's survival time contributed to the analyses. Tables of the number of measurements provided by each study at each time point are available in the supplemental information (supplemental tables S1-S3).
Analyses of SBP and each time-to-event outcome are presented in Tables 2 to 4. For EWPHE, an intention to treat analysis was only possible for fatal endpoints, and so the study only contributes to the analysis of SBP and time to death. As such, the final dataset examined contained a maximum of 6 studies totalling at most 29825 individuals. The exact number of individuals involved in each analysis is stated in the captions of Tables 2 to 4.
The aim of this investigation was to illustrate the proposed one-stage joint meta-analytic models, rather than to investigate potential treatment modifiers. As such, while the INDANA dataset contained a range of patient covariates that could influence the outcomes, models in this investigation included only treatment assignment, study membership, and the longitudinal time covariate.
The models of specification shown in Table 1 were fitted to the data for each combination of outcomes (SBP and each of time to death, time to MI, and time to stroke, with longitudinal outcome Y kij = SBP kij ). However, plots of the longitudinal trajectories for each study paneled by event type (Supplemental Figures S1-S3) indicated a changepoint early in the trajectories. A range of terms were tested to account for non-linearity due to the changepoint including t 2 , exp(−t kij ) and exp(−a * t kij ). Comparison of the log-likelihoods and AIC values of the models determined that inclusion of the term exp(−3 * t kij ) gave the best fit. Consequently, in addition to the terms stated in Table 1, each longitudinal sub-model also  −0.06 (−0.14, 0.03) (2) 0.013 (0.006, 0.017) (3) 0.000 (−0.039, 0.048)    (2) 0.034 (0.027, 0.041) (3) −0.076 (−0.171, 0.005) contained a exp(−3 * t kij ) term (for clarity, full model specifications for real data analyses are available in Supplemental  Table S4).

Results
In the models examined, a statistically significant negative treatment assignment coefficient in the time-to-event model would indicate that assignment to any treatment for hypertension versus placebo, no treatment, or usual care significantly reduced the risk of the event in question. Model groups 0, 2, 3, 4, and 5 each produce a single global time-to-event treatment effect estimate ( 21 ), whereas model group 1 produces study specific treatment effect estimates (calculated by 21 for the reference study, and by 21 + 23k for non-reference study k).
A statistically significant negative treatment assignment coefficient in the longitudinal sub-model would indicate that assignment to any treatment for hypertension significantly decreased SBP. Model groups 0, 2, 3, and 5 each produce a single global longitudinal treatment effect estimate ( 12 ), whereas model groups 1 and 4 produce study specific estimates (calculated by 12 for the reference study, and 12 + 14k for non-reference study k).
A statistically significant positive study level association parameter ( (3) ) indicates that individuals in studies with longitudinal outcome values above the corresponding overall population mean are at higher risk of experiencing the event at a given time point. A statistically significant positive individual level association parameter ( (2) ) indicates that individuals with longitudinal values above that predicted by the terms in the longitudinal sub-model (apart from the individual level random effects) are at higher risk of experiencing the event at a given time point. Association parameters were only estimated for joint analyses. Across all pairwise combinations of outcomes investigated, the estimated treatment effect from the separate longitudinal one-stage IPD-MA and the joint one-stage IPD-MA longitudinal sub-model was significant and negative, indicating that assignment to treatment for hypertension significantly reduced SBP compared to placebo, no treatment, or usual care. The estimated treatment effect from the separate and joint analyses agreed well across model groups examined, apart from model group 3 (which solely accounted for between study heterogeneity using study level random effects). Here, the separate results were similar to those produced by the other model groups; however, the results from the joint analysis, while still significant, were much smaller in magnitude than the joint results from the other modeling groups. In the separate group 3 model, the study level random effects accounted for between study heterogeneity in the longitudinal trajectory. However, in the joint model, they also accounted for between study heterogeneity in the time-to-event sub-model through their presence in the association structure. It was important to determine if sharing study level random effects in this way between sub-models caused bias in covariate estimates, examined through simulations in Section 5.

Results from the INDANA dataset meta-analyses
Throughout the analyses, the estimated time-to-event treatment coefficient from the joint one-stage IPD-MA models was smaller in magnitude than those from the separate one-stage IPD-MA model. However, the direction of the results agreed between the separate and the joint analyses. For SBP and time to death, the separate and joint analyses agreed in the significance of results, with a significant reduction in risk of death due to assignment to any treatment for hypertension estimated only for the STOP trial for model group 1. For SBP and time to MI, model groups 0, 2, 3, 4, and 5 for both the separate and joint analyses estimated significant negative global treatment effect estimates, indicating a significant reduction in risk of MI due to assignment to treatment for hypertension. However, for model group 1, only the study specific estimate for the SHEP trial from the joint analysis was significant. For SBP and time to stroke, model groups 0, 2, 3, 4, and 5 for both the separate and joint analyses estimated significant negative global treatment effect estimates, indicating a significant reduction in risk of stroke due to assignment to treatment for hypertension. These treatment assignment coefficients were larger in magnitude than the results for time to death or time to MI. For model group 1, the separate time-to-event model identified study specific significant treatment effects for COOP and MRC1; however, the joint analysis additionally identified significant effects for SHEP and STOP.
Individual level random effects were included in all model groups examined, causing the individual level association parameter (2) to be present in all model groups. For each set of outcomes examined, all model groups estimated significant positive values for (2) , indicating that individuals with SBP values above the corresponding population average are at higher risk of an event. We should note that model group 0 consistently estimated (2) values of larger magnitude than the other model groups (which were consistent in the magnitude of (2) estimated). This highlights the importance of accounting for between study heterogeneity in joint analyses of multi-study data.
Study level random effects were only employed in model groups 2, 3, and 5, meaning that the study level association parameter (3) was only estimated in these model groups. There was a noticeable discrepancy between results from model group 3 and model groups 2 or 5. Model group 3 contained both a study level random intercept and treatment effect, whereas model groups 2 and 5 contained only a study level random treatment effect. Model group 3 estimated a significant positive study level association parameter across all three sets of analyses (with interpretation that studies with SBP values above the population average were at higher risk of an event). However, as noted earlier, for the joint analysis, estimated parameters from model group 3 were inconsistent with the results produced by the other model groups. Model groups 2 and 5 estimated insignificant (3) values across the three sets of analyses, which were different in magnitude to model group 3, and had wide confidence intervals. These results motivated a simulation study to investigate when the use of shared study level random effects may be recommended.

SIMULATION INVESTIGATIONS
In practice, meta-analyses involve data with very different characteristics to those displayed in our real data example. For example, associations between the longitudinal and time-to-event outcomes may be different in significance and/or magnitude. The number of studies included in the meta-analysis might differ. There might be a different level of variability or heterogeneity between studies involved in the meta-analysis. To assess the behavior of the models stated in Table 1 under a range of these different conditions, a range of simulation investigations were conducted. These simulations can be split into three main sets: Simulation Set 1 investigates the models under different levels of association, Simulation Set 2 investigates differing numbers of studies included in the meta-analysis, and Simulation Set 3 investigates differing levels of between study heterogeneity. During the simulation investigations, data were firstly simulated using the models and methods discussed in Section 6.1. The models stated in Table 1 were then fitted to each simulated dataset, the results of which are presented in Section 6.2.

Data simulation
Data for each set of simulations were simulated under the same model structure, but with different model parameter values, which we will now describe. For each set of simulations, for each scenario, 1000 datasets were simulated.
For each dataset within each set of simulations, multi-study joint data were generated containing a single continuous normally distributed longitudinal outcome and a single censored time-to-event outcome. The number of included studies varies between simulation sets; however, each simulated study contained 500 individuals randomized equally to two treatment groups. A maximum of 10 longitudinal measurements at times 0, 0.25, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, and 4 were permitted, with measurements recorded only up to the individual's survival time (T Ski ). Data for all studies were simulated simultaneously, with any between study heterogeneity generated through specification of the distribution of study level random effects. The longitudinal data were simulated under Equation (2).
In Equation (2), the longitudinal outcome Y kij follows a linear mixed effects model containing fixed intercept, time and treatment assignment terms (with coefficients 10 , 11 , and 12 ), individual level random intercept and slope terms (b (2) 0 and b (2) 1 ), study level random intercept and treatment effect terms (b (3) 0k and b (3) 1k ), and an error term kij . The random effects follow multivariate normal distributions, with individual level random effects distributed b (2) ∼ N( , D), and study level random effects distributed b (3) k ∼N( , A). The random effects are independent of each other and of the error terms, which are considered to be independently and identically distributed ∼N(0, 2 e ). The simulation of time-to-event data under a proportional hazards model with time varying covariates is described by Bender et al 33 and Austin. 34 In these simulations, the time-to-event data were generated under Equation (3), where 0 (t) is an unspecified baseline hazard.
As a time varying covariate is present in the time-to-event sub-model (the individual level random time term b (2) 1 , present through the association structure), event times are simulated under a Gompertz distribution, as it has a baseline hazard that can vary over time. Consequently, individual event times T Eki are generated under Equation (4) In Equation (4), U ki is an individual specific realization from a Uniform U(0, 1) distribution. The parameters 0 (the exponential of which is the scale parameter of a Gompertz distribution) and 1 (the shape parameter of a Gompertz distribution) are used along with the coefficients in the model to control the distribution of the event times. The event times T Eki were specified to be Gompertz distributed with mean 0 = 3 and standard deviation 0 = 0.5. Using the extreme value distribution (as recommended by Bender et al, 33 with ≈ 0.5772 representing Euler's constant), this lead to the parameters controlling the event times distributions to be set to A Gompertz distribution has increasing hazard for a positive shape parameter, constant hazard for a shape parameter equal to 0 (equivalent to an exponential distribution), and a decreasing hazard for negative shape parameters. Under the above model, the probability density function of the event times takes form If the shape parameter is negative, if time is allowed to tend toward infinity, there is a non-zero probability of living forever. As such, in the function used to simulate event times (available in the aforementioned joineRmeta package), when the Gompertz distribution is employed, event times are simulated under a two step process. First, for each individual i within study k, the following two conditions are checked (using the realization from the U(0, 1) distribution, U ki ): If the conditions are both true, the individual is automatically assigned an event time of infinity; otherwise, their event time is generated under Equation (4). The censoring times were simulated under an exponential distribution with parameter cens . As such, individual censoring times T Cki are generated using equation The event rate of the simulated data was controlled through the censoring process. Due to the volume of planned simulations, only datasets with a "low" (∼25%) event rate were generated. A range of censoring parameters were tested to obtain datasets with mean event rate at 25%, leading to setting cens = exp (−0.426). The survival time for each individual was the minimum of their censoring and event times (T Ski = min (T Eki , T Cki )). All data used in the simulation studies were simulated under the models shown in Equations (2) and (3), although certain parameter values were altered between different sets of simulations. Parameter values in the simulation sets were chosen such that deviations of different methods from the true parameter values would be clearly discernible. A summary of the values used for the different sets of simulations is given in Table 5. All simulation groups utilized the same fixed effect and error term variance values ( 10 = 1, 11 = 3, 12 = 2, 21 = 3, 2 e = 0.01). Additionally, throughout different sets of simulations, the individual level random effects covariance matrix D remained constant (defined in Table 5). However, the remaining aspects of the datasets (association parameters, number of included studies, level of between study heterogeneity) varied between simulation sets. These aspects are stated in Table 5 and are briefly discussed in the following  ( 21 ) Individual level association (2) = (0, 0.5, 1) (2) = 0.5 (2) = 0.5 parameter ( (2)

Simulation set 1: Varying levels of association
In practice, the magnitude of the association between the longitudinal and time-to-event outcomes at the individual and the study level of the data could impact the performance of the model groups defined in Section 2. Consequently, we performed a simulation investigation to assess the effect of varying magnitudes of association at different levels.
The individual level association parameter (2) and the study level association parameter (3) were permitted to take values 0, 0.5, and 1, giving a total of 9 unique scenarios. The number of included studies in each dataset equalled 5, whereas the study level random effects covariance matrix A (Table 5) remained constant across scenarios.

Simulation set 2: Varying numbers of studies included in the meta-analysis
The models introduced in Section 2 that include study level random effects may not reliably estimate the distribution of the study level random effects unless the number of studies included in the meta-analysis is large. In addition, models including fixed interaction terms between study membership and treatment group may become unwieldy or difficult to estimate as the number of included studies increases. To investigate this, simulations were conducted comparing one-stage analyses of joint data for datasets containing 5, 10, or 15 studies.
During this set of simulations, the association parameters were held constant across scenarios (with (2) = (3) = 0.5). Additionally, the study level random effects covariance matrix A (Table 5) remained constant across scenarios.

Simulation set 3: Varying levels of between study heterogeneity
Finally, the level of between study heterogeneity could affect the behavior of the different one-stage models described in Section 2. As such, the third set of simulations alters the study level random effects covariance matrix A across different scenarios, to increase or reduce between study heterogeneity. Values taken for A, labeled A 1 , A 2 , and A 3 are specified in Table 5, representing cases for no between study heterogeneity and, then, two increasing levels of between study heterogeneity.
During this simulation set, across all scenarios, 5 studies were simulated for each dataset, with association parameters held constant across scenarios at (2) = (3) = 0.5.

Models fitted to simulated data
Model groups 0 through 5 (as defined in Table 1) were fitted to each of the datasets simulated for each scenario within each set of simulations. As the data were simulated under a joint model of structure from Model Group 3, the results of fitting examples of Model Group 3 to the data could be expected to provide less biased results than the other model groups.

Reporting of simulation results
For model groups that estimated study specific parameters (the longitudinal treatment effect in model groups 1 and 4, and the time-to-event treatment effect in model group 1), overall pooled effects have been reported by combining study specific estimates using methods equivalent to conducting a random effects MA of study level results. 14,35 Results are reported as the mean estimate produced across studies (SE between simulation estimates) [coverage], where SE is the standard error (the standard deviation) of the produced estimates. As defined by Burton et al, 36 and using a significance level of = 0.05, coverage is calculated as the proportion of times the 100(1 − )% confidence intervals for parameter estimatêv, defined bŷv ± Z 1− ∕2 SE(̂v), includes the "true" value of parameter that the data were simulated under (where Z 1 − /2 ≈ 1.96 for significance level 0.05, and v takes values 1 to the total number of simulations performed, here 1000). Where parameters are not estimated for a model group (eg, (3) for model groups not including study level random effects), an NA is printed. The total number of successful model fits are also reported. As the joint models were fitted using the EM algorithm, 22 separate longitudinal and time-to-event models were automatically fitted to determine suitable starting values for the algorithm. Consequently, the number of failed fits was equal for the separate and joint model analyses.

Results of simulation investigations 6.2.1 Results of simulation set 1: Differing levels of association
The results of Simulation Set 1 are presented in Tables 6 and 7. Graphical representations of the mean estimates displayed in Tables 6 and 7

Longitudinal treatment effect ( 12 )
Throughout Simulation Set 1, the mean pooled longitudinal treatment effect estimate was similar in magnitude between the separate and joint one-stage analyses. The coverage for model group 0 was poor for both the separate and joint analyses; however, the coverage for the remaining model groups for the separate longitudinal one-stage MA-model was consistently high. Conversely, the joint one-stage MA-model results displayed high coverage for models that did not include study level random effects, but low coverage across all levels of association for any model group that involved study level random effects. The reason for the comparable mean estimates, but differing coverage, between the separate and joint one-stage MA-models, is identifiable through examination of the results from each separate scenario (Supplemental Figures S17-S28). The confidence intervals for 12 for joint one-stage models involving study level random effects were quite narrow, leading to poor coverage even though the point estimates are clustered about the "true" value of 12 .

Time-to-event treatment effect ( 21 )
For all scenarios investigated in Simulation Set 1, the width of confidence intervals for estimates of 21 increased for both separate and joint analyses, as (3) increased in magnitude. The results from separate or joint analyses for model group 0 (which ignored between study heterogeneity) were poor when there was non-zero association.
When individual level association was zero, the estimates produced by the separate analyses for 21 were close to their "true" value of 3; however, the separate analyses underestimated 21 when (2) was non-zero. For the separate analyses, for (2) = 0, coverage for 21 estimates decreased as study level association increased; however, when (2) ≠ 0, coverage was close to 0. For the joint analyses, for any model group that accounted for between study heterogeneity in some way (model groups 1-5), the mean estimates were close to the "true" value of 21 for all model groups; however, model groups 2, 3, and 5 displayed mean estimates diverging from the "true" value of 21 as the magnitude of the "true" (3) value increased. Coverage was good across all scenarios for model group 1. For the remaining model groups, coverage decreased as the magnitude of the "true" (3) value increased, although coverage was good for joint models from any of model groups 1 to 5 when (3) = 0.

Association parameters ( (2) , (3) )
The individual level association parameter (2) was poorly estimated by model group 0. However, the estimates of (2) were close to the "true" parameter value for model groups 1, 2, 4, and 5, with good coverage. However, for model group 3, which solely accounted for between study heterogeneity using study level random effects, where the "true" (2) was non-zero, as the magnitude of (3) increased from zero, the mean parameter estimate decreased in magnitude, with corresponding decrease in coverage.
The estimation of the study level association parameter was poor in model groups 2 and 5, with large coverage values explained by wide confidence intervals (Supplemental Figures S22-S24). Mean estimates of (3) were closer to the "true" values in model group 3, although they were still underestimated. Coverage for all model groups that estimated (3) decreased as the value of the "true" (3) increased.

Summary
Under a one-stage joint model containing a single longitudinal and single time-to-event outcome, with association structure sharing both individual and study level random effects (when present), with common association parameter at each level, separate time-to-event one-stage MA-models appeared to behave poorly when (2) ≠ 0; however, joint one-stage MA-models displayed issues when study level random effects were shared between sub-models.

Results of simulation set 2: Differing numbers of included studies
The results of Simulation Set 2 are presented in Table 8. Graphical representations of the mean estimates displayed in Table 8

Longitudinal treatment effect ( 12 )
Across all scenarios investigated, for both the separate and the joint analyses, the mean estimate for the longitudinal treatment effect 12 was close to the "true" value of 2. Coverage was poor for both the separate and joint analyses for model group 0, which ignores between study heterogeneity. Coverage was consistently good for the separate analyses in the remaining model groups and good for joint models from model groups 1 and 4. However, coverage was poor from joint models for model groups involving study level random effects.

Time-to-event treatment effect ( 21 )
For the time-to-event treatment effect 21 , we saw mean estimates from the joint analyses closer to the "true" value of 3 for the joint analyses than the separate analyses. Coverage for the separate analyses was below 6% for all scenarios investigated, whereas coverage for the joint models appeared best for model group 1 (above 85%), followed by model groups 4 and 5 (above 69%). Coverage was noticeably lower for model group 0, which ignored between study heterogeneity, and coverage decreased for model groups 2 and 3 as the number of included studies increased.

Association parameters ( (2) , (3) )
The mean estimate for the individual level association was close to the "true" value of 0.5 for model groups 1-5, with slightly worse estimates from model group 0. Coverage was good for model groups 1, 2, 4, and 5. However, coverage decreased with increasing the number of studies for model groups 0 and 3. Study level association was poorly estimated in model groups 2 and 5, with estimates closer to the "true" value of 0.5 for model group 3. However, coverage was consistently poor and decreased with increasing the number of included studies.

Summary
Under a one-stage joint model containing a single longitudinal and single time-to-event outcome, with association structure sharing both individual and study level random effects, with common association parameter at each level, there appeared to be little benefit of increasing the number of included studies. However, this result may not hold for other association structures, eg, just sharing individual level random effects between studies.

Results of simulation set 3: Differing levels of between study heterogeneity
The results of Simulation Set 3 are presented in Table 9. Graphical representations of the mean estimates displayed in Table 9 are provided in Supplementary Figures S37-S40, with representations of the point estimates from each simulation given in Supplemental Figures S41-S44. There were issues with model fitting for a large proportion of simulations for model groups involving study level random effects when there was no between study heterogeneity (A = A 1 ); otherwise, the proportion of successful fits was 99.8% or over.

Longitudinal treatment effect ( 12 )
Across scenarios investigated, the mean estimated longitudinal treatment effect produced by both the separate and joint one-stage MA-model were close to the "true" parameter values. Coverage of estimates produced by model group 0 was good from both the separate and the joint one-stage MA-models when no between study heterogeneity existed; however, coverage decreased as between study heterogeneity increased. For the remaining model groups, coverage was consistently good for the separate analyses, but joint analyses involving study level random effects displayed decreasing coverage as between study heterogeneity increased.

Time-to-event treatment effect ( 21 )
Throughout the scenarios investigated, the time-to-event treatment effect was consistently underestimated by the separate analyses compared to the joint (which displayed estimates closer to the "true" value of the parameters). Models involving study level random effects showed estimates diverging slightly from the "true" value as between study heterogeneity increased. Coverage was consistently good for model group 1; however, the remaining model groups displayed decreasing coverage as between study heterogeneity increased.

Association parameters ( (2) , (3) )
The mean estimate for individual level association (2) was good for model groups 1, 2, 4, and 5, with corresponding high coverage. However, model groups 0 and 3 showed mean estimates increasingly below the true value, with corresponding decreasing coverage as between study heterogeneity increased. Mean estimates for study level association (3) were poor for model groups 2 and 5, and closer to the true value for model group 3. Coverage was good for model groups 2 and 5 for the case of no between study heterogeneity, and decreased as between study heterogeneity increased. However, examination of Supplemental Figure S44 indicates that wide confidence intervals explained the higher coverage at no between study heterogeneity, with the width of confidence intervals decreasing as between study heterogeneity increases. Coverage was relatively constant but not good for model group 3 across examined levels of between study heterogeneity.

Summary
Under a one-stage joint model containing a single longitudinal and single time-to-event outcome, with association structure sharing both individual and study level random effects, with common association parameter at each level, model group 1 appeared to be the most consistently reliable modeling option. However, as noted earlier, this result may not hold for other joint model specifications.

DISCUSSION
In this research, we have presented and investigated a variety of models for use when analyzing multi-study joint longitudinal and time-to-event data. Analyses of single study joint datasets are increasing. 8 Ensuring availability of appropriate methods for the meta-analysis of such data is vital, in order to maximize use of available data and better inform healthcare decisions. We have examined a range of the possible modeling options; however, other combinations of the approaches discussed here to account for between study heterogeneity are also possible. Each of the model groups examined presents a range of advantages and disadvantages. Models that use fixed effects to account for between study heterogeneity estimate K − 1 parameters for each term involving study membership (one for each study apart from the reference study). As such, results may not be generalizable to external studies, and the number of parameters estimated quickly increases as the number of studies included in the meta-analysis increases. However, such methods do allow calculation of effect sizes within each study (although this is not a primary aim of meta-analyses). Conversely, use of study level random effects accounts for between study heterogeneity, but study specific effect estimates are not generally automatically provided (unless the estimates of the random effects can be extracted from models fitted). However, this should not be an issue, as meta-analyses aim to pool data rather than provide study specific estimates. The number of parameters to be fitted due to study level random effects does not increase as the number of included studies increases. However, the distribution of the random effects may be poorly estimated unless a large number of studies are included in the meta-analysis.
Additionally, model groups with a common baseline hazard across studies assume proportional hazards across all studies included in the meta-analysis. However, model groups that stratify the baseline hazard by study assume proportional hazards within but not across studies. This may be a more reasonable assumption, especially if the demographics of the studies differ.
The simulation investigation displayed poor performance for models that ignored any between study heterogeneity present in the data. Consequently, it is clear that accounting for any between study heterogeneity present in multi-study joint data is vital. The most consistently well-performing model group was model group 1, which accounted for between study heterogeneity using fixed study membership and interaction between study membership and treatment assignment in both sub-models. The remaining model groups for the joint analyses showed issues under various scenarios. As the coverage was good for separate models for any model group that accounted for between study heterogeneity, the poor coverage in the joint analyses for model groups 2, 3, and 5 may be due to the dual use of the study level random effects to account for between study heterogeneity and account for study level behavior in the link between the longitudinal and time-to-event outcomes. It may be that this dual use is not possible, unless an unrealistically large number of studies are included in the meta-analysis.
While point estimates were similar in magnitude between the separate and joint analyses for the longitudinal treatment effect, we note bias in the estimates of the time-to-event treatment effect from separate analyses where a non-zero association between the longitudinal and time-to-event outcomes is present. This behavior has previously been noted in single study cases by Guo and Carlin, 18 and in two-stage joint MA analyses by Sudell et al, 16 our research confirms that this issue persists for one-stage analyses. This behavior may be comparable to the established situation where omission of covariates from Cox models causes bias in estimated effect parameters. [37][38][39] The W 2ki (t) term is included in the joint time-to-event sub-model but is not present in the separate time-to-event sub-model. Where association is present (ie, where (2) ≠ 0 or (3) ≠ 0), the joint analyses model risk of an event associated with the longitudinal outcome through the W 2ki (t) term. This term (which has an effect on risk of an event when association is present) is not included in the separate time-to-event model, giving a possible explanation for the observed biased treatment effect estimates. As noted in the work of Sudell et al, 16 similar behavior was not observed between the separate and joint longitudinal analyses as the model specifications for the longitudinal trajectory are identical in both cases. As such, it is recommended that joint one-stage MA-models are used in place of separate time-to-event one-stage MA-models where significant association exists. This can be assessed prior to analyses through plotting of the longitudinal trajectories paneled by event type 16 ; differences between the trajectories between those censored and experiencing an event can indicate presence of such an association.
The models investigated utilized an unspecified baseline hazard in the time-to-event sub-model. Hseih et al 40 noted that when unspecified baseline hazards are used in a joint model, standard errors should be obtained through bootstrapping procedures to avoid their underestimation. As such, the time commitment to perform bootstrapping procedures on large meta-datasets was considerable. Performing bootstrapping procedures on a standard computing environment took several days for the real dataset. Consequently, bootstraps were performed in parallel using the University of Liverpool's HTCondor system (see the work of Litzkow et al, 41 https://research.cs.wisc.edu/htcondor/, and http://condor.liv.ac.uk/), which was also used to run the simulations, with the results compiled using purpose written code rather than relying on single computer bootstrapping procedures. Researchers using large datasets without coding experience or access to such computer systems may experience issues conducting large scale joint data meta-analyses.
In our clinical example, we assume common association parameters across treatment groups. However, in reality, the association between the longitudinal blood pressure and risk of an event could differ between those assigned to any treatment for hypertension versus those assigned to placebo, no treatment, or usual care. 42 In single study cases, association structures involving interactions between baseline covariates and the association parameters have been presented 2,4 ; however, this association structure has not yet been investigated in meta-analytic joint models.
The research presented here prompts a range of future areas of research. Investigation of one-stage joint MA-models with varying association structures, including sharing only individual level random effects or sharing the current value of the longitudinal trajectory, is vital. Additionally, it is vital to investigate alternative modeling options, such as alternative baseline hazard specifications, which could reduce model fitting times by removing the necessity of bootstrapping. In addition, in our simulation study, we assumed common longitudinal measurement schedules across the included studies, identical numbers of individuals recruited to each study, and common association parameter across studies. Further simulation investigations varying these aspects could provide additional useful information for future joint data meta-analyses.
In conclusion, this research indicates the benefit of the one-stage meta-analysis of joint longitudinal and time-to-event data where significant association exists between the longitudinal and time-to-event outcomes. Given the current research, it is recommended that analyses do not rely on models that share study level random effects between sub-models. Further research into one-stage joint MA-models is required.