Mixed effects models for healthcare longitudinal data with an informative visiting process: a Monte Carlo simulation study

Electronic health records are being increasingly used in medical research to answer more relevant and detailed clinical questions; however, they pose new and significant methodological challenges. For instance, observation times are likely correlated with the underlying disease severity: patients with worse conditions utilise health care more and may have worse biomarker values recorded. Traditional methods for analysing longitudinal data assume independence between observation times and disease severity; yet, with healthcare data such assumptions unlikely holds. Through Monte Carlo simulation, we compare different analytical approaches proposed to account for an informative visiting process to assess whether they lead to unbiased results. Furthermore, we formalise a joint model for the observation process and the longitudinal outcome within an extended joint modelling framework. We illustrate our results using data from a pragmatic trial on enhanced care for individuals with chronic kidney disease, and we introduce user-friendly software that can be used to fit the joint model for the observation process and a longitudinal outcome.

Informative visiting process; Longitudinal data; Electronic health records; Mixed-effects models; Selection bias; Inverse intensity of visiting weighting; Monte Carlo simulation; Recurrent event models;

| INTRODUCTION
The analysis of longitudinal data is essential to understand the evolution of disease and the effect of interventions over time. A source of longitudinally recorded data that is being used increasingly often in medical research is health care consumption data: that is, data sources that have been constructed by extracting and linking electronic health records from primary, specialist, and hospital care with other data sources such as nationwide registries for epidemiological surveillance. Several examples of cohorts constructed in such a way are emerging in a variety of medical fields: among others, kidney disease (Hemmelgarn et al., 2009;Runesson et al., 2015), cardiovascular disease (Denaxas et al., 2012), and end-of-life healthcare (Tanuseputro et al., 2015). Data cohorts constructed by extracting medical records have thousands -if not millions -of individuals with hundreds of measurements each: the availability to researchers of such vast amount of data allows answering more relevant and detailed clinical questions but poses new challenges. In terms of reporting, guidelines have emerged to improve discovery, transparency, and replicability of research finding utilising routinely collected data (Benchimol et al., 2015). In terms of methodological challenges, first and foremost, observation times are likely to be correlated with the underlying disease severity in healthcare consumption datasets.
For instance, individuals tend to have irregular observation times as patients with more severe conditions (or showing early symptoms of a disease) tend to visit their doctor or go to the hospital more often than those with milder conditions (and no symptoms). Their worse disease status is also likely to be reflected in worse biomarkers being recorded at such visits, causing abnormal values of such biomarkers to be overrepresented and normal values to be under-represented.
Taking this pattern to the extreme, healthy individuals may not appear in health records at all leading to cohort selection bias; this is a separate issue that is not dealt with in this manuscript.
Traditional methods used to analyse longitudinal data rely on the assumption that the underlying mechanism that controls the observation time is independent of disease severity; however, that is unlikely with health care consumption data. It can be shown that failing to account for informative dropout in a longitudinal study could yield biased estimates of the model parameters (Wu and Carroll, 1988;McCulloch et al., 2016), and so does näively applying traditional methods when the follow-up is irregular and related to the outcome (Pullenayegum and Lim, 2016). Despite the potential for bias, there is some evidence pointing towards a lack of awareness of the potential for bias in longitudinal studies with healthcare data irregularly collected over time: in a recent literature review on the topic, Farzanfar et al. (2017) showed that 86% of studies did not report enough information to evaluate whether the visiting process was informative or not, and only one study used a method capable of dealing with an informative observation process. This is concerning when the aim of a research project is aetiology.
Bias may arise when data on covariates and outcomes is collected at irregular, subject-specific intervals: in fact, when analysing data originating from electronic health records, data is collected only when study subjects consume health care (e.g. by visiting their doctor or going to the hospital). As a consequence, visit times are likely to be informative and to depend on the clinical history of an individual. The visiting process in this setting is therefore deemed to be informative (or dynamic, outcome-dependent). The bias that one may encounter when the observation process is informative can be classified in two types: selection bias or confounding (Hernán et al., 2009). Selection bias arises because of the selection of observed individuals only in the analysis. This bias is the same bias induced by informative censoring due to loss to follow-up (Hernán et al., 2004): censoring is the extreme case of an observation process where an individual is not observed ever again. Conversely, confounding arises when there are common causes of both the exposure and the outcome, e.g. when the consequent visit times are decided by physicians or patients based on e.g. current health status, which itself is associated with the observed longitudinal outcome. Hernán et al. (2009) describe selection bias and confounding originating from dynamic observation processes more in detail, including directed acyclic graphs (DAGs) that illustrate the underlying causal mechanism.
In the past years several methods have been developed to deal with longitudinal data terminated by informative dropout (Kurland et al., 2009); conversely, the problem of informative visit times has received considerably less attention.
Despite that, a few methods emerged that can be broadly categorised in two families: methods based on inverse intensity of visit weighting (IIVW, an extension of inverse probability of treatment weighting, Robins et al. (1995)) and methods based on shared random effects (Liu et al., 2008). An introduction to the various methods is presented elsewhere (Pullenayegum and Lim, 2016). Nevertheless, to the best of our knowledge, there is only one comparison existing in the current literature, which yielded negative results: Neuhaus et al. (2018) conclude that fitting ordinary linear mixed models disregarding the observation process yielded the smallest bias and showed that adding regular visits to the observation schedule (if possible) reduced that bias even further.
Throughout this paper, we focus on the problem of informative visiting process by assuming that the dropout process is not informative. First, we describe characteristics of the observation process and we define when it can be deemed informative in Section 2. Then, we introduce a joint model for the observation and longitudinal processes that can be easily extended within a multivariate generalised linear and non-linear mixed effects models framework (Crowther, 2017) in Section 3, and introduce the IIVW method in more detail in Section 4. We compare the performance of this model against other alternatives that have been introduced in the literature via Monte Carlo simulation in Section 5. Finally, we illustrate the use of the joint model using data from a pragmatic trial in chronic kidney disease and discuss our conclusions in Sections 6 and 7, respectively.

| CHARACTERISTICS OF THE OBSERVATION PROCESS
An observation process can have regular or irregular visits. With regular visits, the j th visit time for the i th individual T i j is the same for all individuals: T i j = t j i , j , with i = 1, 2, . . . , n and j = 1, 2, . . . , n i . Conversely, with irregular visits that is no longer true. With irregular visits, the observation process -denoted by the counting process N i (t ) -can be defined to be completely at random when visit times and outcomes are independent (Pullenayegum and Lim, 2016): , and t − being the instant of time right before t .Ȳ i (∞) andX i (∞) denote the values of outcome and covariates for any t > 0.
The observation process can be deemed informative when it is not completely at random, i.e. when the condition above is not verified. In that case, it is possible to identify the following two scenarios: • Observation process at random, when visiting at time t is independent of the outcome at time t given data recorded up to time t : • Observation process not at random, where the definition of missing at random does not hold. That is, the scenario where visiting at time t is not independent of the outcome at time t , even after conditioning on data recorded up to time t : Gruger et al. (1991) illustrate four possible models that could be linked to the above-mentioned scenarios: 1. the examination at regular intervals model, consisting of observation times that are pre-defined and equal for all patients. This scenario yields so-called balanced panel data; 2. the random sampling model, consisting of a sampling scheme (e.g. an observation process) that is not pre-defined, but still independent of the disease history of the study subjects; 3. the doctor's care model, consisting of an observation process that depends on the characteristics of the patient at the moment of the current doctor's examination. For instance, a doctor could require stricter monitoring for subjects with more advanced disease status, or with abnormal values of a biomarker; 4. the patient self-selection model, yielding observations that are triggered by the patients themselves. According to this model, patients may choose to visit their doctor when they feel unwell, or they may choose to skip a visit that was pre-planned when they feel the treatment they are receiving is not beneficial to their health status. Unfortunately, the factors that cause patients to self-select themselves are generally unknown or not recorded.

Models
(1) and (2) could be characterised as observation completely at random; model (3) could be characterised as observation at random; finally, model (4) could be characterised as observation not at random.

| A JOINT MODEL FOR THE OBSERVATION PROCESS AND A LONGITUDI-NAL OUTCOME
Let D i j (t ) = I (T i j = t ) denote the presence of an observation at time t for the i th individual: at each D i j (t ) = 1 a new observation of the longitudinal outcome Y i j is recorded. Lett i j be the gap time between the j th and (j + 1) th measurement for the i th individual. Letd i j be the binary indicator variable that denotes whether the gap-timet i j is observed (or not). In practice, gap-time are always observed except when the observation process is censored at the end of follow-up, e.g. the date when the data extraction occurs. Let z i j be the covariate vector for the longitudinal outcome, and w i the covariate vector for the observation process; z i j and w i do not necessarily overlap, and it is assumed that both could be extended to include time-dependent exogenous covariates (e.g. w i j ). We model the observation process and the repeated measures process using a joint longitudinal and survival model. Conditional on random effects u i , the submodel for the time to each observation is a proportional hazards model with hazard for gap timet i j : where θ t = β . The submodel for the j th longitudinal observation for the i th individual is where i j ∼ N (0, σ 2 ) and θ y = {α, γ, σ 2 }.
Equation 1 is a recurrent events model for the observation process, with r 0 (t i j ) any parametric or flexible parametric (Royston and Parmar, 2002) baseline hazard function (also referred to as baseline intensity -we use the terms hazard and intensity interchangeably throughout this manuscript). Equation 2 is a linear mixed model for the longitudinal outcome with a random intercept v i . The two processes are linked together via the shared, individual-specific, random effect u i . Including the γ parameter in the longitudinal model allows for an association between the two equations, association that will be estimated from data; when γ = 0, the two processes are independent of each other; that is, the observation process is not informative. Finally, we assume that the random effects follow a multivariate normal distribution with null mean vector and variance-covariance matrix Σ u,v .
The model is fitted using maximum likelihood; the individual-specific contribution to the likelihood can be written as: is the contribution to the likelihood of the time to the j th observation in individual i, is the contribution of the j th longitudinal observation, and p(b i |θ b ) is the density of the random effects. The likelihood does not have a closed form, as it is necessary to integrate out the distribution of the random effects; methods such as Gaussian quadrature and Monte Carlo integration can be used for that purpose (Pinheiro and Bates, 1995).
A simplified DAG that illustrate how the joint model accounts for the correlation between a longitudinal outcome Y and its observation process R is included as Figure 1 (Liu et al., 2018); X represents covariates included in the model, and U represents the shared random effects. After adjusting for all covariates (e.g. confounders) X , the longitudinal outcome and the observation process are associated only through the shared U . However, when estimating the joint model we assume a distribution for U (e.g. Gaussian) and we integrate it out of the marginal likelihood, blocking the path between Y and R . Therefore, for the joint model to be valid the observation process has to be at least at random, according to the definition of Section 2.
This model is nested within a wide family of multivariate generalised linear and non-linear mixed effects models (Crowther, 2017). The model presented in this section can easily be extended to multiple random effects (potentially nested within each other), to different parametric and flexible parametric baseline hazard formulations for the recurrent Y X R U F I G U R E 1 Simplified DAG depicting a joint model for a longitudinal outcome and its observation process.
events model, and to include other outcomes (e.g. a dropout process, or a second longitudinal outcome); we focus on the model formulated in this section for simplicity. Finally, this joint model (and several extensions) can be easily fitted in Stata using the user-written command merlin (Crowther, 2018). We produce example code that is included in the Online Supplementary Material.

| INVERSE INTENSITY OF VISIT WEIGHTING
The bias induced by an informative observation process can be adjusted for by using the inverse intensity of visit weighting [IIVW] method first proposed by Robins et al. (1995) as an extension of the inverse probability of treatment [IPW] method (Cole and Hernán, 2008). This method was further developed by Bůžková and Lumley (2007), and there are a few examples of this method applied in practice (Van Ness et al., 2009;Bůžková et al., 2010). The IIVW approach accommodates an informative observation process in a marginal regression model by weighting each observation by the inverse of the probability of each measurement to be recorded. This approach creates a pseudo-population in which the observation process is static and can be ignored. The weights can be estimated by fitting a regression model including all covariates that inform the observation process and further stabilised to increase efficiency (Cole and Hernán, 2008). The weighting model could include current and past values of any covariate that may affect the visiting process; however, as with IPW, all covariates that might be related to the observation process should be included in the weighting model -otherwise, bias will incur.
The approach we illustrate follows from Van Ness et al. (2009). The model used to estimate weights is an Andersen-Gill recurrent events model (Andersen and Gill, 1982) for the observation process, assuming a gap-time scale (as described in Section 3): wheret are gap-times between consecutive observations, r i (t ) is the intensity of visit for individual i at gap-timet , r 0 (t ) is the unspecified baseline intensity at gap-timet , and z i is a vector of coefficients that are assumed to accurately describe the observation process for individual i . η is a vector of regression coefficients that is estimated using the Cox partial likelihood method and a robust jack-knife estimator for the variance of the regression coefficients. The inverse intensity of visit weights are estimated by taking the inverse of the linear predictor exp(z iη ) at each time point, and further normalised by subtracting the mean inverse weight and adding the value 1 to each weight; the distribution of the weights is therefore centred on the value 1. Finally, two further adjustments are needed. First, since the last data entry for each individual represents the end of follow-up of the study, each weight is shifted by one time point. Second, given that each individual is observed at least once (i.e. at baseline), a weight of one is assigned to the first observation of each individual.
The marginal model for the longitudinal outcome is then fit using generalised estimating equations and including the normalised inverse intensity of visit weights as probability weights in the model. The model has the form E (y i j ) = α 0 + Z i α 1 + t i j α 2 , and can be fit using readily available statistical software. We use the Stata command glm.

Aim
We design a simulation study aimed to assess the impact of ignoring the observation process in longitudinal mixed-effects models when the observation process is informative.

Data-generating mechanisms
We simulate data from the following joint model: representing a binary treatment, simulated from a Bernoulli random variable with probability 0.5: Z i ∼ Bern(1, 0.5). The coefficient associated to the treatment variable is β = 1 for the observation process, α 1 = 1 for the longitudinal process. The fixed intercept of the longitudinal model is α 0 = 0, and the fixed effect of time is α 2 = 0.2. The random effects u i and v i are simulated from a Normal random variable with null mean and variance σ 2 u = 1 and σ 2 v = 0.5, respectively. The residual error of the longitudinal model is assumed to follow a Normal distribution with null mean and variance σ 2 = 1. We assume independence between the random effects and the residual variance, and between random effects (i.e. Σ u,v is a diagonal matrix with diag (Σ u,v . We assume independent random effects for simplicity, but we show in the Online Supplementary Material how to fit a joint model with correlated random effects. The joint model with correlated random effects can be thought of as a reparameterisation of the joint model with independent random effects, where the association parameter γ is related to the correlation between the two random effects in the bivariate version. The baseline hazard from the recurrent visit process is assumed to follow a Weibull distribution with shape parameter p = 1.05; we vary the scale parameter λ and therefore the baseline intensity of the visiting process, with λ = {0.10, 0.30, 1.00}. This baseline intensities along with the value of β correspond to an expected median gap time between observations of 5.83 and 2.25 years for unexposed and exposed individuals if λ = 0.10, 2.05 and 0.79 years if λ = 0.30, and 0.65 and 0.25 years if λ = 1.00, respectively. Each observation time is simulated using the inversion method of Bender et al. (2005), assuming a gap time scale (where the time index is reset to zero after the occurrence of each observation; the resulting recurrent events model is then a semi-Markov model). We vary the association parameter γ between the two sub models, with γ = {0.00, 1.50}; we expect all models to perform similarly when γ = 0, that is, when the longitudinal process is independent of the observation process.
In addition to simulating data from the joint model above, we generate the observation process by drawing from a Gamma distribution. Specifically, we draw the observation times from a Gamma distribution with shape = 2.00 and scale: where ξ i is simulated from a Normal distribution with null mean and variance σ 2 ξ = 0.1. Z i is the same binary treatment covariate as before, with the same associated parameter β = 1. The value of ψ defines the association between the observation, e.g. when ψ = 0 the observation process is not informative; we set ψ = {0.00, 2.00}. We also simulate a scenario where the observation process depends on treatment and on previous values of the longitudinal outcome Y . In this setting, we draw observation times from a Gamma distribution with shape = 2.00 and scale: for the j th observation time of the i th individual, with ψ = 2.00 and ω = 0.20. Finally, we simulate a scenario from a joint model to which we add regular (i.e. planned) visits every year, as suggested by Neuhaus et al. (2018). We simulate this scenario from the above-mentioned joint model, and we set γ = 3.00 and λ = 0.05 to obtain an observation process that is sparse and strongly associated with the longitudinal outcome.
We simulate 200 study individuals under each data-generating mechanism and the recurrent observation process continues for each individual until the occurrence of administrative censoring -which we simulated from a Unif(5, 10) random variable.
We define the last gap time for each individual as the difference between the last observation and the censoring time.

Estimands
The main estimand of interest is the vector of regression coefficients α = {α 0 , α 1 , α 2 }, with specific focus on the treatment effect α 1 . In the Online Supplementary Material, we also report on the estimated association parameter γ and on the estimated variance of the random effects and the residual errors: σ 2 u , σ 2 v , and σ 2 .

Methods
We fit five competing models to each simulated dataset: 1. model A, the joint model described above (at the beginning of the "Data-generating mechanisms" section) and corresponding to the true data-generating mechanisms when simulating data from a joint model; Model A is fit using merlin (Crowther, 2018) and gsem in Stata. Model B follows from previous work by Goldstein et al. (2016), where they demonstrate that conditioning on the number of health-care encounters it is possible to remove bias due to an informative observation process (they denote this bias as "informed presence bias"). We therefore include the number of observations per individual, centred on the mean value, in a mixed effects model for the longitudinal outcome: with v i a random intercept and n c i the number of observations for the i th individual. Model C is analogous to model B, adjusting for the cumulative number of measurements up to time j instead, denoted asn i t j : Model D is analogous to model B and C, assuming α 3 = 0. Model B, C, and D are fit using the mixed command in Stata.
Model A, B, C, D are fit assuming an independent structure for the variance-covariance matrix of the random effects.
Finally, model E is fitted following the two-stage procedure presented in Van Ness et al. (2009) and illustrated in Section 4.

Performance measures
We will assess average estimates and standard errors, empirical standard errors, bias, and coverage probability ofα m , with m = {0, 1, 2}. However, the main performance measures of interest are bias and coverage probability: the former quantifies whether an estimator targets the true value on average, while the latter represents the proportion of times that a confidence interval based onα m,k andŜE(α m,k ) contains the true value α m , with k indexing each replication.
We compute and report Monte Carlo standard errors to quantify the uncertainty in estimating bias and coverage . If we assume that Var(α m )≤ 0.1 (or, equivalently, SE(α m )≤ 0.32) and we require a Monte Carlo standard error for bias of 0.01 or lower, given that MCSE(Bias) = Var(α m )/K , we would require a number of replications K = 1,000. The assumed standard error is larger than the standard errors reported by Liu et al. (2008) for a model similar to model A. The expected Monte Carlo standard error for coverage, assuming a worst-case scenario of coverage = 0.50, would be 0.02 -which we deem acceptable. Therefore, we proceed by simulating 1,000 independent data sets for this simulation study.

Software
The simulation study is coded and run using Stata version 15, built-in functions (such as mixed, glm, gsem), and the user-written commands survsim (Crowther and Lambert, 2012) and merlin (Crowther, 2018); results of the simulation study are summarised using R (R Core Team, 2019) and the R package rsimsum (Gasparini, 2018). All the code required to simulate data, fit each model, and produce summary tables and figures is publicly available on the GitHub page of the first author (https://github.com/ellessenne/infobsmcsim).

Results
We focus on results for the estimated treatment effect α 1 , which are depicted in Figure 2. Tabulated values are included in the Online Supplementary Material, alongside results for the other regression coefficients α 0 , α 2 , estimated variances of the random effects, summaries for the association parameter γ, and convergence rates of each model under each data-generating mechanism.
Descriptive results Each simulated dataset had 200 distinct individuals; summary descriptive statistics for each data-generating mechanism are included in Table 1. The median sample size per simulated dataset varied between 666 and 4,482, the median number of measurements per individual varied between 2 and 13, and the median gap time between observations varied between 0.13 and 1.31 (years). In simulated scenarios from the joint model, as expected, a higher baseline intensity of visit process yielded more frequent measurements and a larger number of measurements overall; values were not affected by the association parameter. Results for non-informative observation processes When the observation process was not informative, all models estimated regression coefficients with null to negligible bias. Coverage probability of the regression coefficients was also optimal, with slight under coverage for the intercept term α 0 and the treatment effect α 1 for estimates originating from the IIVW model. Mean squared errors were similar across the range of scenarios with a non-informative observation process. Bias for the variance of the residual error term was null to negligible as well, with good coverage. Conversely, the variability of the random intercept v was estimated with slight negative bias from all models, with sub-par coverage (between 90% and 95 %); this is expected as we use maximum likelihood and not restricted maximum likelihood. Finally, the estimated variance of the random effect linking the two outcomes in the joint model was positively biased with coverage of approximately 75%; the magnitude of bias decreased as the baseline intensity λ increased.
Results for informative observation processes When generating data from a Γ distribution depending on treatment only, all models were able to estimate the regression coefficients with no bias, optimal coverage probability, and Results for the association parameter γ The estimating procedure worked well when the two sub models were not associated. For instance, there was no bias, coverage probabilities were optimal, and mean squared errors were small -irrespectively of the baseline intensity of visit λ. Conversely, when the sub models were associated (γ = 1.50) the estimated association parameter was slightly negatively biased (−0.11 to −0.06), with sub-optimal coverage (75% to 83%). Mean squared error decreased when the baseline intensity of visit increased. Finally, the scenario simulated from a joint model with a strong association parameter γ = 3.00 and regular visits showed the worst performance, with large negative bias (-3.7289), poor coverage, and large mean squared error. Including regular visits caused γ to shrink towards the null, with a median estimate of -0.7289.

Convergence rates
Convergence rates for all models included in this comparison were generally good. All models showed a perfect convergence rate of 100% except the joint model, which showed a lower convergence rate of 96% and 99% in two simulated scenarios, both with an informative observation process. However, the remaining scenarios showed a perfect convergence rate for the joint model as well.

| APPLICATION
We fit the models included in this comparison to data obtained from the "Primary-Secondary Care Partnership to Prevent Adverse Outcomes in Chronic Kidney Disease" (PSP-CKD) study (ClinicalTrials.gov Identifier: NCT01688141, Major et al. (2019)). PSP-CKD is a cluster randomised controlled pragmatic trial of enhanced chronic kidney disease (CKD) care against usual primary care management. 49 primary care practices from the Nene Clinical Commissioning Group, Northamptonshire, United Kingdom, were randomised to either enhanced care or usual care; informed consent was provided at the practice level. Adult individuals with CKD were identified from each practice by using a research version of the web-based CKD management and audit tool IMPAKT (available at http://www.impakt.org.uk/); all data was anonymised prior to removal from the primary care practice. Individuals were included if a recorded estimated glomerular filtration rate (eGFR) below 60 ml/min/1.73m 2 was found during 5 years before the date of randomisation; eGFR was estimated using the MDRD equation (Levey et al., 1999).
We extracted baseline data (collected retrospectively at the date of randomisation and up to 5 years prior) from the . We aim to evaluate whether the longitudinal eGFR trajectory before randomisation to treatment differs between males and females.
We start by evaluating whether the visiting process could be informative. First, we computed the Spearman's rank correlation between gap time and gender: (ρ = 0.01). The correlation coefficient was significantly different than zero. Second, we fitted a linear mixed model for gap time versus gender with a random intercept and a random gender effect, and we found a significant association, as females had an 8.56-days longer gap time (95% CI: 5.58 -11.54).
Finally, fitting the Andersen-Gill model for the observation process as described in Section 4 with gender as the only covariate included in the model yielded a hazard ratio of 0.9589 (with 95% C.I.: 0.9398 -0.9783) for females compared to males. In conclusion, we found gap time to be associated with gender; hence, we deem the visiting process to likely be informative.
We fit the models included in the comparison, with gender as the binary exposure variable. The joint model included gender as the only covariate in the observation process submodel, and so did the recurrent events model utilised to fit weights for the IIVW model. Overall all models estimated a similar longitudinal trajectory (Figure 4), with the IIVW model being the exception.
We saw in the results of our simulations in Section 5 that the IIVW model yielded biased results for the exposure and the intercept of the longitudinal model under a variety of scenarios, and we observe this difference in our applied setting as well. Interestingly, all other models performed similarly -even the mixed model adjusting for the total number of measurements; our simulations showed that the effect of a binary exposure was estimated with bias, but we did not saw this difference in practice.

| DISCUSSION
In this article, we formalise the problem of informative visiting process within a framework of multivariate gen- F I G U R E 4 Predicted longitudinal trajectories from the models fit to the application data from the PSP-CKD study.
The solid lines represent estimated trajectories for males, while the dashed lines represent trajectories for females. Colours identify the model.
it. To the best of our knowledge, there is only one comparison currently in the literature (Neuhaus et al., 2018), albeit they include different models in their comparison and simulate an informative observation process differently by first generating a grid of potential observation times and then relating the probability of being observed to a given functional form of current (or lagged) covariates. They also do not include a joint model analogous to the model introduced in our manuscript in Section 3 in their comparison.
As expected, the joint model that accounts for the informative observation process by modelling it via a recurrent events survival model performed best. Interestingly, the mixed effects model that disregarded completely the observation process performed worse than the joint model but outperformed other methods; the inflation in the variance of the random intercept of the plain mixed model seemed to capture part (if not most) of the variability due to the observation process, although this result needs to be thoroughly tested in more complex scenarios (e.g. with random effects of time, etc.). The mixed models adjusting for the total number of measurements or the cumulative number of measurements (as a time-varying covariate) performed worst, and we would not recommend their usage in practice in these settings; this finding contrasts the findings of Goldstein et al. (2016), although their settings were quite different than ours. Further to that, they acknowledged the potential for collider bias (due to conditioning on a collider, the number of measurements) when the phenotyping algorithm for determining the exposure has high sensitivity; indeed, in our settings, the sensitivity is perfect as there is no misspecification of the exposure. An additional possible explanation could be that in our settings the model adjusting for the total number of measurements is in fact conditioning on the future, as the total number of observations is not determined at the beginning of the study. This may be explaining the poor performance of this method in the settings of our simulations. The performance of the marginal model fitted using generalised estimating equations and inverse intensity of visit weights laid between the plain mixed model and the remaining mixed models; furthermore, its performance seemed to improve when the observation pattern became denser, except for the intercept term α 0 . This pattern was generally observed throughout all scenarios and models, as the performance seemed to increase with more frequent observation patterns; this finding is consistent with Hernán showed bias in all the settings of their simulation where the observation process was informative, even when adding regular visits to the study. To compute the weights of the IIVW approach applied researchers need to correctly specify the model for the visit process, a challenging task -especially when not all the information required to fit the correctly specified model is observed (or known). We also observed that the IIVW model performed quite differently than the other methods in our applied example, although the observed difference does not seem to be clinically relevant.
Most importantly, our simulations show that under the null all the approaches compared in this study produce unbiased estimates of the regression coefficients, the implication being that over modelling the observation process does not seem to introduce bias in the analysis. In settings where it is not clear whether the observation process is informative or not, fitting the joint model would provide applied researchers with a method for estimating (and testing) the association between the two outcomes: this could be especially useful e.g. as a sensitivity analysis of standard mixed-effects models.
The joint model for the observation process and a longitudinal outcome that we described in Section 3 can be further extended. For instance, additional random effects could be introduced in the model to account for, say, heterogeneity in the trajectory of the longitudinal outcome over time. The functional form of the effect of time (both fixed and random) could also be generalised by using fractional polynomials or splines; the longitudinal trajectories need to be modelled appropriately and best fit could be assessed via information criteria such as the AIC and BIC. In fact, in the applied example of Section 6 we assumed a linear effect of time on eGFR for simplicity; in actual applied projects one should assess whether the final model is correctly specified. One could also extend the model to account for time-varying treatments, in both the observation process and longitudinal outcome sub models. That would however require further investigations to assess the performance of the joint model in those settings.
We assumed the treatment to be constant over time for simplicity, but in real-life settings individuals are likely to start and drop treatment when deemed necessary by their treating physician. We assumed the baseline hazard of the recurrent events model for the observation process to follow a Weibull distribution: this assumption could be further relaxed, and one could assume any parametric function, or even use flexible, spline-based formulations (e.g. Royston and Parmar (2002)). Additionally, for diseases with a high mortality rate, a terminal event that truncates observation of the longitudinal process is likely to be informative in the sense that it likely correlates with disease severity. That is, dropout is likely to be informative as the tendency to drop out after the occurrence of a terminal event is related to the current level of the longitudinally recorded biomarker. The proposed model could be easily extended to include a third equation with a time to event submodel for the dropout process, as in Liu et al. (2008). All of these extensions can be fit within the general framework of Crowther (2017) using the Stata command merlin. Finally, we could explore the association structure between the two sub models. For instance, we could reverse the association structure and include γ in the observation submodel: in that setting, assuming a positive association, higher values of the longitudinal process would lead to a more frequent visiting process (and vice-versa in the setting of negative association). The observation process could also depend on lagged values of the longitudinal outcome or of the exposure; this would relax the semi-Markov assumption in some of our data-generating mechanisms. More biologically (and clinically) plausible association structures (such as the current value, current slope, cumulative effect parametrisations) could also be investigated; more details in Rizopoulos (2012).
In conclusion, it is important to account for the visiting process when analysing health care utilisation data and we showed that ignoring it leads to biased estimates. Given the wide range of applied settings in which this could be relevant, the review of Farzanfar et al. (2017) points towards a lack of awareness of the problem and the lack of readily available, user friendly software to fit more complex joint models; throughout this paper, we outlined a framework in which merlin could be easily used to fit complex joint model and help to reduce this translational gap. We provide example code using Stata in the Online Supplementary Material.