Individualized dynamic prediction of survival with the presence of intermediate events

Often, in follow‐up studies, patients experience intermediate events, such as reinterventions or adverse events, which directly affect the shapes of their longitudinal profiles. Our work is motivated by two studies in which such intermediate events have been recorded during follow‐up. In both studies, we are interested in the change of the longitudinal evolutions after the occurrence of the intermediate event and in utilizing this information to improve the accuracy of dynamic prediction of their risk. To achieve so, we propose a flexible joint modeling framework for longitudinal and time‐to‐event data, which includes features of the intermediate event as time‐varying covariates in both the longitudinal and survival submodels. We consider a set of joint models that postulate different effects of the intermediate event in the longitudinal profile and the risk of the clinical endpoint, with different formulations for the association structure while allowing its functional form to change after the occurrence of the intermediate event. Based on these models, we derive dynamic predictions of conditional survival probabilities which are adaptive to different scenarios with respect to the occurrence of the intermediate event. We evaluate the predictive accuracy of these predictions with a simulation study using the time‐dependent area under the receiver operating characteristic curve and the expected prediction error adjusted to our setting. The results suggest that accounting for the changes in the longitudinal profiles and the instantaneous risk for the clinical endpoint is important, and improves the accuracy of the dynamic predictions.


S2 Predictive performance comparison between extrapolation method and joint models with intermediate events in the SPRINT data
To illustrate the use of the predictive performance measures presented in Section 3, we hereby present a comparison in terms of the time-dependent area under the receiver operating characteristic curve (AUC) and the expected prediction error (PE) for the SPRINT data.
More specifically, we are interested to compare the joint model used for the analysis of the SPRINT data which postulates effects of both the current value and current slope of the systolic blood pressure trajectory on the instantaneous risk of the composite endpoint (as presented in Section 4.2) against the corresponding model that ignores the longitudinal data after the occurrence of the serious adverse event (Extrapolation method). Specifically, for the extrapolation method the following model was used for the evolution of the systolic blood pressure: while for the instantaneous risk the same model was used for both methods. To compare the two approaches, we randomly split the data in half to a training and test part. Then we fitted the models to the training part and evaluated the AUC and PE for both models using the test part. The evaluation was done at 6 time-intervals: t = 2.5, t = 2.75, t = 3, t = 3.25, t = 3.5, and t = 3.75 assuming a clinically relevant time interval of a quarter of a year ∆t = 0.25.
The results are shown in table S3.7. In the specific application the two approaches are virtually similar. This can be explained by the fact that the occurrence of a serious adverse event did not have a strong effect on the evolution of systolic blood pressure. However, even in this case both approaches achieve similar performance in terms of predictive accuracy.

S3 Sample R Code
In this section we provide a sample R code for fitting the joint models proposed in the paper and then utilizing them to derive predictions under different scenarios for the occurrence of intermediate events.
As an illustrative example we use one of the 500 datasets generated for the simulation study, presented in the paper, under scenario 1. Under this scenario we assume that the trajectory of the longitudinal outcome immediately changes (drops) at the occurrence of the intermediate event while the rate of increase (slope) also changes after the occurrence of the intermediate event, implying two special features of the longitudinal outcome that need to be considered.
To start with the code, the R package JMbayes needs to be installed and loaded.
# Install package JMbayes install.packages("JMbayes") # or for the latest development version of JMbayes # Note that package devtools needs to be installed and loaded install_github("drizopoulos\JMbayes") For the installation of JMbayes we do recommend the second option (via github) since it is frequently updated.
Before we begin fitting the models it is important to have a look in the dataset, since it needs to be appropriately prepared before we use it. Table S3.1 shows the data from two subjects in the dataset, one who experienced the intermediate event during follow-up and one that did not. The data are in long format and the following information are given: • ID: Subject's identification number.
• time: Time the measurement of the longitudinal outcome was obtained.
• Y: Observed longitudinal outcome. • Tstart: Starting time point for interval regarding the survival outcome in counting process format.
• Tstop: Stop time point for interval regarding the survival outcome in counting process format.
• Event: Event indicator corresponding to the interval defined by Tstart and Tstop. In Table S3.1 it is important to note that for subject 2, for which the intermediate event did not occur, data on the timing of the occurrence of the intermediate event are most likely to appear as missing observations in real life datasets. This causes a practical issue since in order to account for the special features, the occurrence of the intermediate event imposes on the longitudinal trajectory, we need to introduce appropriate time-varying covariates in the dataset, such as R (t) and t i+ which were discussed in Section 2 of the manuscript. That is we need to use the information we have regarding the timing of the occurrence of the intermediate event in creating these new variables, but doing so using missing observations will also result in missing observations for the new variables. To surpass this, we propose to change the missing observations for the timing of the occurrence of the intermediate event to arbitrarily large values. That is: # For the remainder the R symbol 'dat' will be used to refer to the dataset # Replace NAs for Interm.Evnt.Time with 1e+6 dat$Interm.Evnt.Time <-ifesle(is.na(dat$Interm.Evnt.Time), 1e+5, dat$Interm.Evnt.Time) Here we chose a large unrealistic value of 100000 years that is impossible to occur in reality. Table S3.2 shows how the data for the 2 nd subject changed.  Table S3.3 shows the updated data on the subset of the data that includes the two subjects we use in the example. We see that the two time-varying covariates are now included in the dataset. For subject 1, for all the time points following the occurrence of the intermediate event, the two time-varying covariates change accordingly whereas for subject 2 they are zero since the occurrence of the intermediate event was not observed during follow-up. Now that the data are ready, we can proceed to fitting the joint model. This is no different than fitting any other joint model with package JMbayes apart from including in the model specification the newly introduced time-varying covariates. Hence we first fit separately the Cox relative risk and mixed-effects submodels and then we pass the corresponding objects to function jointModelBayes() to obtain the final joint model fit: # Fit time-dependent Cox model cox.fit <-coxph(Surv(Tstart, Tstop, Event)~Int.Evnt.Index + cluster(ID), data = dat) # Fit the mixed-effects model mix.fit <-lme(Y~time + Int.Evnt.Index + time.relative, random =~time + Int.Evnt.Index + time.relative | ID, data = dat) # Fit the Joint model joint.fit <-jointModelBayes(mix.fit, cox.fit, timeVar = "time") After obtaining the joint model fit, we can proceed in deriving dynamic predictions under different scenarios for the occurrence of the intermediate event for new subjects. In Table S3.4 the data for a new subject which has not experienced the intermediate event are shown. For the code illustration this data are assumed to be included in a data.frame called newdata. Since this subject has not experienced the intermediate event, all the relevant information regarding the occurrence of the intermediate event are shown as missing observations. Again in order to derive predictions under different scenarios regarding the occurrence of the intermediate event we need to manipulate the data accordingly. We will illustrate how the data need to be changed for investigating three different scenarios regarding the occurrence of the intermediate event: The changes in newdata1 are shown in Table S3.5. In a similar manner we create datasets newdata2 and newdata3 for scenarios 2 and 3 respectively. The only notable difference here, is that we add a new row in the dataset for which we only specify the covariates associated with time and the intermediate event accordingly but we specify the longitudinal outcome as missing for the specific row, since no data are available. The changes in newdata2 and newdata3 are shown in Tables S3.6 S3.7, respectively. Finally, the predictions under the different scenarios can be obtained by applying function survfitJM() to the fitted models and datasets we just created: # Predictions for scenario 1 preds.1 <-survfitJM(object = joint.fit, newdata = newdata1, idVar = "ID") # Predictions for scenario 2 preds.2 <-survfitJM(object = joint.fit, newdata = newdata2, idVar = "ID") # Predictions for scenario 3 preds.3 <-survfitJM(object = joint.fit, newdata = newdata3, idVar = "ID")

S4 Data Availability Statement
The data used for the Pulmonary Gradient analysis are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions. The data used for the SPRINT trial are available from BioLINCC.