Estimating the Effect of Early Treatment Initiation in Parkinson's Disease Using Observational Data

ABSTRACT Background Both patients and physicians may choose to delay initiation of dopamine replacement therapy in Parkinson's disease (PD) for various reasons. We used observational data to estimate the effect of earlier treatment in PD. Observational data offer a valuable source of evidence, complementary to controlled trials. Method We studied the Parkinson's Progression Markers Initiative cohort of patients with de novo PD to estimate the effects of duration of PD treatment during the first 2 years of follow‐up, exploiting natural interindividual variation in the time to start first treatment. We estimated the Movement Disorder Society–Unified Parkinson's Disease Rating Scale (MDS‐UPDRS) Part III (primary outcome) and several functionally relevant outcomes at 2, 3, and 4 years after baseline. To adjust for time‐varying confounding, we used marginal structural models with inverse probability of treatment weighting and the parametric g‐formula. Results We included 302 patients from the Parkinson's Progression Markers Initiative cohort. There was a small improvement in MDS‐UPDRS Part III scores after 2 years of follow‐up for patients who started treatment earlier, and similar, but nonstatistically significant, differences in subsequent years. We found no statistically significant differences in most secondary outcomes, including the presence of motor fluctuations, nonmotor symptoms, MDS‐UPDRS Part II scores, and the Schwab and England Activities of Daily Living Scale. Conclusion Earlier treatment initiation does not lead to worse MDS‐UPDRS motor scores and may offer small improvements. These findings, based on observational data, are in line with earlier findings from clinical trials. Observational data, when combined with appropriate causal methods, are a valuable source of additional evidence to support real‐world clinical decisions. © 2020 The Authors. Movement Disorders published by Wiley Periodicals LLC on behalf of International Parkinson and Movement Disorder Society

A simplified version of the underlying causal model we assume in this work is shown in Figure 1S. L in this figure denotes the time-varying confounders at each stage. These are formed by patient characteristics at each visit, and we assume they are adequately captured by the MDS-UPDRS part scores, MoCA and MSE-ADL in addition to the background characteristics of the patient: Age, Sex, Years of Education, disease duration and history of cardiovascular disease. An important insight from this model is that merely correcting for baseline measurements and characteristics is not enough to estimate the causal effect of intervening on the length of therapy in the first two years. Failing to properly adjust for the patient characteristics at each treatment decision could, for instance, improperly attribute longer therapy with bad outcomes, simply because the groups of patients that started medication therapy earlier, already had a more advanced state of the disease when the decision to start medication therapy was made. This confounding effect is known as time-varying confounding. The inverse probability of treatment weighting (IPTW) and parametric g-formula methods used in this work offer a way to adjust for this time-varying confounding, either by reweighting patients in such a way that in the reweighted sample the chance of receiving each treatment no longer depends on the covariates at each stage, or by modelling and simulating disease trajectories under all possible treatment assignments for each patient.

IPTW models
In IPTW, different weights are assigned to different patients to correct for the fact that the probability of starting treatment is not equal for each patient, like in a randomized experiment, but is influenced by the current disease state. Weighting patients differently is a way of equalizing these probabilities in the observational data to estimate what would have happened if the treatment initiation was properly randomized. To find the weights, at each visit we estimate the prior probability of starting treatment for those that have not started yet, which will be used to stabilize the weights. We then estimate the probability of starting treatment for those that have not started yet given the untransformed covariates (age, sex, years of education, disease duration, a binary indicator for history of cardiovascular disease, MDS-UPDRS part I, II and III scores, MSE-ADL scores and, if available, MoCA score) of only the previous visit, using a linear logistic regression model. Note that we do not pool the data across visits, which allows the model to be different for the different visits.
We divide the prior of the observed decision (started vs. not-started) by the probability of the observed decision given by this logistic regression model. By multiplying these weights for the different time-points, we get a final weight that is used for estimating the working marginal structural model. For censoring in the IPTW models, we use the same procedure as for the treatment weights, and multiply the censoring weight by the weight given by the treatment models to get an overall weight which we use in the working marginal structural model.
In the working marginal structural model (MSM), we use a linear model of the time on treatment. To avoid a mismatch between the continuous time treatment information and the discrete time confounding adjustments in our models, we define the time on treatment as the number of treatment periods the patient is recorded to be using PD medication, multiplied by the length of the time periods (0.5 year). In other words, we define it to be 0.5 ∑ 4 =1 , rather than the precise length of the treatment recorded in the study. In most analyses, in the working MSM, we adjust for the outcome measured at baseline (indicated here as "Baseline Adjusted"). In this supplement, we also report results when adjusting for all baseline variables, not just the particular outcome of interest (indicated by "Baseline Adjusted All"), and with and without adjustment for censoring using inverse probability of censoring weighting (indicated by "Censoring"). We also show results when adjusting for outcome at baseline and the Levodopa Equivalent Daily Dose of the patient at the time of the outcome measurement (indicated using "LEDD"). While this latter model may seem to be sensible to adjust for the long duration response of medication on the measurement, the causal interpretation of these results is not straightforward, since LEDD is likely influenced by the treatment decision whose effect we are trying to estimate. Finally, we use a linear regression model both for the continuous as well as for the dichotomous outcomes, which allows for both a uniform implementation of the models and presentation of the results. For the dichotomous outcomes, a logistic regression model would have been a sensible alternative choice.

Parametric g-formula models
When using the parametric g-formula, the goal was to estimate the effect of an intervention by constructing a model of disease progression over time, and we then used this model to simulate the effect of different interventions on the outcomes, starting from the observed values at baseline. We constructed models for the confounders and outcomes at each visit, based on data from the previous visit and the patient's treatment status, using linear regression models. We allow for treatments earlier than the current visit to influence the prediction of the next observation. These links are not shown in the causal model in Figure  1S, but are reflected in the fitted model which contain the observed values of the covariates at the previous available time-point (and the baseline measurements, for the predictions at the final time-point) and all available previous treatment decisions. The variables that are included are treatment status, age, sex, years of education, disease duration, an indicator for the presence of cardiovascular disease, MDS-UPDRS part I, II and III scores, MSE-ADL and, for yearly measurements, MoCA score. To take the correlations between the timevarying covariates at each time point into account, we iteratively add the variables for the current time-point to the models, in the order just mentioned. For instance, the model to generate the current MDS-UPDRS part I score only contains variables from the previous visit, while the model for the current part II score includes the current part I score, and the model for the part III score includes the current part I and part II scores. Using these models, for each patient from the subpopulation of interest, for each of the 5 possible treatment decisions decisions (starting therapy at year 0, 0.5, 1, 1.5 or 2), we simulate 100 trajectories for the covariates and outcomes, which, in the analysis where we consider 5 options for the treatment length and the whole sample at baseline, gives (416*5*100=208000) trajectories. We use the same working marginal structural model on these simulated outcomes as used in the IPTW method to generate the final estimates.

Missing data
Since the efficiency of the estimators of some of our models improved by having access to complete observations at all time points under consideration, we imputed missing values. For the imputation of PD treatment status, we used the procedure described below in 'Missing treatment status'. For both the outcome measures and potential confounders, we imputed missing data as follows. If two consecutive measurements were missing we considered a patient as having dropped out of the study, and measurements at future time points were considered missing. If a single measurement was missing, we used the previous and next measurements to impute the missing values using a regression model estimated based on the observations of other patients that had no missing data at that time point.

Missing treatment status
If the information on treatment status was missing in a visit we instead use whether or not there was a positive LEDD in the medication log. For the use of levodopa specifically, we imputed as follows: if a patient used levodopa at the previous visit, we assumed the patient had started treatment and a positive levodopa status was imputed; if the patient had a negative levodopa status at the subsequent visit, we assumed the patient had not started levodopa and a negative levodopa status was imputed; if the levodopa status was negative at the previous visit but positive in the next visit, we assumed the patient also used levodopa at the current visit.

Non-linearity of the effect
In the results above we assumed in the marginal structural model that the effect of treatment was linear: adding half a year of treatment has the same effect regardless of whether one goes from 0 years of treatment to 0.5 year of treatment or from 1.5 years of treatment to 2 years of treatment. Here we estimate how the effect differs for different time points. "Treatment @ 2" refers to having treatment at year 2 (so in the final six months), Treatment 1.5 is between year 1 and 1.5, etc.

Effect modification
To estimate the potential modification of the effect for different baseline characteristics of the patient, using the inverse probability of treatment weights for each outcome/year we estimate a working marginal structural model containing main effects and interaction terms of treatment length with age, sex, disease duration and three MDS-UPDRS subscores at baseline in a single model.

Treatment Probability Weights
An important assumption for the causal effect of interest to be identified is the positivity assumption of the different treatments: for each treatment decision, both starting treatment and non-treatment have to have a positive probability of occuring, for all strata in the population of interest. Apart from identifiability issues, small treatment probabilities can also cause estimation issues in inverse probability weighting, since they lead to some observations receiving large weights. For our main analysis (MDS UPDRS part III, measured at year 2), we report the distribution of the estimated probability of the assigned treatment, the estimated probability of non-censoring, and the final stabilized weight used in our working marginal structural model. The bi-modal distribution in the propensity scores for treatment seems to be caused by the changing prior of starting treatment for the different time points. Because of the use of stabilized weights (where we multiply by the prior probability of the observed treatment assignment), the stabilized inverse probability weights do not show this strong bi-modality.

Sensitivity to alternative model specifications
Constructing models of complex disease and decision processes using limited data requires trade-offs in bias and variance of the model estimates. In the main analyses, we have focused on simple linear logistic and linear regression models, using a limited history of measurements (usually constrained to the previous measurements and constant patient characteristics, rather than a longer measurement history). Here we investigate the effect of making different choices in the probability of treatment modelling.
We report the results of the IPTW estimates at year 2 when taking only the previous measurement into account (as in our main results, denoted here as "Markov" to reflect the Markov assumption), when taking full measurement history into account while still using linear logistic regression models (denoted as "Non-Markov"), as well as when using this full history for a non-parametric model, in this case a random forest classifier (denoted as "Non-parametric"). For the random forest classifier, we make use of R's "randomForest" package, using the default settings, without cross-validation: 500 trees, a minimum node size 1, and the square root of the number of variables as the number of variables considered at each split. These defaults are also used in the "SuperLearner" implementation that we discuss in the next section.
The results show the increased variance caused by the larger number of features in the unregularized models. The non-parametric models lead to similar point estimates as our main analysis. We do not report the estimated confidence interval for these models, as these would likely be an underestimate of the true uncertainty. Figure 13S: Effect of a year of PD treatment (any medication) during the first two years of follow-up on outcomes at year 2 (estimated using only those patients who were using medication at the time of the outcome measurement) using the IPTW estimator and different propensity score models.

Targeted Maximum Likelihood Estimation
In our main analysis, we report on the performance of linear working MSMs with inverse probability weighting, as well as estimates using the parametric g-formula, and interpret the combination of their resulting estimates. An alternative is to combine these approaches into a single doubly robust estimate, with the guarantee that if either the probability of treatment model or the disease progression model (or both) is correctly specified, the combined estimator is a consistent estimator for the causal effect of interest. One elegant and efficient procedure to do this is targeted maximum likelihood estimation (van der Laan, M. J. (2010). Targeted maximum likelihood based causal inference: Part I. The international journal of biostatistics, 6(2).). We have attempted to apply this method to our problem by using the ltmle R package (Lendle, S. D., Schwab, J., Petersen, M. L., & van der Laan, M. J. (2017). ltmle: an R package implementing targeted minimum loss-based estimation for longitudinal data. Journal of Statistical Software, 81(1), 1-21.). While this package covers a large number of modelling scenarios, it does not exactly map on to the problem setup we used in our main analyses, which makes it challenging to directly compare the results. Most importantly, the package uses logistic regression models on transformed outcomes (where outcomes are mapped to be in [0,1]) as the working MSM. While being an interesting modelling choice, the coefficients in these models do not directly correspond to the coefficients of the linear working MSM that we report in the main analyses.
The figure below shows the results of the estimates using the ltmle package for MDS-UPDRS Part III 'OFF' scores at the end of year 2, in a working MSM that includes the baseline MDS-UPDRS Part III scores. The propensity score and "Q" models include all previous measurements of the covariates (described as the "Non-Markov" setting in the previous section). We use both logistic regression for these models (results shown at the top of the figure) as well as a combination of logistic regression, Bayesian logistic regression and random forest models, that are combined using the SuperLearner package (see Van Table 3S shows the contribution of the different models to the propensity score estimates at the different time points. Figure 14S: Effect estimate of PD treatment (any medication) during the first two years of follow-up on outcomes at year 2 (estimated using only those patients who were using medication at the time of the outcome measurement) on MDS UPDRS Part III using the ltmle package for estimation. Note that the working MSM is a logistic regression model on the scaled outcomes, and the estimated beta is the parameter in this model, not the slope in the linear MSM used in the main analysis.