Combining randomized and non-randomized data to predict heterogeneous effects of competing treatments

Some patients benefit from a treatment while others may do so less or do not benefit at all. We have previously developed a two-stage network meta-regression prediction model that synthesized randomized trials and evaluates how treatment effects vary across patient characteristics. In this article, we extended this model to combine different sources of types in different formats: aggregate data (AD) and individual participant data (IPD) from randomized and non-randomized evidence. In the first stage, a prognostic model is developed to predict the baseline risk of the outcome using a large cohort study. In the second stage, we recalibrated this prognostic model to improve our predictions for patients enrolled in randomized trials. In the third stage, we used the baseline risk as effect modifier in a network meta-regression model combining AD, IPD RCTs to estimate heterogeneous treatment effects. We illustrated the approach in the re-analysis of a network of studies comparing three drugs for relapsing-remitting multiple sclerosis. Several patient characteristics influence the baseline risk of relapse, which in turn modifies the effect of the drugs. The proposed model makes personalized predictions for health outcomes under several treatment options and encompasses all relevant randomized and non-randomized evidence.


Introduction
Applications of network meta-analysis to health care questions typically report population-average results about the effects of competing treatments. 1,2The applicability of such results is limited for decision-making purposes, as some patients might benefit greatly from a treatment while others may do so less or do not benefit at all.Network meta-regression models of studies with individual patient data (IPD) can be used to estimate such heterogeneous treatments effects and indicate the preferable treatment for each patient based on their characteristics. 3,4,5e frequently used method to examine whether treatment effects vary across individuals is to divide patients into subgroups based on possibly relevant baseline characteristics, e.g., by performing separate univariate network meta-regression models by age, gender, etc. 6,7 In practice however there are numerous characteristics that can influence the treatment effects.A series of 'one-variable-at-a-time' subgroup analyses is a suboptimal approach; underpowered and prone to false-positive results because of multiple comparisons, which fails to address jointly all clinically relevant effect modifiers. 6,7Therefore, network meta-regression models aiming to predict the optimal treatment for each patient should preferably account for all relevant individual's baseline characteristics simultaneously.The most commonly used methods for individualized predictions are the effect modeling and the risk modeling approaches. 8,9,10,11,12,13e effect modeling approach is a regression model with multiple variables and interaction terms between them and the treatment. 7The effect modeling approach is a flexible method for predicting individualized treatment effects but presents some practical difficulties. 11,7First, it is prone to overfitting because often a large number of model parameters must be estimated from a small or insufficient sample size. 14,15,16though penalization approaches could potentially alleviate the risk of overfitting, research on penalization in (network) meta-regression models is still at an experimental phase. 12Risk modeling approaches have been developed as a solution to these shortcomings.
The risk modelling approach is a two-stage method to estimate heterogeneous treatment effects within a trial.The approach assumes that the risk of the outcome estimated at baseline (often a proxy for the severity of the condition, the presence of comorbidities etc.) could moderate the treatment effects. 3,4,5,7,11,17In the first stage, the outcome risk for each patient is predicted according to their characteristics at baseline.In the second stage, the interaction between the baseline risk and the treatment effect is estimated. 3,6,11,18,19,20,21The same trial data (internal risk modelling) or different datasets (external risk modelling) can be used at each stage. 3,4,5,7The risk modelling approach can be thought of as a parameter reduction method, which reduces the risk of overfitting which is one of the most important problems when dealing with a large number of treatment-covariate interactions.Risk modelling outperforms the effect modification method in terms of dimensionality, power, and limited prior knowledge by taking advantage of well-established penalization and variable selection methods in multivariable prognostic models. 11In addition, risk modelling method can take advantage of the numerous existing established penalization and variable selection methods in multivariable prognostic models (e.g., as the required one in the first stage).Risk modelling method is already used in clinical practice, 7,11,12 and was in detail described and developed, obtaining information from a single RCT with IPD. 12,19Analysis of RCTs using risk modelling has been successfully applied in various clinical areas to estimate heterogeneous treatment effects. 7,11,12,19The internal risk modelling approach was recently extended into a network meta-regression model of RCTs with IPD. 22e aim of this paper is to extend the previously developed model in two directions. 22First, to use observational rather than randomized studies to develop the prognostic model at the first stage.Second, to increase the power and precision of the estimated treatment effects by including studies that report only aggregated data (AD). 23We implemented the network meta-regression model in a Bayesian framework, and we used it to predict the probability of experiencing at least one relapse within the next two years for three drugs and placebo in patients with relapsing-remitting multiple sclerosis (RRMS).

Motivating example and data
Multiple sclerosis is an immune-mediated disease of the central nervous system with various subtypes.Its most common subtype is RRMS. 24Patients with RRMS present with acute or subacute symptoms (relapses) followed by periods of complete or incomplete recovery (remissions). 25Reduction in relapse rates has been commonly used as the primary efficacy endpoint in phase III randomized clinical trials leading to market approval of drugs. 24Some of the drugs, like natalizumab, are associated with rare but serious side effects, while others, like dimethyl fumarate, are considered to be safer options. 26,27 illustrate the methods in datasets including patients with confirmed RRMS.Table 1 presents the outcome and the patients' baseline characteristics for the included datasets.The outcome of interest is the number of patients that experience at least a relapse within two years from baseline.

Observational evidence
We included 935 patients enrolled in the Swiss Multiple Sclerosis Cohort (SMSC). 28Each patient contributed one, two or three cycles of twoyears of follow-up.At the beginning of each cycle, several patient characteristics were recorded, such as age, gender, Expanded Disability Status Scale (EDSS), etc., and we considered them as 'baseline characteristics' for the respective cycle.We included 1752 patient cycles in total.

Randomized evidence
We had access to IPD from three phase-III RCTs with 3590 patients assigned to placebo, natalizumab, dimethyl fumarate or glatiramer acetate. 29,30,31We included a subset of 2150 patients with complete covariate and outcome information, assuming that any missingness does not depend on the risk of relapsing.

Methods
We proposed a three-stage model.In the first stage, we built a prognostic model using the SMSC.In the second stage, we recalibrated the model to estimate the baseline risk in patients enrolled in RCTs.In the third stage, we estimated heterogeneous treatment effects from a network meta-regression model that synthesizes AD with IPD from RCTs and includes the baseline risk as a prognostic factor and effect modifier.
All our analyses were done in R 32 version 3.6.2and in JAGS 33 (called through R).The code can be found in the GitHub repository: https://github.com/htx-r/Reproduce-results-from-papers/tree/master/ThreeStageModelRRMSNotation Consider a set of treatments ℋ each denoted by ℎ = 1, 2, … , .Let  ℎ denote the dichotomous outcome for individual =1, 2, …,  under treatment ℎ in the  th trial, and a total of  trials.An individual can experience the outcome ( ℎ = 1) or not ( ℎ = 0).  is the  th prognostic factor,  = 1, 2, … .The  prognostic factors are used to estimate the baseline risk   (independent of treatment), for each participant.The probability of the outcome to occur for individual  in study  under treatment ℎ is denoted by  ℎ and depends on treatment, baseline risk   and the interaction between the baseline risk and the treatment.We use asterisk (*), to differentiate between the estimations before and after re-calibration:   * indicates the baseline risk before the re-calibration, estimated using the SMSC, while   indicates the baseline risk after re-calibration, for the RCTs population.

Stage 1: Development and internal validation of the baseline risk prognostic model
There is plenty of guidance about how to develop and validate a prognostic model. 34,35,36,37Good practice involves the use of appropriate model selection methods (or pre-specifying the model), shrinkage in the coefficients to avoid extreme predictions, accounting for missing data and correcting for optimism when the model performance is evaluated internally.
In our approach we developed the prognostic model using a non-randomized study for the baseline risk,   * , for each individual .We used a logistic mixed-effects model, which accounts for information about the same patient from different cycles.(, where  = 1, 2, … , ), in a Bayesian framework as_ (  (  ) =  0 + (  * ) (Equation 2) to the RCTs data.The intercept  0 could be assumed exchangeable ( 0 ~( 0 ,   0 2 ), or common ( 0 =  0 ) across studies.
Another option is to recalibrate the intercept and the overall calibration slope,   . 43This will also update the overall effect of the prognostic factors for the RCTs setting.We first estimate the uncalibrated predictions   * for the RCT population, and then we estimate the following model (Equation 3) where the intercept and the overall regression coefficient of (  * ) could be assumed exchangeable ( 0 ~( 0 ,   0 2 ), )) or common ( 0 =  0 ,  overall =   ) across studies.The recalibrated predicted risk score   is obtained from equation 3 after estimating  0 and  overall .
A more comprehensive option is to re-calibrate the intercept, the overall slope (as above) and in addition re-estimate some of the regression coefficients as needed. 43The re-estimated baseline risk for RCT patients,   , will be finally estimated as: (Equation 4) where  0j and   are the recalibrated intercept and regression coefficients, and as before can be assumed exchangeable ( 0 ~( 0 ,   0 2 ), )) or common ( 0 =  0 ,  k =   ) across studies. 43e common-effects assumption for  0 ,  overall ,    , ignores trial differences, and could be used only in cases where RCTs were designed similarly, have similar protocols and inclusion criteria.In addition, these parameters could be also estimated independently (e.g., each  0 ,  overall ,    is given a prior distribution); however, this would lead to different baseline risk scores per study, which in turn, would pose challenges in the baseline risk score estimation of an external dataset (other than the RCTs used for the recalibration).
In the application we use the re-calibration method associated with the best balance between model's calibration (i.e., the agreement between the observed outcome's proportions and the predicted probabilities) and discrimination ability (i.e., Area Under the Curve (AUC)).

Stage 3: Network meta-regression with individual and aggregate data using the baseline risk as prognostic factor and effect modifier
In the third stage, we used the calibrated (  ) from stage 2 as covariate in a network meta-regression model. 23We extended the metaregression model suggested by Saramago et.al. to combine IPD and AD in a network of trials comparing multiple treatments. 23In the first part, we modelled studies with IPD (Equation 5) where () �����������  is the mean logit baseline risk from all patients in study , and each study  has a reference treatment ℎ , ∈ ℋ.
The parameters of interest are the relative treatment effects  ℎ , ℎ .We estimated independently the nuisance parameters   for each study (i.e., the log odds of experiencing the outcome under the study's reference treatment).The coefficients g 0j measure the prognostic impact of baseline risk and can be assumed independent, exchangeable ( 0 ~N(γ 0 , σ γ 0 2 )), or common ( 0 = γ 0 ) across studies.To estimate the average baseline risk () �����������  we simulated pseudo-IPD using a multivariate normal distribution with mean equal to the study-level covariate values (  ������ ) as reported in the published trials, and variance covariance matrix between the included covariates as estimated through the available IPDs.In this way, we avoid the aggregation bias which would be induced if we simply plugged-in the mean value of each prognostic factor   ������ in the prediction model from stage 2.
The mean values of some of the prognostic factors might not be reported in the original studies.In that case we used imputations to allow studies with partial information on covariates to be included in the meta-regression model, as previously described by Hemming et al (described in Appendix). 44n the third part, the relative treatment effects,  ℎ , ℎ , can be combined across studies in a random-effect ( represents an estimate of ecological bias (i.e., the difference between across-study associations and within-study associations, due to study-level confounding). 45To ensure we do not introduce bias, the within-study effect modification ) is estimated through the IPD studies alone (Equation 5), whereas both IPD and AD studies are used for the between-study effect modification estimation ( ℎ , ℎ

Making treatment-specific outcome predictions
To predict the probability p   h of the outcome in a new patient   under treatment ℎ, we first estimate �   � from stage 2, and then, under the common treatment effects assumption, we use the following equation 7) The values for  ℎ , γ ℎ W , and γ ℎ B , are those estimated in the third stage of the network meta-regression prognostic model.What data to use to obtain values for -the logit-probability of the outcome under the reference treatment (placebo, in our example),--the regression coefficient of (),and () ����������� -the mean of logit baseline risk across all individuals-depends on the context within which we plan to make predictions.If our aim to is make predictions for RCT populations, then placebo arms can be used to estimate ,  and all RCT patients to estimate the () ����������� .If we aim to make predictions in a real-world population, then registry or a cohort data should be used instead.Under the random effects assumption, �   ℎ � would be normally distributed with mean as in Equation 7, and a variancecovariance matrix determined by the variance-covariance matrix of  ℎ , and γ ℎ W .
Figure 1 presents a schematic presentation of the aim, data and parameters of each stage of the approach.Information about the studies used in the example are also presented.

Application: heterogeneous effects of treatments for RRMS
We developed the prognostic model using the SMSC data. 28The development of the prognostic model in stage one has been previously published and is implemented as Shiny app in https://cinema.ispm.unibe.ch/shinies/rrms/. 38 first selected the prognostic factors via a review of the literature. 46,47,48,49,50,51We included all prognostic factors that were included in at least two previously published prognostic models.The model includes eight prognostic factors: age, sex, EDSS, prior or current treatment (yes or no), months since last relapse, disease duration, number of relapses in the previous two years, number of gadolinium enhanced lesions.We then fitted a logistic mixed-effects regression model in a Bayesian framework accounting for correlations induced by individuals contributing data to more than one cycle.The SMSC includes 1752 observations from two-years cycles of 935 patients, and 302 of those patients experienced at least one relapse.The full model had 22 degrees of freedom (for 10 predictors with random intercept and slope) and the number of events per variable was 13.7.To shrink the coefficients of the regression and avoid extreme predictions, we used Laplace prior distributions. 52We used multiple imputation to account for missing covariate data. 53,54After internal validation, the bootstrap optimism-corrected AUC was 0.65 and the bootstrap optimism-corrected calibration slope 0.91.The calibration plot and the evaluation of the model's clinical usefulness are presented elsewhere. 38The model's accuracy and clinical performance are overall suggesting a useful prediction model.
We then re-calibrated the prognostic model for the RCT setting (stage 2).All three RCTs used for the re-calibration of the baseline risk model in stage 2 were designed under similar protocols from the same company, sharing similar distribution of baseline characteristics as presented in Table 1.Therefore, we assumed common effects across the three RCTs for estimating the intercept (in Equations 2 to 4), the overall regression coefficient (in Equation 3), as well as the regression coefficient for each prognostic factor (Equation 4).Alternatively, when studies share different designs, protocols, and inclusion criteria random effects across studies should be assumed in stage 2. We assessed the predictions of the developed re-calibrated models in a calibration plot with loess smoother (Appendix figure 1).All three models appear to predict accurately the probability to relapse within the next two years.The re-calibration method resulting in the highest AUC (AUC=0.61)was "the recalibration and selective re-estimation" approach (Equation 4); the other two methods resulted in AUC=0.58.Based on the existing literature, risk models with a low predictive ability (0.6 -0.65) are often adequate to detect risk-based heterogeneous treatment effects. 19,55 herefore, we selected the baseline risk from "the re-calibration and selective re-estimation" method to use in the next stage of the risk modelling approach.
Appendix table 1 presents the re-calibrated regression coefficients for each prognostic factor.
In Figure 2 we show the distributions of the predicted baseline risk by relapsing status in the populations included in the three RCTs.The overall mean predicted baseline risk was 36.8% (95% Credible Interval (CrI) 36.4% to 37.2%).The overlap in the distributions was large, as reflected by the low AUC.For patients who experienced a relapse, the mean predicted risk was 39.2% (95% CrI 38.5% to 39.8%) whereas for patients who did not, it was 35.4% (95% CrI 34.9% to 35.9%).
The predicted baseline risk in the RCT populations was then used in the network-meta-regression model (equations 5 and 6, stage 3).
Because only two AD studies were available, we assumed that  ℎ ).We also assumed common coefficients for the prognostic effect of the baseline risk (g 0j = γ 0 ), as all three studies with IPD data were very similar in terms of design and patient characteristics.
None of the 2 AD studies provided information about the number of patients with prior treatment, gadolinium enhanced lesions, and months since last relapse; we performed imputations as described in the Appendix.Then, we created 2 pseudo-IPDs (one for each AD study) via a multivariate normal distribution with mean the reported (or imputed) mean covariate values and variance covariance matrix as estimated via the available IPDs.We estimated the baseline risk for each observation in each of the pseudo-IPD datasets.The estimated mean baseline risk for each study is presented in Table 1.
Table 2 shows the estimated parameters from the network meta-regression model (stage 3).The estimated values of  0 indicate that baseline risk is an important prognostic factor for relapse.We first make predictions for the RCT populations and hence we estimate , and  by synthesizing data from placebo individuals across the three RCTs with IPD and () ����������� as the mean of (  ) across all individuals in RCTs with IPD.The treatment effects as a function of the baseline risk are shown in Figure 3.In addition, Appendix figure 2 presents the final predictions with their 95% CrIs.Natalizumab gives the lowest probability of relapsing over almost the entire baseline risk range.However, its advantage over dimethyl fumarate for patients with low baseline risk (below 30%, on average) is very small.We also assessed the prediction model's accuracy using the calibration plot with loess smoother (Appendix figure 3).The prediction model predicts accurately the probability of relapsing within the next two years.
Additionally, we performed a sensitivity analysis comparing the final predictions from stage 3 under all three recalibration methods (Appendix table 2, and Appendix figure 4).All three recalibration methods lead to similar final predictions.On average, natalizumab minimizes the predicted probability of relapsing within the next two years (about 7% to 12% mean absolute difference compared to dimethyl fumarate).For low-risk patients (baseline risk≤30%), the probability to relapse is similar under dimethyl fumarate and natalizumab.For highrisk patients (baseline risk≥50%), natalizumab is the drug that minimizes the predicted probability of relapsing within the next two years (about 13% to 19% mean absolute difference compared to dimethyl fumarate).
To make predictions for the Swiss real-world population we estimate  as the logit-probability of relapse in untreated patients in the SMSC and () ����������� as the mean of (  ) across all individuals in the SMSC;  was also estimated using SMSC.The results for the SMSC population are presented in Appendix figure 5. Often the patients' baseline disease condition is more severe in RCTs than in observational studies, and hence as expected, the distribution of baseline risk is different between the SMSC and the RCTs.However, the relative ranking of therapies for a given baseline risk does not deviate from this of RCTs (presented in Figure 3).As in the RCTs population, the advantage of natalizumab over dimethyl fumarate for patients with low baseline risk is non-existing (Appendix figure 5).An interactive version of Figure 3 and Appendix figure 5 has been implemented in https://cinema.ispm.unibe.ch/shinies/srrms/.
Table 3 summarizes the information in Figure 3 for patients at baseline risk below 30% (low risk) or more than 50% (high risk).These cutoffs were chosen arbitrarily for illustrative purposes.For high-risk patients (8.5% of patients in RCTs) the risk difference for relapse between natalizumab and dimethyl fumarate is 19% favoring natalizumab.For low-risk patients (25% of patients in RCTs), the risk difference between natalizumab and dimethyl fumarate is 1.4%.

Discussion
We developed a three-stage network meta-analysis approach, where data from different sources and study designs can be synthesized to make predictions for heterogeneous treatment effects.We exemplified our method by predicting the probability of relapse under three active treatments and placebo in patients with RRMS, we made the code available (https://github.com/htx-r/Reproduce-results-frompapers/tree/master/ThreeStageModelRRMS), and we created an online tool to show the predictions in an interactive way ( https://cinema.ispm.unibe.ch/shinies/srrms/).
Central to our work is the risk modelling approach.We preferred the risk modelling method for predicting individualized treatment effects over other available methods; first, we consider the risk of overfitting high in effect modification approach, and, secondly, we expected high risk of false-positive results due to numerous 'one-variable-at-a-time' in univariate network meta-regression models.Risk modelling approach has been originally used in a single randomized trial, 3,6,11,18,19,20,21 and extended to meta-analysis 3,11,56 and more recently to network meta-analysis 22 of randomized trials.However, none of these approaches examined combining and making the best use of all available data sources.
Observational studies reflect better the real-world populations and conditions 39,40,41,42 and this is why we used a cohort in the first stage of our approach to develop a model that predicts the baseline risk.In addition, we combined AD and IPD to increase the power and precision of the estimated treatment effects.
The approach has several limitations.It requires that at least one IPD dataset per intervention is available.The access to IPD data entails many challenges and difficulties described in detail elsewhere. 57,58,59The risk modelling approach assumes that the selected variables adequately capture both prognosis and effect modification.This assumption is difficult to evaluate unless the outcome is well studied, and many prognostic studies exist on the topic, which is rarely the case.We developed the three stages in different models instead into a single Bayesian framework.
We used an existing prognostic model for the baseline risk (stage 1), and we developed the re-calibration of the baseline risk model (stage 2) within a frequentist setting to take advantage of the software's re-calibration options.Consequently, uncertainty was not accounted between the different stages and the results from stage three might be over-precise (i.e., we expect that the results in Table 3 will have wider intervals).In addition, the imputation method used, although it allows the use of AD studies even if study-level covariates are missing, may not be the optimal one.Other methods, like advanced multiple imputations techniques for study-level characteristics, may be used. 60Finally, in the RRMS application, we used common treatment effects model (stage 3) to enable model convergence, because of the small number of studies.This assumption can be relaxed if more studies are available.
The implementation of our approach in the RRMS example shows that several patient characteristics influence the baseline risk of relapse, which in turn modifies the effect of treatments.Natalizumab appears to be the optimal treatment (i.e., minimizes the predicted probability of relapsing) over almost the entire baseline risk range.However, its advantage over dimethyl fumarate for patients with low baseline risk is very also influence the treatment effects.Such variables can be added to the network meta-regression model, if the number of studies permits.Finally, the implementation of the three stages into a single Bayesian model will allow naturally to incorporate uncertainty from all stages in the final result and avoid spuriously overprecise conclusions.

Conclusion
The proposed approach combines all relevant evidence sources and can be applied to estimate individualized predictions of treatment effects for any health condition.Consequently, it has the potential to assist clinical practice and decision-making towards treatment recommendations and precision medicine.

What is already known
• Recently, a two-stage model which allows for individualized treatment effects predictions between several competing treatments was developed, using individualized participant data from randomized clinical trials

What is new
• We extend this model by combining several data sources, such as observational studies on the top of randomized clinical trials as well as incorporation of aggregate data on the top of individual participant data

Potential impact for RSM readers outside the authors' field
• Readers will be able to reproduce the suggested method, in any clinical area, to make individualized predictions of treatment effects between several competing treatments into a network meta-analysis framework, while combining several data sources; the methods are described in detail and the codes used for the illustrative example are publicly available

Figure 2 The distribution of predicted baseline risk of any relapse within the next two years for individuals by relapse status in the RCTs dataset (stage2). The dashed lines indicate the mean of predicted baseline risk for individuals who did experience a relapse (purple) and for those who did not (yellow). Figure 3 Predicted probability of relapsing within the next two years as a function of the baseline risk (stage 3) into the randomized clinical trials population. The x-axis shows the baseline risk of relapsing within the next two years (after re-calibration, stage 2) and the y-axis shows the predicted probability to relapse within the next two years under each one of the available treatments. Between the two red vertical dashed lines are the baseline risk values observed in the three randomized clinical trials
with individual participant data. 29 , 30 , 31The distribution of the baseline risk in these three trials is presented at the bottom of the graph.The conditional distribution for the outcome (  ) ������������ is normal with mean a linear combination of all covariates and a precision parameter  (  )

FiguresFigure 1
Figures Figure 1 Schematic presentation of each stage's aim, suggested data design ant type, and estimated parameters.

Table 1 Treatment, sample size, outcome, baseline characteristics and baseline risk (stage 2) of patients in the included datasets.
EDSS: Expanded Disability Status Scale; sd: standard deviation; RCT: Randomized clinical trial; IPD: individual participant data; AD: aggregate data; NA: not available