Bayesian meta-analysis for evaluating treatment effectiveness in biomarker subgroups using trials of mixed patient populations

During drug development, evidence can emerge to suggest a treatment is more effective in a specific patient subgroup. Whilst early trials may be conducted in biomarker-mixed populations, later trials are more likely to enrol biomarker-positive patients alone, thus leading to trials of the same treatment investigated in different populations. When conducting a meta-analysis, a conservative approach would be to combine only trials conducted in the biomarker-positive subgroup. However, this discards potentially useful information on treatment effects in the biomarker-positive subgroup concealed within observed treatment effects in biomarker-mixed populations. We extend standard random-effects meta-analysis to combine treatment effects obtained from trials with different populations to estimate pooled treatment effects in a biomarker subgroup of interest. The model assumes a systematic difference in treatment effects between biomarker-positive and biomarker-negative subgroups, which is estimated from trials which report either or both treatment effects. The estimated systematic difference and proportion of biomarker-negative patients in biomarker-mixed studies are used to interpolate treatment effects in the biomarker-positive subgroup from observed treatment effects in the biomarker-mixed population. The developed methods are applied to an illustrative example in metastatic colorectal cancer and evaluated in a simulation study. In the example, the developed method resulted in improved precision of the pooled treatment effect estimate compared to standard random-effects meta-analysis of trials investigating only biomarker-positive patients. The simulation study confirmed that when the systematic difference in treatment effects between biomarker subgroups is not very large, the developed method can improve precision of estimation of pooled treatment effects while maintaining low bias.


Introduction
Traditionally, randomised controlled trials (RCTs) have aimed to obtain a reliable estimate of the average treatment effect in a broad patient population.However, over recent years there has been an increased interest in precision medicine where patient characteristics or biomarkers are used to characterise differences in disease risk, severity and efficacy of treatments and thus tailor healthcare to patients in order to maximise patient benefit [1].Tailoring treatments to patients who are likely to benefit can also generate improvements in the cost-effectiveness of therapies [2].
During the course of drug development, evidence (for example from exploratory subgroup analyses) can emerge to suggest that a treatment is more effective in a particular subset of patients.When predictive biomarkers are used to identify subsets of a population which can benefit from treatments targeted on these biomarkers, they are investigated in clinical trials with mixed populations and trial designs [3,4].For example, in metastatic colorectal cancer (mCRC), some novel therapeutic treatments were initially developed targeting the epidermal growth factor receptor (EGFR), overexpression of which (present in 50-80% of colorectal tumours) can contribute to progression of the cancer [5].However, evidence suggested that anti-EGFR treatments were only effective in a subgroup of patients and subsequent trials suggested that mutations in the kirsten rat sarcoma (KRAS) biomarker were predictive of resistance to EGFR-targeted treatments [6,7].This resulted in early trials of anti-EGFR treatments investigating a mixture of patients with wild-type (WT) and mutant (MT) status of KRAS biomarker, while later trials only enrolled patients with KRAS WT status.Therefore, the development of EGFR-targeted therapies has resulted in trials with different designs and mixed populations.
When assessing new therapies for their clinical and cost-effectiveness; for example, within a health technology assessment (HTA) framework, data from a range of trials are often combined in a metaanalysis.However, when RCTs are conducted in different populations, it is challenging to pool treatment effects in a single meta-analysis.One approach would be to only include studies which are conducted or report treatment effects in the subgroup of interest (e.g.biomarker-positive patients).However, this results in a loss of information (and therefore statistical power) as information on biomarker-positive patients contained within studies investigating patients with mixed biomarker status will not be utilised when subgroup results are unavailable.Individual participant data (IPD) would be required to conduct subgroup analyses of mixed populations where they were not reported, but it is unlikely that IPD could be obtained for all trials of mixed populations.An alternative approach is to include all studies in a single meta-analysis regardless of biomarker status.However, this will systematically increase between-trial heterogeneity and lead to an inconclusive treatment effect for the total population [8].
In this paper we modify the standard random-effects meta-analysis model to include treatment effects obtained from biomarker-positive, biomarker-negative and biomarker-mixed populations accounting for the differences in treatment effects across the biomarker groups in order to obtain a more precise estimate of the pooled treatment effect in the biomarker-positive subgroup.
The remainder of this paper is structured as follows.An illustrative example in mCRC is described in Section 2 and the methods are described in Section 3. Section 4 discusses the results from the application of the methods to the illustrative example.Section 5 describes the methods and results of a simulation study used to evaluate the developed methods.Finally, discussion and conclusions are presented in Sections 6 and 7.

Illustrative Example
We use data from randomised controlled trials (RCTs) investigating anti-EGFR therapies for the treatment of mCRC.Over time, evidence has emerged suggesting that anti-EGFR therapies are effective in patients with WT KRAS biomarker status but ineffective in patients with MT KRAS biomarker [9].Anti-EGFR therapies have been evaluated in trials with different designs and with varying KRAS biomarker populations.
In the main analysis, where a study reported treatment effects from analysis of mixed patients in addition to reporting results from separate analyses of WT and MT patients (e.g.Bokemeyer 2009) only the treatment effect for the mixed population was included in the model (not using subgroup information).This resulted in 5 studies with mixed populations in the data and allowed us a better assessment of whether addition of results from the mixed biomarker population could improve estimation of pooled treatment effects for the WT group.The PFS and OS data used in the main analysis can be seen by the solid and dotted lines in Figure 1.In a sensitivity analysis, where a study reported treatment effects from analysis of mixed patients in addition to reporting results from separate analyses of WT and MT patients, the treatment effects for the WT and MT subgroup analyses were included in the model and the treatment effect from analysis of the mixed population was excluded (to avoid duplication of data).Inclusion of more information on the subgroups allowed us to assess the robustness of the developed methods.The PFS and OS data used in the sensitivity analysis can be seen by the solid and dashed lines in Figure 1 respectively.
To include studies from biomarker-mixed populations using the developed methods, the proportion of MT patients in the biomarker-mixed studies were required.For the studies Van Cutsem 2009, Guren 2017 and Bokemeyer 2009, KRAS status was known for 89%, 86% and 94% of the total biomarkermixed population giving proportions of KRAS MT patients of 0.37, 0.40 and 0.43 respectively.To account for uncertainty around these proportions, as not all patients have known KRAS status, we defined informative beta prior distributions for the proportion of KRAS MT patients using method of moments.However, for the studies Sobrero 2008 and Modest 2019, the proportion of MT patients was not reported.For these studies beta prior distributions were constructed using information on the prevalence of KRAS mutations suggesting that frequency of mutations was between 30% and 54% [23].This is consistent with proportions of KRAS MT patients in studies in this illustrative example which range from 0.37-0.45.Details of how the prior distributions were constructed are available in Appendix A.

Methods
To make the methods discussed in this paper generalisable to different scenarios, we shall henceforth refer to biomarker-positive, biomarker-negative and biomarker-mixed patient populations, where biomarker-positive patients are thought to respond to treatment and biomarker-negative patients are thought not to respond to treatment.Therefore, in the illustrative example in mCRC, patients with WT KRAS biomarker status are biomarker-positive and patients with MT KRAS biomarker status are biomarker-negative.In the following methods we are interested in estimating the pooled treatment effect in the biomarker-positive subgroup while utilising available information on treatment effects in the biomarker-negative and biomarker-mixed populations.
The model described in Section 3.1 below is a standard random-effects meta-analysis which we use to synthesise treatment effects from the biomarker-positive subgroup.In the illustrative example in mCRC, this model is applied to studies reporting HRs from analysis of patients with KRAS WT biomarker status only.The data utilised in the main analysis can be seen in the columns 2-3 of Table 1.In Section 3.2 the model is extended to include treatment effects from the biomarker-negative subgroup in order to estimate the systematic difference in treatment effects between the two subgroups.The model described in Section 3.2 is applied to studies reporting HRs from (1) analysis of patients with the KRAS WT biomarker only or (2) subgroup analysis of patients with KRAS WT and KRAS MT biomarkers.The data utilised in the main analysis can be seen in columns 2-5 of Table 1.Finally, in Section 3.3 the model is extended to include treatment effects from the biomarker-mixed subgroup by utilising information on the systematic difference in treatment effects between biomarker subgroups, estimated in the second part of the model described in Section 3.2 and the proportion of biomarkernegative patients included in each biomarker-mixed study, in order to estimate treatment effects in the biomarker-positive subgroup.The model described in Section 3.3 is applied to studies reporting HRs from (1) analysis of patients with the KRAS WT biomarker only, (2) subgroup analysis of patients with KRAS WT and KRAS MT biomarker or (3) analysis of mixed patients.The data utilised in the main analysis can be seen in Table 1.

Model 1: REMA for Biomarker-Positive Patients
To estimate the pooled treatment effect in biomarker-positive patients, a standard approach would be to use random-effects meta-analysis (REMA) to combine treatment effects obtained from analysis of biomarker-positive patients only.In REMA, we assume that the normally distributed observed treatment effects in the biomarker-positive subgroup, Y +i , are estimates of the true treatment effect, δ +i , in this group of patients: with corresponding within-study variances, σ 2 +i , which are assumed to be known and equal to the squared observed standard error of the treatment effect estimate in each study i = 1, ..., n + .The true effects are assumed to come from a normal distribution with a mean, d + , and between-study variance, τ 2 + : To implement this model using a Bayesian framework, the vague prior distributions d + ∼ N (0, 100 2 ) and τ + ∼ HN (0, 10 2 ) (where HN indicates a half normal distribution) were placed on the pooled mean and between-study standard deviation.
This model only uses information from studies which provide estimates of the treatment effect in biomarker-positive patients.In our illustrative example in mCRC, the model was applied to logHR data obtained from the analysis of patients with the KRAS WT biomarker status.The data for the main analysis are listed in columns 2-3 of Table 1 and the data for the sensitivity analysis are listed in columns 2-3 of Table B1 in Appendix B.
3.2 Model 2: REMA for Biomarker-Positive and Biomarker-Negative Patients Bayesian REMA, described above, can be used to estimate pooled treatment effects for biomarkerpositive patients when sufficient data exist.However, when there are a limited number of studies that report treatment effects in the biomarker-positive group such analysis can result in high uncertainty around the pooled estimate of the treatment effect.Therefore, we propose an extension to REMA to allow for inclusion of additional data obtained from the analysis of biomarker-negative patients.This can be achieved by assuming a systematic, albeit random, difference between the treatment effects in the two biomarker groups.
The model for treatment effects from the analysis of biomarker-positive patients alone remains the same as described in Section 3.1.For i = n + + 1, ..., n + + n ± where n ± is the number of studies with effectiveness estimates available from both analysis of biomarker-positive and biomarker-negative patients, the treatment effects from analysis of biomarker-negative patients, Y −i , are assumed to be normally distributed with a true underlying mean δ −i , and within-study variance, σ 2 −i .
The underlying true treatment effects in each study investigating biomarker-negative patients, δ −i , are then assumed to be equal to the underlying treatment effect estimated in biomarker-positive patients in study i, δ +i , plus a systematic difference, β i : The systematic difference between treatment effects in the biomarker-positive and biomarkernegative patients, β i , can differ across studies and is assumed to come from a normal distribution with a mean, µ β , and variance, τ 2 β : To implement this model using a Bayesian framework, the vague prior distributions τ β ∼ HN (0, 10 2 ) and µ β ∼ N (0, 100 2 ) were placed on the between-study standard deviation of the systematic difference and the mean of the systematic difference and δ +i follow a common distribution as in equation ( 2) in Section 3.1.The prior distributions for d + and τ + remain the same as specified in Section 3.1.This model uses information from studies which provide treatment effects from biomarker-positive patients only or both biomarker-positive and biomarker-negative patients.In the illustrative example in mCRC, the model was applied to data obtained from studies reporting results from analysis of KRAS WT patients only or studies reporting results from analysis of both the KRAS WT and KRAS MT biomarker subgroups.The data for the main analysis are listed in columns 2-5 of Table 1 and the data for the sensitivity analysis are listed in columns 2-5 of Table B1 in Appendix B.
It is worth acknowledging that this model can be used to include treatment effects from trials which only investigate biomarker-negative patients and details of this model are available in Appendix C.However, it is unlikely that studies will report treatment effects from analysis of biomarker-negative patients only as it is extremely unlikely that a clinical trial would be exclusively conducted in a subgroup where there is evidence that a treatment does not work.Therefore, studies which report treatment effects from biomarker-negative patients are also likely to report treatment effects from biomarker-positive patients and thus we focus on this scenario in this paper.

Model 3: REMA for Biomarker-Positive, Biomarker-Negative and Biomarker-Mixed patients
To include treatment effects from studies with a mix of biomarker-positive and biomarker-negative patients where there is no biomarker subgroup analysis reported, we extend the model further using information about the proportion of biomarker-negative patients in each study.The model for treatment effects from analysis of biomarker-positive and biomarker-negative patients remains the same as described in Section 3.2.
For i = n + + n ± + 1, ..., n + + n ± + n mix , where n mix is the number of studies reporting only the analysis of data from biomarker-mixed patients, the observed treatment effects from analysis of biomarker-mixed patients, Y mixi are assumed normally distributed with an underlying mean, δ mixi and within-study variance, σ 2 mixi .The underlying true treatment effects in each study i investigating biomarker-mixed patients, δ mixi , are assumed to be equal to the underlying treatment effect for biomarker-positive patients in study i, δ +i , plus the systematic difference in treatment effects between biomarker-positive and biomarker-negative patients, β i , multiplied by the proportion of biomarkernegative patients in the biomarker-mixed study, p i .
In equation (7) we assume that the treatment effect in the biomarker-mixed population has a linear relationship with the proportion of biomarker-negative patients in the study.It is important to acknowledge that the assumption of linearity is an approximation for outcomes such as logHRs.We consider this assumption further in the simulation study and discussion.For p i = 0 (biomarkerpositive population) the model reduces to model M1 in Section 3.1 and for p i = 1 (biomarker-negative population) the model reduces to model M2 in Section 3.2.As in the models described in Sections 3.1 & 3.2, the true treatment effects for biomarker-positive patients and systematic difference come from normal distributions with a mean and variance as specified in equations ( 2) & ( 5).To implement this model using a Bayesian framework the prior distributions described in Sections 3.1 and 3.2 were used.This model uses information from studies which provide treatment effects from analysis of biomarkerpositive, biomarker-negative or biomarker-mixed populations.In the illustrative example in mCRC, the model was applied to logHR data obtained from studies reporting results from analysis of KRAS WT, KRAS MT or KRAS mixed biomarker groups.The data for the main analysis are listed in Table 1 and the data for the sensitivity analysis are listed in Table B1 in Appendix B.
In model M2, described in Section 3.2, the biomarker-positive and biomarker-negative estimates can be assumed to be independent and thus model M2 can be applied using estimates from both subgroups.However, if a study were to provide treatment effect estimates from all groups (biomarkerpositive, biomarker-negative and biomarker-mixed) these estimates would not be independent and would be highly correlated.Therefore, where a study reports all three treatment effects, it is not desirable for all three treatment effects to be included in model M3 (described here).Therefore, a decision must be made to either include (1) the subgroup estimates from biomarker-positive and biomarker-negative subgroups or (2) the estimates from the biomarker-mixed population only.In the main analysis of the illustrative example in this paper, where a study reported all three treatment effects the estimates from the biomarker-mixed population were included in model M3.This decision was made to more clearly illustrate whether inclusion of studies where only treatment effects on the biomarker-mixed population were available could improve precision of estimation of pooled treatment effects for the biomarker-positive subgroup.However, in a sensitivity analysis, where a study reported all three treatment effects the subgroup estimates were included in model M3.A comparison of results from model M3 in the main analysis and model M1 in the sensitivity analysis will give an indication of whether the developed model can reliably extract information on treatment effects in the biomarkerpositive subgroup from observed treatment effects in a biomarker-mixed population in order to improve precision of estimation of pooled treatment effects in the biomarker-positive subgroup.

Results
In this Section we report the results from applying the methods described in Section 3 to the example dataset described in Section 2. REMA, described in Section 3.1, was applied to treatment effects from analysis of patients with KRAS WT status only.In the main analysis, eight studies reported treatment effects (logHRs) on PFS and OS for the KRAS WT biomarker subgroup in the colorectal cancer applied example (solid green lines in Figure 1).In the sensitivity analysis, eleven studies reported treatment effects on PFS and OS for the KRAS WT biomarker subgroup (solid and dashed green lines in Figure 1).
REMA extended to include treatment effects obtained from the biomarker-negative subgroup, described in Section 3.2, was applied to treatment effects obtained from KRAS WT and KRAS MT biomarker subgroups separately.In the main analysis, five studies reported treatment effects on PFS and OS for the KRAS WT biomarker subgroup only and three studies reported treatment effects on PFS and OS for the KRAS WT and KRAS MT biomarker subgroups separately (solid green and red lines in Figure 1).In the sensitivity analysis, five studies reported treatment effects on PFS and OS for the KRAS WT biomarker subgroup only and six studies reported treatment effects on PFS and OS for KRAS WT and KRAS MT biomarker subgroups separately (solid and dashed green and red lines in Figure 1).
REMA extended to include treatment effects obtained from biomarker-negative and biomarkermixed populations, described in Section 3.3, was applied to treatment effects obtained from KRAS WT, KRAS MT and KRAS mixed populations.In the main analysis, in addition to the studies reporting treatment effects in KRAS WT and KRAS MT subgroups, five studies reported treatment effects in a mixed KRAS population (solid and dotted blue lines in Figure 1).In the sensitivity analysis, two studies reported treatment effects in a mixed KRAS population (solid blue lines in Figure 1).
Tables 2 and 3 show results from the main analysis and sensitivity analysis respectively.We consider the PFS outcome first.When only treatment effects from analysis of patients in the WT biomarker subgroup are used, the pooled logHR is estimated to be -0.24(95% CrI: -0.37, -0.11).This provides strong evidence for a meaningful treatment effect of anti-EGFR therapies in KRAS WT patients with mCRC as the point estimate is below zero and the 95% CrI does not contain zero.The addition of treatment effects from analysis of patients with MT KRAS biomarker status to the meta-analysis slightly increases the CrI of the estimate of the pooled treatment effect for the biomarker-positive subgroup while the point estimate remains the same.However, the further addition of treatment effects from analysis of patients with mixed KRAS biomarker status estimates a pooled logHR for WT patients of -0.24 (95% CrI: -0.35, -0.14).This indicates that addition of treatment effects from analysis of patients with mixed biomarker status, relative to using treatment effects from analysis of patients with WT biomarker status alone, results in a 19% improvement in precision of estimation of pooled treatment effects for the WT biomarker subgroup.This improvement in precision is obtained without altering the point estimate of the pooled treatment effect.These results are illustrated in Figure 2.
We now consider the OS outcome.When only treatment effects from analysis of WT patients are used, the pooled logHR for WT patients is estimated to be -0.11(95% CrI: -0.29, 0.057).While the point estimate is below zero, the 95% CrI contains zero indicating that for the OS outcome there is not strong evidence of a treatment effect for anti-EGFR therapies in the KRAS WT biomarker subgroup.As for the PFS outcome, the addition of treatment effects from analysis of MT patients does not change the results.However, the further addition of treatment effects from analysis of patients with mixed KRAS biomarker status estimates a pooled logHR for WT patients of -0.11 (95% CrI: -0.21,     -0.017).This provides stronger evidence for a treatment effect on OS of anti-EGFR therapies in the KRAS WT biomarker subgroup as the 95% CrI is completely below zero.Addition of treatment effects from analysis of patients with mixed biomarker status, relative to using treatment effects from analysis of patients with WT biomarker status alone, results in a 44% improvement in precision of estimation of pooled treatment effects for the WT biomarker subgroup.As for the PFS outcome, this improvement in precision is obtained without altering the point estimate of the pooled treatment effect.These results are illustrated in Figure 2.
Pooled treatment effects on PFS and OS estimated by the sensitivity analysis utilising treatment effects from the biomarker-positive subgroup only were very similar to those estimated in the main analysis.Estimation of pooled treatment effects in the sensitivity analysis were generally more precise as a result of utilising data from subgroup analysis rather than mixed analysis.However, addition of treatment effects from analysis of biomarker-negative and biomarker-mixed groups still resulted in around 14% and 20% improvements in precision of estimation of the pooled treatment effect for the biomarker-positive subgroup on PFS and OS outcomes respectively, relative to using data on biomarker-positive patients alone.Potentially more importantly, the results from models M1 and M2 in the sensitivity analysis, where more subgroup data were available, are similar to results from model M3 in the main analysis, indicating robustness of the developed method.Results from the sensitivity analysis can be seen in Table 3 and Figure 3.It is not surprising that improvements in precision are limited in the sensitivity analysis as only two additional studies reporting treatment effects in the biomarker-mixed population are available.

Simulation Study
Results from the illustrative example suggest that the developed methods have the potential to improve precision of estimation of pooled treatment effects in the subgroup of interest.However, to formally assess whether the developed methods perform well and can improve upon standard methods, a simulation study is required.Therefore in this Section we report the methods and results of a simulation study designed to evaluate the performance of the developed methods.

Methods
In this Section we report aims, data-generation methods, estimands, methods and performance measures for the simulation study, as recommended by Morris et al [24].

Aims
The simulation study aimed to compare the performance of the methods described in Section 3, under a number of scenarios varying; (1) the proportion of studies reporting treatment effects on biomarkerpositive and biomarker-negative subgroups, (2) the proportion of studies reporting treatment effects on the biomarker-mixed population, (3) the true mean systematic difference in treatment effects between biomarker subgroups, (4) the true variance of the systematic difference in treatment effects between biomarker subgroups and (5) the total number of studies in the meta-analysis.The data scenarios are discussed in more detail in Section 5.1.6.

Data-Generation Methods
We set the total number of studies in each simulated meta-analysis to be n studies .For i = 1, ..., n studies the true treatment effects in the biomarker-positive subgroup, δ +i , were drawn from a normal distribution with a mean, d + , and variance, τ 2 + as in equation ( 2).For i = 1, ..., n studies the true systematic difference in treatment effects between biomarker-positive and biomarker-negative subgroups, β i , was drawn from a normal distribution with a mean, µ β , and variance τ 2 β as in equation (5).The true treatment effects in the biomarker-negative subgroup, δ −i , were then assumed to be equal to the sum of the true treatment effects in the biomarker-positive subgroup and the systematic difference as in equation (4).
Having generated true treatment effects for each study i = 1, .., n studies , in the meta-analysis, IPD were generated separately for biomarker-positive and biomarker-negative subgroups using the simsurv command in R [25].To do this we set the number of participants in each individual trial n participants = 350, the probability of receiving treatment, p trt = 0.5 and the probability of having negative biomarker status, p − ∼ Beta(9.2,13.8).The probability of being biomarker-negative was based on the prevalence of KRAS mutations in mCRC [23].
We assumed that survival data for the biomarker-positive and biomarker-negative subgroups in each study i were drawn from two separate exponential distributions each with a single rate parameter λ + and λ − defining the baseline hazard.We assumed that λ = λ + = λ − , that is, that the baseline hazard (that is the hazard in the control arm) was the same for biomarker-positive and biomarkernegative subgroups.We set λ = 0.15 as this was the baseline hazard estimated from analysis of the illustrative example in mCRC.
Survival data for the biomarker-positive subgroup were drawn from an exponential distribution with a rate parameter of λ and an assumed treatment effect of δ +i .Survival data for the biomarkernegative subgroup were drawn from an exponential distribution with a rate parameter λ − and an assumed treatment effect of δ −j .This resulted in the hazard rates for the treatment and control arms in the biomarker-positive and biomarker-negative subgroups as shown in Table 4.
Table 4: Hazard rates for control and treatment arms in biomarker-positive and biomarkernegative subgroups for study i in simulation study.

Control Treatment
Biomarker-positive Exp(λ) Exp(λ + δ +i ) Biomarker-negative Exp(λ) Exp(λ + δ −i ) Individual participant data for the biomarker-mixed population in study i were simply created by combining the survival times from the biomarker-positive and biomarker-negative subgroups into a single dataset.
For each study i in each simulated meta-analysis, the biomarker-positive and biomarker-negative IPD were analysed by subgroup using the Cox proportional hazards model with a single covariate for treatment arm which can be expressed using the following hazard function: h j (t) = h 0 (t)×exp(b 1 trt j ), where trt j is a binary variable for treatment group (trt j = 0 indicating control and trt j = 1 indicating treatment).
The biomarker-mixed IPD were analysed in two ways; the first was using the Cox proportional hazards model with a single covariate for treatment arm (as in the equation above) and the second was using a Cox proportional hazards model with a covariate for treatment and a covariate for biomarker status which can be expressed using the following hazard function: h j (t) = h 0 (t) × exp(b 1 trt j + b 2 biomarker j ), where biomarker j is a binary variable for biomarker status (biomarker j = 0 indicating biomarker-positive status and biomarker j = 1 indicating biomarker-negative status).
This resulted in four treatment effects being estimated for each study i in each simulated metaanalysis.These were; treatment effect in the biomarker-positive subgroup, treatment effect in the biomarker-negative subgroup, treatment effect in biomarker-mixed population and treatment effect adjusted for biomarker status in the biomarker-mixed population.The treatment effects and associated standard errors estimated by the Cox model were saved as the observed treatment effects, Y +i , , Y −i , Y mixi and Y Adjmixi and within-study standard deviations, σ +i , σ −i , σ mixi and σ Adjmixi .
To replicate the illustrative example where treatment effects are not available from biomarkerpositive, biomarker-negative and biomarker-mixed groups in every study, observed treatment effects are assumed to be missing for some of the n studies in the meta-analysis.We assume there are n studies+ reporting treatment effects in the biomarker-positive subgroup alone, n studies± reporting treatment effects in biomarker-positive and biomarker-negative subgroups and n studiesmix reporting treatment effects in the biomarker-mixed population.To ensure no double counting of data, we assume that studies which report treatment effects in the biomarker-mixed population do not report treatment effects in the biomarker-positive or biomarker-negative subgroups.

Estimand
The estimand of interest is the pooled treatment effect in the biomarker-positive subgroup.

Methods
Model M1 was applied to treatment effects from analysis of biomarker-positive patients only.Model M2 was applied to treatment effects from subgroup analysis of biomarker-positive and biomarker-negative patients.Model M3 was applied to treatment effects from analysis of biomarker-positive, biomarkernegative and biomarker-mixed populations.However, in the simulation, estimated treatment effects for the biomarker-mixed population could be obtained from a Cox proportional hazards model without adjustment for biomarker status or with adjustment for biomarker status.Therefore, in the following, application of model M3 to treatment effects from the biomarker-mixed population unadjusted for biomarker status will be referred to as model M3-unadj and application of model M3 to treatment effects from the biomarker-mixed population adjusted for biomarker status will be referred to as M3adj.It is important to note that the only difference between M3-unadj and M3-adj are the estimated treatment effects for the biomarker-mixed population which are used as inputs for model M3.

Performance Measures
We evaluated performance by calculating the percentage bias, coverage and mean width of the credible interval for the pooled treatment effect in the biomarker-positive subgroup.The methods were implemented via MCMC sampling in the WinBUGS software, using a burn-in of 50,000 iterations and 100,000 iterations for posterior estimation [26].

Data Scenarios
The data were simulated under a number of scenarios adapted from scenario 1 (S1), where the number of studies in the meta-analysis was n studies = 15, the number of studies reporting observed treatment effects in the biomarker-positive subgroup only was n studies+ = 5, the number of studies reporting observed treatment effects in the biomarker-positive and biomarker-negative subgroup was n studies± = 5, the number of studies reporting treatment effects in the biomarker-mixed group was n studiesmix = 5, the true pooled treatment effect in the biomarker-positive subgroup was d + = −0.25, the true betweenstudy variance in the biomarker-positive subgroup was τ 2 + = 0.0056, the true mean of the systematic difference was µ β = 0.25 and the true variance of the systematic difference was τ 2 β = 0.01.We arrange the scenarios into five groups, where in each group the scenarios focus on varying a common set of parameter values.In group one, [S1-S5], the number of trials reporting observed treatment effects from the biomarker-positive subgroup only decreases (from n studies+ = 5 to n studies+ = 1) as the number of trials reporting observed treatment effects from biomarker-positive and biomarkernegative subgroups increases (from n studies± = 5 to n studies± = 9).This was intended to clearly demonstrate the performance of the methods as the number of studies used to estimate the systematic difference increases (i.e.S4 and S5 where 80% and 90% of studies reporting biomarker-positive treatment effects also report biomarker-negative treatment effects relative to S1 where only 50% of studies reporting biomarker-positive treatment effects also report biomarker-negative treatment effects).
In group two [S1, S6-S9], the number of treatment effects from biomarker-positive analysis only gradually decreases (from n studies+ = 5 to n studies+ = 1) and the number of treatment effects from biomarker-mixed groups gradually increases (from n studiesmix = 5 to n studiesmix = 9).This was intended to clearly demonstrate the performance of the methods as the number of studies with biomarker-mixed analysis, relative to the number of studies with biomarker-positive analysis increases.
In group three, [S1, S10-S13] the mean of the systematic difference increases (from µ β = 0.25 to µ β = 1.25).This was intended to clearly demonstrate the performance of the methods as the difference between the true treatment effects observed in the biomarker-positive and biomarker-negative subgroups increases.
In group four, [S1, S14-S17] the variance of the systematic difference gradually increases (from τ 2 β = 0.01 to τ 2 β = 0.3).This was intended to clearly demonstrate the performance of the methods as the variability of the systematic difference in the treatment effects between biomarker-positive and biomarker-negative subgroups increases.
In group five, [S1, S18-S21] the total number of studies in the meta-analysis increases (from n studies = 9 to n studies = 90).This was intended to clearly demonstrate the performance of the methods as the availability of data increases.
For all scenarios the true pooled treatment effect and between-study variance for the biomarkerpositive subgroup were set as d + = 0.25 and τ 2 + = 0.0056.A full description of the parameter values specified in each scenario is provided in Table 5.

Results
In this Section, we present the simulation study results for each method in terms of percentage bias, coverage and mean width of the credible interval.Here the methods described in Sections 3.1 and 3.2 are referred to as methods M1 and M2 respectively.The method described in Section 3.3 utilising treatment effects from biomarker-mixed populations without adjustment for biomarker group is referred to as model M3-unadj and the same method utilising treatment effects from biomarkermixed populations adjusted for treatment group and biomarker status is referred to as model M3-adj.We illustrate the results for each scenario group with a line plot of the performance measures for the estimand.

Scenarios S1-S5
Across scenarios [S1-S5], the proportion of studies reporting observed treatment effects in both biomarkerpositive and biomarker-negative subgroups increases whilst the proportion of studies reporting observed treatment effects in the biomarker-positive subgroup alone decreases.The results for these scenarios are presented in Figure 4.There is reasonably small percentage bias (below 1.5%) and over-coverage (i.e.coverage above the nominal value 0.95) for all models in all scenarios.However, for models M3-unadj and M3-adj, which include treatment effects from biomarkerpositive, biomarker-negative and biomarker-mixed populations, the mean CrI is much lower than for models M1 and M2 which only include treatment effects from biomarker-positive and biomarkerpositive and biomarker-negative subgroups respectively.This indicates that inclusion of treatment effects from biomarker-mixed populations informed by the systematic difference in treatment effects between biomarker-positive and biomarker-negative subgroups, improves the precision of estimation of pooled treatment effects for the biomarker-positive group of interest.For all performance measures and for all models it does not appear that changing the proportion of studies reporting treatment effects from the biomarker-negative subgroup, impacts performance.

Scenarios S1 & S6-S9
Across scenarios [S1, S6-S9], the proportion of studies reporting observed treatment effects from the biomarker-positive subgroup decreases (from n studies+ = 5 to n studies+ = 1) and the proportion of studies reporting observed treatment effects from the biomarker-mixed population increases (from n studiesmix = 5 to n studiesmix = 9).
The results for these scenarios are presented in Figure 4.As for scenarios [S1-S5], for all models the percentage bias is relatively small, there is over-coverage and models M3-unadj and M3-adj achieve lower uncertainty than models M1 and M2 in all scenarios.However, for scenarios [S1, S6-S9], the mean width of the 95% CrI increases as the number of studies reporting treatment effects from the biomarker-positive subgroup decreases and the number of studies reporting treatment effects from the biomarker-mixed population increases.This is to be expected as when there are fewer treatment effects available from the biomarker-positive subgroup there is greater uncertainty around the estimate of the pooled treatment effect for this subgroup.However, the ability to utilise treatment effect estimates on the biomarker-mixed subgroup when using models M3-unadj and M3-adj results in a shallower increase in uncertainty around estimates of the pooled treatment effect compared to using treatment effects from the biomarker-positive subgroup alone.
The results for these scenarios are presented in Figure 5. Percentage bias is similar and close to zero for models M1, M2 and M3-adj.As in the previous groups, models M3-unadj and M3-adj result in more precise estimates than models M1 and M2.However, for M3-unadj the percentage bias increases as the true mean of the systematic difference in treatment effects between biomarker-positive and biomarker-negative subgroups increases.This is likely due to non-collapsibility of the hazard ratio.However, including treatment effects from the biomarker-mixed population adjusted for biomarker status in model M3-adj results in only a minimal increase in the percentage bias as the systematic difference increases.Thus, using treatment effects from biomarker-mixed populations adjusted for biomarker status moderates the effect of the increase in systematic difference.
The results for these scenarios are presented in Figure 5. Percentage bias is small and similar for models M1, M2 and M3-adj while the percentage bias is larger for M3-unadj and increases as the variance of the systematic difference increases.Once again, there is over-coverage for all models.Models M3-unadj and M3-adj including treatment effects from biomarker-mixed populations achieve lower CrIs than models M1 and M2 which do not utilise treatment effects from the biomarker-mixed population.However, as the variance of the systematic difference increases the improvement in precision when using models M3-unadj and M3-adj compared to models M1 and M2 decreases.
The results for these scenarios are presented in Figure D1 in Appendix D. For all four models percentage bias and coverage are generally similar and decreasing as the number of studies in the meta-analysis increases.As for all other simulation scenarios, the mean CrI is always lower when using models M3-unadj and M3-adj compared to models M1 and M2.However, as the number of studies in the meta-analysis increases, there is less potential to improve the precision of pooled treatment effect estimates by utilising additional data from biomarker-negative and biomarker-mixed populations.

Discussion
In this paper, we describe methods for synthesis of data on treatment effects from studies with varying reporting of biomarker subgroup analyses in order to estimate pooled treatment effects for a biomarker  subgroup of interest.We applied these methods to an illustrative example in mCRC, where a proportion of studies did not report subgroup analysis.This resulted in a 19% and 44% improvement in precision of estimates of the pooled treatment effects on PFS and OS respectively compared to the standard method applied to data available from subgroup analyses only.These results were consistent with results from a sensitivity analysis utilising more subgroup information.This indicated that the addition of treatment effects from KRAS MT and KRAS mixed populations improved precision of estimates of pooled treatment effects in the KRAS WT population without introducing bias.
To formally assess whether the model can improve precision without introducing bias, we carried out a simulation study.Based on the simulation study, we conclude that model M3-unadj provides a consistent reduction in uncertainty compared to models M1 and M2.However, this reduction in uncertainty is accompanied by a consistent increase in percentage bias for model M3-unadj compared to models M1 and M2.Furthermore for model M3-unadj, unlike models M1 and M2, the percentage bias obtained increases as (a) the true systematic difference between treatment effects in the two biomarker subgroups increases and (b) the variance of the systematic difference increases.It is not surprising that the bias increases as the true systematic difference in treatment effects between biomarker-positive and biomarker-negative subgroups increases.This is because model M3-unadj assumes that the treatment effect in the biomarker-mixed population is linear with the proportion of biomarker-negative patients in the study and this assumption of linearity is known not to hold when the effect size is non-collapsible.However, in data scenarios where the true systematic difference is low, the linearity assumption made in model M3-unadj has the potential to improve precision without inducing any significant bias on the estimated pooled treatment effect in the biomarker-positive subgroup.
Moreover, the model M3-adj (where treatment effects from biomarker-mixed populations are adjusted for biomarker status) improves on model M3-unadj by consistently achieving lower uncertainty and percentage bias than model M3-unadj.Furthermore, model M3-adj is reasonably robust to (a) increases in the systematic difference and (b) increases in the variance of the systematic difference.While percentage bias does increase for model M3-adj across scenarios [S1, S10-S13], percentage bias only increases significantly above that obtained by models M1 and M2 in scenarios S12 and S13 where the true systematic difference is very large and unlikely to be seen in practice.For example, in scenario S12 the HR in the biomarker-positive subgroup is 0.77 and the HR in the biomarker-negative subgroup is 2.12.This suggests that in cases where the true systematic difference is large, but still plausible, model M3-adj has the potential to improve precision without inducing any significant bias on the estimated pooled treatment effect in the biomarker-positive subgroup.Therefore, in data scenarios where the systematic difference between treatment effects in the subgroups is likely to be large, it is preferable to apply model M3 to treatment effects from biomarker-mixed populations which are adjusted for treatment group and biomarker status.However, we acknowledge that in practice, studies conducted in biomarker-mixed populations are unlikely to report treatment effects adjusted for biomarker status.
There are some limitations of the analyses and methods described in this paper.First, in the meta-analysis model it is assumed that the logHR treatment effects are normally distributed, which is a strong assumption.Second, the within-study variances which are used as inputs to the meta-analysis models are obtained from standard errors of analysis of the IPD using the Cox proportional hazards model.Therefore standard error estimates will not be accounting for the random-effects structure of the data.Finally, it should be acknowledged that the assumption of linearity of treatment effects in the biomarker-mixed population with the proportion of biomarker-negative patients in a study is a strong assumption which will not hold exactly when the effect size is non-collapsible (for example when treatment effects are logHRs or log odds ratios) [27].The first two assumptions of normality of the treatment effects and standard errors used as within-study variances are assumptions which are commonly made in meta-analysis and further discussion of such assumptions is beyond the scope of this paper.The final assumption of linearity of the treatment effects however, is an assumption introduced to allow interpolation of treatment effects from biomarker-mixed populations.While this assumption is shown to be too strong in scenarios where the true systematic difference between biomarker-positive and biomarker-negative subgroups is large, in scenarios where the systematic difference is not very large, this assumption appears to be reasonable.

Conclusions
We have described methods for incorporating treatment effects from studies with varying reporting of biomarker subgroup analyses in a single meta-analysis to estimate pooled treatment effects for a biomarker subgroup of interest.These methods have been applied to an illustrative example in mCRC and evaluated using a simulation study.We conclude that our method described to incorporate treatment effects from biomarker-positive, biomarker-negative and biomarker-mixed populations provides a consistent reduction in uncertainty around the estimated pooled treatment effect for biomarker-positive patients compared to using biomarker-positive data alone or biomarker-positive and biomarker-negative data.When the method is applied to treatment effects from biomarker-mixed populations which are not adjusted for biomarker status, the model is not robust to increasing values of the mean and variance of the systematic difference resulting in increases in bias.However, this effect is mitigated when applying the method to treatment effects from biomarker-mixed populations which are adjusted for biomarker status.We hope these methods and simulation study are informative for researchers seeking to conduct a pairwise meta-analysis of trials conducted in varying populations.

Figure 1 :
Figure 1: Observed hazard ratios and 95% confidence intervals on progression-free survival and overall survival.Solid lines are used in both main and sensitivity analysis, dotted lines are used in main analysis only and dashed lines are used in sensitivity analysis only.

Figure 4 :
Figure 4: Percentage bias, coverage and mean credible interval width for pooled treatment effects for the biomarker-positive subgroup, across scenarios; (a) S1-S5 where the proportion of studies reporting treatment effects in the biomarker-negative subgroup in addition to treatment effects in the biomarker-positive subgroup increases and (b) S1 & S6-S9 where the proportion of studies reporting observed treatment effects from the biomarker-positive subgroup decreases and the proportion of studies reporting observed treatment effects from the biomarker-mixed population increases.

Figure 5 :
Figure 5: Percentage bias, coverage and mean credible interval width for pooled treatment effects for the biomarker-positive subgroup, across scenarios; (a) S1, & S10-S13 where the true mean systematic difference between treatment effects in the biomarker-positive and biomarkernegative group gradually increases and (b) S1 & S14-S17 where the true variance of the systematic difference between treatment effects in the biomarker-positive and biomarker-negative group gradually increases.

Table 1 :
LogHRs (Y ) and corresponding standard errors (σ) on overall survival in KRAS WT (+), KRAS MT (−) and KRAS WT and MT (mix) populations from illustrative example in metastatic colorectal cancer used in main analysis.NA indicates no such treatment effect estimate is used from this study.

Table 2 :
Main analysis results from application of models M1 (WT), M2 (WT & MT) and M3 (WT, MT & Mixed) to illustrative example in mCRC for the outcomes PFS and OS.

Table 3 :
Sensitivity analysis results from application of models M1 (WT), M2 (WT & MT) and M3 (WT, MT & Mixed) to illustrative example in mCRC for the outcomes PFS and OS.

Table 5 :
Model parameters specified to simulate datasets under 21 scenarios Scenario Group n studies n studies+ n studies± n studiesmix µ β τ 2

Table C1 :
Dummy example data to illustrate the structure of data required for the model including studies reporting treatment effects from biomarker-positive patients only, biomarkernegative patients only and biomarker-positive and biomarker-negative patients.