Comparative efficacy and safety of bortezomib, thalidomide, and dexamethasone (VTd) without and with daratumumab (D‐VTd) in CASSIOPEIA versus VTd in PETHEMA/GEM in transplant‐eligible patients with newly diagnosed multiple myeloma, using propensity score matching

Abstract Background Traditional bortezomib, thalidomide, and dexamethasone (VTd) regimens for patients with newly diagnosed multiple myeloma (NDMM) include doses of thalidomide up to 200 mg/day (VTd‐label). Clinical practice has evolved to use a lower dose (100 mg/day) to reduce toxicity (VTd‐mod), which was evaluated in the phase III CASSIOPEIA study, without or with daratumumab (D‐VTd; an anti‐CD38 monoclonal antibody). We used propensity score matching to compare efficacy and safety for VTd‐mod and D‐VTd with VTd‐label. Methods Patient‐level data for VTd‐mod and D‐VTd from CASSIOPEIA (NCT02541383) and data for VTd‐label from the PETHEMA/GEM study (NCT00461747) were analyzed. Propensity scores were estimated using logistic regression, and nearest‐neighbor matching procedure was used. Outcomes included overall survival (OS), progression‐free survival (PFS), time to progression (TTP), postinduction and posttransplant responses, as well as rate of treatment discontinuation and grade 3/4 peripheral neuropathy. Results VTd‐mod was noninferior to VTd‐label for OS, PFS, TTP, postinduction very good partial response or better (≥VGPR) and overall response rate (ORR). VTd‐mod was significantly better for posttransplant ≥VGPR and ORR versus VTd‐label. VTd‐mod safety was not superior to VTd‐label despite the lower thalidomide dose. D‐VTd was significantly better than VTd‐label for OS, PFS, TTP, postinduction and posttransplant ≥VGPR and ORR, and was noninferior to VTd‐label for safety outcomes. Conclusions In transplant‐eligible patients with NDMM, D‐VTd had superior efficacy compared with VTd‐label. Despite a lower dose of thalidomide, VTd‐mod was noninferior to VTd‐label for safety and was significantly better for posttransplant ≥VGPR/ORR. These data further support the first‐line use of daratumumab plus VTd.

As higher doses of thalidomide have been associated with peripheral neuropathy [3], clinical practice has evolved to use a modified version of VTd (VTd-mod), which features a lower dose of thalidomide (100 mg daily) to potentially reduce toxicity. This dosing regimen recently gained approval in the United States, Europe, and Brazil in combination with daratumumab [4][5][6], a human monoclonal antibody targeting CD38 that has an immunomodulatory mechanism of action. Approval was based on the results of the phase III CASSIOPEIA trial Part 1 in transplant-eligible patients with NDMM [7]. The dosing regimen in CASSIOPEIA Part 1 comprised four 28-day cycles of pre-ASCT induction therapy and two 28-day cycles of post-ASCT consolidation therapy with bortezomib, thalidomide (100 mg daily), and dexamethasone, without or with daratumumab (D-VTd). Treatment with D-VTd improved the depth of response and progression-free survival (PFS) in patients with NDMM [7]. Part 2 of this study, which is investigating daratumumab monotherapy maintenance (16 mg/kg every 8 weeks until progression, or for a maximum of 2 years) versus observation in patients who achieved a partial response or better, is ongoing.
To date, there have been no randomized clinical trials (RCTs) that directly compare the efficacy and safety of VTd-mod or D-VTd versus VTd-label. It is difficult to draw meaningful conclusions from indirect comparisons between published aggregated clinical trial data because unadjusted comparisons of outcomes are prone to confounding bias, due to variation in patient characteristics between treatment populations. However, statistical methods that control for differences in baseline covariates, such as propensity score matching (PSM), can be utilized to estimate differences between treatment regimens in the absence of a head-to-head comparison [8,9]. The objective of the current PSM analysis was to compare the efficacy and safety of the VTdmod and D-VTd regimens versus VTd-label in patients with NDMM who are transplant eligible.
Data for the VTd-label group were taken from the PETHEMA/GEM study, in which patients were randomized to one of three regimens: the alternating chemotherapy regimens vincristine/carmustine/melphalan/cyclophosphamide/prednisone and vincristine/carmustine/doxorubicin/dexamethasone, followed by bortezomib; vs thalidomide/dexamethasone; vs VTd [10,11]. Patients

Propensity score matching
PSM was used to correct for differences in baseline characteristics in the CASSIOPEIA and PETHEMA/GEM trials. The National Institute for Health and Care Excellence [NICE] decision tree was used to determine which propensity score methodology best suited the data for this analysis [12]. As there was some imbalance in baseline characteristics before matching, and good balance was possible to achieve after matching, analysis on matched samples was deemed more appropriate in both the primary and sensitivity analyses.
In an exploratory analysis, several types of matching methods were applied to pick the best performing method. For each method, the distribution of propensity scores before and after matching and the postmatch balance between treatment groups (VTd-mod vs VTd-label or D-VTd vs VTd-label) was assessed. To determine how adequately PSM balanced the covariates, pre-and postmatch balance between treatment groups (VTd-mod vs VTd-label or D-VTd vs VTd-label) was assessed using standardized mean differences for the included covari-ates (described below), with values >0.1 suggesting potentially important imbalances [13]. Additionally, chi-square tests were performed to assess the statistical significance of differences in covariates between treatment groups before and after matching. This assessment determined that the best performing PSM method was nearest-neighbor matching (without replacement). A 2:1 ratio was used (number of VTdmod or D-VTd patients matched to each VTd-label patient). Propensity scores were estimated using logistic regression, and matching was carried out using the Matchit R package [14].
Propensity score distribution in both treatment groups was assessed before and after matching to assess the degree of overlap. Additionally, propensity score distributions in matched and unmatched patients were assessed to determine whether the individuals not matched were in some specific part of the propensity score continuum. After excluding unmatched samples, outcomes observed in the matched sample were compared directly using a suitable measure of treatment effect for different endpoints.

Covariates
The following covariates were identified for matching (based on

Analysis variables and statistical methodology
The efficacy endpoints included in the PSM analysis were overall sur- other CR criteria were considered to have CR [7]. Safety endpoints included in the analysis (for the induction phase only) were treatment discontinuation due to any grade of adverse events (AEs), treatment discontinuation due to grade 3 or 4 AEs, and grade 3 or 4 peripheral neuropathy.
For time-to-event outcomes (OS and PFS), hazard ratios (HRs) with two-sided 95% confidence intervals (CIs) were estimated using stratified Cox regression models, fitted with treatment arm; P values for HRs and Kaplan-Meier curves were based on the Wald test and log-rank test, respectively. Comparison of HRs between treatment groups was reported with point estimates and 95% CIs. Response rates and AEs were analyzed based on an odds ratio calculated using a two-sided 95% CI by fitting a logistic regression model. Results that did not achieve statistical significance (5%) were interpreted with the use of noninferiority margins [15,16]. A targeted literature review identified noninferiority margins for response, safety, PFS, and OS as follows: 13% (rate difference), 13% (rate difference), 1.333 (HR), and 1.298 (HR), respectively [15]. Results that did not achieve significance and did not qualify per the noninferiority criteria were treated as inconclusive.

Patients, treatments, and baseline characteristics
The median duration of follow-up was 35.9 months for VTd-label and Therefore, a comparison of outcomes on the matched sample was warranted.

Efficacy outcomes
Naïve, unadjusted comparisons between groups significantly favored VTd-mod over VTd-label for OS, PFS, and TTP, as well as D-VTd versus VTd-label ( Figures 1 and 2 and Table 3). For response endpoints, naïve unadjusted comparisons found VTd-mod to be inferior (≥CR) or noninferior (≥VGPR, ORR) to VTd-label postinduction, whereas posttransplant responses for VTd-mod were either inferior (≥CR) or superior (≥VGPR, ORR) to VTd-label (Table 4). Similar results were observed for naïve unadjusted comparisons of D-VTd with VTd-label, with the exception of postinduction ≥VGPR and ORR, which were superior with D-VTd versus VTd-label (Table 4).

D-VTd versus VTd-label
After matching, D-VTd was significantly better than VTd-label for OS  Table 3). The sensitivity analysis supported the primary analysis for all time-to-event endpoints (Table 3 and Figure 4).   [7] and VTd-label (PETHEMA/GEM) [10,11]

Safety
Baseline characteristics for the safety analyses before and after matching VTd-mod to VTd-label, and D-VTd to VTd-label, are summarized in Additional files 3 and 4 (in the Supporting Information). As with the efficacy analyses, matching balanced the groups in terms of baseline variables.

D-VTd versus VTd-label
For all evaluated safety endpoints, D-VTd was noninferior to VTd-label (Table 5). The rate of discontinuation due to AEs of all grades was 5.6% for D-VTd versus 6.4% for VTd-label (P = .752), whereas the rate of discontinuation due to grade 3 or 4 AEs was 4.0% versus 3.2%, respectively (P = .695), and the incidence of grade 3 or 4 peripheral neuropathy was 2.8% versus 5.6% (P = .186). assignments. This method reduces the impact of confounding, thereby strengthening the validity and confidence of findings [19,20].

DISCUSSION
In the current PSM analysis, the VTd-mod regimen was found to be  for safety endpoints, which also agrees with the CASSIOPEIA trial results, demonstrating that the addition of daratumumab to VTd does not increase overall toxicity or affect the ability of patients to undergo successful transplantation [7].
Propensity score-based methods do have some limitations, which should also be considered when interpreting these findings. First, the PSM analysis could not be adjusted for unreported or unobserved confounding factors that may influence patient outcomes (residual confounding). If any important variables were omitted, then the groups may remain unbalanced and study results can be seriously biased [21].
However, both CASSIOPEIA and PETHEMA/GEM are RCTs, with data from most of the clinically relevant baseline variables collected and included in this analysis; therefore, the risk of unobserved confounding may be minimized. Second, PSM requires large samples because matching reduces the sample size, negatively affecting the precision of the estimates. PSM also cannot correct for selection bias, and regional differences in local standard-of-care regimens may have further contributed to differences in the CASSIOPEIA and PETHEMA/GEM studies. Lastly, a substantial number of patients without cytogenetic risk data in the PETHEMA/GEM study were excluded from the primary analysis, thereby reducing the power of the sensitivity analysis.
In addition to limitations of the PSM methodology, longer median follow-up and differences in maintenance treatments between the CASSIOPEIA and PETHEMA/GEM studies may have biased the results in favor of the VTd-label arm, particularly for long-term survival.
Median follow-up in PETHEMA/GEM for VTd-label was 35.2 months [10,11] compared with 18.8 months in CASSIOPEIA [7]. Thus, more patients in the VTd-label arm of the analysis were exposed to maintenance therapies compared with VTd-mod/D-VTd. Fewer patients in the CASSIOPEIA study had the opportunity to receive maintenance treatment because, per protocol for Part 2 of the study, patients with partial response or better were rerandomized 100 days post-ASCT in a 1:1 ratio to either observation only or daratumumab monotherapy every 8 weeks for a maximum of 2 years; thus, 50% of patients did not receive maintenance treatment. In PETHEMA/GEM, patients were rerandomized 3 months post-ASCT in a 1:1:1 ratio to one of three maintenance therapies. Comparisons may also be biased in favor of VTd-label due to differences in how response was assessed between the studies.

CONCLUSIONS
This PSM analysis demonstrated the noninferiority of VTd-mod versus VTd-label for OS, PFS, TTP, and postinduction ≥VGPR and ORR, and posttransplant superiority of VTd-mod for ≥VGPR and ORR. Safety outcomes for VTd-mod were also noninferior to VTd-label outcomes despite a lower dose of thalidomide in the VTd-mod regimen. In addition, D-VTd, using a modified dose of thalidomide 100 mg, was significantly better than VTd-label for efficacy outcomes (OS, PFS, TTP, and postinduction and posttransplant ≥VGPR and ORR) and was noninferior to VTd-label for safety outcomes. Taken together, these findings confirm those of Part 1 of the CASSIOPEIA study wherein D-VTd had superior efficacy to VTd, with both regimens using a modified thalidomide dose, and support the use of daratumumab in the first-line treatment for NDMM in patients who are transplant-eligible.