A new risk score for patients after first recurrence of stage 4 neuroblastoma aged ≥18 months at first diagnosis

Abstract Background The prognosis of patients with recurrences from stage 4 neuroblastoma is not uniformly dismal. The evaluation of new therapies therefore needs to consider the individual risks of the treated patients. This study aims to define clinically useful risk criteria. Patients and Methods Inclusion criteria were: first recurrence of neuroblastoma stage 4 aged ≥18 months and enrollment in first line trials between 1997 and 2016. Patients were randomized into a training set (N = 310) and an independent validation set (N = 159). The primary endpoint was secondary event‐free survival. The individual treatment elements the patients received during initial and recurrent disease were analyzed as binary and time‐dependent variables. A five‐step multiple time‐dependent Cox regression analysis was performed on the training set to identify prognostic variables adjusted for the individual frontline treatment. The selected variables resulted in a prognostic index (PI) and were used to build a risk score system. The score was validated with the validation set. Results Of the 469 patients, 372 were treated with curative intent and 97 with palliative intent. The PI included the variables number of recurrence organs (hazard ratio [HR] = 2.27), time to recurrence (HR = 2.03), liver metastasis at diagnosis (HR = 1.77), first recurrence at site of the primary tumor (HR = 1.55), and age (HR = 1.29). Three risk groups were built and confirmed in the validation set. The scoring system was likewise useful for the curatively or palliatively treated subgroups. Conclusion A new risk score system for patients with first recurrence of stage 4 neuroblastoma aged ≥18 months at diagnosis is proposed.


| INTRODUCTION
The prognosis of children with recurrences of high-risk neuroblastoma is dismal. [1][2][3][4][5][6][7][8][9] The reported time periods from the observation of first recurrence to the subsequent progression are short (median intervals 58 days, 1 4.7 months, 2 6.4 months 3 ), and the overall survival proportions have been poor (20% after 4 years, 1 20% after 5 years, 4 7% after 10 years 5 ). Risk factors for an inferior outcome included stage 4, 4,5 age ≥18 months at first diagnosis, 4,5 MYCN amplification, 1,2,4,6,7 loss of heterozygosity of chromosome 11q, 1 shorter time from diagnosis to first recurrence, 4,7 abdominal primary tumor, 5 bone marrow metastasis at first diagnosis, 2 recurrent disease (vs refractory disease), 3,8,10 and increased lactate dehydrogenase blood levels. 5 Other predictors for poor outcome were measurable tumor on computed tomography/ magnetic resonance imaging at second-line trial enrollment, high Curie score by metaiodobenzylguanidine scintigraphy, and stem cell supported therapy. 10 The analyzed cohorts varied and included recurrences from localized tumors, 1,2,[4][5][6][7][8]10 patients aged less than 18 months at diagnosis, 1,2,4,5,9,10 and patients with refractory disease. 3,8,10 For parents, the decision to undergo therapy a second time after recurrence of stage 4 neuroblastoma is heavily dependent on the potential survival chances. For clinical scientists, reporting on phase 2/3 trials requires a comparison with equivalent risk groups. The study objective was to develop a robust score system taking into account the individual treatment the patients had received. A large data set was retrospectively analyzed, focusing on the largest and worst group, ie stage 4 neuroblastoma aged ≥18 months at initial diagnosis, including patients treated with palliative intent.

| Study design
This is a retrospective analysis of data obtained from the national trials NB97 and NB2004 of the German Pediatric Oncology Society. The data lock was 30 April 2019. The primary endpoint for this analysis was secondary event-free survival (secEFS), and the secondary endpoint was secondary overall survival (secOS). Table S2 lists the non-time-dependent variables used for the analysis.
The trials were conducted in 66 pediatric oncology university and community hospitals in Germany and Switzerland. All protocols were approved by the ethics committees of the University of Cologne. Written informed consent was obtained from the parents or legal guardians before enrollment and included the follow-up after recurrences until death. Yearly routine check-up reports were requested by the trial office from the local institutions and/or by the German Childhood Cancer Registry (www.kinde rkreb sregi ster.de).
If medical care had been finished, the local registration offices were approached in accordance with the national law. Of all German neuroblastoma patients known to the German Childhood Cancer Registry, 99.3% participated in one of the two trials between 1997 and 2016. 11 High-dose chemotherapy with autologous blood stem rescue and anti-GD2 immunotherapy was used systematically from 1997 onwards. Palliative therapy was assumed if the medical report clearly stated the intent of the therapeutic measure was palliative. The decision for palliative or curative treatment was made exclusively at the treating institution. Epidemiological, diagnostic, and therapeutic details are described elsewhere. 11 One hundred and eighty three study patients have been the subject of an earlier report. 7

| Participants
The inclusion criteria were (a) neuroblastoma stage 4 according to international criteria (INSS) 12 ; (b) age at diagnosis ≥18 months and <21 years; (c) diagnosis between 5 March 1997 (trial NB97 first patient in), and 31 December 2016 (trial NB2004 last patient in); (d) enrollment in the national neuroblastoma trials NB97 or NB2004; (e) first recurrence diagnosed. Exclusion criteria were (a) death due to the tumor or early progression <90 days without response to first line therapy; (b) death due to toxicity in first line therapy; (c) no or inadequate first line therapy; (d) diagnosis of a second malignancy at any time; (e) insufficient information regarding recurrence site, recurrence therapy or unknown cause of death in first line therapy.

| Statistical analysis methods
Secondary event-free survival was defined as the time from first recurrence until second recurrence or until death of any cause or until the last examination. For simplicity, recurrences where tumors had completely disappeared as well as progressions where tumors had partially disappeared or had been stable have been comprised under the term "recurrences". Secondary overall survival was defined as the time from first recurrence until death of any cause or until the last examination. Kaplan-Meier estimates for secEFS and secOS were compared using the log-rank test. For multivariable survival analyses, the Cox regression model was used. Estimated hazard ratios (HR) with 95%-confidence intervals (95%-CI) and Wald P-values were calculated. For all analyses, IBM SPSS statistical package version 24 and SAS version 9.4 were used.

| Explanatory variables and missing data
Continuous variables (eg age and time from diagnosis to first recurrence) were entered in continuous and in binary form (median as cut-off) into the model. Count variables (eg number of recurrence sites) were additionally used in the binary form (one vs more than one). Missing data were treated as missing completely at random. First line therapies were included as binary variables, and therapies with curative intent after the first recurrence were included as time-dependent variables (effective during treatment time and the following month). Palliative treatment was considered as a binary variable.

| Analysis sets
Patients were stratified according to the variables palliative treatment (yes/no), MYCN amplification (yes/ no), 1,2,4,6,7 time from diagnosis to first recurrence (≤18/>18 months), 4,7 and number of recurrence organs (1/>1) and then randomized with allocation ratio 2:1 into a training set (n = 310) and a validation set (n = 159). All patients treated with palliative (n = 97) or curative (n = 372) intent were included in the analysis in order to obtain the full spectrum.

| Score building/development
A multiple time-dependent Cox regression analysis was performed on the training set to identify prognostic variables that are available at the time of first recurrence diagnosis. As some of the considered variables had missing values and regression analysis is dependent on complete information, variables were categorized into three groups: variables with a high number of missing values (≥15%), variables with some missing values (>0-<15%) and the variables without any missing data.
A 5-step approach for the selection of variables was used based on consecutive forward selection (inclusion criterion: P-value of score test ≤.05) and backward selection steps (exclusion criterion: P-value of likelihood ratio test >.05). The procedure is summarized in Figure 1.
The first two steps were used to test whether any variables with missing values showed to be significant. In the first step (forward selection), all variables were entered into the model to identify significant variables with ≥15% missing values. The second step of forward selection included variables with <15% missing values plus the variables with ≥15% missing values selected in the first step. This was done in order to have a higher number of complete cases available for analysis. The final model was built in steps three to five. In the third step, only variables without missing values plus the selected variables from the previous step were considered and again chosen through forward selection. In the fourth step, the model from the previous step was expanded by successively adding the treatments as time-dependent covariates into the forward selection model in order to adjust for the therapy the patients received. Finally, in the fifth step the set of variables was reduced via backward selection, resulting in the final model where X is the design matrix of the nontherapy variables, Y is the design matrix of non-time-dependent therapy variables, and Z(t) contains the time-dependent (second-line therapy) variables. The vectors β, γ, and δ depict the corresponding vectors of estimated regression coefficients. The model fit was assessed by estimating the integrated timedependent area under the curve and Harrell's concordance statistic.

| Prognostic index and risk score
The parameter estimates of the selected nontherapy variables were then used to produce the prognostic index (PI), PI = Xβ. The PI is a measure of the risk for an event independent of the received therapy. It can be calculated for each patient based on his/her characteristics regarding the prognostic variables. Based on an optimal stratification of the PI, three risk groups were built. This was done using all possible pairs of cut-off values. Each cut-off pair defined three risk groups and was evaluated in a Cox regression model including the risk groups plus the therapeutic variables of the final model. Finally, the cut-off pair with the smallest P-value in the likelihood ratio test was chosen (unadjusted for the multiple tests of all possible cut-off values).
The resulting risk groups defined the risk score.

| Score validation
The risk score was applied to the independent validation set in order to test the reliability to discriminate the three subgroups regarding secEFS. The overall confirmatory null hypothesis was: H0: the secEFS of patients in the lower risk group is equal to or worse than the secEFS of patients in the intermediate risk group or the secEFS of patients in the higher risk group is equal to or better than the secEFS of patients in the intermediate risk group.
The P-value is calculated as P:= max[P1,P2] where P1 [P2] is the one-sided Wald P-value of the variable risk group (higher [lower] risk group vs intermediate risk group) in the Cox regression model including risk group and the therapy variables from the last step of model building. The same hypotheses were exploratively evaluated for secOS. To investigate the robustness of the score, the analysis was repeated with risk group as the only explanatory variable. The results obtained were exploratively reassessed for the cohort of curatively treated patients. Additionally, the five diagnostic variables from the final model were separately analyzed in a Cox regression model including the therapy variables from the last step.

| RESULTS
Four hundred and sixty nine cases with recurrences of stage 4 neuroblastoma aged ≥18 months at initial diagnosis were identified. The CONSORT diagram (Figure 2) shows the numbers of patients who were excluded from the analysis.  Table S1 shows that the most frequently involved recurrence organs were the osteomedullary site and the primary tumor area. In one quarter of the patients, both sites were jointly affected. The recurrence presented in 56.3% of cases in one site/organ, in 30.7% in two, in 10.2% in three, and in 2.8% in four sites.

| Score building/development
The training set for the Cox regression analysis had 310 cases. Sixty-one non-time-dependent variables were available at the time of diagnosis of the first recurrence, including the variable "palliative treatment" (Table S2). In the first step, all 61 variables were considered for selection, resulting in 152/310 complete cases available. No variable with more than 15% missing values was selected. In the second step, the 60 variables with less than 15% missing values were considered, resulting in 252/310 complete cases. Again, none of the variables with missing data was selected and the whole training set could be used only including the 53 variables without missing data. The model chosen in the third step was then expanded in the fourth step by adding second-line therapies as time-dependent variables ( Table S3). The final model from the fifth step (backward selection) is depicted in Table 1. All nontherapy parameters are generally available at the time of recurrence and were therefore chosen to build a PI. Aberrations of chromosome 1p and amplification of the MYCN were prognostic in univariable analysis (Table S2), but not relevant in multivariable analysis ( dehydrogenase elevation was univariably and multivariably not prognostically relevant. The PI allows the individual risk of each patient to be calculated (Table S4) according to the formula (coefficients rounded): Ranging from 0 to 2.787 with a median of 1.11 (95% CI 1.01-1.14), the optimal two cut-offs were at C 1 = 0.50 and C 2 = 1.20. Figure 3 shows the Kaplan-Meier curves of the three risk groups for secEFS in the training set. Twenty percent of patients belonged to risk group 1 (lower, PI 0 to ≤0.50), 36% to risk group 2 (intermediate, PI > 0.50 to ≤1.20), and 44% to risk group 3 (higher, PI > 1.20). The distribution of the identified risk factors for the total group (N = 469) is shown in Table S5.

| Score validation
The overall null hypotheses that secEFS is independent of risk group could be rejected at a significance level of 2.5% (P < .001), thus supporting the validity of the risk score. The validation set consisted of 159 cases, of which 16% were attributed to risk group 1, 38% to risk group 2, and 45% to risk group 3. The risk score reliably discriminated between the groups for secondary event-free as well as for overall survival. The HRs of the multivariable Cox model for secEFS including the therapeutic variables were 0.82 (95% CI 0.49-1.38, lower vs intermediate risk, Wald P = .462) and 1.97 (95% CI 1.37-2.83, higher risk vs intermediate risk, P < .001). For secOS, the HRs were 0.77 (95% CI 0.43-1.40, lower vs intermediate risk, P = .393) and 2.07 (95% CI 1.41-3.03, higher vs intermediate risk, P < .001). Figure 4 shows the Kaplan-Meier estimates.
In the subgroup of curatively treated patients (n = 126), the score was different between higher vs intermediate risk as well for secEFS (Wald P = .002) and secOS (P < .001), while intermediate vs lower risk were not discriminated (secEFS P = .253; secOS P = .236). In the statistically small subgroup of patients with palliative care (n = 33), the score discriminated also higher risk vs intermediate risk (sec EFS P = .016) and lower vs intermediate risk (secEFS P = .015, secOS P = .046), but not secOS higher vs intermediate risk (P = .118). Figure S1 shows the Kaplan-Meier estimates.

| DISCUSSION
A new scoring system defining three risk groups for patients with recurrences of high-risk stage 4 neuroblastoma aged ≥18 months is proposed. The data needed for the score are available at the diagnosis of recurrence (age, time to recurrence, liver metastasis at diagnosis, recurrence at the site of the primary tumor, number of recurrence sites).
The The study has limitations. One is the retrospective nature of the analysis. The main information was obtained through medical reports and yearly routine queries while other structured detailed reports were only rarely available. Second, the diagnostic work-up at the time of recurrence was not always complete. For example, bone marrow cytology was rarely performed in the event of osteomedullary metaiodobenzylguanidine accumulation or in cases of palliative treatment intent. Nonetheless, the date and the sites of recurrences used for the analysis were unambiguous. Thus, more recurrence sites were possible, but not fewer. Third, the therapeutic elements applied to treat the recurrence were comprised under one heading, eg "chemotherapy". Specific data on drugs used, dosages, and timing were rarely available and not analyzed. Thus, only "presence" or "absence" of a therapeutic element and the time period were used for the adjustment. But even this limited information concerning consecutive treatment approaches is rarely available in most data bases. Fourth, the decision to treat a patient curatively or palliatively was up to the patient's family and the local treating team. The information about the therapeutic intent was communicated in all cases from the treating institution. It cannot be excluded that some of the palliative treatment patients may have been categorized as curative intent patients. The portion of patients treated palliatively decreased with time. Fifth, it was not our goal to analyze the effect of different treatments patients received. As the secondary treatment variables are included in the analysis as time-dependent variables assuming a constant effect of therapy during treatment plus four weeks not considering the doses and other factors, the parameters describing the effect of the treatment must be interpreted with care and we do not suggest any therapy recommendation based on the found effects. Sixth, the designations "lower", "intermediate", and "higher" are relative and must be viewed in the context of the generally poor survival of this specific cohort of stage 4 patients with first recurrence. Finally, the risk score is not applicable to patients refractory to or without or with inadequate first line therapy (15/741 = 2%, Figure 1).
The combination of the individual risk factors resulting in a score to predict the individual risk is new. Through the inclusion of the therapeutic variables, the resulting parameters from the multivariable analysis yield the effect of the risk factor when corrected for the treatment the patients received. This is the important point of our study. The five described variables appear to cover the impact of MYCN oncogene (MYCN) amplification, 1,2,4,6,7 increased lactate dehydrogenase levels, 5 and other risk factors mentioned. In the future, novel omics technologies may provide better insights into the molecular mechanism and potentially better explain clinical courses right from the first diagnosis. 13,14 Tumor material for molecular investigations was not available in most cases of this series. The match of phenotype and genotype is an important task for the future.
The authors of this study see two implications: the estimation of the individual risk of a patient for treatment selection and the presentation of basic risk group data for comparisons with outcomes obtained by new therapeutic approaches. The gold standard to evaluate a new therapy is a randomized