A nomogram that incorporates traditional and newer prognostic factors to identify patients with chronic lymphocytic leukemia (CLL) who are at high risk of receiving therapy was developed by investigators at The University of Texas M. D. Anderson Cancer Center (MDACC). Because the model required validation before its extensive use could be recommended, the authors sought to externally validate the nomogram in an independent, community-based cohort of patients with CLL.
In total, 328 previously untreated patients with newly diagnosed, asymptomatic, Binet stage A CLL from different primary hematology centers who were registered on a prospective basis during 2006 to 2010 on an observational database of the Italian Lymphoma Study Group were considered suitable for external validation of the model.
A total point score was calculated for each patient using a formula proposed by MDACC investigators, and the median score was 19.9 (range, 0-69.5). Furthermore, when the score was evaluated as continuous variable (ie, by measuring the risk of each point increase), the total point score was associated with the time to first treatment (hazard ratio [HR], 1.04; 95% confidence interval [CI], 1.02-1.05; P < .0001). Receiver operating characteristic analysis identified a point score of 25 (area under curve; 0.64; sensitivity, 61.5; specificity, 72.1; P < .0001) as the best threshold capable of separating patients who needed therapy from patients who did not (HR, 3.27; 95% CI, 2,07-5.18; P < .0001). The prognostic index category also remained a predictor of the time to first treatment when the analysis was limited to patients with Rai stage 0 disease (HR, 4.05; 95% CI, 2.25-7.52; P < .0001). Finally, a goodness-of-fit test demonstrated that the nomogram model had a significantly good fit at 2 years (correlation coefficient [r2] = 0.966; P = .002).
In parallel with significant improvements in treatment outcomes, there has been dramatic progress in understanding the biology of chronic lymphocytic leukemia (CLL), and several new prognostic factors have been identified.1 However, it is important to clearly define the clinical endpoint for analysis and not to assume that the same factors apply across all clinical endpoints.2 Recently, Wierda et al3 performed an analysis to identify traditional and newer prognostic factors that are associated independently with the time to first treatment for patients with CLL who do not have an indication for treatment at the time of presentation. On the basis of their findings, a nomogram was developed that calculates the 2-year and 4-year probability of treatment and estimates the median time to first treatment.3
Caution must be used when extrapolating results from regression models that are built on different populations, because a nomogram derived from a certain patient cohort may not be applicable to an independent cohort. Consequently, external validation is essential to ensure that the nomogram is completely applicable to clinical practice and an extensive clinical use can be recommended.4
In the current study, we used the Italian Lymphoma Study Group (GISL) database to validate the model proposed by investigators at The University of Texas M. D. Anderson Cancer Center (MDACC) for assessing the time to first treatment in patients with CLL. It is noteworthy that our independent, prospective, multicenter series consisted of community-based patients who had early stage disease at presentation.3 We also extended the utility of the index to patients with Rai stage 0 disease, who represent approximately 70% of patients with newly diagnosed early CLL in nonreferral primary hematology centers in Italy.5 Finally, because, in their original study, Wierda et al3 did not identify a test group of patients to internally validate their nomogram, the results of the current study represent the first valid attempt to verify the reliability of the newly proposed MDACC index.
MATERIALS AND METHODS
Three hundred twenty-eight patients with previously untreated, Binet stage A CLL from several institutions who were diagnosed between January 2006 and December 2010 were registered prospectively within 12 months from diagnosis in a national database (O-CLL1 protocol; clinicaltrial.gov identifier NCT00917540). The median time elapsed between the date of diagnosis and the date of database registration was 3 months (range, <1-12 months). The date of CLL diagnosis was recorded, and the time-to-event endpoint was defined as the time elapsed between the time of database registration and the time of first CLL treatment. The inclusion criteria for CLL diagnosis, which were used at the time of study design and initiation, followed the National Cancer Institute-sponsored Working Group guidelines,6 which require absolute lymphocytosis with a lower threshold of >5000 mature-appearing lymphocytes/mL in the peripheral blood. Because the objective of this observational study was to evaluate the role of novel prognostic variables in younger patients with CLL, only those aged <70 years were considered eligible.
Diagnoses were centrally confirmed with flow cytometry (positive for cluster of differentiation 5 [CD5; a type I transmembrane protein]/weak expression of surface membrane immunoglobulin [SmIg]) by a biologic review committee according to flow cytometric analyses that were centralized at the National Institute of Cancer Laboratory in Genoa. Both traditional and newer prognostic parameters were assessed at the time of database registration. Evaluations of newer prognostic factors, including immunoglobulin heavy-chain variable gene (IgHV) mutation status, CD38 (glycoprotein) expression, and ζ-chain–associated protein kinase 70 (ZAP-70) expression, by flow cytometry and Western blot analysis were centralized at the National Institute of Cancer Laboratory in Genoa; and the study of chromosome abnormalities by fluorescence in situ hybridization (FISH) analysis was centralized at the Research Center for the Study of Leukemia, Institute for Cancer Research and Treatment Foundation at the University of Milan. Information regarding traditional prognostic factors and clinical and laboratory variables was obtained at the local hematology centers and included sex, age, Rai stage, the number of involved lymph node sites (cervical, axillary, and inguinal) and size (ie, greatest dimension in cm), measurement of liver and spleen size, absolute lymphocyte count, hemoglobin level, platelet count; β2-microglobulin, and lactate dehydrogenase (LDH).
The expression of CD38 was analyzed by 3-color immunofluorescence, and ZAP-70 was detected according to previously reported methods.7 The cutoff level of 30% positive cells was chosen to discriminate CD38-negative CLL from CD38-positive CLL.6 Cutoff levels of 20% or 30% were used to distinguish ZAP-70-negative CLL from ZAP-70-positive CLL, depending on the anti–ZAP-70 antibody used and the laboratory's standardization of ZAP-70 flow cytometric protocols.6IgHV gene presence and mutation status were determined using combinational DNA according to previously published methods.7-9
Interphase FISH analyses were carried out for the detection of trisomy 12 and chromosome deletion at 17p13.1, 11q22.3, and 13q14.3 loci. Dual-color hybridizations, using appropriate centromeric-specific probes and unique, sequence-specific probes for the tumor protein 53 (TP53) (locus-specific identifier probe [LSI] P53) and ataxia telangiectasia mutated (ATM) (LSI ATM) loci, were performed for the 17p13.1 and 11q23.3 deletions, respectively.10 To detect the 13q14.3 deletion, the LSI D13D25 was cohybridized with the 13q34 telomeric probe as an internal control for nullisomy. A chromosome 12-specific α-satellite probe was used to identify trisomy 12. All probes were purchased from Vysis Inc. (Downers Grove, Il), and FISH procedures were performed according to the manufacturer's specifications. For each hybridization, we assessed a minimum of 200 interphase nuclei.8 Patients were categorized into high-risk (17p13.1 and 11q22.3 deletions), intermediate-risk (trisomy 12), and low-risk (13q14.3 deletion and normal) groups for subsequent analysis.
Indication for Therapy
All patients underwent sequential monitoring and were managed according to a “watch-and-wait” policy. The frequency of follow-up visits was individualized according to patient risk and ranged between 3 months and 6 months (median, 3 months). All physicians who registered patients into this observational database stated that they had used the National Cancer Institute-sponsored Working Group guidelines as reference criteria for starting therapy.6, 11 The absolute lymphocyte count was not used as the sole indicator for treatment. Active disease that required therapy was defined on the basis of at least 1 of the criterion set out in the National Cancer Institute-sponsored Working Group guidelines.6, 11
The Wierda et al Nomogram Score
The variables required for the Wierda et al3 nomogram were the number of lymph node sites involved, the size of cervical lymph nodes, LDH level, IgVH mutational status, and the presence of 11q or 17p deletion established by FISH analysis. A visual representation of the relative contribution of each prognostic factor to the total point score is illustrated in Figure 1.
The formula for calculating the total point score for each patient is as follows: [I(no. of lymph node sites involved = 3) × 7.370 + I(FISH = 11q del) × 9.312 + I(FISH = 17p del) × 11.285 + (greatest dimension of largest cervical lymph node in cm) × 4.172 + (LDH/100) × I([IgHV gene = mutated] × 5.000 + (LDH ÷ 100) × I(IgHV gene = unmutated) × 1.065) + 35.467. The indicator function (I) is equal to 1 if the statement in the parentheses is true and, otherwise, is equal to zero.
The primary endpoint, time to first treatment, was defined as the interval between the date of database registration and date of first CLL treatment. Patients who did not receive any treatment were censored at their last confirmed treatment-free follow-up date. The Kaplan-Meier method was used to estimate distribution of the time to first treatment, and the log-rank test was used to compare patient subgroups. Likelihood-ratio tests were used to test the effects of individual factors, either in univariate analysis or jointly. Hazard ratios (HRs) and confidence intervals (CIs) for HRs were calculated from the Cox models.
Nomogram validation consisted of discrimination and validation. Discrimination, which refers to the nomogram model's ability to correctly distinguish 2 classes of outcome, was quantified by means of the concordance index (the Harrell C-statistic).12 We used the area under the receiver operating characteristic curve (AUC) was to measure model discrimination. The AUC can range from 0.5 (which indicates a test with no information) to 1.0 (which indicates a perfect test).
We also assessed the nomogram calibration that compares the predicted and actual time to first treatment. For this purpose, a Poisson log-linear model was used.
In total, 328 patients were included in this analysis. The median age at diagnosis was 61 years (range, 33-70 years), and 58.2% were men. The majority of patients had Rai stage 0 disease (76.5%), and virtually all were local, nonreferred patients who were first diagnosed and then managed at different primary Italian hematology centers. In this patient cohort, 33.4% had unmutated IgVH status, and 9.7% had high-risk cytogenetic features (ie, 17p or 11q deletion) (Table 1).
Table 1. Patient Characteristics and Univariate Analysis of the Time to First Treatment
Patients were followed for a total of 2038 person-years (median, 30 months; range, 1-65 months), during which 68 patients (20.7%) required therapy. The probability of remaining free from therapy was 57% at 5 years, and no plateau was reached (Fig. 2). When the analysis was restricted to patients who received therapy, the median time to treatment was 24 months (range, 2-56 months).
Univariate and Multivariate Analysis
Univariate analysis of the time to first treatment included a series of traditional and novel variables. Traditional clinical and laboratory parameters that were associated with a shorter time to first treatment were a higher absolute lymphocyte count (P < .0001), increased β2-microglobulin (P < .0001), advanced Rai substage (P < .0001), and multiple lymph node involvement (P < .0001). With regard to novel prognostic variables, we observed that the presence of 11q or 17p deletion by FISH analysis identified a patient subset at higher risk of requiring early treatment (P = .002) (Fig. 3).13 Also, patients with unmutated IgVH status (P < .0001) or CD38 expression (P < .0001) had a shorter time to first treatment. Regardless of the method used for ZAP-70 assessment (ie, flow cytometry or Western blot analysis), a close correlation with the time to first treatment was observed (P < .0001 for both). However, ZAP-70 expression was not included in the multivariable analysis because of a lack of standardized testing in the community. Among the variables that were significant in univariate analysis, only Rai substages (P = .01), absolute lymphocyte count (P < .0001), IgVH mutation status (P < .0001), and β2-microglobulin (P < .05) retained independent prognostic significance for the time to first treatment.
Validation of the Wierda et al Nomogram
In each of the 328 patients with Binet stage A disease, we calculated the total point score using the formula proposed by Wierda et al.3 Total point scores ranged from 0 to 69.5 (median, 19.9) and were similar to the total point scores reported by the MDACC group (median, 21.0; range, 0-87.4).3
We wondered whether the Wierda et al nomogram3 could predict the time to first therapy in our patient cohort. The total point score, as a continuous variable (ie, measuring the risk of each point score increase), was associated with the time to first treatment (HR, 1.04; 95% CI, 1.02-1.05; P < .0001). In addition, we conducted a C-statistic analysis, which is considered a measure of concordance between observed and predicted time-dependent events. The results from that analysis clearly demonstrated that the total point score correctly predicted the time to first treatment (C = 0.62; 95% CI, 0.55-0.70).
Next, we investigated the threshold value that best differentiated patients who required treatment from patients who had stable disease. A receiver operating characteristic analysis performed for this purpose identified a point score of 25 (AUC, 0.64; sensitivity, 61.5; specificity, 72.1; P < .0001) (Fig. 4) as the best threshold capable of discriminating patients who needed therapy from patients who did not. A graphic representation illustrating how this threshold works in predicting the time to first treatment is presented in Figure 5. The likelihood of treatment for patients with total point scores ≥25 was substantially greater than that of patients with total point scores <25 (HR, 3.27; 95% CI, 2.07-5.18; P < .0001), as indicated in Figure 5. This threshold allowed an accurate prediction of the time to first treatment (C = 0.65; 95% CI, 0.57-0.71); and, during the first 5 years of follow-up, the estimated rate of progression to an active phase of the disease requiring treatment was approximately 1.8% (95% CI, 1.6%-1.9%) per year among patients who had total score points <25, whereas the estimate increased to 2.5% (95% CI, 2.2%-2.8%) per year for patients who had total point scores ≥25.
A calibration of the Wierda et al3 nomogram was assessed by comparing the nomogram that predicted the time to first therapy with the actual time to first therapy in patient subgroups divided by decades according to their total point score. This was verified by a linear analysis, and the goodness-of-fit test indicated that the nomogram had a significantly good fit at 2 years (correlation coefficient [r2] = 0.966; P = .002) (Fig. 6), which means that the model was accurate in predicting the individual 2-year time to treatment in the external validation set. We avoided evaluating the fit of the model at 4 years, because only 44 patients had a follow-up >48 months.
Because Rai stage 0 accounted for >76.5% of patients in our cohort with early stage CLL, we wondered whether the Wierda et al3 nomogram also would retain its discriminant power in this patient subset. When 251 patients with Rai stage 0 CLL in the current series were classified according to the previously identified threshold (ie, a point score of 25), we observed that 66.9% fell into the low-risk category, and 33% fell into the high-risk category. Differences in the time to first treatment were observed between patients with Rai stage 0 disease based on whether they were classified as a low risk or high risk according to the nomogram score (HR, 4.05; 95% CI, 2.25-7.52; P < .0001) (Fig. 7), indicating that the Wierda et al3 nomogram accounts, at least in part, for some of the heterogeneity in clinical outcomes observed within clinical stage categories. Finally, the total point score also permitted us to accurately predict the time to first treatment among patients with Rai stage 0 disease (C = 0.65; 95% CI, 0.56-0.74).
Wierda et al Score and Patients Requiring Therapy
The interaction between the Wierda et al3 nomogram categories and the time to first treatment was assessed in patients who had progressive disease that required therapy. Theoretically, we expected that the time to first treatment would be longer for patients in the low-risk category and shorter for patients in the high-risk category. In our experience, the median time to first treatment for 68 patients who required therapy, when stratified according to the Wierda et al3 nomogram into 2 groups with total point scores <25 (n = 30) and >25 (n = 38), was virtually the same (median time to first treatment, 24 months for both; HR, 1.19; 95% CI, 0.74-1.99; P = .43). This observation suggests that, in patients with early stage CLL who eventually require therapy, the Wierda et al nomogram3 does not help to identify patient subsets that have different patterns of progression.
Previous published reports evaluated the impact of either clinical or biologic prognostic factors on the clinical outcome of patients with CLL.14-22 Further work was performed to combine clinical variables in a prognostic index that was able to predict overall survival more accurately than clinical stage alone.23 The model was subsequently validated by different groups,5, 24 and a revised version of this score was useful to predict the time to first treatment in a previously published series5 and in the current patient cohort. Indeed, the 2-year probability of remaining free from therapy was 93%, 85%, and 63%, in patients who scored 0 to 2, 3 or 4, and 5 to 7, respectively (chi-square test for trend, 9.94; 1 degree of freedom; P = .001). However, currently, the inclusion of traditional and newer prognostic factors in the same model is a novelty.3 Remarkably, the time to first treatment is a more suitable clinical endpoint than overall survival for patients with early CLL, because it does not reflect competing risks between successive relapses, histologic transformation, deaths in remission, or the impact of new therapies.5
The current results obtained in an independent patient cohort confirm the ability of the model recently developed by Wierda et al3 to predict the time to first treatment among patients with previously untreated CLL who have early disease. The challenges for this type of analysis include subjectivity in computing some of the parameters evaluated, like assessing bulk of disease by measuring lymph nodes and organomegaly by physical examination. It should be pointed out that palpation and measures of lymph node, liver, and spleen size are somewhat subjective, approximate, and may limit the precision of prognostic tools. In addition, the nomogram is a graphic representation, which potentially is less accurate than using a tool that is purely calculated. When Wierda et at3 evaluated the parameters, they included either Rai stage in the model or hemoglobin level and platelet count, because there are significant interactions between these parameters. Our analyses, which extend the utility of the MDACC score to a homogeneous subset of patients with Rai stage 0 disease, support the value of the model above the possible interactions between different parameters.
Although our findings indicate that this prognostic index is a significant advance in predicting the time to first treatment in patients with early CLL, several qualifications are required. For instance, in multivariate analysis of the time to first treatment, we observed that only 4 covariates had independent prognostic value (ie, Rai stage, absolute lymphocyte count, IgVH mutation status, and β2-microglobulin). In contrast, the MDACC study identified 5 covariates with independent prognostic value, which were included in the nomogram: the size of cervical lymph nodes, the number of lymph node areas, FISH abnormalities, LDH concentration, and IgVH mutation status.3 These changes reflect the different distribution of either traditional or newer prognostic variables among patients registered in our observational database compared with patients in the MDACC database. Actually, only 36% of patients evaluated at the MDACC had Rai stage 0 CLL; whereas, in our prospective multicenter patient cohort, Rai stage 0 accounted for 76.5% of all cases. Differences also were observed when the distribution of patients displaying mutated IgVH status (GISL cohort, 66.6%; MDACC cohort, 55%) or low-risk genetic features defined on the basis of the presence of 13q deletion or normal cytogenetics (GISL cohort,79.2%; MDACC cohort 69%), was compared between the 2 series.
The current analysis has some weakness. First, patients enrolled in our prospective database were selected intentionally on the basis of an age threshold of ≤70 years, because we wanted to specifically address the issue of prognosis in younger patients who were diagnosed with early stage CLL. Few studies have evaluated interactions between age and the utility of prognostic testing, because it is not completely clear whether age may determine differences in the natural history of CLL.23-25 Our study reinforces the need to generate prognostic models that include mostly younger CLL patients. Conversely, recent findings suggest that prognostic assessment has little value in patients aged ≥75 years.25 However, given the availability of newer and less toxic therapies, even for older patients,26 and because the average median age of patients with CLL is 72 years, it also would be relevant to validate the Wierda et al3 model in a patient cohort that includes all age groups of patients with CLL. Second, the current validation analysis was performed on a relatively small patient series. Indeed, our cohort, which included 328 patients with CLL, was 2 times smaller than the cohort analyzed by MDACC investigators, who evaluated 687 patients to generate the model.3 Although this is a limiting factor, some aspects of our patient cohort, such as the homogenous disease stage of patients, the prospective nature of enrolment, and the community-based setting, support our current results. Finally, we need to emphasize that this represents the first reliable attempt to validate the MDACC model, because Wierda et al,3 in their original study, did not identify a patient test group to internally validate the nomogram.
Our study has several important strengths. All patients had early stage disease at the time of entry into the study and, thus, represent the group of patients for whom prognostic instruments are most needed. Accordingly, validation of the nomogram proposed by MDACC3 meets the need to separate Binet stage A patients into different prognostic groups to devise individualized and tailored follow-up policies during the treatment-free period. In addition, in Italy, patients undergoing clinical evaluation for absolute lymphocytosis commonly are referred to primary hematology centers. For that reason, clinical studies that include these patients allow for a horizontal, long-term, observational follow-up from early diagnosis representative of the natural course of CLL.5 The same does not always apply for studies dealing with patients who are followed at academic referral centers in the United States in view of the influence of both lead-time and time-length bias.23, 24
In recent years, statistical prediction models have been developed across the majority of cancer types.27-29 One such predictive tool is the nomogram, which creates a simple representation of a statistical predictive model that generates a numerical probability of a clinical event. The external validation of the MDACC nomogram we carried out in an independent, community-based cohort of patients with CLL supports the accuracy of the method used to generate the model. Furthermore, the actual and predicted time-to-first-treatment outcomes revealed good agreement, suggesting that time-to-first-treatment predictions from the nomogram are well calibrated. It should be pointed out, however, that the particularly significant good fit of the model at 2 years may be caused, at least in part, by the relatively short median follow-up of our patient cohort (ie, 30 months).
In conclusion, the nomogram proposed by Wierda et al3 can be considered a trade-off between a complex mathematical formulas and a simple numerical estimate. External validation of this nomogram may contribute toward expanding its diffusion among clinicians. However, we expect that the availability of a web-based version definitely may facilitate its current use in clinical practice and among patients who are seeking reliable information about the clinical outcome of CLL, especially when it is diagnosed at an early phase of the disease.
This work was supported by grants from Associazione Italiana Ricerca sul Cancro (AIRC) (to Antonino Neri -IG4569, MF-IG10492 and Fortunato Morabito -RG6432) and AIRC-Special Program Molecular Clinical Oncology-“5 per mille”, grant 9980, 2010-15 to Antonino Neri, Manlio Ferrarini and Fortunato Morabito; Ricerca Finalizzata from Italian Ministry of Health 2006 (to Giovanna Cutrona, Fortunato Morabito and Manlio Ferrarini) and 2007 (to Giovanna Cutrona).