A new scoring system for predicting survival in patients with non-small cell lung cancer

This analysis was performed to create a scoring system to estimate the survival of patients with non-small cell lung cancer (NSCLC). Data from 1274 NSCLC patients were analyzed to create and validate a scoring system. Univariate (UV) and multivariate (MV) Cox models were used to evaluate the prognostic importance of each baseline factor. Prognostic factors that were significant on both UV and MV analyses were used to develop the score. These included quality of life, age, performance status, primary tumor diameter, nodal status, distant metastases, and smoking cessation. The score for each factor was determined by dividing the 5-year survival rate (%) by 10 and summing these scores to form a total score. MV models and the score were validated using bootstrapping with 1000 iterations from the original samples. The score for each prognostic factor ranged from 1 to 7 points with higher scores reflective of better survival. Total scores (sum of the scores from each independent prognostic factor) of 32–37 correlated with a 5-year survival of 8.3% (95% CI = 0–17.1%), 38–43 correlated with a 5-year survival of 20% (95% CI = 13–27%), 44–47 correlated with a 5-year survival of 48.3% (95% CI = 41.5–55.2%), 48–49 correlated to a 5-year survival of 72.1% (95% CI = 65.6–78.6%), and 50–52 correlated to a 5-year survival of 84.7% (95% CI = 79.6–89.8%). The bootstrap method confirmed the reliability of the score. Prognostic factors significantly associated with survival on both UV and MV analyses were used to construct a valid scoring system that can be used to predict survival of NSCLC patients. Optimally, this score could be used when counseling patients, and designing future trials.


Introduction
In 2013, lung cancer caused an estimated 159,480 deaths in the US [1]. Approximately 85% of lung cancer patients were diagnosed with non-small cell lung cancer (NSCLC) with the majority of patients presenting with advanced disease [2]. Despite gradual improvements in prognosis over time, the majority of the estimated 228,190 Americans diagnosed in 2013 with lung cancer will succumb to it. More research in the prevention, screening, and treatment of lung cancer is required to alter this dismal situation. When writing trials for lung cancer patients, it is important to have a clear understanding of the effects of pretreatment prognostic factors on outcome. This is critical to proper trial design where one optimally stratifies patients for these factors evenly between the treatment arms. This is done to prevent the introduction of uncontrolled biases that can confound the results leading to incorrect conclusions. A valid scoring system could be used to potentially improve the quality of trials performed by allowing better balance of prognostic factors between the treatment arms and the selection of high-risk patients for specific trials. Additionally, the clear understanding of prognosis can help physicians counsel patients about outcome and choose appropriate treatment for individual patients.
In this study, we evaluated the outcome of a large patient cohort to identify their pretreatment prognostic factors and created a scoring system that can stratify patients into groups with distinctly different outcomes. We also carried out validation testing of this scoring system.

Materials and Methods
A total of 1274 patients with NSCLC from a retrospective analysis selected from more than 10,000 patients enrolled to the Mayo Clinic Epidemiology and Genetics of Lung Cancer Research Program were used to generate this scoring system. These patients were registered between 1 March 1997 and 29 april 2008 and were selected because they had complete data available regarding the prognostic factors used for this analysis. Details of the research program and the approach used for identifying and observing patients have been previously presented [3,4]. In this study, we aimed to produce a valid scoring system that could be used to segregate NSCLC patients into groups with differing survival. Baseline factors examined included overall quality of life (QOL), age, treatment, sex, tumor diameter (cm), regional nodal involvement, distant metastasis, Eastern Cooperative Oncology Group (ECOG) performance score, presence of other malignancy, smoking category and status at diagnosis, years since quitting smoking, and pack-years of smoking. These factors were identified as potential prognostic factors associated with survival in the previous study [2]. Weight loss of ≥5% in past 6 months was also included as this is an established prognostic factor in NSCLC [5]. Patients with distant metastases included 16 patients with metastases within the other lung (M1a), seven patients with pleural nodules (M1a), four patients with pleural effusion (M1a), 1 patient with pericardial effusion (M1a), and 53 patients with distant metastases in extra-thoracic organs (M1b).
Stage was specifically not used as it is changed every few years and would negate the value of this score when the staging system is redefined. QOL was assessed with a single-item from the Lung Cancer Symptom Scale. The overall QOL item was used by Sloan et al. [4] and in this study. Overall QOL was considered as a single continuous variable, taking integer values from 0 to 100 (ranging from "as bad as it can be" to "as good as it can be"). The patients judged their own QOL and filled out this single question on a sliding scale. A score of 50 or lower was indicative of a deficit in QOL and related to patient survival.
The Cox proportional hazards model was used to assess the prognostic significance of baseline factors in UV and MV analyses [6]. Those independent prognostic factors significant in both analyses were used to develop the scoring system. The 5-year overall survival (OS) rate (as the percentage) was first calculated for each level of the significant prognostic factors. The 5-year OS rate for each level was divided by 10 to obtain the corresponding score (as whole digit). For example, if patients with ECOG performance status of 0-1 had a 5-year OS rate of 62%, the corresponding score for performance status was calculated by dividing 62 by 10 resulting in a score of 6. In contrast, if patients with performance status of 2-4 had a 5-year OS rate of 24%, the corresponding score is 24/10 or 2. The sum of scores from all significant independent prognostic factors was calculated to form a total score for each patient. The median survival and 5-year OS rates for patients grouped within various ranges of total scores were calculated using Kaplan-Meier survival estimates. Categorization of the score was delineated first by clinician expert opinion and then by multiple statistically defined empirical cut points.
Bootstrapping was employed to assess the relative robustness of the model and provide preliminary evidence of validity [6]. Multivariate Cox proportional hazards models were bootstrapped, wherein we took a random sample, with replacement of the same size as the original sample to obtain a MV model using stepwise selection [7]. We created 1000 bootstrap samples, and obtained 1000 estimates of the MV model. We then summarized the percentage of time each variable was selected in the bootstrapped model. A similar approach was also used to validate the score for each level of prognostic factors, where Kaplan-Meier survival estimates were used to calculate the 5-year survival rate; and the basic statistics from 1000 bootstrap samples were summarized. Survival rates observed were accurate to within 2% with 95% confidence.

Results
The most common patient group represented was white married men who were former smokers with good performance status, and early disease stage that was resected [2]. Patient demographics are presented in Table 1.
In the UV analysis, age, tumor diameter, regional nodal involvement, distant metastasis, overall QOL, treatment, sex, ECOG performance score, smoking cessation, and pack-years smoked were significant prognostic factors of survival ( Table 2). All factors significant on UV analysis were included in MV analysis, except the treatment and pack-years of smoking. Treatment was not included as the goal was to develop a pretreatment score. The number of pack-years was excluded because it is a collinear confounding factor with smoking cessation.
The MV analysis revealed that all these factors were significant predictors of survival. Patients reporting a QOL deficit had significantly worse survival rates even after adjusting for other known prognostic variables (P < 0.0001, HR = 1.84 with a 95% CI 1.44-2.35). See Table 3 for MV Cox proportional hazard model results. The 5-year OS was reduced by greater than one half for patients reporting QOL deficits (29.9% vs. 62.8%); ECOG performance status of >1 (24.3% vs. 61.8%) and continued smoking (28.2% vs. 58.6%).
The score was calculated for each prognostic factor by dividing the 5-year survival rate in percent by 10. Individual score ranged from 1 to 7 points. High 5-year survival rates correlated to higher scores ( Table 4). The total scores were calculated for each patient based on the sum of the scores for each prognostic factor and ranged from 32 to 52 points. Kaplan-Meir survival estimates by total score are shown in Table 5. Figure 1 shows the median survival for each corresponding total score. Figure 2 shows the total score and the corresponding 5-year survival rates. The 5-year OS by different total scores are categorized in Table 6. Within category 4, patients with a low total score of 32 to 37 had a significantly worse OS (P < 0.0001, HR = 29.06 with a 95% CI 18.49-45.66) compared to patients with a high total score (50-52). All categorization schemes demonstrated successful prognostic power (Table 6). Category 4 divided patients into groups with total scores of 32-37, 38-43, 44-47, 48-49, and 50-52 with 5-year OS rates of 8%, 20%, 48%, 72%, and 85%, respectively (P < 0.0001).
Sensitivity analyses using bootstrap approach provided results that were similar to the original analyses. In the  MV model validation, the percent of time the variables were included in the bootstrapped model were 100% for overall QOL, 100% for age, 100% for ECOG performance status, 100% for regional nodal involvement, 100% for distant metastasis, 97% for smoking cessation, 95% for tumor size, and 78% for sex. In score validation, the median and mean survival rates at 5 years from bootstrapped samples only differ by 0.1% to 3.2% from the 5-year survival rates on original samples (Table 7).

Discussion
Lung cancer is a significant health care problem as the leading cause of cancer deaths [1]. A clear understanding of the various prognostic factors is important for a number of reasons. Physicians can use this information to give patients and their families' realistic impressions of survival. Also, the ability to predict survival can help tailor therapy to individual patients.  Proper trial design requires a clear understanding of critical prognostic factors. This is important as imbalances in the distribution of pretreatment prognostic factors can influence survival as much as treatment. Thus, imbalances in the distribution of various prognostic factors between treatment groups can bias the outcome and lead to incorrect conclusions. This can create situations where effective therapies appear useless and ineffective therapies appear useful. Thus, one important use of this scoring system is in the proper stratification of patients in future trials. This study was undertaken to use many significant prognostic factors to create a scoring system that can better predict survival than was previously possible for NSCLC patients.
This score can also be used to define eligibility criteria in trials designed for specific patient populations. For example, the criteria for defining high-risk populations in lung cancer generally only rely on stage, weight loss, and performance status [8]. This analysis allows investigators to use more prognostic factors and understand the influence of them individually and collaboratively on patient survival.
Many investigators have evaluated prognostic factors in patients with nonmetastatic (M0) NSCLC. Jeremic et al. [5] identified female sex, performance status, weight loss, stage, histology, inter-fraction interval, and treatment as prognostic factors in stage III NSCLC. Mosvas identified QOL as the sole independent prognostic factor in stage III NSCLC patients [9]. Additionally, other investigators have identified stage, radiotherapy technique, hoarseness, malaise, erythropoietin, and estrogen receptors in tumor cells as prognostic factors in patients without distant metastases [10][11][12]. The present study identified age, diameter of the primary tumor, regional nodal involvement, distant metastases, overall QOL, treatment, ECOG performance score and smoking cessation as independent prognostic factors for survival.
Wigren developed a prognostic index based on a patient cohort with inoperable stages I-IIIb NSCLC. The five factors identified were disease extent, clinical symptom score These key prognostic variables of the index had equal impact on survival. Thus, based only on the number of adverse factors, each patient falls into one of the six possible prognostic groups. All five factors were significantly predictive of survival and the inclusion of the other known prognostic variables in the MV analyses did not result in any further improvement. Patients with three or more risk factors had a 2-year survival rate of less than 2%, whereas the 17 patients (8%) with no risk factors had a survival of 53%. Wigren concluded that this information could be used to guide management strategy, help to design new treatment strategies, and facilitate the comparison of different studies [13,14]. However, this prognostic index was based only on patients with inoperable stages I-IIIb NSCLC and is not applicable to the other patients groups as is the scoring system developed in the present analysis. Hoang, Finklestein, Paesmans, and Albain examined patients with stage IV disease and found the following factors to be of prognostic importance: performance status, sex, weight loss, metastases to specific locations (skin, bone, liver), number of metastatic sites, advanced age, and certain laboratory findings (abnormal calcium, white blood counts, lactate dehydrogenase, and anemia) [15][16][17][18]. Mandrekar et al. [19] went further to develop a mathematical model to predict the survival of patients with stage IV NSCLC. This formula was based on various prognostic factors including performance status, basal metabolic index (BMI), hemoglobin levels, and white blood count.
In a previous Mayo study, Sloan et al. [4] found survival was associated with QOL, performance status, age, smoking history, sex, treatment factors, and stage of disease in a large cohort of patients with all stages of disease. The emphasis of the Sloan et al. study was to define the importance of QOL as independent prognostic factor in NSCLC. The prognostic factors identified in both of these Mayo studies were consistent with those previously reported in the literature. Additionally, the cohort identified by Sloan et al. was further updated and analyzed in this study to develop this Mayo Score for NSCLC which could be used to predict 5-year survival based on a NSCLC patient's individual characteristics. While the prognostic factors identified in the current study have been previously reported, a scoring system for patients with all stages of NSCLC has not been reported or widely adopted. One weakness of this analysis is the retrospective methodology that may have introduced unforeseen biases. However, the bootstrapping analyses revealed high consistency, lending credence to the content validity of the scoring system. This study included a primarily white population who were robust enough to seek care at a large tertiary care facility introducing potential bias. Another limitation of this study is that only 81 (6%) of the 1, 274 patients had metastatic disease which is lower than the general population of US lung cancer patients [2]. The results for small subpopulations must be interpreted with care. For example, the confidence interval estimators for tiny populations are statistically quite large.
This study was undertaken to use independent pretreatment prognostic factors to create a single scoring system that can predict survival for all NSCLC patients. The score is based on data that is easily obtained during the evaluation of lung cancer patients. The only factor within this system that is not collected routinely during the evaluation of NSCLC patients is the QOL score that can be collected in a minute or so by having each patient judge the overall quality of their lives with a single 0-100 scale. This Mayo Score can provide accurate estimations of patient survival, aid in proper stratification in future trial design, help tailor therapy to individual patients, and identify patients for high-risk trials. Optimally, this scoring system should be further validated with other data sets to confirm its utility. Additionally, we expect this score will be refined over time as the molecular nature of NSCLC is more fully elucidated, better therapies are developed, and patient survival improves.