The authors investigated the putative surrogate endpoints of best response, complete response (CR), confirmed response, and progression-free survival (PFS) for associations with overall survival (OS), and as possible surrogate endpoints for OS.
Individual patient data from 870 untreated extensive stage small-cell lung cancer patients participating in 6 single-arm (274 patients) and 3 randomized trials (596 patients) were pooled. Patient-level associations between putative surrogate endpoints and OS were assessed by Cox models using landmark analyses. Trial-level surrogacy of putative surrogate endpoints were assessed by the association of treatment effects on OS and individual putative surrogate endpoints. Trial-level surrogacy measures included: R2 from weighted least squares regression model, Spearman correlation coefficient, and R2 from bivariate survival model (Copula R2).
Median OS and PFS were 9.6 (95% confidence interval [CI], 9.1-10.0) and 5.5 (95% CI, 5.2-5.9) months, respectively; best response, CR, and confirmed response rates were 44%, 22%, and 34%, respectively. Patient-level associations showed that PFS status at 4 months was a strong predictor of subsequent survival (hazard ratio [HR], 0.42; 95% CI, 0.35-0.51; concordance index 0.63; P < .01), with 6-month PFS being the strongest (HR, 0.41; 95% CI, 0.35-0.49; concordance index, 0.66, P < .01). At the trial level, PFS showed the highest level of surrogacy for OS (weighted least squares R2 = 0.79; Copula R2 = 0.80), explaining 79% of the variance in OS. Tumor response endpoints showed lower surrogacy levels (weighted least squares R2≤0.48).
In 2009, lung cancer was expected to cause 159,390 deaths within the United States.1 About 14% of lung cancer patients have small-cell lung cancer (SCLC).2 Patients with tumors that have metastasized beyond the ipsilateral supraclavicular lymph nodes have extensive stage disease.3 With currently available treatment, the median survival is around 9 to 11 months.4, 5
Overall survival (OS) is the gold standard for oncology clinical trials, including extensive stage SCLC, because 1) it is definitive with respect to the disease process; 2) it is well defined, with no room for subjectivity; and 3) importantly, it reflects the ultimate goal for developing a new regimen. However, OS requires longer follow-up in extensive stage SCLC and second-line therapies (eg, topotecan6) can make ascertainment of the true survival effect of a drug in the first-line setting difficult. Therefore, it is important to investigate whether other clinical endpoints that are more direct indicators of a drug's effectiveness, such as tumor response or progression-free survival (PFS), can accurately and reliably predict OS, and potentially be used in place of OS as the primary endpoint in phase 2 or 3 clinical trials. Clearly, these endpoints are unaffected by subsequent therapy and can be assessed earlier as well.
In the phase 2 setting, in extensive stage SCLC, multiple alternative endpoints to OS have been widely used, although none has been formally validated as a surrogate endpoint for OS. The most commonly used primary endpoint in the phase 2 setting for extensive stage SCLC has been tumor response. Tumor response has the advantage of being a quick and easy endpoint to assess, but it also has several limitations. These include: 1) tumor shrinkage may not occur with targeted therapies, 2) it is a subjective measure of treatment efficacy,7 3) it may not be a good predictor of survival,8, 9 and 4) it excludes patients with stable disease who also achieve clinical benefit.10
PFS is another commonly used endpoint in assessing treatment efficacy in the phase 2 setting for many diseases. Because extensive stage SCLC has such a poor prognosis, it could be argued that the true endpoint of OS is primarily determined by whether the patient's disease has progressed. PFS includes patients' who achieve stable disease for an extended period of time as a success, in addition to those who achieve a response. PFS, similar to response, provides a more direct assessment of whether the tested therapy is potentially worthy of further study in the phase 3 setting. PFS is typically defined as the time from study registration or randomization to the first of either disease progression or death from any cause. However, issues pertaining to imbalance in tumor assessment dates across the different treatment arms, missing assessments, ascertainment bias in an open label trial, and/or the occurrence of progression in the middle of a long tumor evaluation interval can affect the accuracy and validity of PFS as an endpoint, and need to be carefully considered.
In this initial surrogacy evaluation study, we formally investigated the relationships between PFS, best response, complete response (CR), and confirmed response with overall survival using individual patient data from 9 phase 2-3 trials conducted by the North Central Cancer Treatment Group in patients receiving first-line therapy for extensive stage SCLC. Given that the median survival is 9 to 11 months for patients with extensive stage SCLC in the first-line setting, we believe it is important to identify a valid surrogate endpoint for OS that can be assessed earlier for use as the primary endpoint in phase 2 or 3 clinical trials.
MATERIALS AND METHODS
Individual patient data were pooled from 9 consecutive North Central Cancer Treatment Group first-line extensive stage SCLC therapy trials that included either a platinum- or taxol-based regimen and were opened between 1987 and 1999. Three trials were randomized, 2 of which were randomized phase 3 trials (862051, 892051), and the other was a randomized phase 2 study (932053). The remaining 6 trials were single-arm phase 2 trials. Patients who received no study treatment or were ineligible for trial participation were excluded from these analyses, leading to a total of 870 eligible patients with extensive stage SCLC. All trials were pre-RECIST (Response Evaluation Criteria in Solid Tumors), meaning that a tumor response consisted of a complete response, a partial response, or a tumor regression and tumor measurement data were collected bidimensionally.11-15 Because tumor measurement data were not available for analyses, RECIST criteria could not be used. Institutional review boards at the study sites had previously approved these trials, and all participants provided written informed consent. See Tables 1 and 2 for a detailed listing of the individual trial and patient characteristics.
Table 1. Trial Characteristics in Extensive Stage Small-Cell Lung Cancer (N=870)
M indicates male; F, female; CI, confidence interval.
0, 24%; 1, 50%; 2, 26%
0, 23%; 1, 45%; 2, 32%
0, 17%; 1, 54%; 2, 29%
0, 19%; 1, 57%; 2, 24%
0, 35%; 1, 58%; 2, 8%
0, 32%; 1, 42%; 2, 27%
0, 19%; 1, 61%; 2, 20%
0, 33%; 1, 49%; 2, 19%
0, 34%; 1, 53%; 2, 13%
0, 24%; 1, 52%; 2, 24%
Median age, y (range)
M, 65%; F, 35%
M, 48%; F, 52%
M, 60%; F, 40%
M, 64%; F, 36%
M, 70%; F, 30%
M, 55%; F, 45%
M, 54%; F, 46%
M, 60%; F, 40%
M, 63%; F, 37%
M, 62%; F, 38%
Median number of treatment cycles (range)
Median overall survival (95% CI)
Median progression-free survival (95% CI)
Best response rate (95% CI)
Complete response rate (95% CI)
Confirmed response rate (95% CI)
The regimens in the trials were either on a 3- or a 4-week cycle, with tumor assessments generally performed every cycle. See Table 1 for information on the mean and range of the actual number of assessments by trial at the 2-, 4-, and 6-month time points that were used in the landmark analysis. Because of the tumor assessment schedules, the putative surrogate endpoint values at 2, 4, and 6 months were expected to be based on 2, 4 or more, or 6 or more postbaseline assessments, respectively.
This study assessed the association between putative surrogate endpoints and overall survival at both the patient and trial level. Putative surrogate endpoints included PFS and tumor response-based endpoints, including best response, CR, and the confirmed response rate. CR was defined as total disappearance of all tumor during treatment. Partial response was defined as at least a 50% reduction in the sum of the products of the 2 greatest perpendicular diameters of all indicator lesions. Best response was defined as any CR or partial response that occurred during treatment. Best response and CR did not require confirmation of response at a subsequent tumor evaluation. The confirmed response rate was defined as 2 consecutive evaluations of a CR or partial response at least 4 weeks apart. PFS was defined as the time from registration or randomization to the first of either disease progression or death from any cause. Finally, OS was defined as the time from registration or randomization to death from of any cause.
Individual patient-level surrogacy was evaluated as described below. Initially, progression and response status were modeled as time-dependent variables in multivariate Cox proportional hazards models16 for OS (adjusted for age, sex, number of metastatic sites, and Eastern Cooperative Oncology Group performance status [PS], based on previously published work17), stratified by trial. This was done to assess whether patients who remained progression-free or had achieved a response at any time during treatment survived significantly longer than those who had progressed or had not responded to treatment. Subsequently, univariate and multivariate Cox proportional hazards models16 (adjusted for the same factors as above), stratified by the patient's trial, were used to assess the prognostic impact of PFS, best response, CR, and confirmed response on subsequent survival using landmark analyses18 at 2, 4, and 6 months. The hazard ratios (HRs), 95% confidence intervals (CIs), and P values are reported. In addition, model discrimination was evaluated using the concordance index19 for the landmark analyses, which is the recommended approach for comparing the predictive ability for different prognostic models of interest.20 A completely random prediction would have a concordance index of 0.5, and a perfect rule will have a concordance of 1.0. The model with the highest concordance index was considered to be the best predictor from the landmark analyses, where P values were used to demonstrate statistical significance.
Trial-level surrogacy measures were calculated for PFS, best response, CR, and confirmed response endpoints across the 3 randomized trials included in this study. These surrogacy measures quantify the association between the treatment effects on OS and the treatment effects on the putative surrogate endpoints. Given the small number of trials considered, the treating membership (ie, participating center) within each trial was considered the unit of analysis, which is a common practice in evaluating potential surrogate endpoints when the number of randomized trials is <6.21, 22 The randomized phase 2 study 932053 was analyzed as 1 separate unit because of its small size (N = 60). There were 38 participating centers across the 3 randomized trials, but 6 participating centers were combined with other centers, so that a minimum of 2 patients were included in each experimental and control arm for each center. Thus, 32 units were used in the analysis.
Trial-level surrogacy was measured in multiple ways, including conventional methods recommended by Sargent et al.23, 24 The association between treatment effects on OS and the putative surrogate endpoints of PFS, best response, CR, and confirmed response was evaluated by calculating the Spearman rank correlation coefficient, along with the R2 value from a weighted linear regression model, with weights equal to the sample size of the unit from which the data were derived. The treatment effects within each unit were estimated by calculating the log HRs and log odds ratios from Cox proportional hazards16 and logistic regression models, respectively, depending on the nature of the endpoint. In addition, the surrogacy of the time-to-event putative surrogate endpoint of PFS was quantified by a formal trial-level surrogacy measure, known as the Copula R2.25 Copula R2 is estimated from a bivariate survival model that models the putative surrogate endpoint and the true endpoint jointly. Both the weighted least squares R2 and the Copula R2 range in value from 0 to 1, with values close to zero suggesting poor surrogacy, and values close to 1 indicating high surrogacy. Concordance was measured between OS and the putative surrogate endpoints by assessing the percentage of units that reached the same conclusions. All tests were 2-sided, with P values <.05 denoting statistical significance. Statistical analyses were performed using SAS version 9.13 (SAS Institute, Cary, NC) and R version 2.7.1 software (R Foundation for Statistical Computing).
Patient Characteristics and Outcomes
Data included a total of 870 eligible patients with extensive stage SCLC who received first-line treatment. The median age was 64 years (range, 30-85 years). Sixty-two percent were male patients, 76% had a PS of 0 or 1, and 54% of patients had <2 metastatic sites at study entry. The median follow-up for the 14 patients still alive is 4.4 years (range, 0.5-12.0 years). Ninety-eight percent of patients have died, and 87% of patients had disease progression at the time of this analysis (12% died without disease progression). The overall median OS and PFS are 9.6 months (95% CI, 9.1-10.0) and 5.5 months (95% CI, 5.2-5.9), respectively. In addition, about 44% (95% CI, 41%-48%) of patients had a best response, 22% (95% CI, 19%-25%) had a CR, and 34% (95% CI, 31%-37%) had a confirmed response. See Table 2 for patient characteristics by trial and overall.
Approximately 13.1%, 32.8%, and 51.4% of patients experienced disease progression by 2, 4, and 6 months, respectively, after study entry. Of the 758 patients who progressed, 752 died, with a median time from progression to death of 3.3 months (95% CI, 3.0-3.6). About 30% of the patients were alive 6 months after progression, and only about 8% were alive 12 months after progression.
Patient-Level Surrogacy Measures
Time dependent models
As expected, patients who experienced a disease progression at any time had a much worse prognosis compared with patients who had not progressed (HR, 17.4; 95% CI, 13.4-22.5; P < .0001). Patients who responded to treatment had significantly improved OS, but the effect was more modest (HR, 0.64; 95% CI, 0.55-0.75; P < .0001) as compared with the disease progression model. In both models, we found that increased PS (1 or 2 vs 0) and increased age had significantly worse OS. For the response model, sex and the number of metastatic sites at baseline were also significant predictors of OS, with male sex and 2 or more metastatic sites having a worse prognosis.
Patients who were alive at 2 (n = 796), 4 (n = 728), and 6 months (n = 651) after study entry were included in the respective univariate landmark analysis; 765, 698, and 623 patients at 2, 4, and 6 months were included in the multivariate landmark analysis, because of some missing values for the number of metastatic sites variable.
Univariate model results
Although patients who had achieved a best response, CR, or confirmed response at all of the landmark time points had significantly longer subsequent survival (except CR and confirmed response at 2 months), the concordance indices for these models were low (0.50-0.56). This indicates the relative inability of these metrics to adequately and reliably discriminate between patients with different survival times. In contrast, models using PFS at the different landmark time points had better predictive ability. Specifically, patients alive and progression-free did significantly better in terms of subsequent survival compared with those who had progressed or died, with concordance indices (HR) of 0.54 (0.41), 0.59 (0.41), and 0.62 (0.39) for PFS status at 2, 4, and 6 months, respectively.
Multivariate model results
At 4 and 6 months, the best response, CR, and confirmed response endpoints were significant (P < .01; concordance indices, 0.59), with similar HR varying from 0.65 to 0.75 (Models 1-3, Table 3). The results for PFS were consistent with the univariate analysis (Model 4, Table 3). The PFS status at all of the landmark time points was significantly associated with OS (P < .0001). The HR (concordance indices) varied from 0.40 to 0.42 (0.60-0.65) for landmark times 2, 4, and 6 months from study entry. Although the PFS status at 6 months was the strongest predictor of subsequent survival (concordance index, 0.65; 95% CI, 0.63-0.68; 14.0% improvement from base model), the PFS status at 4 months was a strong predictor as well (concordance index, 0.63; 95% CI, 0.61-0.66; 10.5% improvement). In addition, the PFS status at 4 and 6 months showed significantly improved predictive abilities (ie, concordance indices) as compared with all response-based endpoints at 4 and 6 months, respectively (P < .001). See Table 3 for the detailed multivariate landmark analysis results.
Models adjusted for baseline age, sex, performance status, and number of metastatic sites.
At each time point, success is defined as the percentage of patients who had: 1) best response, 2) complete response, 3) confirmed response, and 4) alive and progression-free.
Percentage improvement in the concordance index from the base multivariate model of baseline age, sex, performance status, and number of metastatic sites at baseline, where the base model had concordance index values of 0.58, 0.57, and 0.57 for 2, 4, and 6 months, respectively, across the 4 models. The higher the percent improvement, the better the predictor as compared to the base model.
Progression-free survival at 4 and 6 months showed significantly improved predictive abilities (ie, concordance indices) as compared to all response-based endpoints at 4 and 6 months, respectively (P < .001).
Figure 1 shows the Kaplan-Meier curves for OS split by the landmark analysis subgroups: PFS status at 4 and 6 months. The median subsequent survival for patients who were progression-free at 4 months was significantly higher compared with patients who had progressed by 4 months (median OS, 7.7 vs 3.6 months; P < .0001). Similarly, the median subsequent survival for patients who were progression-free at 6 months was significantly higher compared with patients who had progressed by 6 months (median OS, 7.4 vs 3.0 months; P < .0001).
Trial-Level Surrogacy Measures
The relationships between the unit-specific log HRs comparing the experimental and the control arms for PFS versus OS are shown in Figure 2. The weighted least squares R2 was 0.79, the Spearman rank correlation coefficient was 0.75, and the Copula R2 was 0.80 for PFS. The weighted least squares R2 value of 0.79 indicates that the treatment effects observed on PFS explained 79% of the variance in OS. Furthermore, the Copula R2 value of 0.80 indicates moderate surrogacy. A sensitivity analysis excluding the outlier yielded similar results (weighted least squares R2 = 0.73, Spearman rank correlation = 0.73, Copula R2 = 0.75). For the treatment versus control comparisons within each participating unit, 29 of 32 units (91% concordance) obtained the same conclusions for both OS and PFS. For these 29 concordant units, 3 showed a significant difference in both PFS and OS (HRs in the same direction), and 26 units showed no significant difference in both of the endpoints.
Response based endpoints (best response, CR, confirmed response) had much lower associations for weighted least squares R2 and Spearman rank correlations (see Fig. 3 for best response, confirmed response plots) as compared with PFS. The weighted least squares R2 value (Spearman rank correlation coefficient) was 0.21 (0.52), 0.48 (0.50), and 0.40 (0.60) for best response, CR, and confirmed response, respectively. A sensitivity analysis was performed excluding the outliers and resulted in weighted least squares R2 values (Spearman rank correlation coefficients) of 0.55 (0.47), 0.26 (0.15), and 0.46 (0.46) for best response, CR, and confirmed response, respectively. The concordance between OS and response unit-level conclusions was 84% for best response, 80% for CR, and 81% for confirmed response.
In this study of 870 patients with previously untreated extensive stage SCLC, multiple putative surrogate endpoints for OS were assessed. Surrogacy was measured at both the patient and trial level. At the individual patient level, PFS status as early as 4 months was a strong predictor of subsequent survival, with 6-month PFS being the strongest predictor of subsequent survival as compared with response-based endpoints. At the trial level, PFS showed the highest levels of surrogacy as compared with response-based endpoints. Specifically, the treatment effect observed on PFS explained 79% of the variance in OS (with a Copula R2 = 0.80), whereas response based endpoints explained ≤48% of the variance in OS. PFS showed the most promise as a surrogate endpoint for OS at the patient and the trial level across all the statistical methods assessed. This consistency in results gives strong evidence that PFS is the most promising surrogate endpoint for OS as compared with response-based endpoints.
Given that the median PFS is around 4 to 6 months in this disease population, choosing a PFS-based endpoint around 4 to 6 months postregistration as the primary endpoint in extensive stage SCLC in the phase 2 setting is appropriate. This result is important, especially considering the current availability of novel agents26 to test in the phase 2 setting in extensive stage SCLC. It is important not only to assess these new treatments quickly in the phase 2 setting, but also to have greater confidence that the tested therapies may succeed at the phase 3 level.
Given the poor prognosis of this disease, with median survival of <1 year, one may wonder why it is important to identify a valid surrogate endpoint for OS. The 2 main reasons are 1) OS requires longer follow-up than a valid surrogate like PFS would require, and 2) OS is unable to effectively assess crossover effects and subsequent therapies after disease progression. A valid surrogate endpoint, like PFS, would be unaffected by the use of second-line therapy and can be assessed much sooner, leading to decreased cost and more timely approval of a new regimen.
Although, historically, the overall tumor response rate is the most common primary endpoint in the phase 2 setting in untreated extensive stage SCLC patients, the increasing availability of targeted therapies in SCLC research16 makes it a less appropriate endpoint. A large meta-analyses of 48 phase 3 first-line trials in extensive stage SCLC showed that only about 33% of the median survival differences between treatment arms could be explained by response rate differences between the arms.9 In addition, it has been shown that there is no clear relationship between response and survival in this disease.8 Finally, tumor response has been shown previously to have high levels of measurement error.7
PFS has shown promise as a potential surrogate for OS in other settings as well. One such study was performed in patients with advanced nonsmall cell lung cancer (NSCLC), where it was shown that PFS may be an acceptable surrogate for OS in future trials in advanced NSCLC. This study was a pooled analysis of 2838 patients randomized in 7 trials.27 Another study in advanced NSCLC showed that PFS was a better predictor of survival as compared with tumor response endpoints.28
Despite the impressive results in our study for PFS, there are certainly good reasons to curb our enthusiasm. First, only 870 patients were included in this study. Second, only 2 randomized phase 3 trials were included, which limits the strength of our conclusions and calls for further validation in a larger number of randomized phase 3 trials, especially for the trial-level surrogacy portion of the analysis. Third, only trials that used pre-RECIST criteria were included in this study, which limits our ability to generalize these results to studies that use the RECIST criteria.
In conclusion, PFS was strongly associated with OS at both the patient and trial levels and should be routinely used as the primary endpoint in the phase 2 setting. Our results also demonstrated that PFS may be a potential surrogate for OS, but further validation is needed using data from a larger number of randomized phase 3 trials. This result, if validated, would ultimately allow faster evaluation of drugs for extensive stage SCLC.
CONFLICT OF INTEREST DISCLOSURES
Supported by National Institutes of Health grant CA-25,224.