Accuracy of models to prognosticate survival after surgery for pancreatic cancer in the era of neoadjuvant therapy

Outcomes for pancreatic adenocarcinoma (PDAC) remain difficult to prognosticate. Multiple models attempt to predict survival following the resection of PDAC, but their utility in the neoadjuvant population is unknown. We aimed to assess their accuracy among patients that received neoadjuvant chemotherapy (NAC).


| INTRODUCTION
Over 48 000 people in the United States die of pancreatic ductal adenocarcinoma (PDAC) each year. 1 By 2030, this disease is expected to overtake colon cancer as the second most common cause of cancer-related death. 2 Despite aggressive treatment, 5-year survival for patients undergoing surgery is less than 30%. 3,4 There has been substantial interest in improving prognostication for PDAC. In particular, the development of a prognostic system for risk stratification capable of discriminating patients with worse prognosis from those with better prognosis has been the focus of many research groups. [5][6][7] An accurate prognostic model has immense value in clinical decision-making by improving patient-physician counseling and potentially allowing for more individualized treatment approaches. Very few models have undergone external validation, rendering their generalizability unclear. In lieu of a reliable model, the American Joint Committee on Cancer (AJCC) staging system is commonly used for prognostication, despite its limited prognostic ability. 9,10 Recent data suggest that patients with resectable or borderline resectable PDAC have better overall survival (OS) after neoadjuvant treatment. [11][12][13][14] Since 2014, the proportion of patients who receive neoadjuvant chemotherapy (NAC) in the United States has increased by over 20%. 15,16 A neoadjuvant approach to PDAC is now the standard of care at many centers.
However, the accuracy of existing prognostic systems among patients who have undergone NAC is unknown. We aimed to determine the accuracy of current prognostic systems for patients with PDAC who had undergone NAC followed by surgical resection.

| Patient selection
We performed a retrospective analysis using data from all consecutive patients who received NAC and underwent surgical resection of PDAC at six high-volume academic medical centers across the United

| Model selection
To identify appropriate models for analysis, we began by assessing the models identified by Stijker et al. in their 2019 systematic review. 8 We updated the list of eligible models and sought any overlooked models by performing a review of the literature by searching the PubMed, Web of Science, and Google Scholar databases for the following terms: "pancreas," "pancreatic adenocarcinoma," "prognostic," "prediction," "survival," "model," "calculator," and "nomogram." We included only models that had undergone external validation. Assessing a model's predictive accuracy using an external dataset is considered a necessary step in model development to avoid overly skewed predictive tools. 25 We identified two candidate models that had undergone external validation: the MARCINAK ET AL. underwent pancreaticoduodenectomy. 26 The MSKCCPAN has been externally validated by three studies: one using data from a hospital in the United States, one using data from a hospital in the United Kingdom, and a third using data from a hospital in the Netherlands. [28][29][30] It incorporates the following variables: age, sex, weight loss at presentation, back pain at presentation, portal vein resection Additionally, we used the AJCC staging system (eighth edition) as a comparative system of survival prediction. The AJCC system is designed to use easily reproducible disease-specific variables to divide patients into stage groups that bear prognostic significance.
As a result, this system is often used by clinicians to counsel patients about their disease and make treatment decisions. The AJCC staging system for exocrine pancreas neoplasms was updated in 2016 to reflect evidence that both primary tumor size and a higher number of involved lymph nodes carry prognostic implications. 31,32

| Model assessment
All patients from the CPC database with complete information, including the date of surgery and updated vital status, were included.
As the MSKCCPAN does not account for patients with a complete pathologic response to neoadjuvant therapy (ypT0), these patients were treated as pT1 with a tumor size of 0 cm. As these tumors are ungradable, they were treated as well differentiated (G1). Other patients missing tumor grade were assigned the median grade, G2.
For patients missing tumor size, the size was imputed based on the mean tumor size for other patients in the database with the same T stage. Missingness in the number of positive or negative lymph nodes was imputed using the median number of lymph nodes for other patients in the database with the same N stage. Patients missing three or more data elements were excluded. Because patients who died within 30 days of surgery were considered to have suffered a postoperative complication rather than a disease-specific outcome, these patients were also excluded. The CPC database does not include "back pain" as a collected variable; therefore, this variable was not included in the main analysis. The overall contribution of "back pain" to the MSKCCPAN's nomogram is low. However, to estimate the effect of the exclusion of back pain from the model, we performed a manual chart review at one institution for the presence or absence of back pain at presentation and performed a sensitivity analysis including the complete set of variables.

| Statistical methods
Descriptive statistics were performed using frequency for categorical variables and mean with standard deviation for continuous variables.  33 Second, we divided the cohort into quartiles based on PI, thereby creating four risk groups: "low," "low-moderate," "moderate-high," and "high." We created a survival curve for each group using the Kaplan-Meier method and compared the curves using the log rank test.
In contrast, a model's calibration compares the predicted survival probability of the cohort at a given time point to actual survival.
We assessed the MSKCCPAN's calibration by dividing the cohort into the same four groups: "low," "low-moderate," "moderate-high," and "high." Calibration was numerically assessed using the Brier score, which is calculated by averaging the squared error between the probability of an event and the outcome for each patient. A Brier score ranges from 0 to 1, with lower scores indicating better performance.
For the AJCC system, we first calculated the Uno C-statistic to assess concordance at 12-, 24-, and 36-month time intervals. We subsequently created Kaplan-Meier curves of each AJCC stage and compared the curves using the log rank test. Because the AJCC stage does not provide survival probability at a given time point, we did not assess calibration for the staging system.
Statistical significance was defined as p ≤ 0.05 for a two-tailed test. All analyses were performed using R, version 4.1.1 (www.rproject.org).  We performed a Kaplan-Meier analysis of the cohort divided into risk groups according to PI. 34 The resulting Kaplan-Meier survival curves are shown in Figure 2. Notably, the model could not discriminate between the two moderate-risk groups (p = 0.43) but demonstrated sufficient discrimination between the "high"
For the AJCC staging system, a Kaplan-Meier analysis was performed with the cohort divided by stage. The resulting Kaplan-Meier curves are shown in Figure 3. Using the log rank test, there was insufficient discrimination between all neighboring stage groups (all p > 0.5; Table 3).

| Calibration
The calibration plots for the four risk groups at 12-, 24-, and 36month survival intervals are shown in T A B L E 3 Discrimination of disease-specific survival between AJCC stages by log rank test. used as a means of selecting patients with favorable tumor biology to undergo surgery, this cohort excludes those patients whose prognosis was deemed poor. Therefore, the CPC cohort likely represents solely those patients with a more favorable prognosis, which could partially explain the discrimination and calibration results. However, as many patients now receive NAC, this study highlights the need for better prognostic models in this population.

| CONCLUSIONS
The present analysis demonstrates that the accuracy of current predictive models and staging systems in patients with PDAC undergoing surgical resection after NAC is severely limited. There is a clear need for improved risk stratification in patients with PDAC.
An accurate risk stratification model would produce reliable survival predictions for all treatment pathways (NAC plus surgery, surgery alone, medical therapy alone, or palliative care). A substantial advancement in outcome prediction is essential to improve shared decision-making, which in turn could have a substantial impact on survival in this highly lethal disease.