Closing the gap: Contribution of surgical best practices to outcome differences between high‐ and low‐volume centers for lung cancer resection

Abstract Background Clinical outcomes for resected early‐stage non‐small cell lung cancer (NSCLC) are superior at high‐volume facilities, but reasons for these differences remain unclear. Understanding these differences and optimizing outcomes across institutions are critical to the management of the increasing incidence of these cases. We evaluated the extent to which surgical best practices account for resected early‐stage NSCLC outcome differences between facilities according to case volume. Methods We performed a retrospective cohort study for clinical stage 1 or 2 NSCLC undergoing surgical resection from 2004 to 2013 using the National Cancer Database (NCDB). Surgical best practices (negative surgical margins, lobar or greater resection, lymph node (LN) dissection, and examination of > 10 LNs) were compared between the highest and lowest quartile volumes. Results A total of 150,179 patients were included in the cohort (89% white, 53% female, median age 68 years). In a multivariate model, superior overall survival (OS) was observed at highest volume centers compared to lowest volume centers (hazard ratio (HR) = 0.89; 95% CI, 0.82‐0.96; P = .002). After matching for surgical best practices, there was no significant OS difference (HR = 0.95; 95% CI, 0.87‐1.05; P = .32). Propensity score‐adjusted HR estimates indicated that surgical best practices accounted for 54% of the numerical OS difference between low‐volume and high‐volume centers. Each surgical best practice was independently associated with improved OS (all P ≤ .001). Conclusion Quantifiable and potentially modifiable surgical best practices largely account for resected early‐stage NSCLC outcome differences observed between low‐ and high‐volume centers. Adherence to these guidelines may reduce and potentially eliminate these differences.


| INTRODUCTION
There has been much discussion in recent decades about the relationship between facility type and volume and outcomes for non-small cell lung cancer (NSCLC) and other malignancies, with many studies finding that institutional case volume is associated with improved surgical outcomes. [1][2][3][4][5][6][7] This observation has major health-care practice and policy implications. As the population ages and uptake of computed tomography (CT)-based lung cancer screening increases, the number of early-stage, potentially resectable NSCLC cases is expected to grow. If optimal care requires treatment at a limited number of high-volume clinical centers, patients and their families may be required to travel extensively or even temporarily relocate. Such arrangements could exacerbate the financial impact of diagnosis and treatment if individuals need to pay for travel and housing, or miss additional workdays.
Case volume may serve as a proxy for multiple factors associated with improved outcomes. These may include patient differences, clinician differences, and process differences. 7 Specifically, less sick individuals may be more likely to be referred or travel to high-volume centers. Surgeons and other physicians achieve proficiency by performing a procedure many times. Medical centers that perform more lung cancer resections may have greater institutional memory and clinical experience-variables that are challenging to define, measure, and replicate. Additionally, high-volume centers may be more likely to employ certain surgical and medical techniques and protocols that directly produce better risk-adjusted outcomes.
Although many patient and clinician differences are difficult to ascertain, characterize, and control, process variables may be more readily addressed. Most widely accepted best practices for lung cancer surgery are readily defined, easily measured, and potentially feasible to benchmark across centers. Examples include the type of resection (lobar vs sublobar), 8 surgical margin status, 9 and the nature of lymph node (LN) examination. [10][11][12] To determine the extent to which these variables may account for improved overall survival (OS) at high-volume institutions, we examined surgical practices and clinical outcomes in a nationally representative sample, the National Cancer Database (NCDB).

| Data source and collection
Formed in 1989, the NCDB collects data from more than 1500 US hospitals that have been accredited by the American College of Surgeons Commission on Cancer (CoC) and the American Cancer Society, capturing an estimated 80% of newly diagnosed lung cancers in the United States. 13,14 We examined NCDB participant user files (PUF) from 2004 to 2013 for NSCLC cases. The PUF includes patients with a histological diagnosis of NSCLC (squamous cell, adenocarcinoma, sarcomatoid, adenosquamous, and other NSCLC). We identified cases with American Joint Commission on Cancer (AJCC) 8th edition 15 clinical stage 1 or 2 NSCLC who underwent surgical resection. Cases staged per previous AJCC editions were forward-staged as previously described 16 ; those that were unable to be forward-staged were excluded. Other histologic subtypes (carcinoid, other neuroendocrine histology, such as small cell lung cancer, and metastatic malignancies to the lung) were excluded.
We abstracted the following variables for each case: patient characteristics [age, sex, race, Hispanic origin, insurance status, income, education, Charlson-Deyo (CD) comorbidity score (0, 1, 2, ≥3) 17 ], disease characteristics [AJCC clinical stage, tumor-node-metastasis (TNM) edition number, primary site, laterality, histology, grade, size of tumor, and year of diagnosis], treatment characteristics [surgical margin status (positive/ negative), surgical procedure of the primary site (wedge resection, segmental resection, lobectomy, and pneumonectomy), number of regional LNs examined, regional LN dissection performed (yes/no), administration of radiation therapy (yes/no), administration of chemotherapy (yes/no)], facility characteristics [location (geographic region) and total number of NSCLC stage 1-2 surgical cases during the study period], and clinical outcome measures [last contact or death, and PUF vital status]. We defined surgical best practices as achievement of negative surgical margins, performance of lobar or greater resection, examination of >10 LNs, and performance of regional LN dissection (yes or no), consistent with current clinical guidelines. 18,19

| Statistical analysis
We abstracted total number of lung cancer resections performed at each NCDB facility in the most recent year of analysis, as described previously, 4 and used this metric to define annual surgical volume for each facility. We then calculated summary statistics of annual surgical resection volumes across facilities and used quartile estimates to define low-and high-volume facilities. Centers in the lowest quartile were determined to be low-volume (<6 annual NSCLC resection cases) and those in the highest quartile were deemed highvolume (>34 annual NSCLC resection cases). For survival analyses, we defined OS as the time from definitive surgical procedure to death from any cause or last contact. Cases without a known date of death were censored at the last date of known follow-up. Kaplan-Meier OS curves were generated to visualize OS. Cox regression models and Wald tests were used to compare OS differences and estimate hazard ratios in both univariate and multivariate analyses. Because our sample size is sufficiently large, we excluded all records with missing data. We did not use any imputation methods in this study. To rule out the effect of potential confounders, propensity score matching was used to balance patient groups with different demographic and clinical characteristics. 20 All variables listed in Table 1 were considered in propensity score matching to minimize the effect of collinearity. To ensure the comparability between Model 1 (propensity score matching on clinical and demographic variables) and Model 2 (propensity score matching on clinical, demographic, and surgical best practice variables), we used fixed caliper = 0.0001 and ratio = 1 for both propensity score matching processes. We included in the analyses all demographic and clinical data variables available in the NCDB considered to have potential importance in lung cancer resection outcomes. Ratio = 1 was chosen to reflect the study design of 1-to-1 matching. To select an appropriate caliper, we scanned a list of descending calipers to compare the stability of the matching results. We chose caliper = 0.0001 as there was no substantial difference observed when using smaller caliper values. All Pvalues were two-sided; results were considered significant at P < .05. All analyses were performed with R software, version 3.4.2. 21 We used R packages "survival" (version 2.44-1.1), "survminer" (version 0.4.3), and "MatchIt" (version 3.0.2).

| Demographics, clinical characteristics, facility characteristics, and surgical best practices
From NCDB PUF years 2004-2013, we identified an initial cohort of 1,163,465 NSCLC cases. We then limited our study sample to AJCC 8th edition clinical stage 1 or 2 NSCLC that underwent surgical resection, resulting in a study sample of 150,179 (12.9%) cases treated at 1,264 hospitals ( Figure 1). Median age was 68 years, 89% were white, and 53% were female. Across institutions, median annual volume of stage 1 or 2 NSCLC surgical resections in 2013 was 16.
All demographic and clinical characteristics differed significantly according to facility surgical volume ( Table 1). The highest and lowest volume institutions performed more sublobar resections compared to other centers. Highvolume institutions were more likely to examine greater than 10 LN, perform LN dissection, and report negative surgical margins.

| Clinical outcomes
In univariate analysis, using the lowest volume quartile as reference, we observed further improvements in outcomes with each increase in facility case volume (Table 2). These trends were observed in the overall study cohort as well as stage 1 and 2 subgroups. In the overall cohort and stage 1 subgroup, after base matching for 16 clinical and demographic confounders (Model 1), OS was statistically equivalent in the first, second, and third quartiles across stages, but remained superior in fourth quartile. For stage 2 NSCLC, there was no significant difference in OS across quartiles after confounder matching (Model 1). After controlling for surgical best practices (Model 2), numerical differences in OS were further reduced. There was no significant difference in OS across cohorts in the overall cohort or stage 2 subgroup. In the stage 1 subgroup, only the highest quartile institutions had improved OS.
We compared HRs with and without best practice propensity matching to numerically estimate the influence these variables have on outcome differences, with hazard ratio 1 considered as equivalent outcome, determined as follows: surgical best practices influence = ((1 − HR 1 ) − (1 − HR 2 ))/ (1 − HR 1 ) × 100. Using this approach, in the overall cohort after multivariate matching, surgical best practices accounted for 54% of the numerical OS difference between the lowest volume compared to the highest volume centers. Figure 2 shows Kaplan-Meier plots of OS for the overall cohort between facility volume quartiles for univariate (Panel A), multivariate (Model 1, Panel B), and surgical best practice-matched (Model 2, Panel C) cohorts. P values and hazard ratios are propensity score-adjusted for the matched cohorts. Subgroup analyses for stages 1 and 2 are shown in Figure 3

| DISCUSSION
For decades, it has been observed that high-volume centers have improved surgical outcomes for early-stage NSCLC. In the present study, we sought to identify specific and potentially modifiable factors accounting for these differences. In this national cohort of more than 150 000 patients with surgically resected clinical stage 1-2 NSCLC treated at more than 1200 facilities, we again noted that OS was superior at high-volume centers, even after adjusting for more than one dozen demographic and clinical factors. As noted in earlier landmark studies, 4 the greatest outcome differences occurred between the lowest-and highest volume centers, and the current study demonstrated a comparable trend for volume-outcome association for OS. However, when we incorporated surgical best practices into the analysis, the magnitude of these outcome differences declined substantially and no longer had statistical significance. Indeed, OS curves were essentially overlapping. These findings suggest that greater dissemination of and adherence to practice guidelines may largely close the outcome gap between large-and smallvolume facilities.
In this study, we selected surgical best practices that are widely endorsed, 22 readily recorded and assessed, and have the T A B L E 2 Hazard ratio estimates prior to propensity matching (original cohort), after matching for case characteristics (Model 1), and after further matching for surgical best practices (Model 2) This is the cohort matched on clinical and demographic variables between surgery volume groups, using propensity score with caliper = 0.0001 and ratio = 1.

Cohort
potential for widespread implementation and benchmarking for quality improvement: type of resection, LN examination, and surgical margin status. As previously shown, 9,22 each of these variables was associated with clinical outcomes in this study including a previous analysis demonstrating that combining surgical quality measures improves OS. 23 Among them, surgical margin status had the strongest association, with a 50% reduction in the risk of death for cases with negative margins. Regional LN dissection was associated with a 30% reduction in the risk of death, while lobar or greater (anatomical) resection was associated with a 25% reduction in the risk of death. Highvolume clinical centers were more likely to achieve negative surgical margins and to perform an adequate LN dissection. Each surgical best practice was independently statistically and clinically important across the overall cohort and both stage 1 and 2 subgroups. This is expected and consistent with previous studies, and reinforces the importance and appropriateness of guideline-directed care for NSCLC resections. Interestingly, performance of LN dissection was most strongly associated with improved OS for the stage 2 cohort, being associated with a 50% reduction in mortality. This finding could be related to the removal of occult metastasis with LN dissection or upstaging and appropriate treatment of LN disease when discovered.
Importantly, it seems feasible to benchmark and export these metrics to improve surgical outcomes across centers. For example, both provision of a surgical LN specimen collection kit and a novel, more thorough pathologic gross dissection method have been shown to significantly improve rates of adequate LN examination independently and when performed together. 24,25 It has also been shown that multidisciplinary lung cancer care can be implemented in a community health-care setting. 26 Nevertheless, other recommendations may be more difficult to define and thus more challenging to transmit. The achievement of negative surgical margins could reflect tumor location and other attributes, and does not reflect an a priori decision such as LN dissection or resection type. Furthermore, although radiation therapy for positive margins is generally recommended as a treatment option for cases with positive margins, it has not been validated in population analyses. 27 Even after adjusting for surgical best practices, a modest but significant OS benefit persisted for stage 1 cases. The precise reasons for this observation are not clear. Stage 1 lung cancer represents a widely heterogeneous population, ranging from poorly differentiated, invasive cancers that likely have distant micrometastatic disease at diagnosis to incidentally detected, small, non-or minimally invasive tumors that might never impact patient quantity or quality of life even if left untreated. One possibility is that some clinical stage 2 cases derived OS benefit from removal of occult LN metastasis that was not present in clinical stage 1 cases, leading to increased effect of surgical quality measures. It is also possible that stage 1 cases at high-volume centers were more likely to represent particularly low-risk (based on size and/or histology) tumors. These facilities performed a greater proportion of sublobar resections, which are recommended by numerous expert guidelines for cases such as pure ground-glass opacities or adenocarcinoma in situ under 2 cm. 19,[28][29][30] Notably, sublobar resection was also performed more frequently at the lowest volume centers. We are unable to determine whether sublobar resections were performed (a) following guidance for the lowest risk tumors, (b) because the patient was not a candidate for lobectomy, or (c) because the treatment team was unaware of surgical best practices.
One previously proposed strategy to improve surgical outcomes is to limit who performs procedures. For NSCLC, it has been suggested that complex resections be performed only by individuals and facilities meeting minimal annual volume thresholds, specifically 20 per surgeon and 40 per facility. 31 The facility cut-off suggestion is consistent with our study finding that the highest quartile of facilities perform >34 annual NSCLC resections, a threshold comparable to that of high-volume centers in earlier studies. 4 Another recent analysis revealed that patients undergoing resection at top-ranked cancer centers have better postoperative outcomes compared to those receiving care at their affiliated centers. 32 In that study, there were large mean volume differences between affiliated (8 cases/year) and top-ranked (77 cases/year) facilities. The current analysis suggests that clinically meaningful outcome differences could potentially F I G U R E 2 Kaplan-Meier overall survival for overall cohort prior to propensity matching (original cohort), after matching for case characteristics (Model 1), and after further matching for surgical best practices (Model 2) | 4145 be minimized by promoting guideline-directed surgical care, rather than restricting access to only high-volume surgeons and facilities. A recent analysis of stage IIIA NSCLC found that in these cases, patients being treated at high-volume facilities were more likely to receive surgical resection and had improved OS compared to low-volume facilities. 33 Interestingly, another recent analysis revealed the improved OS trend at high-volume facilities persisted for an analysis of stage IV NSCLC, suggesting that factors other than surgical techniques are involved. 34 A limitation of this study is the nature of available clinical data. While the NCDB provides extensive data on NSCLC cases, several variables relevant to clinical outcomes are not collected. These include the American Society of Anesthesiologists (ASA) physical status classification, smoking status, pulmonary function, weight, body mass index, performance status, and living arrangement. 5,13 Surgeon type and case volume were also not available, both of which may be associated with complications and mortality rates. 7

Surgical Metric Matching Stage II
that may influence outcomes, such as preoperative positron-emission tomography (PET)/CT, brain imaging, and bronchoscopy. 37 Some patients may have had comorbidities that precluded lobectomy, and therefore sublobar resection may have represented best surgical practice in those cases. Charlson comorbidity score propensity matching and the finding that the rates of lobar or greater resection were comparable between the lowest and highest facilities make this limitation less important to study outcomes. It is also possible that surgical best practices could represent surrogate markers for one of the above variables that are known to influence outcomes or other unknown variables, for example other unmeasured surgical techniques or quality of pathological examination. Although the current analysis does not provide details of surgical technique, such as video-assisted or robotic, these approaches have been shown to yield comparable survival to thoracotomy and therefore may not alter our findings. 6,38,39 In conclusion, we have found that modifiable surgical best practices account for a meaningful proportion of outcome differences between high-and low-volume centers for resectable early-stage NSCLC. One response to these findings is to consolidate early-stage NSCLC surgical treatment at selected high-volume facilities, as has been suggested. 31 However, growing case numbers, geographic distribution, and patient preferences and circumstances may not permit such an approach in many cases. For instance, smoking rates and lung cancer diagnoses are generally higher in rural areas, 40,41 which are less likely to have high-volume clinical centers. Because it may not always be practical to consolidate treatment of the growing number of early-stage NSCLC at select sites nationwide, it seems reasonable to continue and expand efforts to promote best practices across centers to minimize outcome disparities.

ACKNOWLEDGMENTS
The authors thank Ms Dru Gray for assistance with manuscript preparation. The authors thank Helen Mayo, MLS, from the UT Southwestern Library, for assistance with literature searches. The authors thank the National Cancer Database (NCDB) project for collecting this invaluable information and making it publicly available. The NCDB is a joint project of the Commission on Cancer of the American College of Surgeons and the American Cancer Society. The data used in the study are derived from a de-identified NCDB file. The American College of Surgeons and the Commission on Cancer have not verified and are not responsible for the analytic or statistical methodology employed, or the conclusions drawn from these data by the investigator.