Assessment of variation in 30‐day mortality following cancer surgeries among older adults across US hospitals

Abstract Background While public reporting of surgical outcomes for noncancer conditions is common, cancer surgeries have generally been excluded. This is true despite numerous studies showing outcomes to differ between hospitals based on their characteristics. Our objective was to assess whether three prerequisites for quality assessment and reporting are present for 30‐day mortality after cancer surgery: low burden for timely reporting, hospital variation, and potential for public health gains. Study Design We used Fee‐for‐Service (FFS) Medicare claims to examine the extent of variation in 30‐day cancer surgical mortality between 3860 US hospitals. We included 340 489 surgeries for 12 cancer types for FFS Medicare beneficiaries aged ≥66 years, 2011‐2013. Hierarchical mixed‐effects logistic regression models adjusted for patient and hospital characteristics and with a random hospital effect were fit to obtain hospital‐specific risk‐standardized mortality rates (RSMRs) and 99% confidence intervals (CI). We calculated a hospital odds ratio to describe the difference in mortality risk for a hospital above vs below average quality and estimated the potential mortality reduction. Results The median number of cancer surgeries per hospital was 34. The median RSMR overall was 2.41% (99% CI 2.28%, 2.66%). In aggregate and for most cancers, variation between hospitals exceeded that due to differences in patient and hospital characteristics. For individual cancers, relative differences exceeded 20% in mortality risk between patients undergoing surgery at a hospital below vs above average quality, with the potential for an estimated 500 deaths prevented annually given hypothetical improvements. Conclusion Quality measurement and reporting of 30‐day mortality for cancer surgery is worthy of consideration.


| INTRODUCTION
While decades of research have raised concerns about the inconsistent quality of cancer surgery in the United States, 1-8 measures of surgical outcomes, so prevalent for other conditions, have not been included specifically for cancer surgery in national public reporting efforts. Policymakers and payers have embraced quality measurement and reporting as potent means by which to improve patient outcomes, both through feedback and payment incentives that can spur institutional quality improvement efforts, and by patients using reported measures when they choose where to receive their care. [9][10][11][12] Numerous studies have suggested that outcomes are variable between hospitals. Strong correlations have been documented for both short-and long-term outcomes following specific cancer surgeries in relation to hospital factors such as surgical volume, teaching status, and geographic region. [1][2][3][4][5][6][7][8]13 Surgery is a primary modality of cancer treatment: over 87% of patients with breast and colorectal cancers, 52% with lung cancer, and 24% with prostate cancer undergo surgery. 14 Despite consistent evidence of outcome variation by hospital characteristics, the Centers for Medicare & Medicaid (CMS) Hospital Compare website, a national program that publicly reports hospital-specific measures of quality, 15 does not include surgical outcome measures for cancer. The CMS prospective payment system (PPS) exempt cancer hospitals reporting program also does not include outcomes from cancer surgery. By fiscal year 2021 the only planned surgical outcome measure for the PPSexempt hospital reporting program is procedure-specific surgical site infection. 16,17 Even if included for PPS-exempt cancer hospitals, there are only 11 of these facilities, and studies generally suggest their outcomes are above average, so measuring their performance only has limited potential to improve cancer outcomes. 17,18 Other programs such as the American College of Surgeon's National Surgical Quality Improvement Program collect surgical outcomes data for quality improvement, but hospital-level public reporting is limited. [19][20][21] At least three conditions should be met if comparative hospital performance measurement and reporting of cancer surgical outcomes is to be pursued: the outcome can be captured efficiently and rapidly; there are important variations in the outcome at baseline; and the number of people potentially affected if the outcome were to improve is sizable. 20,22,23 We hypothesized that cancer surgery would be a promising area for measurement, providing the motivation to examine these three conditions using the foundational measure of 30-day mortality. We examined these conditions using a national source, the Fee-for-Service (FFS) Medicare claims dataset, with 3,860 US hospitals performing cancer surgery between 2011 and 2013 for 12 cancer types.

| Data source and cohort
The national FFS Medicare 100% Research Identifiable Files were used for this analysis. These include inpatient, outpatient, carrier, durable medical equipment, hospice, home health, skilled nursing facility, Part D claims, vital status, and master beneficiary summary files. Using previously published methods, beneficiaries aged 66 years or older undergoing a cancer surgery in 2011-2013 were identified. The analysis was limited to cancers where ≥80% of procedures in Medicare claims alone matched those in the gold standard SEER-Medicare dataset for condition and procedure, and for which false identification of a cancerdirected surgery occurred at a frequency less than 3%. 24 Twelve categories of cancer surgeries qualified: bones and joints; breast; colorectal; gastroesophageal; kidney; liver; lung; other gynecologic; ovary; pancreas; prostate; and sarcoma. Inpatient surgeries were identified using ICD-9-CM procedure codes and HCPCS codes were additionally used to identify outpatient surgeries for breast cancer. If a patient had multiple cancer surgeries, the first for a given cancer site was included, and only surgeries for a second cancer site occurring more than 30 days following the prior surgery were included. The analysis was limited to patients with at least 12 months of continuous enrollment in Parts A and B of FFS Medicare preadmission required for comorbidity assessment, and 1 month of coverage postdischarge date or through death if the patient died within 30 days. Those discharged against medical advice or discharged after November 30, 2013 were excluded.
Surgeries were attributed to their hospital by their CMS Certification Number (CCN), as recorded at the time of surgery. We considered each CCN as a unique hospital, following CMS's reporting approach, 25,26 recognizing that hospital ownership and mergers might have occurred over the study period. 27 Hospital characteristics were obtained from the American Hospital Association (AHA) survey (2012). 28

| Statistical analyses
We assessed hospitals' cancer surgical volume, evaluated the extent of hospital variability in risk-adjusted 30-day all-cause mortality after cancer surgery, and estimated the number of lives saved were poor-performing hospitals to improve their quality. All analyses were performed by cancer site and in aggregate. A two-sided P < .01 was considered the threshold for significance.
We conducted four sets of analyses. To test whether there was underlying variability, our primary analysis adjusted for patient characteristics that could predict 30-day mortality. A second analysis also included hospital-level characteristics to determine if variation was explained by hospital descriptors that are already available. We also ran these two models focused only on surgeries that were nonemergent. Patient characteristics adjusted for included age, sex, race, and Charlson comorbidity score (0, 1, ≥2) 29,30 in the year prior to surgery, and cancer site (for aggregate analyses). Although FFS Medicare claims do not include clinical information regarding cancer stage, we previously demonstrated that risk adjustment was not sensitive to the inclusion or absence of this information. 31 Hospital characteristics came from the AHA database including hospitals' location (rural/urban), organizational control (not-for-profit, private, government), and teaching status (defined as a member of the council of teaching hospitals of the American Medical Association). 28 These characteristics were not available for 84 hospitals (2.2% of the sample) and were excluded from analyses that depended on these characteristics. Hospital volume was calculated as the total number of inpatient and outpatient surgeries performed over the 3-year study, dichotomized at the 75th percentile.
Hierarchical mixed-effects logistic regression models with a random effect for hospital were fit to obtain adjusted mortality rates. This approach accommodates the hierarchical structure of the data accounting for the correlation of outcomes among patients from the same hospital. 32 Hospitalspecific risk-standardized mortality rates (RSMRs) were calculated as the predicted value, which was derived from the random effects model, divided by the expected value, which was derived from a logistic regression model without a random hospital effect. This ratio was then multiplied by the national 30-day mortality rate (y) to obtain a relative measure of performance: a RSMR greater than y indicates that the performance at a given hospital was poorer than expected, whereas one lower than y indicates that the performance at a given hospital was better than expected. Utilizing this approach, low volume hospitals that have empirically poor performance will have a RSMR closer to the mean than a larger volume hospital with equally poor empiric performance. As our intent was to determine the extent of variation, the use of this approach was conservative.
We assessed between-hospital variation in 30-day mortality by examining the distribution of RSMR's at the hospital level. We tested for variation using a Wald test of the random effect. A two-sided P < .01 was considered the threshold for significance for a conclusion that there was underlying variation. We further quantified the variation between hospitals by computing hospital odds ratios (hORs) and 99% confidence intervals based on the standard deviation (SD) of the random effect. With a single covariate X and a hospital random effect ω, where i indexes patients and j indexes hospitals, the logodds of 30-day mortality are modeled by: To compare the risk of mortality for a patient treated at a hospital whose mortality is 1 SD above average (ie, 1 = +1 × SD) to a patient with the same covariates treated at a hospital whose mortality is 1 SD below average (ie, 2 = −1 × SD), the OR comparing mortality between these two patients is e 2SD . The hOR represents the odds of 30-day mortality given that a patient underwent surgery at a hospital below average (+1 SD) quality vs above average quality (−1 SD). 4 Based on the distribution of RSMRs, if hospitals performing in the upper quartile were to improve performance to the median, the reduction in mortality represents the estimated number of lives saved. We calculated the estimated number of lives saved from the models of nonemergent surgeries overall and by cancer site.

| Approvals
Centers for Medicare & Medicaid approved the use of the FFS Medicare files for this analysis, which was deemed exempt research by the Memorial Sloan Kettering Cancer Center Institutional Review Board. Analyses were performed in SAS (Version 9.4, Cary, NC).

| RESULTS
Across all cancer sites, there were 340 489 surgeries performed for FFS Medicare beneficiaries at 3860 US hospitals. Most patients were female (66.8%), white (88.5%), and had one or more comorbidities (55.3%). For breast cancer, 78.9% of surgeries were outpatient. Emergent surgeries accounted for between 0.4% (prostate) and 20.6% (colorectal) of surgeries. The 30-day mortality rate overall was 2.4%; for breast and prostate cancer, it was less than 1%. The mortality rate was over 5% for colorectal and gastroesophageal cancer surgeries (Table 1).
Among the 3776 (97.8%) hospitals with available hospital characteristics, the median number of cancer surgeries overall was 34 [interquartile range (IQR): 9, 108]. For hospitals performing at least one surgery, the median number was fewer than 10 for each cancer site, except for breast (median 17) and colorectal (median 14). Most surgeries were performed at not-for-profit hospitals and in urban locations. Bones and joints, liver, and sarcoma had the highest proportions of surgeries performed at teaching The flag for emergent admissions is only available on inpatient claims; it was assumed that outpatient surgeries (breast cancer) were nonemergent. hospitals (>30%; Table 2). When including hospitals with nonemergent surgeries only, 23 (0.6%) hospitals were excluded (between 0.3% and 8.3% of hospitals excluded by site; Table A1). The median hospital RSMR (Table 3) across all cancer sites was 2.41% (99% CI 2.28%, 2.66%). Breast had the lowest RSMR (median 0.24%) and gastroesophageal the highest (median 5.72%). The median hospital RSMRs were robust to additional adjustments of hospital characteristics. Where estimable, the RSMRs were generally lower in the models that excluded emergent surgeries. Based on the Wald test of the random effect, there was statistically significant variation across hospitals for cancers in aggregate and for breast, colorectal, gastroesophageal, kidney, lung, ovary, and pancreas cancers. Models for bones and joints, prostate, and sarcoma were not estimable. Inclusion of hospital characteristics explained some variability, and all cancers in aggregate, as well as breast, colorectal, lung, and ovarian cancer maintained statistically significance. The results from the model adjusting for patient characteristics for all surgeries were comparable to the results from the model adjusting for patient characteristics for nonemergent surgeries only, except for kidney cancer ( Table 3).
The hOR for all cancer sites in aggregate was 1.44 (99% CI 1.42, 1.45), indicating that the odds of 30-day mortality for a patient undergoing surgery at a hospital whose performance is below average are 44% higher than at a hospital whose performance is above average. Breast cancer had the largest hOR for all models; the odds of mortality were three-to four-fold higher at a below average hospital as compared to an above average hospital, indicating that despite a low mortality rate (0.2%) and median hospital RSMR (0.24%) there are differences in the patients' risk of 30-day mortality between hospitals. Considering all models, pancreas had the second largest hospital odds ratio (hOR 1.89, 99% CI 1.82, 1.98, adjusted for patient characteristics) and gastroesophageal and other gynecologic cancers had the smallest (both hORs: 1.21, 99% CI 1.20, 1.22, adjusted for patient and hospital characteristics; Table 3).
The model adjusted for both patient and hospital characteristics (Table 4) showed that older patients, male patients, and those with a higher number of comorbidities had higher odds of mortality. Patients treated at hospitals with surgical volumes above the 75th percentile and teaching hospitals had lower odds of mortality. Patients treated at government hospitals had higher odds of mortality compared to those treated at not-for-profit hospitals. Patients' odds of mortality did not differ by patient race or their hospitals' location. For several of these characteristics, statistical significance of the output differed by cancer site.
Under a scenario in which the performance of hospitals in the upper quartile of RSMR (≥1.91%) was instead performed at a hospital with the median RSMR (1.73%), an estimated 558 lives could be saved each year among FFS Medicare beneficiaries ( Figure 1, Table A2).

| DISCUSSION
Quality measurement and reporting have the potential to improve outcomes through multiple mechanisms, including performance improvement within hospitals and shifting of patients to better performing ones. [33][34][35] We conducted a study of 30-day mortality following cancer surgery among FFS Medicare beneficiaries to determine if three necessary prerequisites for quality reporting are present: ease of measurement capture and reporting, statistical variation in performance, and important potential health gains through change in performance. 22,23 These prerequisites are in line with the goals of CMS' Meaningful Measures Initiative for prioritizing areas for measurement and improvement. 20 Our analysis provided mixed evidence regarding the promise of reporting quality measures for cancer surgery by cancer site. The outcome of 30-day mortality can be efficiently measured at scale for cancer surgeries using FFS Medicare claims. However, for all surgeries in aggregate, 25% of hospitals performed fewer than 10 cancer surgeries and for 9 cancer sites, 25% of hospitals performed less than 3 site-specific cancer surgery over 3 years. Low sample sizes for many hospitals pose an important problem for using quality measurement to improve cancer surgery performance at the population level. While incorporating additional years of data would increase the number of hospitals whose outcomes may be reliably reported, the tradeoff is that it reduces the timeliness of the measure. The challenge of estimating outcomes for hospitals with a low number of cases is not unique to cancer surgery but is central to measurement decisions across many health care disciplines. 32,[36][37][38] Across all cancers in this study, as well as for most of the individual cancer sites, we observed variation between hospitals in their cancer surgical outcomes that exceeds variation due to differences in observed patient characteristics. This was supported by findings from three approaches that were mostly robust to additionally adjusting for features of the hospital and to the exclusion of emergent surgeries. Our findings are consistent with a recent study by Haneuse, et al that examined variation in 30-day mortality following cancer surgery at 351 hospitals in California. 4 The authors studied a younger population with a lower observed mortality rate of 0.6% (postdischarge only). But the hospital odds ratio in that study of 1.84 (95% CI 1.44, 2.34) from the model adjusted for patient and census-based characteristics is comparable to the hospital odds ratio of 1.44 (99% CI 1.42, 1.45) we found in our model adjusted for patient characteristics. Haneuse et al examined cancer surgeries in aggregate for the primary analysis. We found that variations in individual cancer surgeries might justify reporting at the individual cancer level, beyond solely in aggregate. Our findings are also consistent with a study by Chui and colleagues that assessed the potential impact of reporting surgical mortality for lung, esophagus, gastric, and colon cancer procedures. 39 The authors used the National  Cancer Database, a clinical database that draws from hospital registry data from Commission on Cancer-accredited facilities which are de-identified and intended for internal quality improvement. 40 Our study builds on this work by testing these assumptions in a dataset that identifies hospitals, which is required for public reporting. More generally, the hospital variation by cancer site we found is consistent with prior studies examining cancer-specific mortality outcomes by hospital characteristics including volume. 1,3,5,8 Adjusted 30-day mortality rates following esophagectomy for instance ranges from 20.3% at hospitals in the lowest-volume quintile to 8.4% at hospitals in the highest-volume quintile. Following colectomy for colon cancer, this range is 5.6%-4.5%-a smaller absolute difference but a procedure affecting far more individuals. 5 Last, our estimates of the magnitude of underlying variability in quality across hospitals suggest that improvements in performance could have large effects on the health of the public.
We found that the difference in relative risk of the 30-day mortality between a patient undergoing surgery at a hospital below average quality vs above average quality is at least 20% for each cancer site and exceeds 80% for breast and pancreas. If quality improvement efforts were implemented at below average hospitals or if patients were redirected for surgery at better performing hospitals, we estimate that these efforts could plausibly result in preventing more than 500 deaths among FFS Medicare beneficiaries undergoing cancer surgery each year.
Our analysis should be considered within the context of its limitations. Whether the findings would be consistent if all surgical patients, not only those with FFS Medicare coverage, were included is unknown, although there is no strong rationale why the results would not be generalizable at least directionally. The advantage of the FFS Medicare claims dataset is that it comprehensively covers the entire United States. A shortcoming is that it does not have the type of detailed information about T A B L E 3 Between-hospital variation in 30-day cancer surgical mortality and hospital odds ratio, by model cancer site that cancer registries contain, but prior analyses demonstrate that risk-adjusted surgical outcome assessment is robust to the exclusion of these SEER variables. 18,24,31 While we relied on the Charlson comorbidity index for our risk adjustment, which is a widely accepted method in the field, all risk adjustment can be criticized for being incomplete. 29,30,41 There is controversy over the inclusion of additional information in analyses of outcome variation and in the field of quality measurement more generally. [42][43][44][45] We did not adjust for patient socioeconomic status or other measures of social support. We

T A B L E 4 (Continued)
| 1657

LIPITZ-SNYDERMAN ET AL
are considering additional adjustments as not relevant to this outcome given the relatively short duration of follow-up, attributing primary responsibility to the hospital. While 30-day mortality does not capture all aspects of surgical quality, it is an objective outcome and can be a marker of technical skill and the quality of the hospital, pre-and peri-operative care, and postdischarge follow-up. It is also used as a metric by CMS in the Hospital Compare reporting program to report performance on the quality of other types of surgery. Conducting in-depth analyses along several dimensions of cancer surgical quality could inform future directions for hospital performance measurement. We have found that cancer surgical mortality varies more than that which can be explained by chance or differences in treated patient populations and that collectively this variation is responsible for excess and unnecessary mortality. Quality measurement could therefore be valuable for patient decision-making, policy evaluation, value-based reimbursement programs, and quality improvement initiatives, with the ultimate goal of improving patient outcomes. 10,[46][47][48][49] One limitation is that quality reporting for the many hospitals in the United States that provide very low volumes of surgical cancer care will face statistical challenges. While we incorporated only data from FFS Medicare, this shortcoming could be partially ameliorated using data from more payers including commercial insurers and Medicare Advantage.

| CONCLUSIONS
Our findings suggest that quality measurement and reporting of this outcome across cancers and by cancer site is worthy of serious consideration for practice and policy applications, while highlighting some of the limitations of the approach.