A comparison of relative survival and cause‐specific survival methods to measure net survival in cancer populations

Abstract Background Accurate cancer survival statistics are necessary for describing population‐level survival patterns and measuring advancements in cancer care. Net cancer survival is measured using two methods: cause‐specific survival (CSS) and relative survival (RS). Both are valid methodologies for estimating net survival and are used widely in medical research. In these analyses, we compare CSS to RS at selected cancer sites. Methods Using data from 18 SEER registries between 2000 and 2014, five‐year RS and CSS estimates were generated overall as well as by age groups and by sex. To assess how closely the two survival methods corresponded, net survival percent difference was calculated with the following formula: ((RS‐CSS)/RS)*100. Results Discrepancies between estimates obtained from CSS and RS methods varied with cancer site and age, but not by sex. In most cases, CSS was greater than RS, but cancers with available early screening and high survival rate had higher RS than CSS. Net survival percent differences were small in children and adolescents and young adults, and large in adults over the age of 40. Conclusions While both CSS and RS aim to quantify net survival, the estimates tend to differ due to the biases present in both methodologies. Error when estimating CSS most frequently stems from misclassification of cause of death, whereas RS is subject to error when no suitable life tables are available. Appropriate use of CSS and RS requires a detailed understanding of the characteristics of the disease that may lead to differences in the estimates generated by these methods.

of surviving a cancer diagnosis in the absence of competing causes of death. Net cancer survival is most frequently quantified using the following two methods: relative survival (RS) and cause-specific survival (CSS). 1 The majority of groups that report cancer survival statistics calculate these statistics using RS, including the National Cancer Institute's Surveillance, Epidemiology and End Results (SEER) program, the National Program of Cancer Registries United States Cancer Statistics, and the North American Association of Central Center Registries. RS estimates percent of persons surviving using all deaths adjusted for expected deaths based on life tables. RS calculations utilize life tables that estimate life expectancies for the US populations based on current age. The most likely source of error when estimating RS comes from limitations of life tables. Expected survival from life tables is not always an accurate reflection of the expected survival of a population of patients with a cancer diagnosis. Life tables cannot be generalized to all populations, and use of unsuitable life tables will lead to error when calculating RS. 2 When life tables for a suitable reference population are not available, CSS can be used to estimate net cancer survival instead. CSS estimates percent of persons surviving using individual cause of death information. Like RS, CSS aims to estimate net cancer survival, yet the differences in methodologies lead to different measurements. While in some cases the discrepancy between the estimates from the two differing methods may be considered negligible, there are many instances in which this discrepancy may be substantial. In this study, we examine how factors such as site of cancer, sex, and age affect the correlation between the two estimates of net survival.

| METHODS
This study was approved by the University Hospitals Cleveland Medical Center Institutional Review Board. Using data from 18 SEER registries between 2000 and 2014, 3 estimates of fiveyear relative survival and cause-specific survival were calculated using the actuarial method. The actuarial method assumes that of all the cases lost to follow up, only half were at risk at the time of death. This method also makes the assumption cases are lost to follow up randomly. SEER*Stat 8.3.4 4 was used to estimate cause-specific survival (CSS) and relative survival (RS) at the following sites: lung and bronchus, brain and other nervous system, breast, prostate, melanoma of the skin, and acute myeloid leukemia. The justification behind selecting these sites was to include common cancers and cancers affecting multiple age groups. The sites selected also allowed us to analyze cancers with diverse characteristics (benign and malignant cancers, solid and liquid tumors, hormonal cancers, and cancers with a strong genetic component as well as cancers with a strong link to lifestyle). Estimates generated at these sites were compared overall as well as by age, sex, and behavior (for brain and other nervous system tumors only). Sites were defined using the SEER Site Recode ICD-0-3/ World Health Organization (WHO) 2008 recode. Age subgroups were also stratified by stage using the American Joint Committee on Cancer (AJCC) stage 6th edition 2004+ variable. The stage  variable was only available for a few of the sites starting from  the year 2004, so CSS and RS estimates were only compared  by stage at lung and bronchus and breast from 2004 to 2014. RS is calculated as overall observed survival for patients with a given cancer diagnosis divided by expected survival of a similar population of patients without the cancer diagnosis. When calculating expected survival, the Ederer II method was used. With the Ederer II method, matched individuals are considered at risk until the corresponding patient with a cancer diagnosis is censored or dies. When estimating RS, the default survival table was selected on SEER*Stat. The expected survival in this table comes from the US annual life tables from the National Center for Health Statistics and is based on life tables from 1970 to 2012 where individuals are matched with the appropriate estimation for age and year of diagnosis. 5 These life tables include sex-and race-specific estimates of life expectancy. The NCHS constructs these life tables using vital statistics and census data, as well as data from Medicare for ages 66-99 years to calculate death rates. For life tables from 2000 to 2007, mortality rates were smoothed beginning at age 66. For life tables 2008 and later mortality rate were smoothed around the age of 85, but age at which smoothing began varied with race. Methodology used to generate these life tables is continuously refined and varies slightly by year. 5 CSS is calculated as number of persons with a cancer diagnosis still living after the cancer diagnosis of interest divided by total number of persons with the cancer diagnosis of interest. Individuals who die from competing causes of death are censored from the population. At all sites, except nonmalignant brain and other nervous system, CSS estimates were generated using the SEER cause-specific death classification. 6 For patients with only one cancer, the SEER cause-specific death classification attributes the following causes of death as cancer-specific: cancer of the same site, cancer of same organ system, all malignant cancers, and sitespecific noncancer disease. At certain cancer sites, deaths coded as HIV related were also classified as cancer-specific. For nonmalignant brain and other nervous system tumors, CSS was calculated by categorizing deaths due to in situ, benign, or unknown behavior neoplasms as cancer-specific.
Selection criteria were adjusted to include only individuals with one cancer and individuals of known age. Cases in which cancer was reported only through a death certificate or autopsy were excluded when calculating survival. Cases with any values (including age, race, etc.) not found in expected survival life tables were also excluded. With these selection criteria, 2.13% (range: 0.83%-2.66%) and 2.78% (range: 1.24%-3.18%) of cases were excluded when estimating RS and CSS respectively.

MAKKAR et Al.
To assess how closely the two survival methods corresponded, the percent difference between the two net survival estimates was calculated with the following formula: RS was used as the referent in this formula as it is more commonly used for cancer statistics reporting.

| Net survival estimates by cancer sites
Estimates for CSS and RS at all sites of cancer combined were closely related with a percent difference of only −0.3% and net difference of only −0.2%. The differences between CSS and RS estimates were greater when the cancers were RS − CSS RS × 100%. separated by site ( Figure 1A, Table 1). Both the magnitude and direction of percent difference between the two estimates varied with cancer site. A negative percent difference indicated that CSS estimates were greater than RS estimates and vice versa. Percent difference was negative for most cancer sites including lung and bronchus (−13.29%), brain and other nervous system (malignant: −4.52%; nonmalignant −9.83%), and acute myeloid leukemia (−9.24%). For melanoma of the skin (3.33%), breast cancer (2.40%), and prostate cancer (5.74%), RS estimates were greater than CSS estimates.

| Net survival estimates by sex
To account for possible effects of sex on discrepancies between estimates from the two methods, results were also stratified by sex ( Figure 1B, Table 2). At all cancer sites, net survival estimates were greater in females than in males. No strong pattern was noted between percent difference and sex. The percent differences between the estimates obtained from CSS and RS methods at all selected cancer sites were fairly consistent for both sexes.

| Net survival estimates by age groups
CSS and RS were estimated for children (age 0-14 years), adolescents and young adults (AYA) (age 15-39 years) and adults (age 40+ years [ Figure 2, Table 3]). Adults were divided into two groups: younger adults (40-64 years) and older adults (65+ years). Due to small sample size, CSS and RS were not calculated in children for cancer at the following sites: lung and bronchus, breast, and prostate. In children and AYA, the differences between the estimates obtained from the two methods were quite small (percent differences less than 6% and net differences less than 4%) at the selected sites. Discrepancies between estimates obtained from the two methods were largest in older adult cancers. When compared overall, RS was greater than CSS for melanoma of the skin, breast cancer, and prostate cancer, but when stratified by age this was only true in adults (aged older than 40 years). For cancers of breast and lung and bronchus, age groups were further stratified by stage to assess for a potential confounding effect (Figure 3). Results showed that the magnitude of the discrepancy between CSS and RS estimates was related more closely to age rather than stage of cancer. Interestingly, for advanced stage breast cancer (stage III) the direction of the difference between CSS and RS was reversed ( Figure 3A).

| DISCUSSION
In this study, CSS and RS estimates were compared in several conditions. In many cases, the difference between CSS and RS was small indicating that in these situations CSS can be used as a reliable alternative to RS. Unlike RS estimates, CSS estimates do not rely on accurate life tables to accurately measure net survival. Measuring CSS, however, requires all causes of death to be classified as either a death attributable to cancer diagnosis or as a death not attributable to cancer diagnosis. The largest potential source of error for cause-specific survival is misclassification of cause of death. Misclassification can be divided into two groups: genuine or  conceptual. 7,8 Genuine misclassification is a problem with data collection that leads to inaccurate information on death certificates. On the other hand, conceptual misclassification is a problem that occurs when the cause of death cannot be easily categorized as attributable or not attributable to cancer diagnosis. For example, physicians may differ in how they categorize a death due to infection in a patient whose immune system has been suppressed by cancer treatment.
Misclassification of cause of death can lead to under attribution or over attribution of cancer as a cause of death, but studies show that misclassification is more likely to result in underestimation of cancer mortality. Welch and Black analyzed data from 1994 to 1998 for a study which demonstrated that 41% of deaths that occur within 1 month of cancerdirected surgery were coded as non-cancer-specific deaths. 8 Another study, using data from 1961 to 1987, reported an underestimation of cancer mortality by about 18%. 9 In the analyses we performed, CSS was greater than RS for cancers of lung and bronchus, brain and other nervous system, and acute myeloid leukemia. This suggests overestimation of CSS and/or underestimation of RS. Under attribution of cancer as a cause of death likely lead to overestimation of CSS at these sites. Of note, CSS was calculated using the SEER cause-specific death classification, which attempts to compensate for misclassification of cause of death. If CSS were calculated using only the cancer of interest, the difference between CSS and RS would have been greater. Another possibility is that RS was underestimated at these sites. This is particularly true in the case of lung cancer. Cohorts of lung cancer patients include more smokers than the general population. As a result, general life tables overestimate the lifespan of patients with lung cancer and underestimate relative survival. The life tables utilized by SEER*Stat to generate expected survival estimates are stratified by sex and race, but do not incorporate all variables that are shown to be significant correlated with life expectancy. In particular, increased income is significantly associated with increased life expectancy, as well as higher increases in life expectancy over time. 6 County of residence and comorbidities are also strongly associated with life expectancy. 7 These factors are not included in the standard life tables generated by the National Center for Health Statistics, and as a result may lead to errors in estimating the "true" expected survival for cancer patients.
Stratification by age revealed that the discrepancies between CSS and RS were most prominent in adults, especially adults older than 65 years. The effect of age on the difference between CSS and RS appeared to be independent of stage of cancer. The increasing disparity between CSS and RS in older populations can be explained by a greater amount of error when classifying cause of death in elderly patients. Literature suggests that physicians may be less precise when coding cause of death for individuals that have a higher probability of dying, including elderly patients. 10 This and the greater prevalence of competing mortality risks may lead to a greater degree of death certification misclassification in older patients.
When net survival was calculated for breast cancer, prostate cancer, and melanoma of the skin, RS estimates were greater than CSS estimates. This can be partially attributed to the "healthy participant" effect. The "healthy participant" effect 11 describes the phenomenon in which RS is overestimated when calculated for cancers diagnosed through screening. This is because populations who undergo regular screening have longer life spans than the general population. Stratification by age in this study demonstrated that at these sites RS was greater than CSS only in adults older than 40. As guidelines recommend screening for breast and prostate cancer starting after the age of 40, 12,13 this further supports the notion that RS was overestimated due to the "healthy participant" effect. Of note, the direction of the difference between RS and CSS was reversed in stage III breast cancer. This suggests that for high stage breast cancer the bias from misclassification of cause of death on CSS is stronger than the bias of the "healthy participant" effect on RS.
Our study is not the first to examine whether CSS may be an acceptable alternative to RS. Other published studies compared CSS and RS estimates and explore factors that influence these estimates. A study by Hu et al 14   The analyses in this study were performed using data from 18 SEER registries. One of the greatest strengths of this study is the large sample size provided by SEER. While our results demonstrated that CSS is a reliable method to estimate net survival in many situations, it is important to acknowledge the limitations of the study. In our analyses, individuals with multiple primaries were excluded so that CSS estimates would be more reliable. Including only individuals with one primary allowed us to account for misclassification that occurs due to metastasis. When cancer metastasizes, cause of death may be inaccurately attributed to cancer at the site of metastasis rather than cancer of the primary site. 6 For individuals with only one primary, it is reasonable to assume that deaths attributed to all malignant cancers are cancerspecific deaths. This same assumption cannot be made in individuals with multiple primaries, making it difficult to account for cancers that have been miscoded on death certificates due to metastasis. Additionally, as individuals with multiple primaries have a greater amount of competing mortality risks, cause of death is more likely to be misclassified in these individuals. Also, CSS and RS were only estimated at selected, common sites of cancer, so our conclusions may not be generalizable to all cancer sites. In another study, it was suggested that there is likely to be a greater degree of misclassification of cause of death in death certificates for less common cancer sites. 2 This suggests that the despite the conclusions of our study, CSS may not be a reliable estimate of net survival for rare cancers. The analyses in this study did not investigate the validity of CSS estimates in individuals with multiple primaries or individuals with cancers at less common sites.

| CONCLUSION
RS estimates are usually preferred over CSS estimates in order to avoid error resulting from misclassification of cause of death. With improvements in quality of data on death certificates 8 and algorithms designed to compensate for misclassification of cause of death, 9,10 CSS estimates are now more reliable. RS is usually the default methodology to measure net survival, but CSS estimates may be a more accurate estimate of net survival than RS in situations where appropriate life tables are not available. Furthermore, life table survival estimates may not be an accurate representation of populations that undergo regular screening or of populations with a high prevalence of risk factors (eg, smoking) associated with other diseases that may cause death. CSS estimates in these situations are more correct and should be used instead of RS estimates. CSS estimates can be used when reliable cause of death information is available. Misclassification of cause of death tends to be high in elderly patients; therefore, CSS estimates should be avoided in these populations. CSS and RS are both widely used in medical research, but neither methodology is a perfect net survival estimate. CSS and RS statistics should be interpreted with caution keeping in mind the limitations of both methodologies. As accurate cancer survival statistics are necessary for describing population-level survival patterns, and for measuring advancements in cancer care, it is important to be attentive to strengths and limitations of both methodologies when using CSS and RS to report net survival. Neither RS nor CSS is strong enough to be used as a gold standard. Understanding the biases of both methodologies will enable us to make more informed decisions on which approach to use to estimate net survival depending on the situation.