On standardized relative survival

Summary Cancer survival comparisons between cohorts are often assessed by estimates of relative or net survival. These measure the difference in mortality between those diagnosed with the disease and the general population. For such comparisons methods are needed to standardize cohort structure (including age at diagnosis) and all‐cause mortality rates in the general population. Standardized non‐parametric relative survival measures are evaluated by determining how well they (i) ensure the correct rank ordering, (ii) allow for differences in covariate distributions, and (iii) possess robustness and maximal estimation precision. Two relative survival families that subsume the Ederer‐I, Ederer‐II, and Pohar‐Perme statistics are assessed. The aforementioned statistics do not meet our criteria, and are not invariant under a change of covariate distribution. Existing methods for standardization of these statistics are either not invariant to changes in the general population mortality or are not robust. Standardized statistics and estimators are developed to address the deficiencies. They use a reference distribution for covariates such as age, and a reference population mortality survival distribution that is recommended to approach zero with increasing age as fast as the cohort with the worst life expectancy. Estimators are compared using a breast‐cancer survival example and computer simulation. The proposals are invariant and robust, and out‐perform current methods to standardize the Ederer‐II and Pohar‐Perme estimators in simulations, particularly for extended follow‐up.


Introduction
When cause of death is unavailable or unreliable it is not possible to directly estimate disease-specific survival. For this reason, disease-specific survival is sometimes assessed by a measure of the relative survival between a group diagnosed with disease and the wider population. The main use of relative survival analysis is to compare survival between cohorts, such as from different countries or over periods in time. A complication is that cohort structures can differ. For example, relative survival in cancer is often lower for older patients than younger patients, and different countries may have different distributions of age-at-diagnosis. In this article, we compare the use of relative survival measures for making such comparisons by defining general criteria based on the following setup.
Let S be a survival function and the corresponding cumulative hazard, with superscript C denoting a cohort of interest (often patients diagnosed with cancer), and P the general population from which the cohort was derived. We assume that survival may depend on covariates x, including in particular age and gender. Then is defined to be the conditional excess cumulative hazard at time t, although e need not be monotone or even positive. Typically t is time from diagnosis and P (t | x) = P b (a + t | x), where P b is the cumulative hazard from birth and a is the age at diagnosis. Corresponding to e S e (t | x) = S C (t | x)/S P (t | x) is the conditional relative survival (which may not be a survival function).
The initial estimators developed by Ederer and co-workers focused on the relative survival where H is the marginal distribution of X, and E H denotes expectations with respect to H (Ederer and Heise, 1959;Ederer et al., 1961). Estève et al. (1990) suggested that when S e depends on x, the target of estimation should instead be the marginal net survival When the relative survival is homogeneous, i.e., S e (t | x) = S e (t), then the Ederer estimators are consistent for the marginal net survival. However, Estève et al. (1990) pointed out that when the relative survival is heterogeneous the limit of the classical estimators depends on the survival in the general population P, and so they are not universal. They suggested modeling the excess hazard parametrically. Sasieni (1996) showed how it could be modeled semi-parametrically, but it was not until Perme et al. (2012) that a non-parametric estimator of the net survival corresponding to the marginal excess hazard was developed. Unlike the classical methods, the Pohar-Perme estimator is consistent for the net survival (1) in the heterogeneous setting and, consequently, Roche et al. (2013) suggested that all classical methods should be abandoned. Lambert et al. (2015) noted a trade-off between consistency of the new estimator and its precision.
We are not convinced that the mean of the relative conditional survival is the only statistic of interest for the comparison of survival between countries, periods in time or types of disease. Indeed, it is clear that the net survival depends on the covariate distribution, and two populations with different such distributions may have different marginal net survivals, even when the conditional net survival functions are identical. We next take a step back from focus on the net survival, by considering what features one would like a covariate-free descriptor of the relative survival to hold.

Criteria
Consider a functional R of two conditional survival functions and a covariate distribution that is a function of time t only (i.e., R is not a function of covariates x), which describes the ratio of survival functions. For example, R might be the net survival: or it could be the relative survival: If the purpose is to recreate the ratio of survival functions when they are independent of covariates then this should be a requirement: R(S C , S P , H)(t) = S C (t)/S P (t) whenever S C (t | x) = S C (t) and S P (t | x) = S P (t) for all x. More generally, we might require this to hold provided only that the ratio of survival functions S e (or equivalently the excess cumulative hazard e ) is independent of covariates. This is our first requirement: .
S e may be independent of covariates in real data. For example, relative survival from advanced breast cancer in Section 6 appears to be approximately independent of age at diagnosis until t = 10 years. When the ratio is not independent of covariates (A1 is vacuous, but) we would still like the statistic to reflect the ordering of the ratio.

A2a
If for some T , for all x and t ≤ T , then for all x and t ≤ T , then for all x and t ≤ T , then R(S C , S P , H)(t) < R(S C * , S P * , H)(t) for all t ≤ T .
Condition A2b is key for comparing relative survival between cohorts. It ensures that R does not depend on S P other than through S e . Ideally, we would like R to depend on S C and S P only through their ratio even if the covariate distribution is different. This leads to our third requirement that the statistic is independent of the covariate distribution When both A3 and A2b are satisfied, if One reason for considering different statistics other than the net survival (1) is that the net survival does not satisfy A3. Conditions A1 and A2 might be considered essential for a descriptive measure of relative survival, whereas A3 is necessary only for comparing relative survival between cohorts with different covariate distributions. By analogy, the crude rate is useful for describing a single cohort, but the age-standardized rate is more useful when comparing two cohorts.
If a measure meets criteria A1-A3 then we might ask what additional properties would be desirable. We consider the following.
A4 Robustness. Small changes in S C for a fixed S P and H do not cause large changes in R. A5 Precision. We prefer measures with smaller var(R)R −2 , whereR is an efficient estimator of R.

Some Relative Survival Families and Estimators
The observable data for individuals i = 1, . . . , n are (T i , X i ), where T i is the time of death and X i the covariate value; P(T i > t | x i ) = S C (t | x i ), X ∼ H; S P i (·) is assumed known. If (Ŝ C ,Ĥ) denote empirical versions of (S C , H) (putting mass 1/n at each point (T i , X i )), then corresponding to a measure R(S C , S P , H) we may have an estimator R(Ŝ C , S P ,Ĥ). To allow for right censoring we follow Andersen et al. (1996) and use notation Y i (t) for the at-risk process (Y i (t) = I(T i ≥ t) in the absence of censoring, where I(·) is the indicator function); and the counting process Now, under independent censoring, consider a family of estimators of the cumulative excess hazard where P i is the cumulative hazard for an individual with covariate x i in the general population, and w i (t) is a chosen weight given to the ith individual at time t, that may depend on x i . Setting w i (t) = 1 for all i = 1, . . . , n and t > 0 yields the Ederer-II estimator of the cumulative excess hazard (Â 1 ) (Ederer and Heise, 1959), and setting w i (t) = {S P i (t)} −1 provides the Pohar-Perme estimator (Â 1/S P ) (Perme et al., 2012). They may be put onto a survival scale through the It follows that if the X i are independent and identically distributed thenÂ w (t) converges to This leads to our first family of relative survival measures: A second family is defined If w(t, X) = 1 then we have the limit of the Ederer-I estimator (Ederer et al., 1961). Relative weighted survival satisfies our criterion A1 because in this case In order for it to satisfy A2, and depend only on S e and not S P , the weight w(t, There is a natural family of estimators corresponding to R 2 w that, to our knowledge, has been not been used previously. When there is no censoring then R 2 w may be estimated consistently by where For the more general case with censoring D that is independent of the covariate (so that S D i = S D ), we can define a family of estimators for R 2 w as whereŜ D (t) is a Kaplan-Meier estimate of the censoring survival distribution. When w(t, x) = 1 we have Ederer-I and when w(t, x) = 1/S P (t | x) we have an alternative to the Pohar-Perme estimator that is also consistent for the net survival. Further, when there is no competing mortality so that S P i (t) = 1 for all i and t, thenR 1 =R 1/S P and U 1 = U 1/S P , and it can be shown that U 1 (t) =R 1 1 (t), with both equal to the Kaplan-Meier estimate of S C (t) in the cohort (which is the non-parametric maximum likelihood estimate).
We end by imposing some restrictions on the weights w(t, x) based on the criteria. By definition the first two arguments to each R may be stated in terms of any two of S C , S P , and S e . If we consider R as a function of (S P , S e , H), then A2a and A3 imply that it depends only on S e . Suppose that R w satisfied A2a and A3 and that w = w v, then for R w to also satisfy A2a and A3 v should depend on (S P , S e , H) only via S e .

Assessment of Criteria
We next consider whether the families R 1 w and R 2 w satisfy our fundamental requirements A1-A3.

A1
Both R 1 w and R 2 w satisfy A1. This is seen by taking the excess terms such as S e outside of the expectation.
and v(t, x) depends on (S C , S P , H) only through S e (or not at all) then both R 1 w and R 2 w satisfy A2. It is for this reason that the limit of the Ederer-I and II estimators do not satisfy A2: they depend on S P even when S e is fixed. A3 Neither R 1 w nor R 2 w are guaranteed to satisfy A3. In order for them to do so one needs to standardize so that the weights are proportional to the ratio of the standardized to the observed covariate density, i.e., h 0 (x)/h(x), using the superscript 0 to denote a standard population. This is the approach to age adjustment that was proposed by Brenner et al. (2004); see Section 5.2 for further discussion. If we wish to standardize two cohorts with covariate distributions H and H * that do not have the same support, then to meet A3 the support of the standard distribution H 0 should be their intersection only (i.e., h 0 (X) = 0 if either h(X) = 0 or h * (X) = 0).
We have, thus, established that R 1 w and R 2 w satisfy our main It is then clear that in order for R 1 w and R 2 w to be robust (against for instance a very large |d e (u | X i )| which might happen in a sample when S C is very small), one should require that w(u, x) is bounded for all u and x. When w = h 0 v/(hS P ) then this can either be achieved by setting h 0 (x) = 0 when S P is very small (compared with other x at the same t) or ensuring that v/S P is bounded. Further, consider . Then assuming the hazards exist Thus, for fixed , as t gets large so that , which affects the excess hazard (for fixed S P ). In other words, R 1 w and R 2 w are not robust unless w is carefully chosen: for each x, the weights to approach zero with t as fast or faster than S P (t | x). This argument is also relevant for comparisons between populations: to be robust w(t, x)/E H {w(t, x)} should approach zero as fast or faster than S P (t | x) in all populations compared. Recall that if a is age at diagnosis, then S P (t | a) = S P b (t + a) where S P b (t) is the probability of living until age t in the general population. A5 The asymptotic variance ofR 1 w may be estimated bŷ R 1 w (t)σ 2 (t) using the same arguments as Perme et al. (2012) wherê with It is not straightforward to use this formula for estimation because of the difficulty in estimating S C i without modeling its dependence on X. Although , 1} so we cannot in general simply replace S C i in (6) by Y + i . One exception is when there are assumed to be j = 1, . . . , k homogeneous groups of size n j . Then, with independent censoring within each strata one may estimate R 2 w as based on a stratum-specific Kaplan-Meier estimatê S C j (t), and the variance may be estimated via where Greenwood's formula might be used for var(Ŝ C j ). However, in practice a bootstrap estimate of the variance of U w is recommended because one may avoid the assumption of homogeneous groups. Precision of the estimators of R 1 w and R 2 w is clearly affected by the choice of weight function due to the w 2 i term in the numerator of the variance. In both, functions that place more weight on the oldest patients, such as w i (t) = 1/S P i (t) (Pohar-Perme with R 1 w ) are less precise than others with weights such as w i

Standardization
Methods of standardization that are used in the numerical sections of this article are next introduced, and discussed in relation to the criteria A1-5.

Stratification
The Ederer-II and Pohar-Perme estimators are often standardized by stratification, particularly by age group (Pokhrel and Hakulinen, 2008). The most common method is a weighted arithmetic mean of stratum-specific estimates of the relative survivalR j in stratum j = 1, . . . , k. Let g j = P H 0 (x i ∈ G j ) for groups G j . Then denote the traditional standardized statistic by D g satisfies A1-A3 provided the statistic R j satisfies A1-A3 in each stratum. Note also that when the same level of stratification is used for the weights inR 1 w and for weights in the standardization, (i.e., if w i (t) = w i (t) whenever the observations i and i come from the same stratum G j ), then D g (R 1 w ) does not depend on the particular weights since the w i (u) terms in (2) cancel out. Thus, when the same factors are used to stratify the population mortality S P and for standardization by stratification, the standardized Ederer-II (corresponding to w i = 1) and the standardized Pohar-Perme (corresponding to w i = 1/S P i ) estimators will be identical.

Baseline Weighting
A problem with stratification is that the number in each stratum needs to be sufficient to obtain an estimate of S e over the follow-up period of interest: with censored data it is not possible to estimate the excess survival beyond the smallest of the stratum-specific last follow-up times. A second approach to standardization is to use a weighted estimator. Each individual is weighted so that the weighted sample at baseline represents the standard population (Brenner et al., 2004). This approach corresponds to using time-constant weights within the estimator, rather than taking a weighted average of stratum-specific estimates. It is exactly what needs to be done to ensure that our condition A3 is satisfied. When used with Ederer-II one has weights (nz i )/n i , where n i = n j=1 I(x j = x i ) is the number of individuals in the sample at baseline with the same covariate values, and z i is a standard probability mass function for the covariates ( n i=1 z i = 1). When this approach is applied to the Pohar-Perme estimator one has Unlike the usual Pohar-Perme estimator these weights satisfy A3 for bothR 1 w andR 2 w , but they are similarly not robust.

Standardized Relative Survival
Our proposal is to standardize the estimators of R 1 w and R 2 w by using weights If the covariate distribution at time 0 is z i and individuals are subject to survival S 0 i (t), then z i S 0 i (t) will be the covariate distribution at time t in the standard population. Thus, z i S 0 i (t) can be thought of as a standard prevalence of patients with the disease and covariates x i at time t post diagnosis. The limit of the estimators with these weights corresponds to R 1 w and R 2 w with w = h 0 S 0 /(S P h). With these weights R 1 w and R 2 w meet A3 as described above. The parameterization is still arbitrary, in that S 0 and H 0 (or z i ) may be chosen, but A4 helps to rule out certain choices of S 0 . For example, if H 0 = H and S 0 (t) = 1 for all t then R 1 w is the Pohar-Perme estimator which does not meet A4. Suppose that X = (a, l) where a is age at diagnosis and l is a categorical variable with l = 1, 2, . . . , L levels, and S P (t | a, l) = S P bl (a + t), where S P bl is the survival from birth in group l. Then to meet A4 we showed that the standard reference weights should be chosen so that S 0 bl (t) ≤ S P bl (t) for all l and t. For instance, a country with the poorest population survival could provide S 0 .
Equations (5) and (6) show how the choice of S 0 in (9) relates to A5. The proposed weight w i (t)=h 0 (x i )S 0 i (t)/ {h(x i )S P i (t)} enables us to ensure S 0 i (t)/S P i (t) is stable through the choice of S 0 . Equations (5) and (6) also show that there may be a trade-off between robustness and precision. If w i (t) is zero then the data from individual i will not be used for estimation at time t; this will give the estimator robustness against outlying events at time t, but (5) will be larger and precision worse. Thus, one would not wish to set S 0 (t | x) to zero for t ≥ T unless there is no interest in estimating R at or beyond time T .
In summary, we have two measures and corresponding estimators that satisfy criteria A1-A4, under an assumption of independent censoring. It is not clear whether there are circumstances when one might dominate the other in terms of estimation precision (A5). This will be explored later using a computer simulation.
To help interpretation note that when S 0 = S P and h 0 = h, the weights in (9) equal one. Thus, R 2 w and R 1 w are, respectively, the Ederer-I and II estimation targets when the standard survival is taken to be the same as that in the reference population, and the standard covariate distribution is the same as in the observed cohort. Suppose instead that S P = 1, i.e., there is no competing hazard) then both Ederer-II and Pohar-Perme weights in R 1 w are one, andR 1 w gives the Kaplan-Meier estimator (more preciselyÂ w gives the Nelson-Aalen estimator). The use of S 0 in our weights (9) provides a stratum-weighted Kaplan-Meier estimator (Xie and Liu, 2005). Thus, R 1 w with weights given by (9) can be interpreted as the marginal net survival that would be observed in population H 0 subject to censoring S 0 (t | x). It might be called the S 0 -filtered net survival. At each time t, R 1 w corresponds to a weighted average of the conditional excess hazard functions: If the excess hazard is independent of X then the weights do not matter. More generally we want the weights to be reasonably homogeneous. In particular, we would like to give (approximately) equal weight to subsets of X that have equal probability of being at risk at time t. Ederer-II does this exactly but at a price-it does not satisfy A2. Our weights (9) provide a good approximation to homogeneous weighting while ensuring A1-A3 hold.

Example
The R package relsurv (Perme, 2013), which implements the Pohar-Perme, Ederer-II, and some other relative survival estimators was extended to fit the standardized methods in this report (supplementary material). To demonstrate the methods, we obtained data on breast cancers diagnosed between 1973 and 2010 in the United States from SEER (2014). Death rates for the same period were obtained from National Center for Health Statistics (2015) by age and gender. The following reference data were used for standardization.
(1) The reference age distribution of cases was a standard taken from Corazziari et al. (2004). This weights age groups (15-44, 45-54, 55-64, 65-74, 74+) as (7, 12, 23, 29, 29)%. (2) For exposition the standard reference mortality rate was taken to be that estimated for the Russian Federation (Human Mortality Database, 2016), where mortality rates were approximately 70% higher than in the United States for women aged 60 between 1980 and 1989, rising to 300% by 2000-2010. The effect of a lower reference rate (not recommended) was considered by dividing the U.S. mortality rates by three.
We focus on the survival of 16,597 women younger than 85 who were diagnosed with invasive breast cancer with distant  Figure 1 shows the Pohar-Perme relative survival estimates by age band, where there was little difference to 10 years between the younger (<55) and older groups. Thus, to 10 years, net survival appeared to be almost independent of age at diagnosis (c.f. criteria A1). However, beyond 10 years the differences become more pronounced for the 75+ group, as competing mortality rates increased and precision decreased. This had an impact on the traditional age-standardization estimate, as this age group is weighted most heavily. Figure 2 compares un-standardized and standardized estimates. To 10 years where there was very little difference in net survival by age, there was very little practical difference between the estimators; only a very small difference is visible between the stratified estimators and the others. Larger differences were seen after 10 years. Traditional agestandardization of Pohar-Perme or Ederer-II yielded very similar estimates, with substantial variability. The Brenner age-adjustment of Pohar-Perme was close to un-standardized Pohar-Perme. Our proposals (with a reference mortality that is higher than in the United States) were less variable and closer to un-standardized Ederer-II than the others. Reference rates lower than the United States are only shown for insight and are not recommended; as expected these weights  Figure 2. Estimated relative survival curves from example: (a) to 10 years; and beyond 10 years for (b) some existing methods and (c) proposed estimators with reference to Ederer-II. E2, Ederer-II estimate; PP, Pohar-Perme estimate; PPa, E2a, traditional age-standardization from (7); PPb, Brenner age standardization from (8); R1S, proposalR 1 w with (9) and standard reference mortality from the Russian Federation; R1S*, as R1S but with standard mortality rates three times lower than the United States; R2S, proposal estimated byÛ w from (4) with standard rates from the Russian Federation; R2S*, similarly but with standard rates three times lower than the United States.

(d) 20yr
Survival Figure 3. Estimated standardized relative survival from simulation example at: (a) 5, (b) 10, (c) 15, and (d) 20 years. The true net ( ) and standard survival statistics (· · · R 1 w , R 2 w ; both with weights (9)) in the reference population are given; samples are from two cohort populations [1] and [2]. Net survival estimates are from PPa, traditional standardization applied to Pohar-Perme estimation, and PPb which is Brenner standardization from (8). The standardized survival estimates R 1 w and R 2 w with weights (9) are labeled, respectively, R1S and R2S.
yield an estimate with properties somewhere between those of the Pohar-Perme and Ederer-II estimates.

Two-Group Simulation
A computer simulation with the following characteristics was used to further compare the estimators. Mortality rates in cohort 1 were based on women in the United States in 1980, in cohort 2 they were (i) 1.2 times higher for those younger than 70, (ii) two times higher for those aged 70-85, and (iii) four times higher when aged 86 or older. The standard reference population rates were (i) two times higher for those younger than 70, (ii) four times higher for those aged 70-85, and (iii) 100 times higher when aged 86 or older, to reflect a standard population where very few people lived into their 90s. The excess hazard was the same in both cohorts, being 3% greater per year from age 65. There were two groups in each population aged 65 or 75 at diagnosis. The percentage aged 65 at diagnosis was 60% for cohort 1, 70% for cohort 2, and 50% in the standard reference population. There were two censoring scenarios: (i) no censoring, and (ii) uniform censoring between 1 and 25 years. In the first cohort, approximately 41% were censored before their event time, and 35% in the sec-ond cohort. The outcomes of interest were estimates of relative survival at 5, 10, 15, and 20 years. A group of 2000 individuals was simulated 5000 times from both cohort populations. Standardization is needed or methods will show a difference between the cohorts that only reflects their age distribution at baseline; we used the methods from Section 5.
We focus first on the simulations without censoring. Figure 3 shows boxplots of the simulation survival estimates, and summary statistics are given in Table 1. The plots highlight that the net and standardized survival in the reference population are different quantities. Our interest is not in a comparison of how well the estimator for standardized survival recovers net survival etc, but whether one would draw an appropriate conclusion when comparing the two cohorts. For this the plots show little difference to 10 years. All the estimators had only small bias, and the right conclusion would be drawn for all the estimators. However, at 5 and 10 years, standardized relative survival R 1 and R 2 were more precise than net survival in terms of Var(R)R −2 (Table 1), so they would rule out larger differences because they are more precise.
Beyond 10 years the net survival estimators started to break down, showing differences between the cohorts even though the age-specific excess hazards are identical. This is reflected by substantial differences between estimates of net survival in the second cohort compared with the first. The reason accounts for the lack of results for traditional standardization at 20-years survival, where it was not possible to estimate relative survival in the second cohort because everyone in the older group was dead. This robustness issue likewise affected the Brenner baseline standardization method. Our standardization methods performed robustly even at 20 years, as the standard reference population effectively excluded everyone once they were older than 85. Censoring decreased the precision of all estimators, but did not appear to affectR 1 w very much more thanR 2 w . To 5 yearŝ R 1 w had slightly better precision thanR 2 w . Beyond that it was very similar toR 1 w for no censoring, and slightly worse with censoring in cohort 2: there was not a clear winner between R 1 w andR 2 w in these simulations.

Discussion
In this article, we outlined some criteria for relative survival, and then assessed different families of measures. We developed two new measures and estimators that met our criteria. Standardized R 1 may be interpreted as the marginal net survival that would be "observed" in a standard population subjected to standard censoring. This is because it provides the survival transform of a weighted excess hazard: viewing the weights as the probability of being at risk in the standard population (at time t given covariate x) gives the interpretation (provided that S C (t) ≤ S P (t)). Standardized R 2 targets a marginal probability of surviving the excess hazard from the disease if the person would survive as long with respect to the standard population. It has a similar interpretation to the stratified standardization approach of Brenner and Hakulinen (2003), who proposed time-dependent weights of the form S 0 j (t) in the context of stratified estimation (Pokhrel and Hakulinen, 2008). Here, we applied similar weights but to the individual subject, which follows the ideas in Brenner et al. (2004), also incorporating the inverse probability of sampling weights introduced in this context by Perme et al. (2012). Interpretation of R 1 w is arguably easier than R 2 w , because the excess (non-cumulative) hazard and relative density functions corresponding to R 1 w do not depend on the derivative of the weights with respect to time dw/dt, whereas for R 2 w they do depend on the derivative dw/dt. However, both are statistical constructs. For a non-specialist audience we suggest to describe both proposals as standardized relative survival indicies designed to accurately and precisely determine the direction and ordering of survival differences between cohorts.
Standardized R 1 and R 2 may be applied for longer followup than traditional standardization, by placing more weight on those young enough to be expected to survive that long after diagnosis. But, they are not consistent estimators of the marginal net survival. In our view, this is much less important than our other criteria. Indeed, whenever one uses a Pohar-Perme estimator that is standardized by stratification, one is already foregoing having an estimator of the (unstandardized) net survival. More importantly, any measure of relative survival that is not the same when the excess hazard given covariates is the same in two populations, seems more deficient than one which is inconsistent for estimation of the marginal net survival. We do not accept the need to only estimate the marginal net survival, and would prefer to precisely estimate the mean net survival with respect to a standard covariate distribution. Our argument mirrors Bickel and Lehmann (1975), who showed that although a trimmed mean is not an unbiased estimate of the mean of an asymmetric distribution, it has a place as a measure of central location of a distribution, and may be better for this than the mean in many situations.
This article has considered properties of relative-survival measures and estimators, and from this some general guidance was provided on how to choose the standardization weights. More practically, it would be useful to provide investigators recommended tables of standard weights. We will develop elsewhere recommended cancer-site specific standardization tables for our methods. Another limitation is that we have not considered dependent censoring patterns, such as those described by Hakulinen (1982); Kodre and Perme (2013). Future work will address this and testing differences between standardized relative survival estimates.
In conclusion, we hope that the criteria developed to assess relative survival measures and estimators are useful for a theoretical understanding of their properties. We recommend that our proposed standardization methods be considered for non-parametric relative survival estimation, when the aim is to make comparisons between cohorts, such as from dif-ferent countries or periods in time, or even between disease types.

Supplementary Materials
An R package implementing the new methods is available with this article at the Biometrics website on Wiley Online Library.