An alternative approach to confidence interval estimation for the win ratio statistic
Summary
Pocock et al. (2012, European Heart Journal 33, 176–182) proposed a win ratio approach to analyzing composite endpoints comprised of outcomes with different clinical priorities. In this article, we establish a statistical framework for this approach. We derive the null hypothesis and propose a closed‐form variance estimator for the win ratio statistic in all pairwise matching situation. Our simulation study shows that the proposed variance estimator performs well regardless of the magnitude of treatment effect size and the type of the joint distribution of the outcomes.
1 Introduction
In cardiovascular (CV) trials, the endpoint of interest is often a composite of two or more types of related clinical events (e.g., composite of CV death, myocardial infarction and stroke; composite of hospitalization and death) and the analysis typically focuses on the time to first event. This approach is not ideal as it reduces a multi‐dimensional problem into a one‐dimensional problem by analyzing only the occurrence of the first of several events and ignores the subsequent, sometimes more important events such as CV death.
Data consisting of a non‐fatal event and a fatal event are called semi‐competing risks data (Fine, Jiang, and Chappell, 2001). Data of this kind have been modeled by assuming that the joint survivor function of the two event times follows a copula, with observations restricted to an area where fatal event time is greater than non‐fatal event time (Fine et al., 2001; Peng and Fine, 2007). Xu, Kalbfleisch, and Tai (2010) noted that semi‐competing risks data are in fact illness‐death data (Fix and Neyman, 1951; Sverdrup, 1965). As such, techniques developed for illness‐death data can also be applied to semi‐competing risks data (Andersen et al., 1993; Kalbfleisch and Prentice, 2002; Xu et al., 2010; Allignol et al., 2013). For situations where the non‐fatal component includes recurrent events, Ghosh and Lin, (2000, 2002) used the mean cumulative incidence function adjusted for death. Liu, Wolfe, and Huang (2004), Cowling, Hutton, and Shaw (2006) and Zeng and Lin (2009) jointly modeled the recurrent events and the fatal event based on shared frailty models.
Analysis of semi‐competing risks data is complicated by the fact that fatal events can censor non‐fatal events, potentially resulting in informative censoring. The traditional time to event analysis only considers the first event, however non‐fatal events always occur before death. Consequently, in the traditional analysis, the non‐fatal events are given higher priority than the far more serious event of death.
Pocock et al. (2012) proposed an alternative method to analyze composite endpoints to account for the order of clinical priorities. Subjects from new treatment and standard treatment arms are first paired, and subsequently, within each pair, a “winner” or “loser” is determined by comparing “time to component events” sequentially according to the order of the clinical priorities. A win ratio comparing the new and standard treatments can then be computed. Pocock et al. (2012) discussed “matched” and “unmatched” versions of the win ratio statistic separately. In the “matched” version, each subject from the new treatment arm is matched with a unique subject from the standard treatment arm based on a metric, which is a function of baseline risk scores. In the “unmatched” version, all possible pairing between the subjects in the new treatment arm and the subjects in the standard treatment arm are considered. In practice, implementation of the matched pair approach is not trivial. It necessitates the calculation of a baseline risk score, which involves first identifying risk factors, and then assigning them appropriate weights, both of which can be challenging and very subjective. Furthermore, with the “matched” approach, a given dataset might not result in a unique estimate of the win ratio. This can occur, for example, when the two treatment groups do not have an equal number of subjects or when a few subjects have the same baseline risk score. Finally, we have the basic problem with matching, which is that it might not be possible when the risk factors differ for the components of a composite endpoint. Therefore, in this article, we focus our discussion on the “unmatched” version of the win ratio statistic, which does not require matching based on baseline scores, scoring methods and subjective choice of weights and reduces the potential loss of information due to lack of matched pairs, therefore much more desirable as a test statistic compared to that from the “matched” version.
The win ratio approach seems intuitive and therefore appealing to researchers. However, in the original article, unlike classical methods for analyzing time‐to‐event data (e.g., log‐rank test, Cox proportional hazards model), the null hypothesis being tested using the win ratio statistic is unclear. Also, for the “unmatched” win ratio, the original article does not provide a closed‐form solution for the variance, but recommends a computationally intensive re‐sampling estimation method.
In this article, we have developed a statistical framework for using the win ratio approach by providing the null hypothesis to be tested. We believe that our effort will not only lead to a better understanding of the win ratio approach, but also provide a new methodology for the analysis of semi‐competing risks data. Using the U‐statistics technique, we provide a closed‐form variance estimator for the win ratio in the “unmatched” situation. A simulation experiment is used to evaluate the performance of the proposed variance estimator. Finally, we illustrate the application of our methodology to a real data example.
2 Method
2.1 Formulation of the Statistical Test
In this section, we formulate the statistical test for a composite endpoint consisting of death and hospitalization. Death is considered to be the event of higher priority. Each pair of subjects are compared initially with respect to “time to death” then to “time to hospitalization.” Death censors the event of hospitalization but not vice versa (i.e., the potential hospitalization cannot be observed after death).
Let
and
be two random variables denoting the time to hospitalization and the time to death, respectively. These two variables are usually correlated to each other.
can right‐censor
but not vice versa. Let
denote the new treatment group and
the standard treatment group. In addition,
is the time to censoring, which is assumed to be independent of
given Z.
,
, and
, the observed data can be divided into the following four distinct categories and
is the category indicator ranging from 1 to 4.
- If
and
are observed, that is,
,
;
- If only
is observed, that is,
and
,
;
- If neither
nor
is observed, that is,
and
,
;
- If only
is observed, that is,
,
.
Let
and
. Let
be the corresponding event indicator for
where
if the time of hospitalization is observed; otherwise,
.
indicates the censoring status for
where
if the time of death is observed; otherwise,
. See Figure 1 for more details. The observed data
,
, are the independently identically distributed samples of
.

,
, and
, where for any a and b,
.
- patient in new treatment arm has death first;
- patient in standard treatment arm has death first;
- patient in new treatment arm has hospitalization first;
- patient in standard treatment arm has hospitalization first;
- none of the above.

(1)
(2)
(3)
(4)The total number of “wins” for the new treatment group is
, and the total number of “losses” for the new treatment group is
, where
are the total numbers of the comparisons in categories (a), (b), (c), and (d), respectively.
can be obtained by summing over the double index i and j for the index function in 2;
,
, and
can be similarly obtained from 1, 3, and 4, respectively.
The problem of interest then, is to compare the number of wins to the number of losses in the new treatment group. Pocock et al. (2012) defined the win ratio as
. For mathematical simplicity, we first work on the win difference
.
2.2 Null Hypothesis
through some straightforward but tedious calculations as shown in the Appendix. For easy presentation, we assume that the joint distribution of
is continuous. Let
be the hazard function of
in group k,
be the conditional hazard of
in group k, and
be the hazard function of
in group k,
. Also, let
be the total hazard of censoring. We find
(5)
and
with

is used to test the null hypothesis
, which is the intersection of hypothesis
and hypothesis
:
:
;
:
.
is to test
and
is to test
. Similarly, we derive the expression for the true win ratio
as
(6)
is used to test the hypothesis
as well.
In summary, both
and
are used to jointly test if there is any treatment effect on the hazard of death and/or on the conditional hazard of hospitalization given the observed information on death. If the new treatment arm demonstrates sufficient evidence of efficacy in delaying either death or hospitalization, the null hypothesis will be rejected. Both tests will have better power when both the alternative hypothesis
and
are in the same direction, that is,
:
and
:
with at least one strict inequality.
2.3 Variance Estimation
The variance of
can be calculated by the technique of U‐statistics using Hájek's projection (Hájek, 1968). This variance expression involves the distributions of
and
, therefore one may replace them with their consistent estimators to obtain a consistent variance estimator for
. This procedure can be quite cumbersome, as a good estimator for the bivariate distribution of
may not be easy to obtain. We propose an alternative approach to derive the variance estimation.
and
, define

,
. Let

be their corresponding estimates with “r” replaced by “R” in the above expressions. Using the expo‐nential inequality for U‐statistics (Giné, Latała, and Zinn, 2000; Houdré and Reynaud‐Bouret, 2003), we can show that
almost surely, where
. The details can be obtained from the authors. With these approximations, we estimate the variance
of
as
, where, for
,
and
. Furthermore, we can show that
converges to
and
converges in distribution to a mean‐zero normal distribution with variance
, which can be consistently estimated by
, where

converges in distribution to a mean zero normal distribution with variance
, which can be consistently estimated by
, therefore, an approximate
confidence interval for
is given by

is the upper
percentile of the standard normal distribution. We provide R code in the Supplementary Materials to calculate
,
and their variance estimates.
3 Simulation
3.1 Simulation Setup
be the hazard rate for hospitalization, where
if a subject is on new treatment and
if on standard treatment. Similarly, let
be the hazard rate for death. We use three different joint distributions for
. The first distribution is a bivariate exponential distribution with Gumbel–Hougaard copula, which has the joint survival function:

is the parameter controlling the correlation between
and
(Kendall's concordance
equals to
). The second distribution is a bivariate exponential distribution with bivariate normal copula, which has the joint distribution function:

is the distribution function of the (univariate) standard normal distribution with
being its inverse and
is the distribution function of the bivariate normal distribution with mean zero, variance one and correlation coefficient
. The last distribution is the Marshall–Oklin bivariate distribution, which has the joint survival function:

modulates the correlation between
and
.
Independent of
given
, the censoring variable
has an exponential distribution with rate
.
Throughout the simulation, we fixed parameters
,
,
,
,
,
and
. We then varied
and
in each distribution. We simulated a two‐arm parallel trial with 150 subjects per treatment group, so the sample size
. The number of replications was 1000 for each setting. The Gumbel‐Hougaard bivariate exponential distribution was generated using the R package “Gumbel” (Caillat et al., 2008; R Development Core Team, 2011).
3.2 Simulation Results
Table 1 summarizes the simulation results from different settings. Here we report the logarithm of the win ratio including its true value, mean of the estimates, sample variance, mean of the estimated variance and coverage probabilities. We also report the empirical powers of the log win ratio statistic and other commonly used statistics such as log‐rank and Gehan tests based on
and
. Note that different test statistics are testing different null hypotheses, however, they are often indistinguishably used to evaluate if the new treatment is efficacious. The empirical power is calculated as the proportion of
in the 1000 replicates, where T is the test statistic (e.g., log win ratio, log‐rank statistic, Gehan statistic) and
is the corresponding estimated standard deviation.
| CP | Power | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dist. | ![]() |
![]() |
![]() |
Mean | SV | EV | 80 | 90 | 95 | WR | LRD | GD | LRM | GM |
| GH | 0.0 | 0.0 | 0.00 | 0.00 | 8.80 | 8.73 | 80 | 90 | 95 | 5 | 5 | 5 | 6 | 6 |
| GH | 0.2 | 0.5 | 0.29 | 0.29 | 9.45 | 9.54 | 80 | 90 | 95 | 36 | 22 | 17 | 65 | 54 |
| GH | 0.3 | 0.3 | 0.30 | 0.30 | 9.45 | 9.40 | 80 | 90 | 96 | 39 | 41 | 33 | 52 | 39 |
| GH | 0.5 | 0.2 | 0.38 | 0.39 | 9.42 | 9.47 | 81 | 90 | 95 | 57 | 79 | 69 | 51 | 39 |
| BN | 0.0 | 0.0 | 0.00 | 0.00 | 7.92 | 8.07 | 81 | 90 | 96 | 5 | 5 | 5 | 6 | 5 |
| BN | 0.2 | 0.5 | 0.28 | 0.28 | 8.77 | 8.80 | 81 | 91 | 95 | 38 | 22 | 19 | 63 | 52 |
| BN | 0.3 | 0.3 | 0.29 | 0.30 | 8.61 | 8.66 | 80 | 89 | 95 | 40 | 41 | 33 | 48 | 38 |
| BN | 0.5 | 0.2 | 0.38 | 0.38 | 8.69 | 8.75 | 81 | 90 | 94 | 61 | 81 | 67 | 53 | 40 |
| MO | 0.0 | 0.0 | 0.00 | 0.00 | 6.66 | 6.68 | 81 | 91 | 95 | 5 | 6 | 5 | 6 | 5 |
| MO | 0.2 | 0.5 | 0.13 | 0.13 | 6.90 | 6.91 | 81 | 91 | 95 | 13 | 12 | 8 | 38 | 29 |
| MO | 0.3 | 0.3 | 0.14 | 0.14 | 6.75 | 6.85 | 80 | 91 | 95 | 16 | 15 | 12 | 28 | 21 |
| MO | 0.5 | 0.2 | 0.19 | 0.20 | 6.82 | 6.87 | 81 | 91 | 95 | 26 | 29 | 22 | 32 | 24 |
GH, Gumbel–Houggard bivariate exponential; BN, bivariate exponential with bivariate normal copula; MO, Marshall–Oklin bivariate exponential;
, the logarithm of the win ratio; Mean, mean of the estimates
; SV, simulation variance of
; EV, mean of the estimated variances of
; CP, coverage probability calculated based on the estimated variances; Power, empirical power; WR, test based on win ratio statistic; LRD and GD, log‐rank and Gehan tests based on death; LRM and GM, log‐rank and Gehan tests based on first occurrence of death and hospitalization.
The results show that, regardless of the magnitude of treatment effect size and type of joint distributions, the (log) win ratio estimate approximates the true value reasonably well, the proposed closed‐form variance estimate is very close to the sample variance, and the coverage probabilities are close to the nominal levels. We also observed that the order of empirical powers across different tests varies depending on the magnitude of treatment effect size and the type of joint distributions, which is not surprising as these tests are for different null hypotheses.
4 Real Data Analysis
ATLAS ACS 2 TIMI 51 was a double‐blind, placebo controlled, randomized trial to investigate the effect of Rivaroxaban in preventing cardiovascular outcomes in patients with acute coronary syndrome (Mega et al., 2012). For illustration purpose, we re‐analyzed the events of MI, stroke and death occurred during the first 90 days after randomization among subjects in Rivaroxaban 2.5 mg and placebo treatment arms with intention to use Asprin and Thienopyridine at baseline.
Table 2 presents the results using the traditional analysis including Cox proportional hazards model, log‐rank test and Gehan test for the time to death and the time to the first occurrence of MI, stroke or death. The composite event occurred in
(132/4765) and
(170/4760) of Rivaroxaban and placebo subjects, respectively; The hazard ratio is
with
confidence interval
. The hazard ratio can be interpreted as the hazard of experiencing the composite endpoint for an individual on the Rivaroxaban arm relative to an individual on the placebo arm. Hazard ratio of
with the upper
confidence limit less than 1 demonstrates that Rivaroxaban reduced the risks of experiencing MI, stroke or death.
| Riva | Placebo | HR and CI
|
Log‐rank | Gehan | |
|---|---|---|---|---|---|
| No. of patients | 4765 | 4760 | |||
| No. of patients having either | 132 | 170 | 0.78 (0.62, 0.97) | ||
| MI, stroke, or death | ![]() |
![]() |
![]() |
||
| No. of patients having death | 7 | 14 | |||
| after MI or stroke | |||||
| No. of death | 44 | 64 | 0.69 (0.47, 1.01) | ||
![]() |
![]() |
![]() |
: p‐values are based on Wald tests.
Table 3 presents the win ratio results where death is considered of higher priority than MI/stroke. Every subject in Rivaroxaban arm was compared to every subject in placebo arm, which resulted in a total of 22,681,400
patient pairs. Among such pairs, Rivaroxaban had 292,132 wins on death and then 480,373 wins on MI/stroke, and had 202,868 losses on death and then 392,886 losses on MI/stroke. The win ratio of Rivaroxaban is therefore
, which is calculated as the total number of wins divided by the total number of losses. Win ratio of 1.30 with the lower 95% confidence limit greater 1 shows that Rivaroxaban was effective in delaying the occurrence of MI, stroke or death.
| (a) Death on Riva first | 202,868 |
| (b) Death on Placebo first | 292,132 |
| (c) MI or stroke on Riva first | 392,886 |
| (d) MI or stroke on Placebo first | 480,373 |
| Total No. of pairs | 22,681,400 |
| Win ratio | 1.30 |
CI
|
![]() |
| p‐value | 0.025 |
Both approaches provide evidence that Rivaroxaban is efficacious in preventing MI, stroke or death within first 90 days of randomization.
5 Discussion
In this article, we formulate the win ratio approach within a statistical framework accounting for the “time to event” nature of the data. We found that the win ratio statistic is used to test a union of two null hypotheses: (1) no effect on hazard rate for death over time; (2) no effect on hazard rate for hospitalization conditioning on death over time. The statistical formulation of the win ratio approach helps us understand the potential performance of this novel approach and potentially identify situations where the win ratio approach can perform better than the traditional time to first event analysis. Further work might be done for improving performance of the current win ratio method.
Furthermore, we derived the explicit variance estimator for the win ratio statistic, which is computationally efficient and has a relatively simple form. Results from our simulation experiment show that the proposed variance estimate is close to sample variance, thus, the confidence interval based on the derived variance provides good coverage probability.
Methodology discussed here can be extended to any composite endpoint consisting of both fatal and non‐fatal outcomes, and can be extended to the case with more than one composite endpoint and multiple events in a similar fashion.
6 Supplementary Materials
R code implementing the variance estimation method is available with this article at the Biometrics website on Wiley Online Library.
Acknowledgements
The authors thank Dr. Ying Kuen Cheung for his helpful discussions on this article. Xiaodong Luo was partly supported by National Institute of Health grant P50AG05138. Wei Yann Tsai was partially supported by National Center for Theoretical Sciences (South), Taiwan.
Appendix A
1 Proof of 5



’ is by integration‐by‐parts on
. Therefore,

References
Citing Literature
Number of times cited according to CrossRef: 25
- Björn Redfors, John Gregson, Aaron Crowley, Thomas McAndrew, Ori Ben-Yehuda, Gregg W Stone, Stuart J Pocock, The win ratio approach for composite endpoints: practical guidance based on previous experience, European Heart Journal, 10.1093/eurheartj/ehaa665, (2020).
- Xiaodong Luo, Hui Quan, Some Meaningful Weighted Log-Rank and Weighted Win Loss Statistics, Statistics in Biosciences, 10.1007/s12561-020-09273-4, (2020).
- Lei Peng, The use of the win odds in the design of non-inferiority clinical trials, Journal of Biopharmaceutical Statistics, 10.1080/10543406.2020.1757690, (1-6), (2020).
- Gaohong Dong, Lu Mao, Bo Huang, Margaret Gamalo-Siebers, Jiuzhou Wang, GuangLei Yu, David C. Hoaglin, The inverse-probability-of-censoring weighting (IPCW) adjusted win ratio statistic: an unbiased estimator in the presence of independent censoring, Journal of Biopharmaceutical Statistics, 10.1080/10543406.2020.1757692, (1-18), (2020).
- Samvel B Gasparyan, Folke Folkvaljon, Olof Bengtsson, Joan Buenconsejo, Gary G Koch, Adjusted win ratio with stratification: Calculation methods and interpretation, Statistical Methods in Medical Research, 10.1177/0962280220942558, (096228022094255), (2020).
- Ziqiao Wang, Jie Chen, Testing for Trend in Benefit-Risk Analysis with Prioritized Multiple Outcomes, Statistics in Biopharmaceutical Research, 10.1080/19466315.2019.1690037, (1-10), (2020).
- Lu Mao, Tuo Wang, A class of proportional win‐fractions regression models for composite outcomes, Biometrics, 10.1111/biom.13382, 0, 0, (2020).
- Gaohong Dong, Bo Huang, Yu‐Wei Chang, Yodit Seifu, James Song, David C. Hoaglin, The win ratio: Impact of censoring and follow‐up time and use with nonproportional hazards, Pharmaceutical Statistics, 10.1002/pst.1977, 19, 3, (168-177), (2019).
- Marta Bofill Roig, Guadalupe Gómez Melis, A new approach for sizing trials with composite binary endpoints using anticipated marginal values and accounting for the correlation between components, Statistics in Medicine, 10.1002/sim.8092, 38, 11, (1935-1956), (2019).
- Gaohong Dong, David C. Hoaglin, Junshan Qiu, Roland A. Matsouaka, Yu-Wei Chang, Jiuzhou Wang, Marc Vandemeulebroecke, The win ratio: On interpretation and handling of ties, Statistics in Biopharmaceutical Research, 10.1080/19466315.2019.1575279, (1-14), (2019).
- Lu Mao, On the alternative hypotheses for the win ratio, Biometrics, 10.1111/biom.12954, 75, 1, (347-351), (2018).
- Xiaodong Luo, Hong Tian, Surya Mohanty, Wei Yann Tsai, Rejoinder to “on the alternative hypotheses for the win ratio”, Biometrics, 10.1111/biom.12953, 75, 1, (352-354), (2018).
- Dianne M. Finkelstein, David A. Schoenfeld, Graphing the Win Ratio and its components over time, Statistics in Medicine, 10.1002/sim.7895, 38, 1, (53-61), (2018).
- Ionut Bebu, John M. Lachin, Properties of composite time to first event versus joint marginal analyses of multiple outcomes, Statistics in Medicine, 10.1002/sim.7849, 37, 27, (3918-3930), (2018).
- Robin Ristl, Susanne Urach, Gerd Rosenkranz, Martin Posch, Methods for the analysis of multiple endpoints in small populations: A review, Journal of Biopharmaceutical Statistics, 10.1080/10543406.2018.1489402, (1-29), (2018).
- Dennis Dobler, Markus Pauly, Bootstrap- and permutation-based inference for the Mann–Whitney effect for right-censored and tied data, TEST, 10.1007/s11749-017-0565-z, 27, 3, (639-658), (2017).
- Wei Yann Tsai, Xiaodong Luo, John Crowley, The Probability of Being in Response Function and Its Applications, Frontiers of Biostatistical Methods and Applications in Clinical Oncology, 10.1007/978-981-10-0126-0, (151-164), (2017).
- Xiaodong Luo, Junshan Qiu, Steven Bai, Hong Tian, Weighted win loss approach for analyzing prioritized outcomes, Statistics in Medicine, 10.1002/sim.7284, 36, 15, (2452-2465), (2017).
- Junshan Qiu, Xiaodong Luo, Steven Bai, Hong Tian, Mike Mikailov, WWR: An R package for analyzing prioritized outcomes, Journal of Medical Statistics and Informatics, 1072432053766254, 5, 1, (4), (2017).
- Gaohong Dong, Junshan Qiu, Duolao Wang, Marc Vandemeulebroecke, The stratified win ratio, Journal of Biopharmaceutical Statistics, 10.1080/10543406.2017.1397007, (1-19), (2017).
- Julien Péron, Marc Buyse, Brice Ozenne, Laurent Roche, Pascal Roy, An extension of generalized pairwise comparisons for prioritized outcomes in the presence of censoring, Statistical Methods in Medical Research, 10.1177/0962280216658320, 27, 4, (1230-1239), (2016).
- D. Oakes, On the win-ratio statistic in clinical trials with multiple types of event, Biometrika, 10.1093/biomet/asw026, 103, 3, (742-745), (2016).
- Gaohong Dong, Di Li, Steffen Ballerstedt, Marc Vandemeulebroecke, A generalized analytic solution to the win ratio to analyze a composite endpoint considering the clinical importance order among components, Pharmaceutical Statistics, 10.1002/pst.1763, 15, 5, (430-437), (2016).
- Ionut Bebu, John M. Lachin, Large sample inference for a win ratio analysis of a composite outcome based on prioritized components, Biostatistics, 10.1093/biostatistics/kxv032, 17, 1, (178-187), (2015).
- Marc Buyse, Generalized Pairwise Comparisons, Wiley StatsRef: Statistics Reference Online, 10.1002/9781118445112, (1-9), (2014).






CI






CI


