Volume 71, Issue 1
BIOMETRIC METHODOLOGY
Free Access

An alternative approach to confidence interval estimation for the win ratio statistic

Xiaodong Luo

Corresponding Author

Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, New York 10029, U.S.A.

email: xiaodong.luo@mssm.eduSearch for more papers by this author
Hong Tian

Janssen Research and Development, Raritan, New Jersey 08869, U.S.A.

Search for more papers by this author
Surya Mohanty

Janssen Research and Development, Raritan, New Jersey 08869, U.S.A.

Search for more papers by this author
Wei Yann Tsai

Department of Biostatistics, Columbia University, New York, New York 10032, U.S.A.

Search for more papers by this author
First published: 25 August 2014
Citations: 25

Summary

Pocock et al. (2012, European Heart Journal 33, 176–182) proposed a win ratio approach to analyzing composite endpoints comprised of outcomes with different clinical priorities. In this article, we establish a statistical framework for this approach. We derive the null hypothesis and propose a closed‐form variance estimator for the win ratio statistic in all pairwise matching situation. Our simulation study shows that the proposed variance estimator performs well regardless of the magnitude of treatment effect size and the type of the joint distribution of the outcomes.

1 Introduction

In cardiovascular (CV) trials, the endpoint of interest is often a composite of two or more types of related clinical events (e.g., composite of CV death, myocardial infarction and stroke; composite of hospitalization and death) and the analysis typically focuses on the time to first event. This approach is not ideal as it reduces a multi‐dimensional problem into a one‐dimensional problem by analyzing only the occurrence of the first of several events and ignores the subsequent, sometimes more important events such as CV death.

Data consisting of a non‐fatal event and a fatal event are called semi‐competing risks data (Fine, Jiang, and Chappell, 2001). Data of this kind have been modeled by assuming that the joint survivor function of the two event times follows a copula, with observations restricted to an area where fatal event time is greater than non‐fatal event time (Fine et al., 2001; Peng and Fine, 2007). Xu, Kalbfleisch, and Tai (2010) noted that semi‐competing risks data are in fact illness‐death data (Fix and Neyman, 1951; Sverdrup, 1965). As such, techniques developed for illness‐death data can also be applied to semi‐competing risks data (Andersen et al., 1993; Kalbfleisch and Prentice, 2002; Xu et al., 2010; Allignol et al., 2013). For situations where the non‐fatal component includes recurrent events, Ghosh and Lin, (2000, 2002) used the mean cumulative incidence function adjusted for death. Liu, Wolfe, and Huang (2004), Cowling, Hutton, and Shaw (2006) and Zeng and Lin (2009) jointly modeled the recurrent events and the fatal event based on shared frailty models.

Analysis of semi‐competing risks data is complicated by the fact that fatal events can censor non‐fatal events, potentially resulting in informative censoring. The traditional time to event analysis only considers the first event, however non‐fatal events always occur before death. Consequently, in the traditional analysis, the non‐fatal events are given higher priority than the far more serious event of death.

Pocock et al. (2012) proposed an alternative method to analyze composite endpoints to account for the order of clinical priorities. Subjects from new treatment and standard treatment arms are first paired, and subsequently, within each pair, a “winner” or “loser” is determined by comparing “time to component events” sequentially according to the order of the clinical priorities. A win ratio comparing the new and standard treatments can then be computed. Pocock et al. (2012) discussed “matched” and “unmatched” versions of the win ratio statistic separately. In the “matched” version, each subject from the new treatment arm is matched with a unique subject from the standard treatment arm based on a metric, which is a function of baseline risk scores. In the “unmatched” version, all possible pairing between the subjects in the new treatment arm and the subjects in the standard treatment arm are considered. In practice, implementation of the matched pair approach is not trivial. It necessitates the calculation of a baseline risk score, which involves first identifying risk factors, and then assigning them appropriate weights, both of which can be challenging and very subjective. Furthermore, with the “matched” approach, a given dataset might not result in a unique estimate of the win ratio. This can occur, for example, when the two treatment groups do not have an equal number of subjects or when a few subjects have the same baseline risk score. Finally, we have the basic problem with matching, which is that it might not be possible when the risk factors differ for the components of a composite endpoint. Therefore, in this article, we focus our discussion on the “unmatched” version of the win ratio statistic, which does not require matching based on baseline scores, scoring methods and subjective choice of weights and reduces the potential loss of information due to lack of matched pairs, therefore much more desirable as a test statistic compared to that from the “matched” version.

The win ratio approach seems intuitive and therefore appealing to researchers. However, in the original article, unlike classical methods for analyzing time‐to‐event data (e.g., log‐rank test, Cox proportional hazards model), the null hypothesis being tested using the win ratio statistic is unclear. Also, for the “unmatched” win ratio, the original article does not provide a closed‐form solution for the variance, but recommends a computationally intensive re‐sampling estimation method.

In this article, we have developed a statistical framework for using the win ratio approach by providing the null hypothesis to be tested. We believe that our effort will not only lead to a better understanding of the win ratio approach, but also provide a new methodology for the analysis of semi‐competing risks data. Using the U‐statistics technique, we provide a closed‐form variance estimator for the win ratio in the “unmatched” situation. A simulation experiment is used to evaluate the performance of the proposed variance estimator. Finally, we illustrate the application of our methodology to a real data example.

2 Method

2.1 Formulation of the Statistical Test

In this section, we formulate the statistical test for a composite endpoint consisting of death and hospitalization. Death is considered to be the event of higher priority. Each pair of subjects are compared initially with respect to “time to death” then to “time to hospitalization.” Death censors the event of hospitalization but not vice versa (i.e., the potential hospitalization cannot be observed after death).

Let urn:x-wiley:15410420:media:biom12225:biom12225-math-0001 and urn:x-wiley:15410420:media:biom12225:biom12225-math-0002 be two random variables denoting the time to hospitalization and the time to death, respectively. These two variables are usually correlated to each other. urn:x-wiley:15410420:media:biom12225:biom12225-math-0003 can right‐censor urn:x-wiley:15410420:media:biom12225:biom12225-math-0004 but not vice versa. Let urn:x-wiley:15410420:media:biom12225:biom12225-math-0005 denote the new treatment group and urn:x-wiley:15410420:media:biom12225:biom12225-math-0006 the standard treatment group. In addition, urn:x-wiley:15410420:media:biom12225:biom12225-math-0007 is the time to censoring, which is assumed to be independent of urn:x-wiley:15410420:media:biom12225:biom12225-math-0008 given Z.

Based on the ranking of urn:x-wiley:15410420:media:biom12225:biom12225-math-0009, urn:x-wiley:15410420:media:biom12225:biom12225-math-0010, and urn:x-wiley:15410420:media:biom12225:biom12225-math-0011, the observed data can be divided into the following four distinct categories and urn:x-wiley:15410420:media:biom12225:biom12225-math-0012 is the category indicator ranging from 1 to 4.
  1. If urn:x-wiley:15410420:media:biom12225:biom12225-math-0013 and urn:x-wiley:15410420:media:biom12225:biom12225-math-0014 are observed, that is, urn:x-wiley:15410420:media:biom12225:biom12225-math-0015, urn:x-wiley:15410420:media:biom12225:biom12225-math-0016;
  2. If only urn:x-wiley:15410420:media:biom12225:biom12225-math-0017 is observed, that is, urn:x-wiley:15410420:media:biom12225:biom12225-math-0018 and urn:x-wiley:15410420:media:biom12225:biom12225-math-0019, urn:x-wiley:15410420:media:biom12225:biom12225-math-0020;
  3. If neither urn:x-wiley:15410420:media:biom12225:biom12225-math-0021 nor urn:x-wiley:15410420:media:biom12225:biom12225-math-0022 is observed, that is, urn:x-wiley:15410420:media:biom12225:biom12225-math-0023 and urn:x-wiley:15410420:media:biom12225:biom12225-math-0024, urn:x-wiley:15410420:media:biom12225:biom12225-math-0025;
  4. If only urn:x-wiley:15410420:media:biom12225:biom12225-math-0026 is observed, that is, urn:x-wiley:15410420:media:biom12225:biom12225-math-0027, urn:x-wiley:15410420:media:biom12225:biom12225-math-0028.

Let urn:x-wiley:15410420:media:biom12225:biom12225-math-0029 and urn:x-wiley:15410420:media:biom12225:biom12225-math-0030. Let urn:x-wiley:15410420:media:biom12225:biom12225-math-0031 be the corresponding event indicator for urn:x-wiley:15410420:media:biom12225:biom12225-math-0032 where urn:x-wiley:15410420:media:biom12225:biom12225-math-0033 if the time of hospitalization is observed; otherwise, urn:x-wiley:15410420:media:biom12225:biom12225-math-0034. urn:x-wiley:15410420:media:biom12225:biom12225-math-0035 indicates the censoring status for urn:x-wiley:15410420:media:biom12225:biom12225-math-0036 where urn:x-wiley:15410420:media:biom12225:biom12225-math-0037 if the time of death is observed; otherwise, urn:x-wiley:15410420:media:biom12225:biom12225-math-0038. See Figure 1 for more details. The observed data urn:x-wiley:15410420:media:biom12225:biom12225-math-0039, urn:x-wiley:15410420:media:biom12225:biom12225-math-0040, are the independently identically distributed samples of urn:x-wiley:15410420:media:biom12225:biom12225-math-0041.

image
Subject classification based on the ranking of urn:x-wiley:15410420:media:biom12225:biom12225-math-0042, urn:x-wiley:15410420:media:biom12225:biom12225-math-0043, and urn:x-wiley:15410420:media:biom12225:biom12225-math-0044, where for any a and b, urn:x-wiley:15410420:media:biom12225:biom12225-math-0045.
Each patient pair consists of one subject from the new treatment arm and the other from the standard treatment arm. Pocock et al. (2012) classified the comparison within a patient pair into five possible scenarios. The comparison is first based on information on the “time to death”; if a tie cannot be broken using the death information, then, the “time to hospitalization” will be used for comparison. Figure 2 illustrates the steps involved in conducting this comparison within each patient pair. There are five possible scenarios to consider.
  1. patient in new treatment arm has death first;
  2. patient in standard treatment arm has death first;
  3. patient in new treatment arm has hospitalization first;
  4. patient in standard treatment arm has hospitalization first;
  5. none of the above.
image
Comparison of new and standard treatments within each pair
For two different patients i and j, suppose patient i is from the standard treatment group and patient j is from the new treatment group. If the comparison between these two patients falls into scenario (b), where the patient in the standard treatment arm has death first regardless of the information on hospitalization, then the new treatment is a winner. A win on delaying death for the new treatment can be claimed only when the time to death for patient i is observed and the time to censoring or death for patient j is greater. These conditions are met if and only if the following indicator function is equal to 1.
urn:x-wiley:15410420:media:biom12225:biom12225-math-0046(1)
Similarly, the comparison falls into scenario (a) if and only if the following indicator function is equal to 1.
urn:x-wiley:15410420:media:biom12225:biom12225-math-0047(2)
When the comparison does not fall in either (a) or (b), that is, a tie cannot be broken using the “time to death” information, the “time to hospitalization” information will be used to determine a winner. The new treatment can claim a win on delaying hospitalization if patient i has hospitalization first. In this case, the comparison falls into scenario (d), which means the following indicator function is equal to 1.
urn:x-wiley:15410420:media:biom12225:biom12225-math-0048(3)
Similarly, the comparison falls into scenario (c) if and only if the following indicator function is equal to 1.
urn:x-wiley:15410420:media:biom12225:biom12225-math-0049(4)
All the other situations are gathered in scenario (e), which means no winner can be determined after the death and hospitalization information has been exploited.

The total number of “wins” for the new treatment group is urn:x-wiley:15410420:media:biom12225:biom12225-math-0050, and the total number of “losses” for the new treatment group is urn:x-wiley:15410420:media:biom12225:biom12225-math-0051, where urn:x-wiley:15410420:media:biom12225:biom12225-math-0052 are the total numbers of the comparisons in categories (a), (b), (c), and (d), respectively. urn:x-wiley:15410420:media:biom12225:biom12225-math-0053 can be obtained by summing over the double index i and j for the index function in 2; urn:x-wiley:15410420:media:biom12225:biom12225-math-0054,urn:x-wiley:15410420:media:biom12225:biom12225-math-0055, and urn:x-wiley:15410420:media:biom12225:biom12225-math-0056 can be similarly obtained from 1, 3, and 4, respectively.

The problem of interest then, is to compare the number of wins to the number of losses in the new treatment group. Pocock et al. (2012) defined the win ratio as urn:x-wiley:15410420:media:biom12225:biom12225-math-0057. For mathematical simplicity, we first work on the win difference urn:x-wiley:15410420:media:biom12225:biom12225-math-0058.

2.2 Null Hypothesis

To investigate which quantity the win ratio is truly estimating, we calculate the expected value of urn:x-wiley:15410420:media:biom12225:biom12225-math-0059 through some straightforward but tedious calculations as shown in the Appendix. For easy presentation, we assume that the joint distribution of urn:x-wiley:15410420:media:biom12225:biom12225-math-0060 is continuous. Let urn:x-wiley:15410420:media:biom12225:biom12225-math-0061 be the hazard function of urn:x-wiley:15410420:media:biom12225:biom12225-math-0062 in group k, urn:x-wiley:15410420:media:biom12225:biom12225-math-0063 be the conditional hazard of urn:x-wiley:15410420:media:biom12225:biom12225-math-0064 in group k, and urn:x-wiley:15410420:media:biom12225:biom12225-math-0065 be the hazard function of urn:x-wiley:15410420:media:biom12225:biom12225-math-0066 in group k, urn:x-wiley:15410420:media:biom12225:biom12225-math-0067. Also, let urn:x-wiley:15410420:media:biom12225:biom12225-math-0068 be the total hazard of censoring. We find
urn:x-wiley:15410420:media:biom12225:biom12225-math-0069(5)
where urn:x-wiley:15410420:media:biom12225:biom12225-math-0070 and urn:x-wiley:15410420:media:biom12225:biom12225-math-0071 with
urn:x-wiley:15410420:media:biom12225:biom12225-math-0072
Hence, the test statistic urn:x-wiley:15410420:media:biom12225:biom12225-math-0073 is used to test the null hypothesis urn:x-wiley:15410420:media:biom12225:biom12225-math-0074, which is the intersection of hypothesis urn:x-wiley:15410420:media:biom12225:biom12225-math-0075 and hypothesis urn:x-wiley:15410420:media:biom12225:biom12225-math-0076:
  • urn:x-wiley:15410420:media:biom12225:biom12225-math-0077: urn:x-wiley:15410420:media:biom12225:biom12225-math-0078;
  • urn:x-wiley:15410420:media:biom12225:biom12225-math-0079: urn:x-wiley:15410420:media:biom12225:biom12225-math-0080.
Furthermore, urn:x-wiley:15410420:media:biom12225:biom12225-math-0081 is to test urn:x-wiley:15410420:media:biom12225:biom12225-math-0082 and urn:x-wiley:15410420:media:biom12225:biom12225-math-0083 is to test urn:x-wiley:15410420:media:biom12225:biom12225-math-0084. Similarly, we derive the expression for the true win ratio urn:x-wiley:15410420:media:biom12225:biom12225-math-0085 as
urn:x-wiley:15410420:media:biom12225:biom12225-math-0086(6)
Thus, the win ratio urn:x-wiley:15410420:media:biom12225:biom12225-math-0087 is used to test the hypothesis urn:x-wiley:15410420:media:biom12225:biom12225-math-0088 as well.

In summary, both urn:x-wiley:15410420:media:biom12225:biom12225-math-0089 and urn:x-wiley:15410420:media:biom12225:biom12225-math-0090 are used to jointly test if there is any treatment effect on the hazard of death and/or on the conditional hazard of hospitalization given the observed information on death. If the new treatment arm demonstrates sufficient evidence of efficacy in delaying either death or hospitalization, the null hypothesis will be rejected. Both tests will have better power when both the alternative hypothesis urn:x-wiley:15410420:media:biom12225:biom12225-math-0091 and urn:x-wiley:15410420:media:biom12225:biom12225-math-0092 are in the same direction, that is, urn:x-wiley:15410420:media:biom12225:biom12225-math-0093: urn:x-wiley:15410420:media:biom12225:biom12225-math-0094 and urn:x-wiley:15410420:media:biom12225:biom12225-math-0095: urn:x-wiley:15410420:media:biom12225:biom12225-math-0096 with at least one strict inequality.

2.3 Variance Estimation

The variance of urn:x-wiley:15410420:media:biom12225:biom12225-math-0097 can be calculated by the technique of U‐statistics using Hájek's projection (Hájek, 1968). This variance expression involves the distributions of urn:x-wiley:15410420:media:biom12225:biom12225-math-0098 and urn:x-wiley:15410420:media:biom12225:biom12225-math-0099, therefore one may replace them with their consistent estimators to obtain a consistent variance estimator for urn:x-wiley:15410420:media:biom12225:biom12225-math-0100. This procedure can be quite cumbersome, as a good estimator for the bivariate distribution of urn:x-wiley:15410420:media:biom12225:biom12225-math-0101 may not be easy to obtain. We propose an alternative approach to derive the variance estimation.

For any urn:x-wiley:15410420:media:biom12225:biom12225-math-0102 and urn:x-wiley:15410420:media:biom12225:biom12225-math-0103, define
urn:x-wiley:15410420:media:biom12225:biom12225-math-0104
and define urn:x-wiley:15410420:media:biom12225:biom12225-math-0105, urn:x-wiley:15410420:media:biom12225:biom12225-math-0106. Let
urn:x-wiley:15410420:media:biom12225:biom12225-math-0107
Let urn:x-wiley:15410420:media:biom12225:biom12225-math-0108 be their corresponding estimates with “r” replaced by “R” in the above expressions. Using the expo‐nential inequality for U‐statistics (Giné, Latała, and Zinn, 2000; Houdré and Reynaud‐Bouret, 2003), we can show that urn:x-wiley:15410420:media:biom12225:biom12225-math-0109 almost surely, where urn:x-wiley:15410420:media:biom12225:biom12225-math-0110. The details can be obtained from the authors. With these approximations, we estimate the variance urn:x-wiley:15410420:media:biom12225:biom12225-math-0111 of urn:x-wiley:15410420:media:biom12225:biom12225-math-0112 as urn:x-wiley:15410420:media:biom12225:biom12225-math-0113, where, for urn:x-wiley:15410420:media:biom12225:biom12225-math-0114, urn:x-wiley:15410420:media:biom12225:biom12225-math-0115 and urn:x-wiley:15410420:media:biom12225:biom12225-math-0116. Furthermore, we can show that urn:x-wiley:15410420:media:biom12225:biom12225-math-0117 converges to urn:x-wiley:15410420:media:biom12225:biom12225-math-0118 and urn:x-wiley:15410420:media:biom12225:biom12225-math-0119 converges in distribution to a mean‐zero normal distribution with variance urn:x-wiley:15410420:media:biom12225:biom12225-math-0120, which can be consistently estimated by urn:x-wiley:15410420:media:biom12225:biom12225-math-0121, where
urn:x-wiley:15410420:media:biom12225:biom12225-math-0122
With the logarithmic transformation, urn:x-wiley:15410420:media:biom12225:biom12225-math-0123 converges in distribution to a mean zero normal distribution with variance urn:x-wiley:15410420:media:biom12225:biom12225-math-0124, which can be consistently estimated by urn:x-wiley:15410420:media:biom12225:biom12225-math-0125, therefore, an approximate urn:x-wiley:15410420:media:biom12225:biom12225-math-0126 confidence interval for urn:x-wiley:15410420:media:biom12225:biom12225-math-0127 is given by
urn:x-wiley:15410420:media:biom12225:biom12225-math-0128
where urn:x-wiley:15410420:media:biom12225:biom12225-math-0129 is the upper urn:x-wiley:15410420:media:biom12225:biom12225-math-0130 percentile of the standard normal distribution. We provide R code in the Supplementary Materials to calculate urn:x-wiley:15410420:media:biom12225:biom12225-math-0131, urn:x-wiley:15410420:media:biom12225:biom12225-math-0132 and their variance estimates.

3 Simulation

3.1 Simulation Setup

Let urn:x-wiley:15410420:media:biom12225:biom12225-math-0141 be the hazard rate for hospitalization, where urn:x-wiley:15410420:media:biom12225:biom12225-math-0142 if a subject is on new treatment and urn:x-wiley:15410420:media:biom12225:biom12225-math-0143 if on standard treatment. Similarly, let urn:x-wiley:15410420:media:biom12225:biom12225-math-0144 be the hazard rate for death. We use three different joint distributions for urn:x-wiley:15410420:media:biom12225:biom12225-math-0145. The first distribution is a bivariate exponential distribution with Gumbel–Hougaard copula, which has the joint survival function:
urn:x-wiley:15410420:media:biom12225:biom12225-math-0146
where urn:x-wiley:15410420:media:biom12225:biom12225-math-0147 is the parameter controlling the correlation between urn:x-wiley:15410420:media:biom12225:biom12225-math-0148 and urn:x-wiley:15410420:media:biom12225:biom12225-math-0149 (Kendall's concordance urn:x-wiley:15410420:media:biom12225:biom12225-math-0150 equals to urn:x-wiley:15410420:media:biom12225:biom12225-math-0151). The second distribution is a bivariate exponential distribution with bivariate normal copula, which has the joint distribution function:
urn:x-wiley:15410420:media:biom12225:biom12225-math-0152
where urn:x-wiley:15410420:media:biom12225:biom12225-math-0153 is the distribution function of the (univariate) standard normal distribution with urn:x-wiley:15410420:media:biom12225:biom12225-math-0154 being its inverse and urn:x-wiley:15410420:media:biom12225:biom12225-math-0155 is the distribution function of the bivariate normal distribution with mean zero, variance one and correlation coefficient urn:x-wiley:15410420:media:biom12225:biom12225-math-0156. The last distribution is the Marshall–Oklin bivariate distribution, which has the joint survival function:
urn:x-wiley:15410420:media:biom12225:biom12225-math-0157
where urn:x-wiley:15410420:media:biom12225:biom12225-math-0158 modulates the correlation between urn:x-wiley:15410420:media:biom12225:biom12225-math-0159 and urn:x-wiley:15410420:media:biom12225:biom12225-math-0160.

Independent of urn:x-wiley:15410420:media:biom12225:biom12225-math-0161 given urn:x-wiley:15410420:media:biom12225:biom12225-math-0162, the censoring variable urn:x-wiley:15410420:media:biom12225:biom12225-math-0163 has an exponential distribution with rate urn:x-wiley:15410420:media:biom12225:biom12225-math-0164.

Throughout the simulation, we fixed parameters urn:x-wiley:15410420:media:biom12225:biom12225-math-0165, urn:x-wiley:15410420:media:biom12225:biom12225-math-0166, urn:x-wiley:15410420:media:biom12225:biom12225-math-0167, urn:x-wiley:15410420:media:biom12225:biom12225-math-0168, urn:x-wiley:15410420:media:biom12225:biom12225-math-0169, urn:x-wiley:15410420:media:biom12225:biom12225-math-0170 and urn:x-wiley:15410420:media:biom12225:biom12225-math-0171. We then varied urn:x-wiley:15410420:media:biom12225:biom12225-math-0172 and urn:x-wiley:15410420:media:biom12225:biom12225-math-0173 in each distribution. We simulated a two‐arm parallel trial with 150 subjects per treatment group, so the sample size urn:x-wiley:15410420:media:biom12225:biom12225-math-0174. The number of replications was 1000 for each setting. The Gumbel‐Hougaard bivariate exponential distribution was generated using the R package “Gumbel” (Caillat et al., 2008; R Development Core Team, 2011).

3.2 Simulation Results

Table 1 summarizes the simulation results from different settings. Here we report the logarithm of the win ratio including its true value, mean of the estimates, sample variance, mean of the estimated variance and coverage probabilities. We also report the empirical powers of the log win ratio statistic and other commonly used statistics such as log‐rank and Gehan tests based on urn:x-wiley:15410420:media:biom12225:biom12225-math-0175 and urn:x-wiley:15410420:media:biom12225:biom12225-math-0176. Note that different test statistics are testing different null hypotheses, however, they are often indistinguishably used to evaluate if the new treatment is efficacious. The empirical power is calculated as the proportion of urn:x-wiley:15410420:media:biom12225:biom12225-math-0177 in the 1000 replicates, where T is the test statistic (e.g., log win ratio, log‐rank statistic, Gehan statistic) and urn:x-wiley:15410420:media:biom12225:biom12225-math-0178 is the corresponding estimated standard deviation.

Table 1. Summary of the simulation results
CP Power
Dist. urn:x-wiley:15410420:media:biom12225:biom12225-math-0133 urn:x-wiley:15410420:media:biom12225:biom12225-math-0134 urn:x-wiley:15410420:media:biom12225:biom12225-math-0135 Mean SV EV 80 90 95 WR LRD GD LRM GM
GH 0.0 0.0 0.00 0.00 8.80 8.73 80 90 95 5 5 5 6 6
GH 0.2 0.5 0.29 0.29 9.45 9.54 80 90 95 36 22 17 65 54
GH 0.3 0.3 0.30 0.30 9.45 9.40 80 90 96 39 41 33 52 39
GH 0.5 0.2 0.38 0.39 9.42 9.47 81 90 95 57 79 69 51 39
BN 0.0 0.0 0.00 0.00 7.92 8.07 81 90 96 5 5 5 6 5
BN 0.2 0.5 0.28 0.28 8.77 8.80 81 91 95 38 22 19 63 52
BN 0.3 0.3 0.29 0.30 8.61 8.66 80 89 95 40 41 33 48 38
BN 0.5 0.2 0.38 0.38 8.69 8.75 81 90 94 61 81 67 53 40
MO 0.0 0.0 0.00 0.00 6.66 6.68 81 91 95 5 6 5 6 5
MO 0.2 0.5 0.13 0.13 6.90 6.91 81 91 95 13 12 8 38 29
MO 0.3 0.3 0.14 0.14 6.75 6.85 80 91 95 16 15 12 28 21
MO 0.5 0.2 0.19 0.20 6.82 6.87 81 91 95 26 29 22 32 24
  • urn:x-wiley:15410420:media:biom12225:biom12225-math-0136GH, Gumbel–Houggard bivariate exponential; BN, bivariate exponential with bivariate normal copula; MO, Marshall–Oklin bivariate exponential; urn:x-wiley:15410420:media:biom12225:biom12225-math-0137, the logarithm of the win ratio; Mean, mean of the estimates urn:x-wiley:15410420:media:biom12225:biom12225-math-0138; SV, simulation variance of urn:x-wiley:15410420:media:biom12225:biom12225-math-0139; EV, mean of the estimated variances of urn:x-wiley:15410420:media:biom12225:biom12225-math-0140; CP, coverage probability calculated based on the estimated variances; Power, empirical power; WR, test based on win ratio statistic; LRD and GD, log‐rank and Gehan tests based on death; LRM and GM, log‐rank and Gehan tests based on first occurrence of death and hospitalization.

The results show that, regardless of the magnitude of treatment effect size and type of joint distributions, the (log) win ratio estimate approximates the true value reasonably well, the proposed closed‐form variance estimate is very close to the sample variance, and the coverage probabilities are close to the nominal levels. We also observed that the order of empirical powers across different tests varies depending on the magnitude of treatment effect size and the type of joint distributions, which is not surprising as these tests are for different null hypotheses.

4 Real Data Analysis

ATLAS ACS 2 TIMI 51 was a double‐blind, placebo controlled, randomized trial to investigate the effect of Rivaroxaban in preventing cardiovascular outcomes in patients with acute coronary syndrome (Mega et al., 2012). For illustration purpose, we re‐analyzed the events of MI, stroke and death occurred during the first 90 days after randomization among subjects in Rivaroxaban 2.5 mg and placebo treatment arms with intention to use Asprin and Thienopyridine at baseline.

Table 2 presents the results using the traditional analysis including Cox proportional hazards model, log‐rank test and Gehan test for the time to death and the time to the first occurrence of MI, stroke or death. The composite event occurred in urn:x-wiley:15410420:media:biom12225:biom12225-math-0187 (132/4765) and urn:x-wiley:15410420:media:biom12225:biom12225-math-0188 (170/4760) of Rivaroxaban and placebo subjects, respectively; The hazard ratio is urn:x-wiley:15410420:media:biom12225:biom12225-math-0189 with urn:x-wiley:15410420:media:biom12225:biom12225-math-0190 confidence interval urn:x-wiley:15410420:media:biom12225:biom12225-math-0191. The hazard ratio can be interpreted as the hazard of experiencing the composite endpoint for an individual on the Rivaroxaban arm relative to an individual on the placebo arm. Hazard ratio of urn:x-wiley:15410420:media:biom12225:biom12225-math-0192 with the upper urn:x-wiley:15410420:media:biom12225:biom12225-math-0193 confidence limit less than 1 demonstrates that Rivaroxaban reduced the risks of experiencing MI, stroke or death.

Table 2. Analysis of ATLAS first 90 days data using the traditional methods
Riva Placebo HR and urn:x-wiley:15410420:media:biom12225:biom12225-math-0179 CI Log‐rank Gehan
No. of patients 4765 4760
No. of patients having either 132 170 0.78 (0.62, 0.97)
  MI, stroke, or death urn:x-wiley:15410420:media:biom12225:biom12225-math-0180 urn:x-wiley:15410420:media:biom12225:biom12225-math-0181 urn:x-wiley:15410420:media:biom12225:biom12225-math-0182
No. of patients having death 7 14
  after MI or stroke
No. of death 44 64 0.69 (0.47, 1.01)
urn:x-wiley:15410420:media:biom12225:biom12225-math-0183 urn:x-wiley:15410420:media:biom12225:biom12225-math-0184 urn:x-wiley:15410420:media:biom12225:biom12225-math-0185
  • urn:x-wiley:15410420:media:biom12225:biom12225-math-0186: p‐values are based on Wald tests.

Table 3 presents the win ratio results where death is considered of higher priority than MI/stroke. Every subject in Rivaroxaban arm was compared to every subject in placebo arm, which resulted in a total of 22,681,400 urn:x-wiley:15410420:media:biom12225:biom12225-math-0194 patient pairs. Among such pairs, Rivaroxaban had 292,132 wins on death and then 480,373 wins on MI/stroke, and had 202,868 losses on death and then 392,886 losses on MI/stroke. The win ratio of Rivaroxaban is therefore urn:x-wiley:15410420:media:biom12225:biom12225-math-0195, which is calculated as the total number of wins divided by the total number of losses. Win ratio of 1.30 with the lower 95% confidence limit greater 1 shows that Rivaroxaban was effective in delaying the occurrence of MI, stroke or death.

Table 3. Analysis of ATLAS first 90 days data using the win ratio approach
(a) Death on Riva first 202,868
(b) Death on Placebo first 292,132
(c) MI or stroke on Riva first 392,886
(d) MI or stroke on Placebo first 480,373
Total No. of pairs 22,681,400
Win ratio 1.30
urn:x-wiley:15410420:media:biom12225:biom12225-math-0196 CI urn:x-wiley:15410420:media:biom12225:biom12225-math-0197
p‐value 0.025

Both approaches provide evidence that Rivaroxaban is efficacious in preventing MI, stroke or death within first 90 days of randomization.

5 Discussion

In this article, we formulate the win ratio approach within a statistical framework accounting for the “time to event” nature of the data. We found that the win ratio statistic is used to test a union of two null hypotheses: (1) no effect on hazard rate for death over time; (2) no effect on hazard rate for hospitalization conditioning on death over time. The statistical formulation of the win ratio approach helps us understand the potential performance of this novel approach and potentially identify situations where the win ratio approach can perform better than the traditional time to first event analysis. Further work might be done for improving performance of the current win ratio method.

Furthermore, we derived the explicit variance estimator for the win ratio statistic, which is computationally efficient and has a relatively simple form. Results from our simulation experiment show that the proposed variance estimate is close to sample variance, thus, the confidence interval based on the derived variance provides good coverage probability.

Methodology discussed here can be extended to any composite endpoint consisting of both fatal and non‐fatal outcomes, and can be extended to the case with more than one composite endpoint and multiple events in a similar fashion.

6 Supplementary Materials

R code implementing the variance estimation method is available with this article at the Biometrics website on Wiley Online Library.

Acknowledgements

The authors thank Dr. Ying Kuen Cheung for his helpful discussions on this article. Xiaodong Luo was partly supported by National Institute of Health grant P50AG05138. Wei Yann Tsai was partially supported by National Center for Theoretical Sciences (South), Taiwan.

Appendix A

1 Proof of 5

We calculate
urn:x-wiley:15410420:media:biom12225:biom12225-math-0198
urn:x-wiley:15410420:media:biom12225:biom12225-math-0199
and
urn:x-wiley:15410420:media:biom12225:biom12225-math-0200
where ‘urn:x-wiley:15410420:media:biom12225:biom12225-math-0201’ is by integration‐by‐parts on urn:x-wiley:15410420:media:biom12225:biom12225-math-0202. Therefore,
urn:x-wiley:15410420:media:biom12225:biom12225-math-0203

    Number of times cited according to CrossRef: 25

    • The win ratio approach for composite endpoints: practical guidance based on previous experience, European Heart Journal, 10.1093/eurheartj/ehaa665, (2020).
    • Some Meaningful Weighted Log-Rank and Weighted Win Loss Statistics, Statistics in Biosciences, 10.1007/s12561-020-09273-4, (2020).
    • The use of the win odds in the design of non-inferiority clinical trials, Journal of Biopharmaceutical Statistics, 10.1080/10543406.2020.1757690, (1-6), (2020).
    • The inverse-probability-of-censoring weighting (IPCW) adjusted win ratio statistic: an unbiased estimator in the presence of independent censoring, Journal of Biopharmaceutical Statistics, 10.1080/10543406.2020.1757692, (1-18), (2020).
    • Adjusted win ratio with stratification: Calculation methods and interpretation, Statistical Methods in Medical Research, 10.1177/0962280220942558, (096228022094255), (2020).
    • Testing for Trend in Benefit-Risk Analysis with Prioritized Multiple Outcomes, Statistics in Biopharmaceutical Research, 10.1080/19466315.2019.1690037, (1-10), (2020).
    • A class of proportional win‐fractions regression models for composite outcomes, Biometrics, 10.1111/biom.13382, 0, 0, (2020).
    • The win ratio: Impact of censoring and follow‐up time and use with nonproportional hazards, Pharmaceutical Statistics, 10.1002/pst.1977, 19, 3, (168-177), (2019).
    • A new approach for sizing trials with composite binary endpoints using anticipated marginal values and accounting for the correlation between components, Statistics in Medicine, 10.1002/sim.8092, 38, 11, (1935-1956), (2019).
    • The win ratio: On interpretation and handling of ties, Statistics in Biopharmaceutical Research, 10.1080/19466315.2019.1575279, (1-14), (2019).
    • On the alternative hypotheses for the win ratio, Biometrics, 10.1111/biom.12954, 75, 1, (347-351), (2018).
    • Rejoinder to “on the alternative hypotheses for the win ratio”, Biometrics, 10.1111/biom.12953, 75, 1, (352-354), (2018).
    • Graphing the Win Ratio and its components over time, Statistics in Medicine, 10.1002/sim.7895, 38, 1, (53-61), (2018).
    • Properties of composite time to first event versus joint marginal analyses of multiple outcomes, Statistics in Medicine, 10.1002/sim.7849, 37, 27, (3918-3930), (2018).
    • Methods for the analysis of multiple endpoints in small populations: A review, Journal of Biopharmaceutical Statistics, 10.1080/10543406.2018.1489402, (1-29), (2018).
    • Bootstrap- and permutation-based inference for the Mann–Whitney effect for right-censored and tied data, TEST, 10.1007/s11749-017-0565-z, 27, 3, (639-658), (2017).
    • The Probability of Being in Response Function and Its Applications, Frontiers of Biostatistical Methods and Applications in Clinical Oncology, 10.1007/978-981-10-0126-0, (151-164), (2017).
    • Weighted win loss approach for analyzing prioritized outcomes, Statistics in Medicine, 10.1002/sim.7284, 36, 15, (2452-2465), (2017).
    • WWR: An R package for analyzing prioritized outcomes, Journal of Medical Statistics and Informatics, 1072432053766254, 5, 1, (4), (2017).
    • The stratified win ratio, Journal of Biopharmaceutical Statistics, 10.1080/10543406.2017.1397007, (1-19), (2017).
    • An extension of generalized pairwise comparisons for prioritized outcomes in the presence of censoring, Statistical Methods in Medical Research, 10.1177/0962280216658320, 27, 4, (1230-1239), (2016).
    • On the win-ratio statistic in clinical trials with multiple types of event, Biometrika, 10.1093/biomet/asw026, 103, 3, (742-745), (2016).
    • A generalized analytic solution to the win ratio to analyze a composite endpoint considering the clinical importance order among components, Pharmaceutical Statistics, 10.1002/pst.1763, 15, 5, (430-437), (2016).
    • Large sample inference for a win ratio analysis of a composite outcome based on prioritized components, Biostatistics, 10.1093/biostatistics/kxv032, 17, 1, (178-187), (2015).
    • Generalized Pairwise Comparisons, Wiley StatsRef: Statistics Reference Online, 10.1002/9781118445112, (1-9), (2014).

    The full text of this article hosted at iucr.org is unavailable due to technical difficulties.