Combined multiple testing of multivariate survival times by censored empirical likelihood

Abstract In each study testing the survival experience of one or more populations, one must not only choose an appropriate class of tests, but further an appropriate weight function. As the optimal choice depends on the true shape of the hazard ratio, one is often not capable of getting the best results with respect to a specific dataset. For the univariate case several methods were proposed to conquer this problem. However, most of the interesting datasets contain multivariate observations nowadays. In this work we propose a multivariate version of a method based on multiple constrained censored empirical likelihood where the constraints are formulated as linear functionals of the cumulative hazard functions. By considering the conditional hazards, we take the correlation between the components into account with the goal of obtaining a test that exhibits a high power irrespective of the shape of the hazard ratio under the alternative hypothesis.


INTRODUCTION
In each study testing the survival experience of one or more populations, one must choose an appropriate test. Many of those tests rely on the choice of a weight function. One can select an optimal weight function if the true shape of the hazard ratio under the alternative hypothesis is known, which is not the case in most applications. As a result one is not capable of getting the best results with regard to a specific dataset. The choice of the weight function influences the error rates. In certain situations a wrong choice of the weight function may lead to a considerable power reduction as pointed out by Kosorok and Lin (1999) and Klein and Moeschberger (1997). In the univariate case different methods were proposed to solve the problem, for example, by Brendel, Janssen, Mayer, and Pauly (2014).
Another problem arises from the complexity of multivariate survival times. Even though in current times of massive data ascertainment, data typically contains multivariate observations, one might lack an appropriate method to model the true structure of the survival times.
For bivariate samples, many methods to analyze survival times have been proposed. The methods by Lin and Ying (1993) and Wang and Wells (1997) deal with nonparametric estimators of the bivariate survival function under univariate censoring. An earlier approach by Dabrowska (1988) can even deal with general bivariate censoring. Theoretical and computational performance comparisons of different estimators were conducted by Gill, van der Laan, and Wellner (1995) and Wang and Zafra (2009), among others. A recent paper by Huang and Zhao (2018) also deals with the bivariate survival function and uses empirical likelihood to obtain asymptotic confidence intervals.
While we mainly focus on the bivariate time-to-event times, our method can easily be extended to a higher dimensional case. For multivariate data with more than two response variables, methods have been proposed as well. For the multivariate case there exist less methods than in the bivariate case due to the higher complexity and technical challenges. The most common model for multivariate time-to-event data is the Cox model. Further, the accelerated failure time model is also often applied. A comparison of the Cox model and the accelerated failure time model, among others, was performed, for example, by Clark, Bradburn, Love, and Altman (2003b). But most multivariate methods have limitations. For example many models make parametric assumptions which may not be met, as does the Cox model which assumes proportional hazards. Other methods might lack the ability to deal correctly with dependent or correlated samples to their full extent, as their design might not pick up the correlation structure. The assumption of independence between the components is sometimes implicitly assumed, even though it is likely to be violated when dealing with different times-to-event of a single subject. Tests assuming independence will fail, for example, when the marginal distributions are similar for two samples, but the correlation differs. Most methods, which are primary, based on univariate test statistics can solve this problem by integrating the covariance matrices into the multivariate test statistics, as performed by Wei and Lachin (1984). An overview of multivariate survival analysis and what to consider when choosing a method for a specific dataset was given, for instance, in Clark, Bradburn, Love, and Altman (2003a, 2003c, 2003d and Clark et al. (2003b).
The focus in this paper is motivated by the approach proposed by Bathke, Kim, and Zhou (2009). Their method is based on censored, multiply-constrained empirical likelihood, where the constraints are formulated as linear functionals of the cumulative hazard functions. Their simulations suggested that the test performed well along with an easy computation compared to, for example, the function-indexed weighted log-rank tests which use extensive Monte-Carlo simulations. The proposed method unfortunately could only deal with univariate observations. In this paper, following the approach of Bathke et al. (2009), we propose a multivariate test statistic. In case of independent components, the approach introduced here reduces to that by Bathke et al. (2009). Test procedures designed in an univariate manner are naturally not capable of picking up correlations or dependencies. By considering the conditional hazards, we take the correlation between the components into account with the goal of obtaining a test with a high power, irrespective of the shape of the hazard ratio under the alternative hypothesis. When considering real time-to-event data it is unlikely that the observations within one subject are uncorrelated. It is critical to have test methods that are capable of detecting these data structures. This is why an extension of the method proposed by Bathke et al. (2009) to the multivariate case is important. Combining the conditional hazards and the constraints offers a new and well-functioning test procedure that could be the foundation for further tests on multivariate survival times.
The paper is structured as follows. Section 2 deals with the theoretical aspects of the proposed method for the one-sample case. Section 3 describes the results for the two-sample case. In Section 4 we give examples on how to apply the proposed method. The simulations and an application to real data can be found in Section 5. The regularity conditions and the proofs of the main theorems are given in the Appendix.

ONE-SAMPLE CENSORED EMPIRICAL LOG-LIKELIHOOD WITH MULTIPLE CONSTRAINTS
We start by giving the results for the one-sample case. We propose two methods: one assumes independence between the components, while the other one takes possible dependencies into account.
Let X 1 , … , X n be independent, identically distributed d-dimensional observations with X i ∶= (X 1i , … , X di ) T , distribution function F X0 (t), marginal distribution functions F X01 (t), … , F X0d (t), cumulative hazard function Λ X0 (t), and marginal cumulative hazard functions Λ X01 (t), … , Λ X0d (t). Due to right censoring we only observe with C X i = (C X 1i , … , C X di ) T the censoring times, which are assumed to be independent, identically distributed, and independent of the X i 's.

One-sample EL with independent components
We assume in this section that the components of one observation vector are independent from each other. Based on the censored observations, the empirical log-likelihood (EL) for component j pertaining to its distribution F Xj is given by withF Xj the empirical marginal distribution function of component j = 1, … , d. The analogue of Equation (2) based on the cumulative hazard function is given by where with t ji denoting the ordered, distinct values of T ji . Here, 0 < v ji ≤ 1 are the discrete hazards of component j at time t ji . It is well known that the maximum of Equation (3) with respect to v ji is attained at the jumps of the Nelson-Aalen estimator for each component, that is and a hypothesis testing problem where g jr , j = 1, 2, r = 1, … , k are nonnegative functions, and j = ( j1 , … , jk ) T and j = ( j1 , … , jk ) T are two vectors of constants. The proposed constraints on the hazard v ji for given functions g j1 , … , g jk and constants j1 , … , jk are given by where the sum is taken over all i for which v ji ≠ 1. Denote the maximum empirical likelihood estimators of ΔΛ Xj (t ji ) under constraints as v * ji , whereΛ Xj is the empirical cumulative hazard function. Application of the Lagrange multiplier method leads to where G j (t ji ) = {g j1 (t ji ), … , g jk (t ji )} T , and j ∈ R k is a vector of Lagrange multipliers obtained by solving the constraints for component j. Then the component-wise test statistic in terms of hazards is given by Proof. Bathke et al. (2009) showed that the component-wise test statistics W j are chi-squared distributed with k degrees of freedom. As we assumed independence between the components, and thus the test statistics, we obtain the desired result. ▪ Remark 1. Although the notation suggests that the number of constraints per component is equal for all components, they may differ for the individual components, leading to different degrees of freedom. More precisely let k j be the number of constraints for component j and define Then, the test statistic W is asymptotically chi-squared distributed with K degrees of freedom under the null hypothesis.
Even though this yields a rather easy and intuitively interpretable test statistic, we encounter serious limitations. We assumed independence between the components, which may be questionable for many data sets. We tackle this problem using a more general approach in the next section.

2.2
General bivariate one-sample EL Following the approach of Dabrowska (1988) and Vonta, Nikulin, Limnios, and Huber-Carol (2008) for bivariate survival models using conditional hazards, we derive the asymptotic distributions of the nonparametric maximum-likelihood estimator (NPMLE) for the hazards of the bivariate survival function without constraints for the two-dimensional case and the asymptotic distribution of the actual test statistics based on the empirical log-likelihood function. We point out that the following sections are based on the assumption that the true underlying cumulative hazard function is continuous, even if we do not explicitly state so. Let be the class of continuous univariate cumulative hazard functions on R + . We define a joint bivariate survival function S on R + × R + as proposed in Dabrowska (1988) by where Λ c d a b ∈ ℒ with a and b indicating whether component 1 or 2, respectively, are event free up to x and c and d indicating whether an event occurred in components 1 and 2 at time point x. The notation is analogue to the one of the hazard rates, which are given below. This survival function S(x, y) = P(X ≥ x, Y ≥ y) is indeed a bivariate survival function. A necessary condition for the proposed method is that Λ 11 11 ≡ 0 holds true, that is, the survival times of the components are not identical The observed bivariate time is given by T = (T 1i , T 2i ) and the censoring indicator is given by The empirical likelihood for n observations is the product ] .
For simplicity, denote the hazards with a l ∶= 10 11 (t 1l )dt, b l ∶= 01 11 (t 2l )dt, c l ∶= 10 10 (t 1l )dt, and d l ∶= 01 01 (t 2l )dt, where t jl are the ordered distinct time points of jumps in component j. It follows that the empirical log-likelihood EL is given by where the sums over t jl are taken over those time points where a l , respectively b l , c l , and d l are strictly greater than zero, and excluding the last time point, as the hazards would be 1. Denote the sums over "i" by u 1l , … , u 8l in the order they are given above, where l indicates that the sum was taken at the lth time point of the first component, respectively, the second component.

Lemma 1.
Maximizing the empirical log-likelihood function with respect to a l , b l , c l , and d l , one obtains the NPMLE Proof. Taking the derivatives of EL with respect to a l , we obtain The derivative is zero for â l = u 2l ∕(u 1l + u 2l ). For the other hazards similar calculations will lead to the given NPMLE. ▪ When fixing the constraints we do not consider the four hazards individually, but combine them such that only one hazard per component remains for the constraints. This will lead to a reduction of the degrees of freedom, as for each constraint one degree of freedom is added to the asymptotic sampling distribution of the test statistic. Define The constraints are then imposed directly on v 1l and v 2l and are given in the form of The function to maximize is given by where EL( ) denotes the log-likelihood function, as given in Equation (7), without the constraints but with the modified hazards a l ( 1 ), … , d l ( 2 ) instead of the Nelson-Aalen estimators plugged in and = ( 1 , 2 ) T , j = (λ j1 , … , λ jk ) T , j = 1, 2. Again, the last sum ranges over the distinct time points excluding the last.
Remark 2. For survival times with absolutely continuous distribution, there is one event per time point such that either a l or c l will be zero. This also implies that v 1l ≤ a l if c l is zero or v 1l ≤ c l if a l is zero. Taking additionally into account that we exclude the last time points such that u 2l ∕(u 1l + u 2l ) ≠ 1 and u 6l ∕(u 5l + u 6l ) ≠ 1, we can even go as far as stating a l , c l ≤ 0.5 and thus v 1l ≤ 0.5.
Remark 3. As the true NPMLE of G constraint is rather complex to calculate, we will approximate it. Using Remark 2, we approximate log(1 − v 1l ) by z 1l log(1 − a l ) + z 3l log(1 − c l ) with an error of order v 2 1l , where one of the two terms is always equal to zero. In total, we approximate G constraint by where the sum is taken over those time points t 1l for which a i ( 1 ) ≠ 1 and c i ( 1 ) ≠ 1 and over t 2i for which b i ( 1 ) ≠ 1 and d i ( 1 ) ≠ 1.
Remark 4. For noncontinuous distributions v jl can be close to zero just like in the continuous case. It may, however, occur in such cases that we obtain high hazard rates at some of the jump points. Thus the approximation as described in Remark 3 would not hold in general. Therefore, we will exclude noncontinuous distributions in the following.
Remark 5. The approximation is a true equation if the components are independent, that is, the conditional hazards can be replaced by the regular hazards for the individual components. In this case, the conditional hazard model simplifies to the sum of the univariate models and thus we obtain the same results as in Section 2.1.
Remark 6. Instead of approximating G constraint one can also simply define the parameter under the null hypothesis to be the weighted sum of the integrals with respect to the conditional component-wise cumulative hazard function instead of the component-wise cumulative hazard function. That is, define for the constraints on the first component, with Λ X01|2> and Λ X01|2< the conditional component-wise cumulative hazard functions. Note that this approach might be more precise. However, in order to test the hypothesis one must make assumptions on the conditional probabilities, which are usually unknown and difficult to estimate in advance in the absence of a pilot study.
Theorem 2. The NPMLE of G constraints , as given in Equation (8), are given by , where 1 and 2 are obtained as the solution of the following 2k equations: Proof. Deriving the approximation of G constraint with respect to the λs simply results in the constraints. For instance, the derivatives of G constraint with respect to (w.r.t.) a l leads to Equalizing the previous expression to zero and solving for a l leads to the NPMLE .
The remaining derivations follow analogously. ▪ Let the test statistic in terms of hazards be given by where EL(Λ X ) is empirical log-likelihood as given in Equation (7).
Theorem 3. Suppose that the null hypothesis H 0 , as defined in Equation (4), holds for nonnegative, random functions g jr (t) that are predictable w.
Then, under regularity conditions specified in the Appendix, the test statistic W has asymptotically a chi-squared distribution with 2k degrees of freedom, where k is the number of constraints per dimension.
As in the case of independent components we can vary the number of constraints per component, such that for every component a fitting number of constraints can be chosen.

General d-dimensional one-sample EL
As mentioned earlier, the presented approach can be extended to any arbitrary number of components. However, the notation in a case beyond the bivariate model gets rather complex. In general, one considers all conditional hazards that could possibly occur. For the three-dimensional case this would be 100 and so forth. In general, the number of conditional hazards in the d-dimensional case is given by d ⋅ 2 (d−1) . The remaining derivations follow analogously to the bivariate case. Note that the complete hazard is then summed up over all conditions, which are four for the three-dimensional case. Explicitly this means for the hazard of the first component that Denote the log-likelihood function, similar to Equation (7), by EL and consider constraints of the form Then the function to maximize is given by Theorem 4. Suppose that the null hypothesis as defined in Equation (4) holds for nonnegative, random functions g jr (t). Under the regularity conditions specified in the Appendix, the test statistic defined by has asymptotically a chi-squared distribution with ∑ d j=1 k j degrees of freedom with k j the number of constraints in dimension j.
Remark 7. Although the multiple testing procedure is only applied to hypothesis testing here, it can be used for other forms of inference as well. For example it could be used to obtain confidence regions for a collection of the population quantiles. A straightforward way to construct confidence regions for the true 0 = ( 011 , … , 01k 1 , … , 0d1 , … , 0dk d ) T works as follows. Define the confidence region to contain all = ( 11 , … , dk d ) T for which the test statistic W( ) as given in Theorem 4 is smaller than the 1 − -quantile of the chi-squared distribution with ∑ d j=1 k j degrees of freedom.

TWO-SAMPLE CENSORED EL WITH MULTIPLE CONSTRAINTS
As it is often of interest to not only test certain aspects of one sample but further to compare it with a second sample, we expand the model to a two-sample censored EL. Now, additionally to our partly censored sample of observations Assume that the Y i s, i = 1, … , m, are independent of the X j s, j = 1, … , n. Again, the sample may be censored, such that we only observe denote the censoring times for the second sample, which shall be independent from the Y i . Denote the ordered, distinct values of the S ji as s jl .
Note that we do not require independence between the individual components for this method.

Bivariate two-sample EL
The EL function based on the two samples is given by where EL X corresponds to the EL function of the observations X, as given in Equation (7), and EL Y corresponds to the EL function of the observations Y , analogously to EL X . More explicitly, EL Y is given by where the sums over s jl are only taken over those time points where l , respectively l , l , and l are truly greater than zero, and excluding the last time points, as there the hazards would be 1. Again, we will denote the eight sums over i with ν 1l , … , ν 8l , in the same order as they appear above.

Lemma 2.
Maximizing the log-likelihood function w.r.t. a l , … , l , one obtains the NPMLE where l indicates the time point of the jump.
Let us now consider a hypothesis testing problem for a 2k-dimensional parameter = ( 11 , … , 1k , 21 , … , 2k ) T w.r.t. the cumulative hazard functions Λ X and Λ Y such that where jr = ∫ g jr (t) log(1 − dΛ X0j (t)) − ∫ h jr (t) log(1 − dΛ Y0j (t)), r = 1, … , k, j = 1, 2, for some given functions g jr (t) and h jr (t). Then, the constraints imposed on v ji and w ji shall be given by with j = 1, 2, r = 1, … , k, and where the sum is taken over all distinct time points in each sample, excluding the last value as the hazard would be 1 at that time point. The w ji s are defined analogously to v ji s, Application of the Lagrange multiplier method on the likelihood with constraints as given in Equation (12) shows that the NPMLE are given through , where G 1 (t 1l ), G 2 (t 2l ), H 1 (s 1l ), and H 2 (s 2l ) denote the vectors of constraints imposed on the hazards at the time point t jl , respectively s jl , and n * is defined as min(n, m). Let G constraint be given by where EL X ( ) and EL Y ( ) denote the log-likelihood function without constraints based on the modified hazards a l ( ), … , l ( ) instead of the Nelson-Aalen estimators.
The two-sample test statistic is given by Theorem 5. Suppose that the null hypothesis H 0 ∶ jr = jr holds for all j = 1, 2 and r = 1, … , k, that is, for nonnegative random functions g jr (t) and h jr (t) that are predictable w.r.t. the filtration ℱ t specified in Appendix. Then, under further conditions specified in the regularity specification for the two-sample case, as n * → ∞ and n∕m → c ∈ (0, ∞), W * , as given in Equation (14), has asymptotically a chi-squared distribution with 2k degrees of freedom.
Again the number of constraints can be fitted individually to match each component. Needless to mention, the number of constraints for the two samples must match.

General d-dimensional two-sample EL
As in the one-sample case, the proposed method works for any arbitrary number of components. Due to complexity of notation we only sketch the idea of the general d-dimensional two-sample test. Let EL(λ) be the sum of the individual EL functions in each component with again the modified hazards instead of the Nelson-Aalen estimators. Further let EL be the sum of the individual EL functions in each component with the Nelson-Aalen estimators for the hazards. The function to maximize under the constraints, G constraint , is given by .
Theorem 6. Suppose that the null hypothesis H 0 ∶ jr = jr holds for all j = 1, … , d and r = 1, … , k j , that is, for nonnegative random functions g jr (t) and h jr (t) that are predictable w.r.t. the filtration ℱ t specified in Appendix. Then under further conditions specified in the regularity specification for the two-samplecase, as n * → ∞ and n∕m → c ∈ (0, ∞), has asymptotically a chi-squared distribution with ∑ d j=1 k j degrees of freedom.

APPLICATIONS
In this section we discuss different applications of Theorem 5. The constraints proposed in the following are based on random yet predictable constraint functions g jr and h jr . One application of Theorem 5 is based on constraints obtained by combining the log-rank test with the Gehan test. We consider a two-sample problem with

. Specifically the test statistics of the individual tests can be given in the form of
as well as R j1 (u) = ∑ 1{T ji ≥ u} and R j2 (u) = ∑ 1{S ji ≥ u}.
For the proposed method, we used ∑ as constraints for the two combinations of ( , ) mentioned above. More specifically, we applied the constraint functions namely the log-rank statistic, and the Gehan-Wilcoxon statistic, to each component leading to four degrees of freedom. A comparison of the combined test based on these constraints with the two individual tests was performed in a simulation study. It corresponds to Simulation 4 in Section 5.
Further, it might be of interest to test whether two distributions which have identical marginals are identically distributed also in the multivariate sense. In practice, it is possible that two datasets differ only in the correlation structure, which can easily go undetected. To this end, one may apply functions of the form with g(t) a nonnegative predictable function. Applying, for instance, five constraints we can use four to focus on testing the two datasets for equal marginals. Two may be used for the first component, where one of them puts weight on early events, and the other one on late events, and the same for the second component. The last constraint can be used to control whether the correlation within each of the two datasets is similar, and it can be applied to either component. If it is of the form Equation (17) it should be applied to the second component. As one can see in Section 5 this leads to a high rejection level if the correlation structure is not the same, even if the samples are drawn from identical marginal distributions. Instead of using random, predictable functions, one can also apply various deterministic functions as constraints. Those can be chosen such that the test focuses on the part of the data where differences are to be expected or are considered as critical time frames. For instance, if one is only interested in detecting differences at the beginning of the study, constraints of the form g 1 (t) = exp{−t} or g 2 (t) = 1{t ≤ t * }, t * ∈ R + can be applied. While g 1 (t) assigns greater weight to earlier time points and lesser weight to later ones, g 2 (t) weights all time points up to a prespecified time point t * equally and does not consider later time points at all. On the other hand, if one is interested in differences at later time points, one can apply constraints of the form g 1 (t) = 1{t ≥ t * }, t * ∈ R + or g 2 (t) = log(t + 1), among others. Obviously, it is possible to combine two constraints so that one detects differences at early time points and one at later time points. Constraints of this form were used in the first simulation as given in Section 5.

SIMULATIONS AND DATA EXAMPLE
We provide simulation results that confirm the chi-squared limiting distributions of W and W * and illustrate the small sample performance. Further, we ran some simulations to show how dependency and correlation structures influence the results. We then compared the performance of our test to various others. A real data example is provided to illustrate the proposed method. All computations were performed using R (R version 3.3.2, R Core Team, 2017).
The actual observed data was then created analogously to Equation (1). Censoring was roughly one-third in each of the components. The observations were uncensored in both components in 42% of the cases.
The Q-Q plot (Figure 1) is based on 10,000 runs and a sample size of 200. The resulting empirical distribution of W agrees well with the theoretically derived 2 6 -distribution. Simulation 2 (two-sample case). In this simulation we illustrate the empirical distribution of W * for the two-sample case. The constraints are given by the functions The functions g 1 and g 2 were applied to the first component and the remaining two to the second component. The two samples were drawn from the following marginal distributions: The simulated actual observations after censoring were again created according to Equation (1). The amount of censoring was 36% for the first component and 73% for the second component. The four Q-Q plots in Figure 2 are based on 10,000 runs and show different combinations of sample sizes. The resulting empirical distribution of W generally agreed well with the theoretically derived 2 4 -distribution. However, in the case of m = n = 50 it only agreed well with the theoretical distribution up to the 90% quantile. Table 1 shows the Type I errors at various significance levels and different sample sizes. The proposed combined test attained the Type I error quite well at the nominal levels for sample sizes of n, m ≥ 70.
Simulation 3 (Dependency/correlation structure). The previous simulations were performed with random variables which had independent components. The following simulations show the influence of high dependency on the test statistic and the consequences of identically distributed components but with different correlation structures in the two datasets. First, we considered bivariate random variables which were identically distributed in both samples. For the first component we drew samples according to

Samplesize n=50, m=50
Sample Quantile s The second components were obtained by X 2 = 0.65X 1 + 0.35X and Y 2 = 0.65Y 1 + 0.35Y, with X, Y ∼ exp(0.6). Censoring was still considered to be i.i.d., such that C X 2 , C Y 2 ∼ exp(0.3). The censoring accumulated to 25% in the first component and 30% in the second. Overall 55% of the observations could be observed completely. For each dataset we simulated 200 observations. The constraints were given by the functions Here, g 1 and g 2 were applied to the first component, the remaining two to the second component.
The Q-Q plot (Figure 3) is based on 10,000 runs. The resulting empirical distribution of W * agreed with the theoretically derived 2 4 -distribution up to the 90% quantile. For even stronger dependencies, higher sample sizes are needed to obtain similar empirical results. Yet we can conclude that with a sufficient number of observations, the proposed method can deal with dependencies.
Correlation between the components may also be of interest in the case of multivariate data. Extreme cases of testing two samples for equality could be given by having identical marginals but different correlation structure. A test should reject the null hypothesis of identical joint survival distributions in such cases. Both samples were simulated by To obtain a different correlation structure, the samples of X 1 and of X 2 were sorted. The correlation of the true observations of X was given by .99 and for Y by 0. The actual observed data of the X-sample, T X , was only correlated by .68. The correlation for the actual observed data for the Y -sample remained the same. The constraints were given by g 1 (t) = exp{−t}, g 2 (t) = 1{t ≤ 1.5}, g 3 (t) = 21{0.5 < t < 1.5}, and g 4 as defined in Equation (17) with g(t) = 1. Again, the first two constraints were imposed on component 1 and the other two on component 2. The simulated power was 81.79% indicating that the proposed method indeed is capable of detecting differences in the correlation structure. As one can see in Simulation 4, other methods have limited capability to detect such differences.

Simulation 4 (Comparison with other methods).
We compare the small and moderate sample size behaviors of the proposed combined tests, as described in the first application in Section 4, to two tests from the G , family of Harrington and Fleming (1982). As the constraints correspond to the weight functions of the log-rank statistic and the Gehan statistic, we compared the combined constraints to the individual tests. For all three tests, we estimated the Type I errors for one null hypothesis and the power for five different alternatives. The size of the simulation was 10,000 for the null model and 5,000 for each of the alternative models A-E, as defined in Table 2. The sample size for the small sample size setting was given by 50 X-observations and 50 Y -observations. For the moderate sample size setting we had 150 X-observations and 150 Y -observations. The five different survival configurations for the power analyses A-E are described as follows. A corresponds to identical marginals but different correlation structure, B to one component identically distributed and one different, C to crossing hazards, D to ordered hazards with early differences, and E to ordered hazards with late differences. Samples were generated both without censoring and with exponentially distributed censoring with parameter λ 1 = 0.5 for the first component and λ 2 = 0.3 for the second component such that censoring amounted to 66% and 64% in the two components in case of the null model. Further information on the modeling can be found in Table 2. For alternative A, an additional constraint was added to the second component, which was given by the constraint function in Equation (17) with g(t) = 1. The other constraints correspond to Equations (15) and (16), which are random predictable functions w.r.t. the filtration we specified in the Appendix B1. Both constraints were applied to each component.
From Tables 3 and 4, the proposed method had comparable performance to the two individual test statistics. In configurations A and C, it performed significantly better than the two individual tests, even though the sample size of 50 was fairly small. In scenarios where only the marginals differed in one component and the correlation structure was similar, it performed poorly compared to the log-rank test and the Gehan-Wilcoxon test. In the scenarios for early differences in the hazard functions, it obtained almost the performance level of Gehan-Wilcoxon, while the log-rank test lost a lot of power. For late differences in the hazards, it was the opposite. As an additional comparison between the amount of constraints, we performed additional tests on alternative E, where we added a fifth constraint corresponding to the constraint function in Equation (17) with g(t) ≡ 1. The resulting Type II error dropped in that case from 73.52 to 60.72 in the censored case and from 27.14 to 21.24 in the uncensored case. In the later case it slightly outperformed the log-rank test. Adding an additional constraint led to a lower error rate in this scenario. All tests performed better when no censoring occurred, yet one could observe similar patterns as in the censored case for the power comparisons.
Tables 5 and 6 show that similar results were obtained for moderate sample sizes. All test obtained lower error rates than in the small sample case, but the same properties as in the small sample simulations can be noticed. The strengths of the newly proposed test can be seen here even more clearly in scenarios A and C, which correspond to correlation and crossing hazards, respectively.

Data example
In order to illustrate our results on a real dataset, we chose the dataset retinopathy from the R-package survival. This package contains the core survival analysis routines. The dataset includes 394 observations out of a total of 197 patients from the Diabetic Retinopathy Study, which was conducted by the National Eye Institute. The observed data was part of a trial on laser coagulation as a treatment to delay diabetic retinopathy. Each patient received laser treatment on one TA B L E 2 Hazard functions for the configurations used in the power studies (Simulation 4)  of their eyes, while the other remained untreated to obtain a baseline. The eye for treatment was randomly chosen. The two observations per patient contain some basic information which is the same in both and the time to vision loss in the treated and untreated eye, as well as the risk factor for each eye. The time to vision loss was measured from initiation of treatment to the time when visual acuity dropped below 5/200 for two visits in a row. As no event could occur within the first 6.5 months of the study, all survival times were reduced by this duration. The survival times were subject to univariate censoring. All patients were considered to have "high-risk" for diabetic retinopathy as defined by the Diabetic Retinopathy Study.

Configuration Hazard functions df
The patients can be split into two groups, juvenile and adult diabetes. As the types of diabetes are assumed to have differing progress, it is of interest to analyze the joint survival function of both eyes. The null hypothesis which was analyzed in the following was that both juvenile as well as adult diabetes had the same joint survival function of both eyes. Denote the times to blindness in the treated eye by X1 for the juvenile group and by Y1 for the adult group. Further let X2 be the survival time of the untreated eyes in the juvenile group and let Y2 be the time in the adult group. With 114 juvenile patients and 83 adult patients, we have enough observations to narrow the error rate. Censoring rates were 68% for X1, 55% for X2, 78% for Y1, and 40% for Y2. In the juvenile group 21% of the patients could be completely observed, in the adult group 17% of the patients.
For the newly proposed method, the constraints (15) and (16) were chosen. In a second calculation we added a fifth constraint, corresponding to Equation (17) with g(t) = 1 applied to the second component.
Both tests did not reject the null hypothesis at level 5%. The p values of the new test method were given by p 1 = .2297 and p 2 = .1733. Former analyses by Huang and Zhao (2018) and Huster, Brookmeyer, and Self (1989) concluded that the null hypothesis should be rejected. However, both did not actually conduct tests on the similarity of the survival functions. Huster et al. (1989) conducted tests on the similarities of the covariates and rejected the null hypothesis of equal influence of the covariates, while Huang and Zhao (2018)  of the individual estimators of the survival function and their confidence intervals. By examining Figure 4, not rejecting the null hypothesis does not seem entirely unreasonable. The proposed test tends to be conservative for very small sample sizes and small sample sizes with a high censoring rate. This could already be observed in the univariate case.

CONCLUSION
On the basis of a previous univariate approach based on empirical likelihood, we developed a multivariate testing procedure for hazards in the survival context with censored observations. Using constraints and conditional hazards, our method is capable of testing various aspects of a dataset containing time-to-event observations. It was shown that our proposed method is asymptotically consistent. The finite sample simulation studies further showed that the power of the proposed method depends on the underlying power associated with the individual constraints. For higher sample sizes, it has approximately the power of the better underlying individual test. In scenarios in which the greatest difference of two datasets lies in the correlation structure, the proposed test outperformed the individual tests from the family of test statistics introduced by Harrington and Fleming (1982).
where C X i are i.i.d. censoring times, independent of X i , and the minimum and the indicator function are considered component-wise. The cumulative distribution function of the C X i is F CX (t). The distribution functions F X0 (t) and F CX (t) do not have common discontinuities. Let g j1 (t), … , g jk (t), j = 1, … , d, be nonnegative left continuous functions such that none can be expressed by a linear combination of the remaining functions. Further it shall hold for all functions This condition guarantees the asymptotic normality of the Nelson-Aalen estimators of the individual components (cf. Gill, 1983, theorem 2.1). Note that the functions g jr (t) may be random, yet they need to be predictable w.r.t. a filtration. Considering d = 2, the filtration is given by Through this we are capable of applying the martingale central limit theorem, asΛ NA,j denotes the Nelson-Aalen estimator of component j. Furthermore, for all random functions g jr (t) there must exist a nonrandom left continuous function g jr0 (t) such that sup t≤maxT ji |g jr (t)∕g jr0 (t)| = o p (1) for j = 1, … , d and r = 1, … , k as n → ∞.

A2. Preparation of the proof of Theorem 3
Let d = 2 and f( ) be given by ∑ l log(1 − a l ( 1 ))u 1l + log(a l ( 1 ))u 2l + log(1 − b l ( 2 ))u 3l + log(b l ( 2 ))u 4l + log(1 − c l ( 1 ))u 5l + log(c l ( 1 ))u 6l + log(1 − d l ( 2 ))u 7l + log(d l ( 2 ))u 8l , with = ( T 1 , T 2 ) T , j = (λ j1 , … , λ jk ) T , j = 1, 2, k the number of constraints per component, and l ranging over all distinct time points an event occurred excluding again the last one as it would be 1. For cases where a hazard is zero we will define the corresponding summand to be zero. Note that we will drop the indices of if it is clear from the context whether 1 or 2 is considered.
In order to show that the NPMLEs of L are indeed the maximums we compute the derivatives of f w.r.t. all λs and evaluate it at zero.
Lemma 3. Let Y 1i and Y 3i be as above. Under the regularity specifications it holds that The convergence is of order O p (1).
Now we can rewrite this in terms of the Nelson-Aalen estimator, ∫ ∞ 0 (̃T 1 G 1 (t)) 2 Z(t)∕n dΛ NA,1 , whereΛ NA,1 denotes the Nelson-Aalen estimator of the first component. Analogue to Lemma 4.1 of Dabrowska (1988) the supremum norm of the component-wise cumulative hazard function converges to zero, which follows from the Glivenko-Cantelli Theorem and simple algebra under the regularity specifications. Applying Lenglart's inequality and using lemma 3.1.3.9 of Spreij (1990) we can switch the integral w.r.t. the true component-wise cumulative hazard function.
Proof. This follows from lemma A1 of Pan and Zhou (2002). ▪ Lemma 5. Assume the data are such that the NPMLEs of EL are asymptotically normal and the variance-covariance matrix Σ defined in the proof is invertible. Then for the solution X = ( T 1 , T 2 ) T of the constrained problem (9), corresponding to the null hypothesis H 0 , as given in Equation (4), it holds that n 1/2 X converges in distribution to N(0, Σ).