How to avoid concerns with the interpretation of two primary endpoints if significant superiority in one is sufficient for formal proof of efficacy

Formal proof of efficacy of a drug requires that in a prospective experiment, superiority over placebo, or either superiority or at least non‐inferiority to an established standard, is demonstrated. Traditionally one primary endpoint is specified, but various diseases exist where treatment success needs to be based on the assessment of two primary endpoints. With co‐primary endpoints, both need to be “significant” as a prerequisite to claim study success. Here, no adjustment of the study‐wise type‐1‐error is needed, but sample size is often increased to maintain the pre‐defined power. Studies that use an at‐least‐one concept have been proposed where study success is claimed if superiority for at least one of the endpoints is demonstrated. This is sometimes also called the dual primary endpoint concept, and an appropriate adjustment of the study‐wise type‐1‐error is required. This concept is not covered in the European Guideline on multiplicity because study success can be claimed if one endpoint shows significant superiority, despite a possible deterioration in the other. In line with Röhmel's strategy, we discuss an alternative approach including non‐inferiority hypotheses testing that avoids obvious contradictions to proper decision‐making. This approach leads back to the co‐primary endpoint assessment, and has the advantage that minimum requirements for endpoints can be modeled flexibly for several practical needs. Our simulations show that, if planning assumptions are correct, the proposed additional requirements improve interpretation with only a limited impact on power, that is, on sample size.


| INTRODUCTION
The assessment of the efficacy of drugs and interventions in complex diseases such as cancer, dementia, or rare diseases is a challenge. The ICH E9 guideline on the "Statistical Principles for Clinical Trials" 1 recommends study assessment based on one pre-specified primary endpoint. However, in complex diseases, often more than one primary endpoint is essential to appropriately assess the efficacy of a new drug or an intervention because treatment success is measured in different dimensions. Alzheimer's disease is an example where treatments should affect cognition and function. It is assumed that these dimensions are not strictly correlated and may not be influenced similarly. 2 Oncology trials are another instance because the study assessment is often based on progression-free (PFS) and overall survival (OS). However, even if a causal relationship is assumed, meaning improved PFS implies better OS, data for both are needed for the overall assessment.
Methodological standards for clinical trials with more than one primary endpoint are discussed in the "Guideline on multiplicity issues in clinical trials" provided by the European Medicines Agency (EMA). 3 In this guideline, two concepts for assessment are described: First, if decision-making is based on two (or more) primary endpoints, efficacy has to be demonstrated for each of them. This is the so-called "co-primary endpoint" or "all-or-none" concept, 4 where all respective elementary hypotheses have to be rejected. Alternatively, two (or more) primary endpoints can be ordered hierarchically regarding their importance. In this concept, confirmatory claims for a lower-ranked endpoint are only possible if the hypotheses of higher-ranked endpoints can be rejected. However, as the study is only considered successful if all primary hypotheses are rejected, there is essentially no difference between a co-primary and a hierarchical assessment of primary endpoints.
A third concept includes that efficacy is claimed when superiority for one of several primary endpoints is demonstrated. This is known as the "at-least-one" concept proposed by Dmitrienko et al. 4 and Tamhane & Dmitrienko, 5 or as the "alternative endpoints" concept denoted by Huque & Röhmel 6 and Offen et al. 7 In the simplest case with two primary endpoints, this has sometimes been called the "dual primary endpoint concept," 8,9 and we will use this terminology later on. In this concept, the study-wise type-1-error is controlled in the strong sense, for example, by using the Bonferroni 10 or (Bonferroni-) Holm 11 procedures (or more assumption-dependent methods). Unfortunately, with this concept, trial success can be declared as soon as superiority for one primary endpoint is demonstrated, even if at the same time the other primary endpoint would indicate a substantial deterioration. This is obviously not in line with practical decision-making. Therefore, this concept is not included as an option in the EMA multiplicity guideline, 3 reflecting that to the extent possible statistical and clinical assessment of trial success based on the evaluation of the primary endpoints should coincide. In this regard, it is also important to mention that a statistically significant efficacy should certainly never be enough for regulatory approval and that an extensive benefit-risk assessment is always required. This means that in practice, the overall assessment of several secondary efficacy and safety variables as well as relevant subgroups is necessary.
In contrast to the EMA, the U.S. Food and Drug Administration (FDA) includes the at-least-one concept in their recently published multiplicity guideline. 12 However, the FDA does not address the potential contradiction between rejecting the null hypothesis with appropriate control of the type-1-error and practical decision-making if the experimental treatment shows a positive effect on one endpoint and a negative effect on the other. One may argue that, for example, a significant improvement in PFS and a substantial deterioration in OS with an experimental treatment is easily detected (as a safety finding), but the contradiction between statistical and practical decision-making would remain. Even worse, moderate to minor deteriorations in one endpoint may challenge the overall assessment where "no effect of treatment" on one of the primary endpoints is an acceptable trial outcome. Untoward discussions at the end of the trial in case the treatment effect on mortality is having "the wrong sign" (i.e., indicating a deterioration) are a clear indicator that an important aspect of the evaluation has not been sufficiently addressed at the planning stage.
In this paper, we investigate the situation of two primary endpoints where it may be sufficient to demonstrate superiority in at least one of them. In line with Röhmel et al. 13 we argue that interpretational issues should be discussed upfront. Therefore, we examine several decision strategies to be added to the dual primary endpoint concept to assure an overall consistent strategy for decision-making.
Our principal idea is to first hierarchically exclude a substantial detriment for all primary endpoints before addressing the superiority hypothesis. Brannath et al., 14 Capizzi & Zhang, 15 and Huque & Röhmel 6 discussed similar concepts but did not investigate this approach in detail. Indeed, Röhmel et al. 13 proposed a hierarchical three-step procedure to simultaneously test non-inferiority for two primary endpoints and superiority in at least one. In the first step, non-inferiority has to be shown for both primary endpoints. Second, they proposed to apply a global bivariate test for superiority. Finally, in the third step, a matching univariate test has to demonstrate superiority in at least one outcome. They also briefly mentioned that Holm's multiple testing procedure 11 could be used to handle both step 2 and step 3. This eliminates the possibility that one gets significance in step 2 but not in step 3. Guilbaud 16 applied the motivating example of Röhmel et al. 13 in the illustrations of his work on simultaneous confidence regions corresponding to Holm's step-down procedure. 11 Logan and Tamhane 17 provided a closed-testing formulation that generalizes Röhmel's et al. 13 three-step procedure for two primary endpoints to three or more primary endpoints. A property of the concept by Röhmel et al. 13 is that in rare instances, even though the second step hypothesis is significant, none of the individual hypotheses on the third level is significant, leaving the research question open at an intermediate step that is non-interpretable. For this reason, we prefer combining univariate hypotheses regarding the treatment effect in two steps.
In a simulation study, we examine different decision strategies to exclude detriment of a certain degree or to assure at least a positive trend. What exactly should be excluded or assured obviously depends on clinical considerations at the planning stage. Fortunately, our idea for assessing individual hypotheses for primary endpoints is quite flexible in modeling the different requirements regarding the treatment effect on each primary endpoint.
Interestingly, the discussed decision strategy leads back to the assessment of the co-primary endpoint concept and is fully in line with the respective European regulatory guidance document. 3 Admittedly and for obvious reasons, there is no free lunch in the current situation. Therefore, with our simulations, we try to address the price to be paid for a more straightforward interpretation of trial results.

| Motivating exampletrials for Alzheimer's disease
The gold standard in efficacy trials on Alzheimer's disease is the assessment of endpoints that cover the change over time in the cognitive and functional dimension as a consequence of treatment. As the correlation between the patient's mental state and the ability to perform everyday tasks is low to moderate, the respective EMA guideline requires study designs including both outcomes as co-primary endpoints. 2 Counter-examples have been reported in the literature where the relationship between cognition and function was found to be strong. 18,19 In this instance, the difference between planning the study with one primary endpoint or two co-primary endpoints would be small. Therefore, if a drug could plausibly affect only one dimension, a concept that reliably can exclude a deterioration in the other should be an acceptable approach. For our simulation study, we used data observed in the randomized, placebo-controlled trial reported by Tariot et al., 20 who examined the effect of galantamine for the treatment of Alzheimer's disease.

| Preliminary methodological considerations
We performed a simulation study and investigated elementary hypotheses of the form where i (i = 1, 2) denotes the endpoint under consideration, μ i1 and μ i2 are the treatment effects, Δ i is the difference between μ i1 and μ i2 , and δ i defines the deterioration to be excluded or the trend to be assured. These elementary hypotheses are combined to model different requirements for trial success in relation to the expectation for two primary endpoints. The overall control of the study-wise type-1-error of the two hypotheses is achieved using (combinations of) the intersection-union (IU) 21 and union-intersection (UI) 22 principles. These result in different adjustments of the significance level for implementing the several decision strategies. In our simulations, the Bonferroni procedure 10 was used if necessary to control the study-wise type-1-error in the strong sense, as studies would be planned with this correction in mind. Bonferroni-Holm 11 is uniformly most powerful and would be used at the analysis stage. We decided to report the results from directly using the Bonferroni procedure 10 to maximize the loss in power when adding constraints for interpretation in the way this would be done at the planning stage. Several other multiple-test procedures were considered inappropriate because they require additional assumptions about the correlation structure of the endpoints. Röhmel et al. 13 discussed most of these methods when proposing their hierarchical three-step procedure to simultaneously test non-inferiority in two multiple primary endpoints and superiority in at least one. They recommend using the Bonferroni-Holm procedure 11 or the more advanced Läuter's SS test. 23 While this is correct, both lead to challenges when used for discussing sample size requirements at the planning stage.

| Decision strategies
In this paper, we address the interpretational issues that may arise with the dual primary endpoint concept and examine several further constraints to be added upfront to arrive at an overall consistent decision strategy. All decision strategies are constructed with the UI 22 and IU 21 principles to control an overall one-sided study-wise type-1-error of 2.5% and are defined with the elementary hypotheses in (1). An overview of the decision strategies and the derived hypotheses is depicted in Table 1.
The traditional co-primary endpoint concept was included as a yardstick. Here, the trial is counted as a success if superiority is demonstrated for both primary endpoints, meaning H C can be rejected (δ 1 ¼ δ 2 ¼ 0Þ. As both elementary null hypotheses have to be rejected (H C : H 01 0 ð Þ[H 02 0 ð Þ), according to the IU principle, 21 no adjustment of the studywise type-1-error is needed. Both, H 01 0 ð Þ and H 02 0 ð Þ can be evaluated at the full one-sided type-1-error of 2.5%. The dual primary endpoint concept is essentially the application of the Bonferroni procedure. 10 Within this decision strategy, trial success can be claimed if superiority for at least one of the primary endpoints is demonstrated, that is, if at least one of the elementary null hypotheses can be rejected (H D : H 01 0 ð Þ\H 02 0 ð Þ). As only one elementary hypothesis needs to be rejected, according to the UI principle, 22 adjustment of the type-1-error is needed, and each individual T A B L E 1 Overview of the investigated decision concepts.

Concept Description Hypotheses Evaluation
Co-primary endpoints Both primary endpoints need to be superior to the control. Non-inferiority for one primary endpoint and superiority in the other (or vice versa) must be shown.
One combination of noninferiority in one endpoint and superiority in the other has to be significant at α = 0.0125 (one-sided).
Non-inferiority 1. Non-inferiority to a pre-specified margin must be shown for both primary endpoints. 2. At least one primary endpoint must be superior to the control.
Both non-inferiority hypotheses have to be significant at α = 0.025 (one-sided), and at least one of the superiority hypotheses has to be significant at α = 0.0125 (one-sided).
Look-at-the-estimate 1. The treatment effect estimates of both primary endpoints must be in the "correct" direction. 2. At least one primary endpoint must be superior to the control.
No formal hypothesis. The decision is based on the proper sign of the treatment effect estimates.
Both treatment effects have to be positive, and at least one of the superiority hypotheses has to be significant at α = 0.0125 (one-sided).
where i denotes the endpoint under consideration, μ i1 and μ i2 are the treatment effects, Δ i is the difference between μ i1 and μ i2 , and δ i defines the deterioration to be excluded or the trend to be assured) are combined to model different requirements for trial success in relation to the expectation for two primary endpoints.
H 0i 0 ð Þ is tested with an evenly split (Bonferroni-adjusted) type-1-error of 1.25%. Asymmetric splits can be used to reflect different sample size requirements for different endpoints, for example, when considering OS and PFS in a dual primary endpoint concept. Here, the required sample size would be largely determined by what is needed to demonstrate differences in OS. Therefore, a larger part of the type-1-error could be allocated to evaluate the treatment effect on OS. In our simulation study, an evenly Bonferroni-adjusted type-1-error was investigated.
The other decision strategies can be understood as introducing additional constraints for trial success to the dual primary endpoint concept. Two different ways of defining the elementary hypothesis (1) were applied to implement formal testing of non-inferiority. In the dual primary endpoint (forte) strategy, non-inferiority to a pre-specified margin δ i for one primary endpoint and superiority in the other (or vice versa) are simultaneously tested (H F : ). The UI principle 22 is applied to control the overall one-sided study-wise type-1-error at 2.5%. One combination of non-inferiority in one endpoint and superiority in the other has to be significant at an evenly Bonferroni-adjusted type-1-error of 1.25%. A second way of formally excluding harm by demanding non-inferiority for both endpoints is a two-step procedure depicted in the non-inferiority strategy. Here, in the first step non-inferiority to δ i needs to be shown for both primary endpoints (H N : According to the IU principle, 21 each elementary hypothesis H 0i δ i ð Þ can be evaluated at the full one-sided type-1-error of 2.5%. In the second step, at least one primary endpoint has to be superior to the control (H D : H 01 0 ð Þ\H 02 0 ð ÞÞ. Here, the UI principle 22 is applied and H 01 0 ð Þ and H 02 0 ð Þ are tested at a one-sided type-1-error of 1.25%. This strategy is similar to a procedure based on simultaneous confidence regions corresponding to Holm's multiple-testing procedure described by Guilbaud. 16 At last, we examined the impact of the look-at-the-estimate decision strategy. Within this, both treatment effect estimates must be positive, that is, favoring the experimental treatment (no formal hypothesis testing), and superiority has to be demonstrated for at least one primary endpoint (H D : H 01 0 ð Þ\H 02 0 ð Þ). Again, each H 0i 0 ð Þ is tested with an adjusted type-1-error of 1.25%. This decision strategy reflects the standard medical assessment strategy.

| Simulation Methods
We intended to perform a simulation study with easy-to-interpret scenarios, so we decided to use a simulation model with normally distributed endpoints. In each of the r simulation runs, we generated the outcome of a clinical trial with two primary endpoints. Two normally distributed endpoints are generated separately for the control and the treatment group (j = 0, 1) based on means, standard deviations, and no or a small correlation (ρ) between the endpoints with a bivariate normal distribution. The decision on study success is based on confidence intervals, treatment effect estimates, and p-values derived from the simple regression model depicted in (2), with the treatment effect on the respective endpoint (EP) as the dependent variable and the treatment group as the independent variable, where β 0 is the intercept and β 1 is the regression coefficient for the treatment variable.
In the next step, we applied the different decision strategies with respective significance levels to decide on study success (see Table 1). Whether the data were simulated under the null hypothesis or the alternative hypothesis of the respective strategy, either the empirical type-1-error or the empirical power was calculated by the proportion of successes on the number of total simulation runs r. The study-wise type-1-error was expected to be controlled because our decision strategies can only be conservative but not anti-conservative (within the limits of finite sample) by construction. Thus, only the power is later reported in detail.

| Simulation Settings
Simulation scenarios were motivated by a clinical trial in Alzheimer's disease reported by Tariot et al. 20 The authors performed a randomized, placebo-controlled trial to examine the efficacy and safety of galantamine. The primary endpoints were the mean change from baseline after 5 months in the standard 11-item ADAS-Cog subscale 24 (Alzheimer's Disease Assessment Scale) and the Clinician's Interview-Based Impression of Change plus Caregiver-Input. 25 Based on this clinical trial, the data for the two endpoints in our simulations (ADAS-Cog, 24 ADCS-ADL 26 ) were generated with a bivariate normal distribution. For both endpoints, we assumed that a positive treatment effect means improvement. For the first endpoint, a positive treatment effect of 2 points difference between treatment groups was kept fixed under the alternative hypothesis. For the treatment effect of the second endpoint, we considered different scenarios with the treatment effect ranging from positive (treatment effect of 2) to harmful (treatment effect of À2). We assumed the same standard deviation for both endpoints (5.8). Margins of À2, À1, and À0.5 were examined to address different not acceptable detrimental treatment effects or the requirement to at least assure a trend for the second endpoint. For example, a margin of À1 can be specified if an effect of at least +1 is required for the second endpoint. Thus, by setting the margin, our approach allows flexible modeling of the required treatment effect of the primary endpoints. To investigate the impact of no or a small correlation between the endpoints, we considered a correlation of ρ = 0 and ρ = 0.2.
The number of simulation runs r was set to 10,000 to assure accurate estimates. When using an estimate of 2.5% for the one-sided type-1-error, the theoretical 95% confidence interval width is 2.19% to 2.81%. Under an alternative giving the study a power of 80%, the theoretical 95% confidence interval width is 79.21% to 80.79%. All simulations are performed with SAS Version 9.4.

| RESULTS
Results are only presented for the standard sample size setting because results and conclusions for the other simulations with different sample sizes were similar (see Appendix, Supplementary Tables 2 and 3). We chose the standard sample size as the various decision strategies should only reduce the power from the pre-specified 80% by adding further requirements. As per construction, the study-wise type-1-error is controlled for all decision strategies (see Appendix, Supplementary Table 1), and the main interest was to balance costs in terms of power against a more straightforward interpretation, only the power is elucidated in the following. The results of the simulation study (standard sample size setting) are depicted in Table 2.
We first investigated scenarios where claiming study success based on the first endpoint would be contradicted by the outcome in the second endpoint, that is, scenarios with simulated normal-distributed treatment effect estimates of 2 | À2, 2 | À1, and 2 | À0.5. There is minimal to no chance of claiming study success in the co-primary endpoint concept or the strategies with an additional requirement for the second endpoint, but the dual primary endpoint concept is problematic. The respective power remains above 70% here, even if one primary endpoint shows negative results. While the 2 | À2 scenario is clear for interpretation due to an unfavorable benefit-risk and no one would claim the study successful, the detriment might be more challenging to be identified in the 2 | À1 and 2 | À0.5 scenarios. Within these, it is a clear advantage that the power in the co-primary endpoint concept remains negligible, that is, smaller than 1%, and that, as intended, the chance of claiming a study success is small as soon as there are additional requirements on the second endpoint.
As expected, the power increases with a rising treatment effect in the second endpoint for all strategies in which excluding a detriment is implemented (δ i = À2). In the 2 | 0 and 2 | 0.5 scenarios, the non-inferiority strategy can assist in interpreting the results with a power of around 60% (2 | 0) and 69% (2 | 0.5). For the dual primary endpoint forte strategy, the power ranges from 51% to 66%. Therefore, this strategy is better suited as a stopping rule in situations where non-inferiority is at stake, and overall study success may be questioned. The 2 | 0 and 2 | 0.5 scenarios also show that the power is distinctly smaller for the two decision strategies if a positive trend is required for the second endpoint (δ i = À1; À0.5). Then both the non-inferiority and the dual primary endpoint forte strategy have only low power.
In scenarios 2 | 1 and 2 | 1.5, where the basic assumption holds true that the treatment has a positive effect on both endpoints, the empirical power is quite similar when applying the dual primary endpoint concept, both the dual primary forte concept and the non-inferiority concept (δ i = À2) or the look-at-the-estimate strategy. For the traditional coprimary endpoint concept, the power is lower, that is, the reduction in power is larger. As anticipated, the (reduction in) power is influenced by the different conditions, that is, margins, we added. Either no negative effect (δ i = À2), a small positive trend (δ i = À1) or a distinct positive trend (δ i = À0.5) was required for both primary endpoints. Naturally, applying the most liberal condition has a higher chance of being successful than applying more strict conditions with smaller margins. This is, for example, reflected in the 2 | 1 scenario, where the effect in one endpoint is only half of the initially required effect. Here, the power of the non-inferiority strategy (in which non-inferiority and superiority are assessed consecutively) is around 76% if no negative effect is demanded, around 63% if a small positive trend is required, and around 44% if a distinct trend is asked for as a requirement for licensing. The empirical power for the dual primary endpoint forte strategy (in which non-inferiority and superiority are tested simultaneously) is smaller, but especially in the 2 | 1 scenario, only marginal (around 75%). In this setting, the power for the look-at-the-estimate strategy is around 71%, thus, between the strategies with margins of À2 and À1. However, compared to the dual primary endpoint concept, the reduction in power in scenario 2 | 1 is at a maximal of 2% for both the dual primary forte and the noninferiority strategy (with δ i = À2) and around 5% for the look-at-the-estimate strategy. Therefore, proper prespecification is not obviated.
In the full-powered 2 | 2 scenario, more than 80% power is reached for all decision strategies except for the traditional co-primary endpoint concept. Instead, a power of around 64% is observed, which is exactly the value expected if the alternative holds true for both endpoints.
In summary, within the decision strategies evaluated here, the non-inferiority strategy grades best in remaining power, while the dual primary endpoint forte strategy better assists in avoiding statistical conclusions of study success in situations with differential outcomes in the two primary endpoints.
Additionally, we observed that the correlation slightly influences the power. However, the difference in power between no correlation and a correlation of 0.2 is, at the most, around 3%. Thus, similar conclusions can be drawn for the different decision strategies in the scenarios investigated.

| DISCUSSION AND CONCLUSIONS
In situations where efficacy in two primary endpoints (in combination with a positive benefit-risk profile) is needed as a key requirement for regulatory approval, the dual primary endpoint concept of demonstrating superiority over control in only one of the primary endpoints and ignoring the outcome in the other, does not seem to reflect an appropriate strategy for decision-making. Even though statistical control of the study-wise type-1-error holds true by applying a Note: Scenario: treatment effect endpoint 1 | treatment effect endpoint 2; ρ: correlation coefficient; δ i : detriment to be excluded or trend to be assured; DFA: Dual primary forte concept (Simultaneous testing of non-inferiority and superiority in two combinations); NI: Non-inferiority concept (Two-step procedure of non-inferiority and superiority); Estimate: Look-at-the-estimate concept; Simulations are based on n = 270, standard deviation = 5.8; Empirical power above 20% is highlighted bold. multiple test procedure, for example, the Bonferroni procedure, 10 this concept is especially problematic if one primary endpoint shows (slightly) negative results. Röhmel et al. 13 have proposed to exclude a detrimental treatment effect for both primary endpoints before investigating superiority in at least one of them and suggested a three-step procedure. We examined simple two-step procedures that test both primary endpoints for non-inferiority in the first step before superiority for at least one in the second step. Two slightly different approaches regarding hypothesis testingthe dual primary endpoint forte and the non-inferiority conceptwere envisaged. A third strategy was examined where both treatment effect estimates must be positive in the first step, and superiority has to be demonstrated for at least one primary endpoint in the second step to reflect what would be usually considered uncritical at the assessment stage.
The advantages for the assessment and interpretation of trials with these strategies are obvious. If from a content perspective, a demonstration of superiority in one of the endpoints is sufficient to justify the use of the drug and the treatment does not affect the other endpoint, that is, a treatment effect of zero, the treatment effect will likely have the "wrong" sign (i.e., favoring the control over the treatment) in 50% of cases. This would lead to post-hoc discussions on overall study success. Assuming, for example, that PFS is significantly superior to control, but OS is numerically favoring the control, clarifies the perceived uncertainty for all who want to interpret the trial outcome positively. Upfront discussion at the planning stage regarding the degree of inferiority to be excluded will avoid such post-hoc discussions in the end because the benefits of the improvement in one endpoint can be openly balanced against what needs to be known for the other endpoint. In this situation, acceptance of demonstrating superiority in only one of the endpoints is often interlinked in a non-obvious way with the expectation to see "at least a trend" in the second endpoint. Here the example of PFS and OS is also instructive. If PFS is prolonged, the usual assumption is that this will also lead to a survival benefit in the long run. As shown in our simulation study, the power remains constant as long as the "distance" between the margin and the true treatment effect remains constant. Thus, by shifting the non-inferiority margin, power can only be kept constant if the treatment effect is positive (and no longer neutral). With this, an expectation regarding a certain trend in the treatment effect can be quantified and translated into a margin (with a given sample size). Again, upfront discussion at the planning stage can pave the way for successful interpretation of trial outcome.
Obviously, as often, a free lunch in life is rare, and adding constraints with the assessment of an additional set of hypotheses as compared to the simple dual endpoint concept will lead to a decrease in power that requires compensation with an increase in sample size.
We have conducted a simulation study to quantify the impact of adding constraints to the dual endpoint concept on power. In those scenarios where interpretation is straightforward because we see the required effect in one primary endpoint and a similar or slightly smaller effect in the other primary endpoint, the decision strategies have approximately the same power as the dual primary endpoint concept. This remains true as long as the margin is specified so that it reflects (as explained above) the required trend in the second endpoint.
In the scenarios more worthy of discussion, where no or a minor detrimental effect for one of the endpoints is observed, a price has to be paid for more certainty regarding the interpretation. We felt that this was within an acceptable range.
The correlation slightly influences the observed power. However, in many practical situations two primary endpoints are chosen if they are uncorrelated or only slightly correlated and reflect different aspects of patient benefit that should be impacted positively by the experimental treatment.
Actually, our decision strategies lead back to the co-primary endpoint concept and are thus in line with the European regulatory guideline on multiplicity issues in clinical trials 3 as elementary hypotheses for both primary endpoints are assessed for the decision on study success. The only difference is the combination of non-inferiority and superiority hypotheses instead of two superiority hypotheses. As superiority needs to be demonstrated in at least one primary endpoint, in contrast to classical non-inferiority trials, a slightly wider margin may be acceptable because the intention is not to assure similarity but to exclude a detriment. In other settings, for example, where superiority in one endpoint should implicate at least a trend in the other endpoint, this can be flexibly modeled with a respectively smaller margin.
Decisions strategies with the discussed additional constraints have the advantage that statistical and clinical success regarding the primary endpoints and hypotheses are not in contradiction. Applying such constraints should be implemented instead of planning a trial with a dual primary endpoint concept, as the impact on the study power is marginal if the treatment effects are as expected or required.