2.1 Setting and notation
The various confidence intervals for that we describe in this paper are constructed using the duality between hypothesis testing and confidence intervals. Whitehead (1997), for example, has described how this may be done. We first describe this for the simple case where a single parameter θ is of interest. Suppose an ordering of the possible data sets is defined so that it is possible to recognise when one data set provides stronger evidence in favour of a large θ than another data set and we write when provides equivalent or stronger evidence when compared with . We define a p-value function of the unknown parameter by
where is a possible data set and is the observed data from the trial. If the ordering is such that if , and is a continuous function of θ, then the ordering is referred to as a stochastic ordering. For a stochastic ordering, Whitehead (1997) describes that, if is such that for data , , then
The various methods of constructing confidence intervals described in this paper use two techniques based on this property. The first technique is based on the fact that for and calculated from the data such that and , respectively, forms a confidence interval for θ. The second technique is based on noting that the confidence interval for θ is the set . Like Stallard and Todd (2005), we will refer to this technique of obtaining a confidence interval as the p-value inversion method.
The p-value inversion method is attractive when adjusted p-values, such as in the case of testing multiple hypotheses, are used. Stallard and Todd (2005) extend the p-value inversion method to the case where several experimental treatments are tested and the trial includes at least one interim analysis with the most effective treatment selected at the first interim analysis and the trial can stop early for efficacy or futility of the most effective treatment at any of the interim analyses. They produce a confidence region for the parameter vector of the treatment differences from which they obtain the confidence interval for . Based on our setting, we describe how they do this in Section 'Stallard and Todd (2005)'.
2.2 Sampson and Sill (2005)
2.3 Wu et al. (2010)
Wu et al. (2010) also use the first technique from Section 'Setting and notation', but not explicitly, to propose a method for calculating confidence intervals after a phase II/III clinical trial. To obtain the lower and upper bounds, they use the critical values for testing the null hypothesis and so we first describe how, using the distribution of , they obtain the critical values. Wu et al. (2010) assume the test statistic is used to test so that if d and are critical values chosen such that the required type I error rate is not inflated, the null hypothesis is rejected if
For a symmetric level α test, to obtain d while controlling the type I error rate in the strong sense, we require
Wu et al. (2010) note that Bischoff and Miller (2005) showed that the supremum is attained at . Without loss of generality, so that . Therefore, if we let and let denote the configuration (0, 0, …, 0), then d is obtained under . Using a similar argument, without loss of generality, the least favourable configuration for obtaining is , and . Thus, if we denote the configuration by , d and are obtained such that they, respectively, satisfy
2.4 Stallard and Todd (2005)
2.5 Posch et al. (2005)
Like Stallard and Todd (2005), Posch et al. (2005) use the p-value inversion method. They describe how to obtain the confidence region for the parameter vector and how to obtain confidence intervals from the confidence region. Before we describe how they define the confidence region, we first need to describe how they propose to conduct hypothesis testing after a trial that uses an ASD as this motivates how they obtain the confidence region. The primary null hypotheses of interest are . We denote the set by . For , we will write . For example for , we will simply write for the intersection null hypothesis . Posch et al. (2005) assume that testing will be conducted as described by Hommel (2001) and Bretz et al. (2006) among others, which involves constructing the closure set consisting of all hypotheses , . For a level α test, a primary hypothesis of interest is rejected if and only if all hypotheses with are rejected, each hypothesis tested at level α. For example, with , H1 is tested using the hypotheses H123, H12, H13 and H1. This controls the type I familywise error rate in the strong sense by the closure principle (CP) of Marcus et al. (1976). Using the CP, testing is conducted in a stepwise manner starting with the global intersection hypothesis. For example, for , the global hypothesis H123 is tested first. If H123 is not rejected, the testing stops and all the null hypotheses of interest (H1, H2, H3) are not rejected. If H123 is rejected, hypotheses with are tested next. Hypothesis testing proceeds to H1 if both H12 and H13 are rejected. Similarly, testing proceeds to H2 if both H12 and H23 and to H3 if both H23 and H13 are rejected. We next describe how the p-values for testing the intersection hypotheses in the closure set based on both stages 1 and 2 data are evaluated.
Posch et al. (2005) describe how to obtain one-sided confidence intervals (lower bounds) that correspond to the one-sided hypothesis tests for superiority of the experimental treatments over the control. Therefore, for each intersection hypothesis , the alternative hypothesis is that the difference between the effect of the highest effective experimental treatment in set I and the control treatment is greater than 0. We denote the one-sided stage 1 and stage 2 p-values used to test superiority for the apparently most effective treatment over the control treatment in the null hypothesis by and , respectively. When data are available, the p-values for elementary hypotheses () are calculated using the usual pairwise tests such as, for example, a t-test or a chi-squared test. For the intersection hypothesis with , we will use the Šidak adjusted p-values. The stage 1 Šidak adjusted p-value for hypothesis is given by
We will also use the Dunnett adjusted p-values. For the Dunnett test, is given by
where ϕ and Φ, respectively, denote the standard normal density and distribution functions and is the maximum standardised difference corresponding to the treatments in intersection hypothesis . At stage 2, because only one experimental treatment is tested, p-value adjustment is not required and so if and otherwise.
Posch et al. (2005) note that, in general, a confidence region will not be the cross product of confidence intervals for , . In order to obtain confidence intervals, they propose embedding the confidence region defined above in a hyper-rectangle. They are interested in one-sided confidence intervals and to obtain the lower confidence bounds, they note that the confidence region is embedded in the rectangle by setting adjusted p-values
for all . Note that for the pairwise hypotheses (6), the supremum over the real line is 1 so that Posch et al. (2005) observe that the form of for tests that use the pairwise p-values to obtain p-values for intersection hypotheses are of a simple form. For example, for the Šidak test,
For stage 2, because in our case only one experimental treatment continues to stage 2, we define
No p-value adjustment is required for analysis using stage 2 data only so that for , and for , . Hence, based on the adjusted p-values, for data , the simultaneous one-sided confidence interval for , is defined by
2.6 Extending Posch et al. (2005) method to obtain two-sided confidence intervals
In this paper, we are interested in simultaneous two-sided confidence intervals so that in this section, we describe how to extend the Posch et al. (2005) method to the setting of two-sided confidence intervals. The two-stage closed testing procedure is used so that p-values are obtained for each intersection hypothesis (). Two-sided confidence intervals correspond to two-sided hypothesis tests so that for each null intersection hypothesis in the closure set described in Section 'Posch et al. (2005)', the alternative hypothesis is that the difference between the effect of the apparently highest effective experimental treatment in set I and the control treatment is not equal to 0. We define p-values for testing superiority and inferiority of the experimental treatments. As in Section 'Posch et al. (2005)', we, respectively, denote the one-sided stages 1 and 2 p-values used to test superiority for the most effective treatment over the control treatment in the null hypothesis by and , respectively. We have described how to obtain these p-values using the Šidak and the Dunnett tests. For testing in the opposite direction, we respectively denote by and the stage 1 and stage 2 one-sided p-values that test inferiority for the apparently most effective experimental treatment over the control treatment in the null hypothesis . After collecting stage 2 data, because only one experimental treatment is tested, using the same explanation as in Section 'Posch et al. (2005)', if and otherwise. The stage 1 Šidak adjusted p-value for hypothesis is given by
where is the pairwise p-value testing the comparison of the experimental treatment i to the control in favour of the alternative that .
As in the case of one-sided confidence intervals, in order to obtain two-sided confidence intervals, the confidence region is embedded in a hyper-rectangle. This is done by defining adjusted p-values. The stages 1 and 2 adjusted p-values and , respectively, for testing the superiority of the apparently most effective treatment in hypothesis are described in Section 'Posch et al. (2005)', with expression (7) giving an example of an expression for . For tests of inferiority, in our case, because only one experimental treatment is selected, the adjusted p-value for and for . Following Posch et al. (2005), for the Šidak test, we would define . However, as described in Section 'Wu et al. (2010)', for the case where the apparently most effective treatment is selected to continue to stage 2, we can ignore the fact that we are comparing several experimental treatments to the control treatment so that the adjusted p-value simplifies to
for all . This leads to less conservative confidence intervals. We emphasize that is obtained using expression (10) only for the case where the apparently most effective treatment is selected. In general, when some other selection rule is used, should be obtained using the expression described previously. Similarly, for all we set
Hence, for data , the simultaneous two-sided confidence interval for , is defined by