Volume 71, Issue 1
BIOMETRIC METHODOLOGY
Free Access

Combining one‐sample confidence procedures for inference in the two‐sample case

Michael P. Fay

Corresponding Author

E-mail address: mfay@niaid.nih.gov

National Institute of Allergy and Infectious Diseases, 6700B Rockledge Dr. MSC 7630, Bethesda, Maryland, 20892‐7630, U.S.A.

email: mfay@niaid.nih.govSearch for more papers by this author
Michael A. Proschan

National Institute of Allergy and Infectious Diseases, 6700B Rockledge Dr. MSC 7630, Bethesda, Maryland, 20892‐7630, U.S.A.

Search for more papers by this author
Erica Brittain

National Institute of Allergy and Infectious Diseases, 6700B Rockledge Dr. MSC 7630, Bethesda, Maryland, 20892‐7630, U.S.A.

Search for more papers by this author
First published: 01 October 2014
Citations: 11
[Correction added on July 17, 2017, after first online publication: Web Appendix G, authors corrected the median definition so that the definition holds for all medians for a discrete distribution.]

Summary

We present a simple general method for combining two one‐sample confidence procedures to obtain inferences in the two‐sample problem. Some applications give striking connections to established methods; for example, combining exact binomial confidence procedures gives new confidence intervals on the difference or ratio of proportions that match inferences using Fisher's exact test, and numeric studies show the associated confidence intervals bound the type I error rate. Combining exact one‐sample Poisson confidence procedures recreates standard confidence intervals on the ratio, and introduces new ones for the difference. Combining confidence procedures associated with one‐sample t‐tests recreates the Behrens–Fisher intervals. Other applications provide new confidence intervals with fewer assumptions than previously needed. For example, the method creates new confidence intervals on the difference in medians that do not require shift and continuity assumptions. We create a new confidence interval for the difference between two survival distributions at a fixed time point when there is independent censoring by combining the recently developed beta product confidence procedure for each single sample. The resulting interval is designed to guarantee coverage regardless of sample size or censoring distribution, and produces equivalent inferences to Fisher's exact test when there is no censoring. We show theoretically that when combining intervals asymptotically equivalent to normal intervals, our method has asymptotically accurate coverage. Importantly, all situations studied suggest guaranteed nominal coverage for our new interval whenever the original confidence procedures themselves guarantee coverage.

1 Introduction

We propose a simple procedure to create a confidence interval (CI) for certain functions (e.g., the difference or the ratio) of two scalar parameters from each of two independent samples. The procedure only requires nested confidence intervals from the independent samples and certain monotonicity constraints on the function, and can be applied quite generally. We call our new CIs “melded confidence intervals” since they meld together the CIs from each of the two‐samples. In this article we focus on melded CIs that are created from two one‐sample CIs with guaranteed nominal coverage, and we conjecture that the resulting CIs themselves guarantee coverage. This conjecture is supported by simulated, numerical, and mathematical results.

The melded CI method is closely related to methods that have expanded or modified fiducial inference yet focus on frequentist properties, such as confidence structures (Balch, 2012), generalized fiducial inference (Hannig, 2009), inferential models (Martin and Liu, 2013), or confidence distributions (Xie and Singh, 2013). The melded CIs are much simpler to describe than the first three methods mentioned, and, unlike confidence distributions, can be applied to small sample discrete problems.

Fiducial inference is no longer part of mainstream statistics (for more background see Pedersen, 1978; Zabell, 1992; Hannig, 2009); nevertheless, it will be helpful to briefly describe some examples of fiducial inference and some of its shortcomings to show how the melded CIs relate to it and avoid those shortcomings. Unlike frequentist inference where parameters are fixed, or Bayesian inference where parameters are random, fiducial inference is not clearly in either camp, and hence has been the source of much confusion. Fiducial inference is a way of conditioning on the data and getting a distribution on the parameter without using a prior distribution. For example, if x is an observation drawn from a normal distribution with mean urn:x-wiley:15410420:media:biom12231:biom12231-math-0001 and variance 1 (i.e., urn:x-wiley:15410420:media:biom12231:biom12231-math-0002), then the corresponding fiducial distribution for urn:x-wiley:15410420:media:biom12231:biom12231-math-0003 is urn:x-wiley:15410420:media:biom12231:biom12231-math-0004. The middle 95% of that fiducial distribution is the usual 95% confidence interval for urn:x-wiley:15410420:media:biom12231:biom12231-math-0005. The problem is that the fiducial distribution cannot be used to get confidence intervals on non‐monotonic transformations of the parameter. For example, using the fiducial distribution for urn:x-wiley:15410420:media:biom12231:biom12231-math-0006 of urn:x-wiley:15410420:media:biom12231:biom12231-math-0007 as above, the corresponding distribution for urn:x-wiley:15410420:media:biom12231:biom12231-math-0008 is a non‐central chi square. A fiducial approach to creating in a one‐sided 95% lower confidence limit for urn:x-wiley:15410420:media:biom12231:biom12231-math-0009 is to take the 5% percentile of that non‐central chi square distribution, but this does not work well; when urn:x-wiley:15410420:media:biom12231:biom12231-math-0010, the coverage is only about 66% (see Pedersen, 1978, pp. 153–155). Another complication is that for discrete data such as a binomial observation, there are two fiducial distributions associated with the parameter, one can be used for obtaining the lower confidence limit and one for the upper limit (Stevens, 1950).

The melded CI approach is similar to the fiducial approach in that without using priors we associate probability distributions with parameters after conditioning on the data. But those probability distributions are only tools used to obtain the melded CIs and need not be interpreted as fiducial probabilities; all statistical theory in this article is firmly frequentist. The melded CI approach avoids the problems of fiducial inference two ways. First, we create distributions for parameters using only nested (defined in Section 2) one‐sample confidence intervals, whose theory is well developed and understood. This seamlessly creates either one distribution (e.g., in the normal case) or two distributions (e.g., in the binomial case) as needed. Second, we limit the application to functions of the parameters that meet some monotonicity constraints, so that when the one‐sample CIs have guaranteed coverage, the resulting melded CIs appear to also have guaranteed coverage.

Besides motivating some classical CIs and creating new CIs for these canonical examples, the melded CI method is a very general tool that can easily be used in essentially any complex two‐sample inference setting, as long as there is an established approach for computing confidence intervals for a single sample. As an example of a new CI consider the difference in medians. Existing methods require continuity or shift assumptions (the Hodges and Lehmann (1963) intervals) or large samples (the nonparametric bootstrap). A melded CI for this situation inverts the sign test, and requires none of those assumptions. Simulations show that, unlike the Hodges and Lehmann (1963) intervals or nonparametric bootstrap intervals, the melded confidence intervals retain nominal coverage in all cases studied including discrete cases, non‐shift cases, and small sample cases.

Here is an outline of the article. First, we define the procedure in Section 2. Then, we motivate the melded CIs for a simple example in Section 3, giving intuition for why it appears to retain the type I error rate. Section 4 gives more general mathematical results. The heart of the article shows the applications, with connections to well‐known tests for simple cases and new tests and confidence intervals for less simple cases. In Sections 57, we discuss the melded CIs applied to the normal, binomial, and Poisson problems, respectively. In Section 8 we study a nonparametric melded CI for the difference in medians. In Section 9 we explore the application to the difference in survival distributions. In Section 10 we discuss the relationship with the confidence distribution approach, and we end with a short discussion.

2 The Melded Confidence Interval Procedure

Suppose we have two independent samples, where for the ith sample, urn:x-wiley:15410420:media:biom12231:biom12231-math-0011 is the data vector, and urn:x-wiley:15410420:media:biom12231:biom12231-math-0012 is the associated random variable whose distribution depends on a scalar parameter urn:x-wiley:15410420:media:biom12231:biom12231-math-0013 and possibly other nuisance parameters. Let the nested 100q% one‐sided lower and upper confidence limits for urn:x-wiley:15410420:media:biom12231:biom12231-math-0014 be urn:x-wiley:15410420:media:biom12231:biom12231-math-0015 and urn:x-wiley:15410420:media:biom12231:biom12231-math-0016, respectively. By nested we mean that if urn:x-wiley:15410420:media:biom12231:biom12231-math-0017 then urn:x-wiley:15410420:media:biom12231:biom12231-math-0018 and urn:x-wiley:15410420:media:biom12231:biom12231-math-0019. We limit our application to functions of the parameters, written urn:x-wiley:15410420:media:biom12231:biom12231-math-0020, that are, loosely speaking, decreasing in urn:x-wiley:15410420:media:biom12231:biom12231-math-0021 among all allowable values of urn:x-wiley:15410420:media:biom12231:biom12231-math-0022 and increasing in urn:x-wiley:15410420:media:biom12231:biom12231-math-0023 among all allowable values of urn:x-wiley:15410420:media:biom12231:biom12231-math-0024 (see Supplementary Material Section A for a precise statement of the monotonicity constraints). We consider three examples for urn:x-wiley:15410420:media:biom12231:biom12231-math-0025 in this article: the difference, urn:x-wiley:15410420:media:biom12231:biom12231-math-0026; the ratio, urn:x-wiley:15410420:media:biom12231:biom12231-math-0027, which can be used if the parameter space for urn:x-wiley:15410420:media:biom12231:biom12231-math-0028 is urn:x-wiley:15410420:media:biom12231:biom12231-math-0029 for urn:x-wiley:15410420:media:biom12231:biom12231-math-0030; and the odds ratio, urn:x-wiley:15410420:media:biom12231:biom12231-math-0031, which can be used if the parameter space for urn:x-wiley:15410420:media:biom12231:biom12231-math-0032 is urn:x-wiley:15410420:media:biom12231:biom12231-math-0033 for urn:x-wiley:15410420:media:biom12231:biom12231-math-0034. Note this is a crucial restriction because the coverage properties of the method may not hold if the monotonicity constraints on g are violated (see Section 10). Let urn:x-wiley:15410420:media:biom12231:biom12231-math-0035. Then the urn:x-wiley:15410420:media:biom12231:biom12231-math-0036 lower and upper one‐sided melded confidence limits for urn:x-wiley:15410420:media:biom12231:biom12231-math-0037 are
urn:x-wiley:15410420:media:biom12231:biom12231-math-0038(1)
and
urn:x-wiley:15410420:media:biom12231:biom12231-math-0039(2)
where here and throughout the article, A and B are independent and uniform random variables. The melded CIs can be calculated by Monte Carlo simulation or numeric integration. For the examples in this article, we used numeric integration. See Section 6 for a worked example.
We can invert the confidence intervals to give p‐values associated with the corresponding series of hypothesis tests. For example, urn:x-wiley:15410420:media:biom12231:biom12231-math-0040 is the corresponding p‐value for testing the null hypothesis urn:x-wiley:15410420:media:biom12231:biom12231-math-0041. The p‐values have a simple form when urn:x-wiley:15410420:media:biom12231:biom12231-math-0042 implies urn:x-wiley:15410420:media:biom12231:biom12231-math-0043 (see Web Appendix  B):
urn:x-wiley:15410420:media:biom12231:biom12231-math-0044(3)
and, for testing urn:x-wiley:15410420:media:biom12231:biom12231-math-0045,
urn:x-wiley:15410420:media:biom12231:biom12231-math-0046(4)

3 Motivation

We motivate melded confidence intervals using the example of calculating the upper 64% one‐sided confidence limit for the difference in two proportions, urn:x-wiley:15410420:media:biom12231:biom12231-math-0047. We use the urn:x-wiley:15410420:media:biom12231:biom12231-math-0048 confidence interval because the graphs will be easier to interpret, but the ideas are analogous for more standard levels. Suppose we observe urn:x-wiley:15410420:media:biom12231:biom12231-math-0049 out of urn:x-wiley:15410420:media:biom12231:biom12231-math-0050 positive responses in group 1 and urn:x-wiley:15410420:media:biom12231:biom12231-math-0051 out of urn:x-wiley:15410420:media:biom12231:biom12231-math-0052 in group 2. Then the difference in sample proportions is urn:x-wiley:15410420:media:biom12231:biom12231-math-0053. A very simple urn:x-wiley:15410420:media:biom12231:biom12231-math-0054 confidence interval has upper limit, urn:x-wiley:15410420:media:biom12231:biom12231-math-0055, where urn:x-wiley:15410420:media:biom12231:biom12231-math-0056 and urn:x-wiley:15410420:media:biom12231:biom12231-math-0057 are one‐sided Clopper–Pearson limits. This CI is at least level 0.64, because by the nestedness property of the CIs for the urn:x-wiley:15410420:media:biom12231:biom12231-math-0058 we can write,
urn:x-wiley:15410420:media:biom12231:biom12231-math-0059
This CI for urn:x-wiley:15410420:media:biom12231:biom12231-math-0060 is illustrated in the upper quadrants of Figure 1. The level curve urn:x-wiley:15410420:media:biom12231:biom12231-math-0061 is represented by the dotted lines, and the gray areas represent the set, urn:x-wiley:15410420:media:biom12231:biom12231-math-0062 urn:x-wiley:15410420:media:biom12231:biom12231-math-0063. The left graph is represented in the urn:x-wiley:15410420:media:biom12231:biom12231-math-0064‐space with the corresponding lower limit levels (a) provided on the top, and the upper limit levels (b) provided on the right. The right graph is represented in the urn:x-wiley:15410420:media:biom12231:biom12231-math-0065‐space with the corresponding urn:x-wiley:15410420:media:biom12231:biom12231-math-0066 values displayed on the left and bottom. The area of the gray rectangle in the right graph is urn:x-wiley:15410420:media:biom12231:biom12231-math-0067 representing the nominal level.
image
Plots of simple 64% upper one‐sided confidence limits for urn:x-wiley:15410420:media:biom12231:biom12231-math-0068 with sample proportions urn:x-wiley:15410420:media:biom12231:biom12231-math-0069 and urn:x-wiley:15410420:media:biom12231:biom12231-math-0070. Top graphs depict urn:x-wiley:15410420:media:biom12231:biom12231-math-0071. The bottom graphs depict the CI constructed by combining two rectangles. The left graphs are plotted in the urn:x-wiley:15410420:media:biom12231:biom12231-math-0072 versus urn:x-wiley:15410420:media:biom12231:biom12231-math-0073 space with the associated levels for the lower limit levels (a) given on the top and the upper limit levels (b) given on the right. The right graphs are plotted on the a versus b space with the urn:x-wiley:15410420:media:biom12231:biom12231-math-0074 and urn:x-wiley:15410420:media:biom12231:biom12231-math-0075 axes adjusted accordingly. The dotted lines represent the level curve urn:x-wiley:15410420:media:biom12231:biom12231-math-0076 (top) or 0.396 (bottom), the upper one‐sided confidence limit for urn:x-wiley:15410420:media:biom12231:biom12231-math-0077, and the points represent the sample proportions. The right gray areas are 0.64, and pictorially represent the nominal level.

To obtain a lower urn:x-wiley:15410420:media:biom12231:biom12231-math-0078 value, we can combine two gray rectangles as depicted in the lower graphs of Figure 1. Let urn:x-wiley:15410420:media:biom12231:biom12231-math-0079, where urn:x-wiley:15410420:media:biom12231:biom12231-math-0080 and urn:x-wiley:15410420:media:biom12231:biom12231-math-0081. For urn:x-wiley:15410420:media:biom12231:biom12231-math-0082 in this form, the coverage is at least urn:x-wiley:15410420:media:biom12231:biom12231-math-0083 (i.e., the area of the gray regions in the right bottom graph). A formal statement of this is given in Theorem 1 (Section 4). For the lower graphs of Figure 1 we use urn:x-wiley:15410420:media:biom12231:biom12231-math-0084, urn:x-wiley:15410420:media:biom12231:biom12231-math-0085, urn:x-wiley:15410420:media:biom12231:biom12231-math-0086 and urn:x-wiley:15410420:media:biom12231:biom12231-math-0087, so that urn:x-wiley:15410420:media:biom12231:biom12231-math-0088. For this confidence limit, urn:x-wiley:15410420:media:biom12231:biom12231-math-0089, which is smaller than the value of urn:x-wiley:15410420:media:biom12231:biom12231-math-0090 of the upper graphs. Notice the lighter rectangles in the bottom quadrants of Figure 1 do not touch the dotted line at the corner, so there is room for improvement. That is, if we extend the lighter rectangle to the left, we can reduce the height of the darker rectangle; we can then shift the level curve urn:x-wiley:15410420:media:biom12231:biom12231-math-0091 to the “southeast” (i.e., urn:x-wiley:15410420:media:biom12231:biom12231-math-0092, where urn:x-wiley:15410420:media:biom12231:biom12231-math-0093), producing a narrower confidence interval.

We can continue adding more rectangles, but in a smarter way such that the corners of the rectangles touch the dotted line at the urn:x-wiley:15410420:media:biom12231:biom12231-math-0094 value. For example, suppose we posit a value for urn:x-wiley:15410420:media:biom12231:biom12231-math-0095 and values for urn:x-wiley:15410420:media:biom12231:biom12231-math-0096. Then as long as urn:x-wiley:15410420:media:biom12231:biom12231-math-0097 is not too small or the urn:x-wiley:15410420:media:biom12231:biom12231-math-0098 values are not too close to 0 or 1, we can solve for the urn:x-wiley:15410420:media:biom12231:biom12231-math-0099 values such that urn:x-wiley:15410420:media:biom12231:biom12231-math-0100. The nominal level, urn:x-wiley:15410420:media:biom12231:biom12231-math-0101, is the gray area in the right panels of Figure 2. Theorem 1 in Section 4 shows that urn:x-wiley:15410420:media:biom12231:biom12231-math-0102, and that the CIs achieve at least that nominal level of coverage. In Figure 2 we do this by positing urn:x-wiley:15410420:media:biom12231:biom12231-math-0103 values of urn:x-wiley:15410420:media:biom12231:biom12231-math-0104. For the top graphs we use 9 rectangles and get urn:x-wiley:15410420:media:biom12231:biom12231-math-0105, which is less than our target of urn:x-wiley:15410420:media:biom12231:biom12231-math-0106. But if we increase the number of rectangles to 98 (bottom graphs), we get urn:x-wiley:15410420:media:biom12231:biom12231-math-0107.

image
Plots of 64% upper one‐sided confidence limits for urn:x-wiley:15410420:media:biom12231:biom12231-math-0108 with sample proportions urn:x-wiley:15410420:media:biom12231:biom12231-math-0109 and urn:x-wiley:15410420:media:biom12231:biom12231-math-0110. Top graphs depict use 9 rectangles (urn:x-wiley:15410420:media:biom12231:biom12231-math-0111), while the bottom graphs use 98 rectangles (urn:x-wiley:15410420:media:biom12231:biom12231-math-0112). The associated b values are chosen so that urn:x-wiley:15410420:media:biom12231:biom12231-math-0113 equals 0.30. The right gray areas represent the nominal level and are urn:x-wiley:15410420:media:biom12231:biom12231-math-0114 (top) and urn:x-wiley:15410420:media:biom12231:biom12231-math-0115 (bottom). As with Figure 1, the left graphs are plotted in the urn:x-wiley:15410420:media:biom12231:biom12231-math-0116 versus urn:x-wiley:15410420:media:biom12231:biom12231-math-0117 space with the associated levels for the lower limit levels (a) given on the top and the on the upper limit levels (b) given on the right. The right graphs are plotted on the a versus b space with the urn:x-wiley:15410420:media:biom12231:biom12231-math-0118 and urn:x-wiley:15410420:media:biom12231:biom12231-math-0119 axes adjusted accordingly. The dotted lines represent the upper one‐sided confidence limit for urn:x-wiley:15410420:media:biom12231:biom12231-math-0120 and the points represent the sample proportions.

The panel in the lower right of Figure 2 shows that there is now little room for improvement, since there is not much white space below and to the right of the dotted line, and urn:x-wiley:15410420:media:biom12231:biom12231-math-0121 is close to the nominal level of urn:x-wiley:15410420:media:biom12231:biom12231-math-0122. The melded CIs are equivalent to finding the dotted line, and its corresponding urn:x-wiley:15410420:media:biom12231:biom12231-math-0123, such that the area under the dotted curve on the a versus b plot is exactly urn:x-wiley:15410420:media:biom12231:biom12231-math-0124. For this example, this value is urn:x-wiley:15410420:media:biom12231:biom12231-math-0125, much improved over the original 0.427. We next provide these statements in a more general way (i.e., allowing other functions besides the difference, and not just the binomial case), but there are essentially no new conceptual ideas needed for applying the method more generally.

4 Some General Theorems

We now gather the motivating ideas into two general theorems and propose another about power. The theorems are for the one‐sided upper interval; the one‐sided lower is analogous and is not presented.

Theorem 1.Define urn:x-wiley:15410420:media:biom12231:biom12231-math-0126 with monotonicity constraints as in Section  2. Let urn:x-wiley:15410420:media:biom12231:biom12231-math-0127, and urn:x-wiley:15410420:media:biom12231:biom12231-math-0128, and urn:x-wiley:15410420:media:biom12231:biom12231-math-0129, and

urn:x-wiley:15410420:media:biom12231:biom12231-math-0130
Then
urn:x-wiley:15410420:media:biom12231:biom12231-math-0131(5)

The theorem is proven in Web Appendix  C.

We relate this theorem to the melded CIs by the following.

Theorem 2.For each data vector, urn:x-wiley:15410420:media:biom12231:biom12231-math-0132, the value urn:x-wiley:15410420:media:biom12231:biom12231-math-0133 of equation 2 gives the infimum value of urn:x-wiley:15410420:media:biom12231:biom12231-math-0134 such that urn:x-wiley:15410420:media:biom12231:biom12231-math-0135 over all possible vectors urn:x-wiley:15410420:media:biom12231:biom12231-math-0136 and urn:x-wiley:15410420:media:biom12231:biom12231-math-0137 as defined in Theorem 1.

The theorem is proven in Web Appendix  D.

Theorems 1 and 2 suggest that the melded CIs guarantee coverage when each of the single sample CIs guarantee coverage. That conjecture has not been rigorously proven. Although Theorem 1 holds for any fixed urn:x-wiley:15410420:media:biom12231:biom12231-math-0138 and urn:x-wiley:15410420:media:biom12231:biom12231-math-0139, the values of urn:x-wiley:15410420:media:biom12231:biom12231-math-0140 and urn:x-wiley:15410420:media:biom12231:biom12231-math-0141 that give the infimum value of urn:x-wiley:15410420:media:biom12231:biom12231-math-0142 in Theorem 2 depend on urn:x-wiley:15410420:media:biom12231:biom12231-math-0143. So to rigorously show guaranteed coverage, we need to show an inequality analogous to expression 5, except allowing urn:x-wiley:15410420:media:biom12231:biom12231-math-0144 and urn:x-wiley:15410420:media:biom12231:biom12231-math-0145 to depend on urn:x-wiley:15410420:media:biom12231:biom12231-math-0146. Despite this lack of rigor, in every example studied in the article, the evidence fully supports the conjecture.

For any confidence interval or series of hypothesis tests, we want not just guaranteed coverage and controlled type I error rates, but tight CIs and powerful tests. To show that the melded CIs are a good strategy in this respect, we turn to the case when each of the individual CIs that are melded together are asymptotically equivalent to standard normal theory confidence intervals.

Theorem 3.Let urn:x-wiley:15410420:media:biom12231:biom12231-math-0147 be asymptotically normal, that is, urn:x-wiley:15410420:media:biom12231:biom12231-math-0148, and assume that urn:x-wiley:15410420:media:biom12231:biom12231-math-0149, and urn:x-wiley:15410420:media:biom12231:biom12231-math-0150 converge almost surely to urn:x-wiley:15410420:media:biom12231:biom12231-math-0151, and urn:x-wiley:15410420:media:biom12231:biom12231-math-0152, respectively. Suppose that

urn:x-wiley:15410420:media:biom12231:biom12231-math-0153
urn:x-wiley:15410420:media:biom12231:biom12231-math-0154
If g has continuous partial derivatives, then the melded CIs using urn:x-wiley:15410420:media:biom12231:biom12231-math-0155 and urn:x-wiley:15410420:media:biom12231:biom12231-math-0156 have asymptotically accurate coverage probabilities and are asymptotically equivalent to applying the delta method on the function urn:x-wiley:15410420:media:biom12231:biom12231-math-0157; that is, treating urn:x-wiley:15410420:media:biom12231:biom12231-math-0158 as asymptotically normal with mean urn:x-wiley:15410420:media:biom12231:biom12231-math-0159 and variance
urn:x-wiley:15410420:media:biom12231:biom12231-math-0160

We can apply Theorem 3 to situations for which the one‐sample intervals are asymptotically normal, such as the binomial case of Section 5. For a proof of the theorem and how it applies to the binomial case, see Web Appendix  E.

5 Normal Case

Let urn:x-wiley:15410420:media:biom12231:biom12231-math-0161 be independently distributed urn:x-wiley:15410420:media:biom12231:biom12231-math-0162 for urn:x-wiley:15410420:media:biom12231:biom12231-math-0163. Let urn:x-wiley:15410420:media:biom12231:biom12231-math-0164 and urn:x-wiley:15410420:media:biom12231:biom12231-math-0165 be the usual sample mean and unbiased variance estimate from the ith group. Consider first the case with known variances. Then urn:x-wiley:15410420:media:biom12231:biom12231-math-0166. Thus, urn:x-wiley:15410420:media:biom12231:biom12231-math-0167 and urn:x-wiley:15410420:media:biom12231:biom12231-math-0168 as well. When urn:x-wiley:15410420:media:biom12231:biom12231-math-0169 then urn:x-wiley:15410420:media:biom12231:biom12231-math-0170 and urn:x-wiley:15410420:media:biom12231:biom12231-math-0171, which are equivalent to the CIs that match the uniformly most powerful (UMP) one‐sided tests (see e.g., Lehmann and Romano, 2005, p. 90).

Now suppose the urn:x-wiley:15410420:media:biom12231:biom12231-math-0172 are unknown and not assumed equal. Then the usual one‐sample t‐based confidence intervals at levels a and b give,
urn:x-wiley:15410420:media:biom12231:biom12231-math-0173(6)
where urn:x-wiley:15410420:media:biom12231:biom12231-math-0174 is the qth quantile of a central t‐distribution with d degrees of freedom. By the probability integral transform and with A and B uniform, urn:x-wiley:15410420:media:biom12231:biom12231-math-0175 and urn:x-wiley:15410420:media:biom12231:biom12231-math-0176 are t random variables. Then the urn:x-wiley:15410420:media:biom12231:biom12231-math-0177 melded CI for urn:x-wiley:15410420:media:biom12231:biom12231-math-0178 is the ath and urn:x-wiley:15410420:media:biom12231:biom12231-math-0179th quantiles of urn:x-wiley:15410420:media:biom12231:biom12231-math-0180. This is equivalent to the Behrens–Fisher interval (Fisher, 1935). Thus, a test based on that interval would give significance whenever the Behrens–Fisher solution declared the two means significantly different. The coverage of the Behrens–Fisher interval is generally not exactly equal its nominal value, but is thought to be conservative. Robinson (1976) conjectured and supported through extensive calculations that the test based on the Behrens–Fisher solution retains the type I error rate. As far as we are aware (see also Lehmann and Romano, 2005, p. 415), the first proof of this retention of the type I error rate was Balch (2012), which used Dempster–Shafer evidence theory (see e.g., Yager and Liu, 2008) and his newly developed confidence structures. (Note: the rigor of Balch's proof may be similar to the relationship of Theorem 1 and 2 to our conjecture, because the A in Balch's Confidence‐Mapping Lemma would typically depend on the data, and this is not explicitly accounted for in Balch's proof.) Theorems 1 and 2 of this article provide additional support for this claim.

6 Binomial Case

Suppose urn:x-wiley:15410420:media:biom12231:biom12231-math-0181. Then using the usual exact (i.e., guaranteed coverage for all values of urn:x-wiley:15410420:media:biom12231:biom12231-math-0182, but possibly conservative for many values of urn:x-wiley:15410420:media:biom12231:biom12231-math-0183) one‐sided intervals for a binomial (Clopper and Pearson, 1934), we have urn:x-wiley:15410420:media:biom12231:biom12231-math-0184 and urn:x-wiley:15410420:media:biom12231:biom12231-math-0185, where A and B are uniform, and for notational convenience we extend the definition of the beta distribution to include point mass distributions at the limits, so urn:x-wiley:15410420:media:biom12231:biom12231-math-0186 is a point mass at 0 and urn:x-wiley:15410420:media:biom12231:biom12231-math-0187 is a point mass at 1 for urn:x-wiley:15410420:media:biom12231:biom12231-math-0188. We can obtain new exact confidence intervals for the difference: urn:x-wiley:15410420:media:biom12231:biom12231-math-0189, the ratio: urn:x-wiley:15410420:media:biom12231:biom12231-math-0190, or the odds ratio: urn:x-wiley:15410420:media:biom12231:biom12231-math-0191 by choosing the appropriate g.

We illustrate the calculations for the data in Section  3 and the difference, urn:x-wiley:15410420:media:biom12231:biom12231-math-0192, but using a more conventional confidence limit of 95%. Recall that urn:x-wiley:15410420:media:biom12231:biom12231-math-0193 and urn:x-wiley:15410420:media:biom12231:biom12231-math-0194. First, we run the Monte Carlo calculation, with urn:x-wiley:15410420:media:biom12231:biom12231-math-0195 replications. For the lower limit, we use the kth largest (urn:x-wiley:15410420:media:biom12231:biom12231-math-0196 out of m pseudo‐random samples of urn:x-wiley:15410420:media:biom12231:biom12231-math-0197, where urn:x-wiley:15410420:media:biom12231:biom12231-math-0198 and urn:x-wiley:15410420:media:biom12231:biom12231-math-0199, giving urn:x-wiley:15410420:media:biom12231:biom12231-math-0200. Similarly, for the upper limit we use the urn:x-wiley:15410420:media:biom12231:biom12231-math-0201th (see Efron and Tibshirani, 1993, p. 160) largest out of m pseudo‐random samples of urn:x-wiley:15410420:media:biom12231:biom12231-math-0202, where urn:x-wiley:15410420:media:biom12231:biom12231-math-0203 and urn:x-wiley:15410420:media:biom12231:biom12231-math-0204, giving urn:x-wiley:15410420:media:biom12231:biom12231-math-0205. The one‐sided p‐values are the proportion of the urn:x-wiley:15410420:media:biom12231:biom12231-math-0206 that are less than urn:x-wiley:15410420:media:biom12231:biom12231-math-0207, giving urn:x-wiley:15410420:media:biom12231:biom12231-math-0208, and the proportion of the urn:x-wiley:15410420:media:biom12231:biom12231-math-0209 that are greater than urn:x-wiley:15410420:media:biom12231:biom12231-math-0210, giving urn:x-wiley:15410420:media:biom12231:biom12231-math-0211. Alternatively, we could use the numeric integration calculation. Using the relationship between one‐sided p‐values and confidence limits,
urn:x-wiley:15410420:media:biom12231:biom12231-math-0212
where urn:x-wiley:15410420:media:biom12231:biom12231-math-0213 is the cumulative distribution of urn:x-wiley:15410420:media:biom12231:biom12231-math-0214, and urn:x-wiley:15410420:media:biom12231:biom12231-math-0215 is the density function of urn:x-wiley:15410420:media:biom12231:biom12231-math-0216. Then using a root solving function, we find the value of urn:x-wiley:15410420:media:biom12231:biom12231-math-0217 such that urn:x-wiley:15410420:media:biom12231:biom12231-math-0218, giving urn:x-wiley:15410420:media:biom12231:biom12231-math-0219. Analogously, we solve urn:x-wiley:15410420:media:biom12231:biom12231-math-0220 for urn:x-wiley:15410420:media:biom12231:biom12231-math-0221 using numeric integration to get urn:x-wiley:15410420:media:biom12231:biom12231-math-0222. Using numeric integration, the one‐sided p‐values for testing urn:x-wiley:15410420:media:biom12231:biom12231-math-0223 are urn:x-wiley:15410420:media:biom12231:biom12231-math-0224 and urn:x-wiley:15410420:media:biom12231:biom12231-math-0225, giving a two‐sided p‐value of urn:x-wiley:15410420:media:biom12231:biom12231-math-0226.

Note that the associated p‐values for testing the one‐sided equality of the urn:x-wiley:15410420:media:biom12231:biom12231-math-0227 (i.e., urn:x-wiley:15410420:media:biom12231:biom12231-math-0228 and urn:x-wiley:15410420:media:biom12231:biom12231-math-0229 for the difference, or generally equations 3 or 4) are equivalent to the one‐sided p‐values using Fisher's exact test. This equivalence has been shown in the context of the Bayesian analysis of a urn:x-wiley:15410420:media:biom12231:biom12231-math-0230 table by Altham (1969). Because of this equivalence the type I error rate is bounded at the nominal level when testing one‐sided tests that urn:x-wiley:15410420:media:biom12231:biom12231-math-0231 or urn:x-wiley:15410420:media:biom12231:biom12231-math-0232.

In order to test the coverage, we performed extensive numerical calculations. For any fixed urn:x-wiley:15410420:media:biom12231:biom12231-math-0233 and urn:x-wiley:15410420:media:biom12231:biom12231-math-0234 we calculated all the possible melded upper 95% confidence limits. Then, using those upper limits, we calculated the coverage for all urn:x-wiley:15410420:media:biom12231:biom12231-math-0235 values of urn:x-wiley:15410420:media:biom12231:biom12231-math-0236 and urn:x-wiley:15410420:media:biom12231:biom12231-math-0237 in urn:x-wiley:15410420:media:biom12231:biom12231-math-0238. We repeated this calculation for all urn:x-wiley:15410420:media:biom12231:biom12231-math-0239. We repeated these steps for the differences, ratios, and odds ratios. We found that the coverage was always at least 95%. Because of the symmetrical nature of the problem, this implies 95% coverage for the lower limits as well. Thus, it appears that these melded confidence intervals guarantee coverage.

This problem is the widely studied urn:x-wiley:15410420:media:biom12231:biom12231-math-0240 table with one margin (namely, the sample sizes) fixed. There is no consensus on the best inferential method for this situation. Some argue that conditioning is merited (Yates, 1984, see discussion), but others argue that the unconditional test is preferred because it is generally more powerful in this case (Lydersen, Fagerland, and Laake, 2009). An issue is that if you fix the significance level, then the discreteness of the conditional distribution will typically make the conditional inferences less powerful than the unconditional ones. Some argue for conditioning by noting that fixing the significance level is not needed or scientific (Upton, 1992), or that we condition on the closely related Poisson problems without controversy (Little, 1989). If we remove the discreteness problem by the impractical use of randomization, then a conditional test, the randomized version of Fisher's exact test, is the uniformly most powerful unbiased test (see Lehmann and Romano, 2005, p.127).

Since our melded confidence limits match the one‐sided Fisher's exact test as mentioned above, the melded confidence limits allow conditional‐like inferences for the difference and ratio, whereas previously they have only been available in practice for the odds ratio. Additionally, the melded CIs are much faster to calculate than the unconditional intervals because the unconditional intervals require searching over the space of the nuisance parameter (see, e.g., Chan and Zhang, 1999).

7 Poisson Case

Suppose urn:x-wiley:15410420:media:biom12231:biom12231-math-0241 for urn:x-wiley:15410420:media:biom12231:biom12231-math-0242, where the mean urn:x-wiley:15410420:media:biom12231:biom12231-math-0243 is the rate, urn:x-wiley:15410420:media:biom12231:biom12231-math-0244, times the time at risk, urn:x-wiley:15410420:media:biom12231:biom12231-math-0245. Suppose we are interested in testing urn:x-wiley:15410420:media:biom12231:biom12231-math-0246. As with the binomial case, the UMPU test is a randomized one, and practically, we use a non‐randomized version of it. In this practical test, we condition on urn:x-wiley:15410420:media:biom12231:biom12231-math-0247; then when urn:x-wiley:15410420:media:biom12231:biom12231-math-0248 we have urn:x-wiley:15410420:media:biom12231:biom12231-math-0249 (see, e.g., Lehmann and Romano, 2005). We reject when urn:x-wiley:15410420:media:biom12231:biom12231-math-0250 is large and the p‐value is
urn:x-wiley:15410420:media:biom12231:biom12231-math-0251
We show in Web Appendix  F that urn:x-wiley:15410420:media:biom12231:biom12231-math-0252 is equivalent to the melded p‐value based on the standard one‐sample exact Poisson intervals (Garwood, 1936).

The advantage of the melded intervals is that we may get intervals for the difference in the urn:x-wiley:15410420:media:biom12231:biom12231-math-0253. The difference may be more important for measuring public health implications of interventions, since it can be translated into how many lives are affected (see, e.g., Chan and Wang, 2009). For example, halving the relative risk from a baseline disease rate of 2% is very different from a public health perspective than if the baseline rate is 20%. Conversely, changing the risk by decreasing the rate of a disease by 1% affects a similar number of people regardless of whether the baseline risk is 2% or 20%. As with the binomial case, the unconditional exact method is much more difficult to calculate because one needs to search over the nuisance parameter space. There are approximate and quasi‐exact methods available (Chan and Wang, 2009), but no conditional exact method. Because of Theorems 1 and 2, we suspect that the melded CIs retain nominal coverage, and they are easy to calculate. Full exploration of that option and the comparison with the best competitor is left to future work.

8 Difference in Medians

Several methods have been proposed for CIs on the difference in medians from two‐samples for non‐censored responses. First, assuming that the two distributions represent continuous responses and differ only by a location shift, then the method of Hodges and Lehmann (1963) provides CIs on the difference in medians that guarantee coverage. However, the Hodges–Lehmann CIs can have far less than nominal coverage if either assumption does not hold, as will be shown. Second, the nonparametric bootstrap is valid asymptotically and does not require the shift assumption (Efron and Tibshirani, 1993). Other asymptotic methods require the continuity assumption and allow different types of censoring and will not be discussed further (Su and Wei, 1993; Kosorok, 1999).

image
Mixture of Normal distributions for median simulations. Sample 1 is black solid, sample 2 is gray dotted, vertical lines are medians.

We create melded CIs for this situation, using single sample CIs derived by inverting the sign test (see, e.g., Slud, Byar, and Green, 1984). In Web Appendix  G, we derive the lower (equation 15) and upper (equation 16) one‐sided confidence limit functions that guarantee coverage even for discrete distributions. These one‐sample CI functions may return either urn:x-wiley:15410420:media:biom12231:biom12231-math-0254 (for the lower limit) or urn:x-wiley:15410420:media:biom12231:biom12231-math-0255 (for the upper limit), and the melded CI may give urn:x-wiley:15410420:media:biom12231:biom12231-math-0256 if the sample size is too small. For example, in the continuous case with equal sample sizes in the two groups, we need at least 7 in each group to get finite 95% CIs. This is more restrictive than the Hodges–Lehmann procedure, which requires at least 4 in each group for that situation to get finite 95% CIs.

We compare the melded CIs to the Hodges‐Lehman CIs and nonparametric percentile bootstrap CIs that uses 2000 replications. We simulate five scenarios, with 10,000 data sets for each scenario with urn:x-wiley:15410420:media:biom12231:biom12231-math-0257 or urn:x-wiley:15410420:media:biom12231:biom12231-math-0258 in each sample. Let urn:x-wiley:15410420:media:biom12231:biom12231-math-0259 be the distributions for group i, urn:x-wiley:15410420:media:biom12231:biom12231-math-0260, and let urn:x-wiley:15410420:media:biom12231:biom12231-math-0261 denote a normal distribution with mean urn:x-wiley:15410420:media:biom12231:biom12231-math-0262 and variance urn:x-wiley:15410420:media:biom12231:biom12231-math-0263. The five scenarios are:
  • Normals:   null case, urn:x-wiley:15410420:media:biom12231:biom12231-math-0264;
  • Figure 3a:   shift case, urn:x-wiley:15410420:media:biom12231:biom12231-math-0265 (median=‐1.335) and urn:x-wiley:15410420:media:biom12231:biom12231-math-0266 (median=0.665);
  • Figure 3b:   asymmetric continuous case 1, urn:x-wiley:15410420:media:biom12231:biom12231-math-0267 (median=‐1.335) and urn:x-wiley:15410420:media:biom12231:biom12231-math-0268 (median=1.335);
  • Figure 3c:   asymmetric continuous case 2, urn:x-wiley:15410420:media:biom12231:biom12231-math-0269 (medianurn:x-wiley:15410420:media:biom12231:biom12231-math-0270) and urn:x-wiley:15410420:media:biom12231:biom12231-math-0271 (median=1.000);
  • Poissons:   discrete and asymmetric case, urn:x-wiley:15410420:media:biom12231:biom12231-math-0272 is Poisson with mean urn:x-wiley:15410420:media:biom12231:biom12231-math-0273 (median=2), and urn:x-wiley:15410420:media:biom12231:biom12231-math-0274 is Poisson with mean urn:x-wiley:15410420:media:biom12231:biom12231-math-0275 (median=3).

The simulation results are given in Table 1. When the continuous and shift assumptions are met (Normals and Figure 3a), the Hodges‐Lehman CIs have coverage near the nominal 95%, while in the other cases where those assumptions are not met, the Hodges–Lehmann CI has poor coverage that becomes worse as the sample size increases. In those latter cases, because the assumptions do not hold, the Hodges–Lehmann CIs on the shift are not measuring the difference in medians, and applying the method in those scenarios would lead to incorrect confidence intervals. The bootstrap does reasonably well in most situations, but does not appear to have proper coverage for Figure 3b (when urn:x-wiley:15410420:media:biom12231:biom12231-math-0276 per group) and the Poissons case (even when urn:x-wiley:15410420:media:biom12231:biom12231-math-0277). Generally, the melded CIs have wider CIs than the Hodges–Lehmann and the bootstrap CIs, but this wideness ensures simulated coverage of at least the nominal level for all scenarios studied. Thus, if the priority is guaranteed coverage regardless of sample size or distributional assumptions, then the melded CIs are recommended.

Table 1. Simulated coverage for nominal 95% confidence intervals for difference in medians. The five scenarios are described in the text, but briefly: Normals is a null case with both groups standard normal, Figures 3a–c are mixtures of normals denoting a shift (Figure 3a) or asymmetric mixtures (Figures 3b and c), and Poissons are Poisson with means 2.6 and 2.7. Bolded values are significantly less than the nominal 95%
Percent Percent Percent Ratio of median Ratio of median
n per coverage coverage coverage CI lengths CI lengths
Description group H–L bootstrap melded (melded/H–L) (melded/bootstrap)
Normals 20 95.1 96.7 99.2 1.51 1.27
100 95.1 96.4 97.6 1.35 1.09
Figure 3a 20 94.9 96.4 99.0 2.44 1.20
100 95.2 96.6 97.8 4.18 1.09
Figure 3b 20 47.4 92.4 97.5 2.31 1.20
100 0.6 95.0 96.7 3.69 1.09
Figure 3c 20 74.7 95.7 98.8 1.42 1.26
100 17.0 95.8 97.3 1.08 1.10
Poissons 20 87.5 90.3 100.0 2.50 2.00
100 57.0 91.8 100.0 4.00 2.00

9 Inferences Between Survival Distributions at a Fixed Time for Right Censored Data

The logrank test or weighted logrank tests are popular for testing for differences in survival distributions because of their good power under proportional hazards models or accelerated failure time (AFT) models (see, e.g., Kalbfleisch and Prentice, 2002, Chapter 7). In some situations those models do not fit the data well. For example, an aggressive new treatment may lead to substantial mortality immediately after initiation, but can increase survival compared to the standard treatment if the patient survives the first few weeks of treatment (see, e.g., Figure 4c). In this case the AFT models do not fit, and a more useful and relevant test is a difference comparison in survival after a fixed amount of time, for example, 1 year after randomization to treatment. Another example is plotted in Figure 4a. Suppose 30% of individuals have a serious version of a disease and die within the first year, while the other 70% have a less serious version and survive longer. Suppose a new treatment prolongs the life of those with the serious version for a short period (less than a year) but does not change that of the others. The logrank test may show significance for the new treatment, but the new treatment is not really curing patients for the long term. A better test may be to test for significant differences at 1 year.

image
Survival distributions for simulations, control arm is dotted gray, treatment arm is solid black. The survival distributions are compared at time 1.0.

Klein et al. (2007) studied several two‐sample tests for comparing survival estimates at a fixed time. They concluded that the test based on the normal approximation on the complementary log log (CLL) transformation of the Kaplan–Meier survival estimator, estimating the variance for each sample with Greenwood's formula and the delta method (see equation 3 of that article) was generally the best at retaining the Type I error rate. We call this the CLL test, and it can fail to retain the type I error rate with small samples and/or heavy censoring. Further, the CLL test cannot be used if the Kaplan–Meier estimator for either one of the groups is equal to 0 or 1 at the fixed test time.

Thus, if we are interested in comparing survival at a fixed time in a small sample case with heavy censoring, the CLL may not be a good test, especially if a conservative procedure is desired, such as in a regulatory setting. As an alternative, we can use the melded confidence intervals based on the beta product confidence procedure (BPCP) for each survival distribution (Fay, Brittain, and Proschan, 2013). The BPCP was designed to guarantee central coverage for survival at a fixed point, and bounds the type I error rate in situations where alternative CIs (including the bootstrap) fail to do so. Thus, the BPCP is a good choice when guaranteed coverage is important with small samples. Further, it has no requirement that the Kaplan–Meier estimators for each sample be between 0 and 1. Using the method of moments implementation of the BPCP, we can create the random variables associated with the BPCP limits using beta distributions. Because the BPCP reduces to the Clopper–Pearson intervals when there is no censoring, the melded confidence limits in this case reduce to the Fisher's exact test when testing the equality of the survival distributions (see Section 6).

For the simulations, we model the failure times of the two groups as mixture distributions. We consider 4 different pairs of mixture distributions, with survival distributions given by Figure 4, each with either moderate or heavy censoring. We let the number in each group (urn:x-wiley:15410420:media:biom12231:biom12231-math-0280) be 50 or 100. We simulate 10,000 data sets per condition. Details of the simulation are given in Web Appendix  H.

The results are given in Table 2. We divide up the missed coverage into low (test arm has lower survival than control arm) and high (test has higher survival). We use 95% confidence limits so we expect 2.5% error for each side at the nominal level. Data sets with the CLL tests undefined (due to the Kaplan–Meier from either group being equal to 0 or 1 when time is 1) were considered non‐rejections in the table. With heavy censoring for model b, there is 33.8% undefined. In most other cases with urn:x-wiley:15410420:media:biom12231:biom12231-math-0281 there is 2–4% undefined. So this is a major practical disadvantage in a clinical trial, where the primary test should be specified in advance. The coverage under null hypotheses (models a and b) can be very inflated with heavy censoring, with upper error about 10% instead of the nominal 2.5%.

Table 2. Simulated Percent that Reject urn:x-wiley:15410420:media:biom12231:biom12231-math-0278 at the one‐sided 2.5% level. Survival models a and b are null models so simulated percent should be 2.5%, and bolded values for those models are significantly larger than 2.5% (at the two‐sided 0.05 level). Models described by Figure 4 and Web Appendix  H
CLL, %
urn:x-wiley:15410420:media:biom12231:biom12231-math-0279 Model Censoring Meld low Meld high CLL low CLL high undefined
50 a Moderate 0.28 0.43 2.69 2.12 0.00
50 a Heavy 0.01 0.00 2.07 9.97 3.34
50 b Moderate 0.23 0.68 1.78 2.50 2.13
50 b Heavy 0.00 0.19 0.00 9.64 34.92
50 c Moderate 0.00 51.78 0.00 83.31 0.11
50 c Heavy 0.00 0.68 0.00 58.56 6.71
50 d Moderate 0.00 99.96 0.00 97.94 2.06
50 d Heavy 0.00 41.07 0.00 92.15 4.67
100 a Moderate 0.63 0.76 2.94 2.56 0.00
100 a Heavy 0.01 0.04 3.01 7.04 0.51
100 b Moderate 0.51 0.78 2.74 2.51 0.08
100 b Heavy 0.00 0.29 0.05 5.20 19.81
100 c Moderate 0.00 93.58 0.00 98.47 0.00
100 c Heavy 0.00 6.85 0.00 86.08 3.19
100 d Moderate 0.00 100.00 0.00 99.96 0.04
100 d Heavy 0.00 84.47 0.00 97.97 0.87

The melded CI method estimated type I error rate is substantially smaller than the nominal type I error rate in all cases simulated. This leads to reduced power compared to the CLL (see model c). This is because with heavy censoring there are very few observations at risk at time=1, so that in each arm the BPCP confidence limits can get very conservative, and conservativeness of the CI associated with each arm naturally propagates to conservativeness of the melded CIs. Conversely, the anti‐conservativeness of the CIs based on the asymptotic normal approximation of the complementary log–log transformation in this heavy censoring situation leads to the anti‐conservativeness of the CLL tests.

This is only a preliminary assessment of how the melded confidence interval performs. It appears to be a promising approach when a conservative test is required, although it clearly has low power for some parameters we considered. This may reflect the fact that inference at a fixed time point is fraught with difficulty when there is considerable censoring. Finally, note that because the beta product confidence procedure on the survival distribution may be inverted to get one‐sample confidence intervals on the median with right censoring (see Fay et al., 2013, Section 6.1), we can use this procedure to get melded CIs for the difference in medians with right censoring. This once again illustrates the flexibility of the melded confidence interval approach.

10 Connections to the Confidence Distribution Method

The confidence distribution (CD) is a frequentist distributional estimator of a parameter. For example, consider the case where we have an exact one‐sided CI for continuous data, where the coverage associated with the one‐sided upper confidence limit, urn:x-wiley:15410420:media:biom12231:biom12231-math-0282, is q for all urn:x-wiley:15410420:media:biom12231:biom12231-math-0283. In this case urn:x-wiley:15410420:media:biom12231:biom12231-math-0284 is a CD random variable, and its cumulative distribution function at t is urn:x-wiley:15410420:media:biom12231:biom12231-math-0285. We call urn:x-wiley:15410420:media:biom12231:biom12231-math-0286 the CD. It can be used similarly to other distribution estimators like the bootstrap or the posterior distribution. For example, the middle urn:x-wiley:15410420:media:biom12231:biom12231-math-0287 of the distribution is a urn:x-wiley:15410420:media:biom12231:biom12231-math-0288 central confidence interval for any q. This application is circular, since we derive the CD from the CI process, then use the CD to get back the CIs. The modern definition of a CD avoids the circular reasoning by defining a CD as a function having two properties: (i) for each urn:x-wiley:15410420:media:biom12231:biom12231-math-0289, urn:x-wiley:15410420:media:biom12231:biom12231-math-0290 is a cumulative distribution for urn:x-wiley:15410420:media:biom12231:biom12231-math-0291, and (ii) at the true value urn:x-wiley:15410420:media:biom12231:biom12231-math-0292, urn:x-wiley:15410420:media:biom12231:biom12231-math-0293 is uniform. Importantly, the CD has other uses besides estimating CIs, like estimating the parameter itself or combining information on a parameter from independent samples (see Xie and Singh, 2013; Yang et al., 2014). The latter application is similar to what the melded CI for the continuous case does, it takes the CD random variable for urn:x-wiley:15410420:media:biom12231:biom12231-math-0294 and melds it with the CD random variable for urn:x-wiley:15410420:media:biom12231:biom12231-math-0295 to create a CD‐like random variable for urn:x-wiley:15410420:media:biom12231:biom12231-math-0296 that is used to create a CI for urn:x-wiley:15410420:media:biom12231:biom12231-math-0297.

For the continuous case with exact CIs, we could have equivalently defined the CD random variable as urn:x-wiley:15410420:media:biom12231:biom12231-math-0298 with distribution urn:x-wiley:15410420:media:biom12231:biom12231-math-0299, since urn:x-wiley:15410420:media:biom12231:biom12231-math-0300 so that urn:x-wiley:15410420:media:biom12231:biom12231-math-0301. Unfortunately, applying the CD method to discrete data is not straightforward because for CIs with guaranteed coverage we generally have urn:x-wiley:15410420:media:biom12231:biom12231-math-0302 and there is not one clear distribution to define as the CD. Further, for discrete data urn:x-wiley:15410420:media:biom12231:biom12231-math-0303 cannot be a uniform distribution.

The melded CIs are one way to generalize the CD approach to handle two‐sample discrete small sample situations (for another approximate way see Hannig and Xie, 2012). We can generalize by defining the upper and lower CDs as urn:x-wiley:15410420:media:biom12231:biom12231-math-0304 and urn:x-wiley:15410420:media:biom12231:biom12231-math-0305, respectively, representing the cumulative distributions of urn:x-wiley:15410420:media:biom12231:biom12231-math-0306 and urn:x-wiley:15410420:media:biom12231:biom12231-math-0307 evaluated at t. For each urn:x-wiley:15410420:media:biom12231:biom12231-math-0308, the upper and lower CDs are each a cumulative distribution, and at the true value of urn:x-wiley:15410420:media:biom12231:biom12231-math-0309, urn:x-wiley:15410420:media:biom12231:biom12231-math-0310, where A is a uniform random variable, and urn:x-wiley:15410420:media:biom12231:biom12231-math-0311 implies urn:x-wiley:15410420:media:biom12231:biom12231-math-0312 for all t. When we create the upper melded confidence limit for urn:x-wiley:15410420:media:biom12231:biom12231-math-0313, we use the lower CD for urn:x-wiley:15410420:media:biom12231:biom12231-math-0314 and the upper CD for urn:x-wiley:15410420:media:biom12231:biom12231-math-0315 in urn:x-wiley:15410420:media:biom12231:biom12231-math-0316 to lead to more conservative coverage.

To ensure valid confidence intervals from CDs on functions of parameters in the continuous case, we require monotonicity (see Xie and Singh, 2013, p.14). In a similar way, for melded CIs we require urn:x-wiley:15410420:media:biom12231:biom12231-math-0317 to meet the monotonicity constraints (see Web Appendix A). If urn:x-wiley:15410420:media:biom12231:biom12231-math-0318 does not meet these monotonicity constraints, then the resulting interval may not guarantee coverage. For example, the ratio of two parameters is not monotonic in the denominator if the parameter in the numerator is non‐zero and the denominator crosses zero. For the ratio of normals, if one performs the melded confidence interval method despite the assumption violation on urn:x-wiley:15410420:media:biom12231:biom12231-math-0319, then an anonymous reviewer of this article has shown that the coverage can be either conservative or anti‐conservative (see also Xie and Singh, 2013, Example 6).

11 Discussion

We have proposed a simple confidence interval procedure for inferences in the two‐sample problem. Our melded CI can be interpreted as a generalization of the confidence distribution approach. We take frequentist distributional estimators of single‐sample parameters and combine them using a monotonic function to create two‐sample confidence intervals that appear to guarantee coverage.

Although we are unable to rigorously prove that the melded CI method controls error rates (see discussion after Theorem 2 in Section 4), several lines of additional argument suggest that it does. The first is the remarkable fact that it reproduces accepted tests and intervals in many settings examined (binomial, normal, Poisson, Behrens–Fisher problem). Second, extensive numerical calculations in the binomial case and further simulations in two other situations (difference in medians and difference in survival distributions) failed to find any situation where the melded CIs had less than nominal coverage.

In addition to reproducing some well‐accepted tests and confidence intervals, the melding method has yielded new intervals. For example, in the binomial case, it gives confidence intervals for the relative risk and risk difference that match the inferences from Fisher's exact test. Previously, such intervals were readily available only for the odds ratio. Thus, the new CI for the risk difference could be used as the primary analysis in the regulatory setting where risk difference is traditionally used, such as a new antibiotic is being compared to an existing one in a non‐inferiority trial. The melded CI method is so general, it can easily generate a two‐sample procedure in any setting where there is a one‐sample procedure that guarantees coverage, as illustrated in the methods presented in Sections 8 and 9. We briefly mention two more possible applications. Consider a randomized clinical trial measuring the effect of treatment compared to placebo, and suppose there is an accepted confidence procedure for that treatment effect. If one wants to determine if there is a difference in treatment effects between two subgroups (say between men and women), then the melded CI approach can answer that question. Next consider two trials that measure vaccine efficacy for two different vaccines both designed to protect against the same disease. Each trial estimates vaccine efficacy by comparing the ratio of infection rates of the vaccinated to the unvaccinated. If the trials are done on similar populations and both use the same control vaccine, a comparison of the two vaccine efficacies can be done using melded CIs. So we see that the potential for developing useful new two‐sample tests and intervals from exact one‐sample procedures makes the melded confidence interval approach an appealing addition to the applied statistician's toolkit.

12 Supplementary Materials and Software

Web Appendices referenced in Sections 2, 4, 7, 8, 9, and 10 are available with this paper at the Biometrics website on Wiley Online Library. Additionally, the R scripts used in the simulations are available at the Biometrics website on Wiley Online Library. Two R packages, exact2x2 and bpcp, are available on CRAN (http://cran.r‐project.org/) and have functions to calculate the binomial melded CIs (binomMeld.test in exact2x2), the difference in medians melded CIs (mdiffmedian.test in bpcp), and the difference in survival distribution melded CIs (bpcp2samp in bpcp).

Acknowledgements

We thank Dean Follmann the anonymous reviewers of the article for helpful suggestions that improved the article. The numerical calculations and simulations were run using the Biowulf Linux cluster at the National Institutes of Health, Bethesda, MD (http://biowulf.nih.gov).

      Number of times cited according to CrossRef: 11

      • Efficacy of FLU-v, a broad-spectrum influenza vaccine, in a randomized phase IIb human influenza challenge study, npj Vaccines, 10.1038/s41541-020-0174-9, 5, 1, (2020).
      • ACR Appropriateness Criteria Facilitate Judicious Use of CT Angiography for Stroke Workup in the Emergency Department, Journal of the American College of Radiology, 10.1016/j.jacr.2020.04.008, (2020).
      • A Phase I, Randomized, Controlled Clinical Study of CC-11050 in People Living With HIV With Suppressed Plasma Viremia on Antiretroviral Therapy (APHRODITE), Open Forum Infectious Diseases, 10.1093/ofid/ofz246, 6, 6, (2019).
      • A Randomized, Controlled Trial of Ebola Virus Disease Therapeutics, New England Journal of Medicine, 10.1056/NEJMoa1910993, (2019).
      • Value of Antibiotic Prophylaxis for Percutaneous Gastrostomy: A Double-Blind Randomized Trial, Journal of Vascular and Interventional Radiology, 10.1016/j.jvir.2017.08.018, 29, 1, (55-61.e2), (2018).
      • Statistical Methods for Standard Membrane-Feeding Assays to Measure Transmission Blocking or Reducing Activity in Malaria, Journal of the American Statistical Association, 10.1080/01621459.2017.1356313, 113, 522, (534-545), (2018).
      • Functional determinants of protein assembly into homomeric complexes, Scientific Reports, 10.1038/s41598-017-05084-8, 7, 1, (2017).
      • Simultaneous confidence intervals for comparisons of several multinomial samples, Computational Statistics & Data Analysis, 10.1016/j.csda.2016.09.004, 106, (65-76), (2017).
      • Safety and efficacy of PfSPZ Vaccine against Plasmodium falciparum via direct venous inoculation in healthy malaria-exposed adults in Mali: a randomised, double-blind phase 1 trial, The Lancet Infectious Diseases, 10.1016/S1473-3099(17)30104-4, 17, 5, (498-509), (2017).
      • Statistical considerations for a trial of Ebola virus disease therapeutics, Clinical Trials: Journal of the Society for Clinical Trials, 10.1177/1740774515620145, 13, 1, (39-48), (2016).
      • Association analysis of CYP2C9*3 and phenytoin-induced severe cutaneous adverse reactions (SCARs) in Thai epilepsy children, Journal of Human Genetics, 10.1038/jhg.2015.47, 60, 8, (413-417), (2015).

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.