## SEARCH BY CITATION

### Keywords:

• Box–Cox transformation;
• Mean squared error;
• Predicted values;
• Reference population

### Abstract

Consider the problem of making an adjusted comparison of the medians of two populations on an interval type outcome variable. A common method of doing this is through the use of a linear model requiring the residuals to be normally distributed. We describe here two methods based on a linear model after Box–Cox transformation of the outcome variable. The methods require a reference population, which could be either of the populations under study or their aggregate. We compare the new procedures with the comparison of normal means procedure and other procedures proposed for this problem by simulation. It is found that the procedure based on comparison of the predicted values obtained from the observed covariates of the reference population has higher power for testing and smaller mean square error of estimation than the other methods, while maintaining reasonable control of the type I error rate. We illustrate the methods by analyzing the duration of the second stage of labor for women in two large observation studies (Collaborative Perinatal Project and Consortium on Safe Labor) separated by 50 years. We recommend the method based on comparison of the predicted values of the transformed outcomes, with careful attention to how close the resulting residual distribution is to normal.

### 1. Introduction

We study the problem of making an adjusted comparison of two populations on the basis of the values of a specific outcome variable. Although we will discuss analogous methods for other types of variables, our main focus here is on interval scaled quantitative variables (or approximate an underlying interval scaled quantity) which may follow a skewed distribution. There are many meaningful ways in which the distributions of the outcome could be compared between the populations; we focus here on the median. We do this because we wish our results to be meaningful for a wide range of examples including skewed data, for which other measures of location (the mean) would not be as widely meaningful as the median is. We would like our comparison to be based on the distribution of the outcome conditional on a set of other variables or covariates. This is to remove any differences in the populations that may be merely differences coming from the covariates and not directly attributable to the outcome of interest. The question is, what is a good way to obtain estimates and hypothesis tests of the adjusted median difference between the populations on the outcome?

Our motivation comes from a comparison of two populations on certain labor characteristics from labor and delivery records. The first population, the Collaborative Perinatal Project (CPP) described by Hardy (2003), consists of about 55,000 deliveries from 1959 to 1966. The second population, the Consortium on Safe Labor (CSL) described by Zhang et al. (2010), consists of 228,668 deliveries from 2002 to 2008. Many changes in the characteristics of expecting mothers have occurred in the 50 years between these studies, and we wish to compare certain characteristics of labor while accounting for some of the differences in the women like body mass index (BMI), age (of the mother as well as gestational), race, and the weight of the baby. One outcome variable of particular interest is the duration of the second stage of labor (from full dilation to delivery of the fetus). Its empirical distributions within the subsets unassisted and assisted deliveries, plotted in Fig. 1, appear highly skewed, suggesting that a comparison of medians could be more meaningful than a comparison of means. In general there are reasons why the mean may be preferred over the median in specific applications, but as a measure of location with skewed data, it is well established that the median is more appropriate. Figure 1 also suggests that the second stage labor duration tends to be longer in the CSL than the CPP. It is not clear, however, whether this apparent change is entirely due to temporal changes in maternal and infant characteristics. Our research question is whether the longer median duration in the CSL would persist should the covariate pattern of the CPP have been found in a contemporary study like the CSL.

Adjusted comparisons for interval data are typically obtained using linear regression models (Draper and Smith, 1981), focus on the mean (Tsiatis et al., 2008), or take a nonparametric approach with a correspondingly more complicated formulation and interpretation (Akritas and Van Keilegorm, 2001; Tsangari and Akritas, 2004). Moreover, as shown by Robins and Rotnitzky (1995), the curse of dimensionality makes the fitting of nonparametric regression models difficult with any large set of covariates. One simple method that uses a linear model approach is to rely on an assumption of normal errors. With skewed data it is of questionable appropriateness to compare adjusted medians using the standard linear regression approach. Our focus on the median sets apart our methods from those of previous adjustment procedures of Peters (1941), Belson (1956), and more recent extensions like that by Graubard et al. (2005). Median regression is a common approach to inference about medians. However, it should be noted that the median estimates from a median regression model are conditional on the covariates. If one is interested in estimating the marginal median duration of the second stage of labor (without conditioning on covariates) it is neither equal to nor determined by the conditional medians.

The paper proceeds as follows. Our assumed model and notation are given in Section 'Notation and models'. We propose using a Box–Cox transformation (Box and Cox, 1964) so that estimates and tests based on the transformed outcome variable are appropriate. We chose the Box–Cox transformations because of their relative simplicity combined with the flexibility to address realistic situations of high skewness. Several existing approaches and methods based on our proposed approach to testing and estimation are described in Section 'Methods'. Section 'Computing confidence intervals for the adjusted median difference' describes how one obtains confidence intervals for the estimates. Results of a simulation experiment comparing the properties of the testing and estimation procedures are given in Section 'Simulations'. The methods are illustrated in Section 'CPP compared to the CSL' on the comparison of second stage labor duration between the CPP and CSL. Finally, our recommendations are given in Section 'Discussion'.

### 2. Notation and models

We are concerned with comparing a variable Y, of interval type, on a sample of subjects in each of two populations ( and ). Suppose also that we want our comparison to account for the influence of a set of p covariates, . A natural model for this situation is

• (1)

where i indexes the independent subjects observed ( for and for ) and where is a vector of parameters. The are independent with distribution denoted H. The model makes no assumption about H other than it has median zero. We are interested in estimating the adjusted median difference between the two populations, μ, and to test the null hypothesis, .

Although model (1) is our primary concern, we also consider a slightly more general setting where model (1) holds, but only after the have been transformed by a Box–Cox transformation (Box and Cox, 1964), ,

• (2)

where

and λ is the transformation parameter. The are independent with distribution H. Model (2) allows us to consider a wider class of data which includes mis-specification of model (1). Model (2) will be used to generate data for the simulations in Section 'Simulations'.

### 3. Methods

#### 3.1. Comparison of normal means

The simplest approach is to assume that the from model (1) are normally distributed. With this assumption it is natural to base comparison of the populations on the least squares estimates, , , and obtained from model (1). To compare the populations, test H0 by the usual regression t-test and estimate the adjusted median difference by .

It is expected that this method will work well if the distribution of errors are truly normal. Of course for most applications this is not exactly correct. If the errors are nonnormal, then the regression t-test is not valid. Moreover, although the least squares estimates are still unbiased, if the distribution of errors are skewed the adjusted median will not be the same as the adjusted mean. Thus, the performance of this method is suspect for highly skewed distributions.

#### 3.2. Comparison of residuals

Ceyhan and Goad (2009) propose the following method. Fit the model

• (3)

to obtain residuals

Then, use the Wilcoxon rank sum test (Wilcoxon, 1945) to compare the group residuals, and . The adjusted median difference is then estimated by the difference in sample medians of those same residuals.

The problem with this method seems to be that in (3) is not the same as the true in (1). Thus, this approach is not expected to be very accurate if the true is large.

#### 3.3. Median regression

McGreevy et al. (2009) propose the following method. First, each covariate is averaged over the combined populations, yielding . Although McGreevy et al. only consider interval covariates, here we consider that any ordinal or categorical covariate is modeled using dummy variables and it is the dummy variables that get averaged. Then a median regression is fit to each group, resulting in and , where and are estimated medians of group 0 and 1, respectively, and , and , are the least absolute deviation parameter estimates (we implemented this with PROC QUANTREG of software (2010)). The adjusted median difference is estimated as and a standard t-test can be used to test H0.

This method can be criticized for two reasons. First, the average value of the covariates is used instead of actual covariate patterns. This does not seem appropriate and, in cases where the covariates are ordinal or even categorical, would not even yield realistic levels at which the comparison is made. Second, the median regressions are fit separately, allowing each population to have different covariate effects. With small or moderate sample sizes these effects could be estimated to be quite different by chance, increasing the variance of the estimated adjusted median difference.

#### 3.4. Comparison of predicted values

We assume now that one of the groups can be considered a reference population. Actually, it could also be that the combined groups are considered the reference. An example of such a case would be where the groups themselves partition a population of interest. Notice that in model (2), it is necessary to have a reference covariate pattern (or population of patterns) in order to define what the adjusted median difference is. Thus, it is natural that one must specify a reference population in order to get a reference set of covariate patterns. In the following, we will assume the reference population has been specified as one of the groups. The empirical distribution (from the designated group) is used to estimate the distribution of covariate patterns of the reference population. The same procedures can be followed if the combined groups are specified as reference by using the empirical distribution of the combined covariate patterns. For what follows, suppose is taken as the reference group.

We suppose there is a Box–Cox transformation, , such that (2) holds

where are assumed Gaussian. We use maximum likelihood to estimate λ, and then fit the model to obtain , , , and . Then the standard t-test for H0 is used to test that the adjusted median difference is zero. To estimate this difference we apply the reference covariate pattern to obtain predicted values

The adjusted median difference is then estimated as the difference in sample medians of the above predicted values.

#### 3.5. Comparison of mixture distributions

In this section, we will derive another estimator based on the assumed model of the previous section: model (2) with assumed Gaussian. Instead of only considering the predicted values of each covariate pattern, we now consider the mixture of distributions formed by the fitted model over the covariate patterns of the reference population (as before assume is reference and the fitted parameters are , , , and ):

where is the estimated residual variance and we have used the notation of to denote the distribution of a normal variable with mean a and variance b after transformation by T. For each mixture distribution we then calculate the median. This does not require explicit representation of the mixture distribution, but can be obtained by solving the equations:

which can be solved by interval bisection after finding values that bracket the target value of 0.5.

### 4. Computing confidence intervals for the adjusted median difference

Asymptotic confidence intervals for the adjusted median difference from the methods of Sections 'Comparison of normal means', 'Comparison of predicted values', and 'Comparison of mixture distributions' can be obtained from the estimated regression coefficient for group and its standard deviation (stderr). If the transformed variables are assumed normally distributed, the estimated regression coefficients are asymptotically normal and 95% confidence intervals for the coefficients can be obtained using a normal approximation by adding and subtracting to the estimated coefficient. The endpoints of this interval can then be used to induce a confidence interval for the adjusted median difference via the inverse transformation and the reference population (in the case of the comparison of normal means of Section 'Comparison of normal means' there is nothing further to do). We demonstrate how this works with the comparison of predicted values method of Section 'Comparison of predicted values' (analogous changes are needed for the comparison of mixture distributions method of Section 'Comparison of mixture distributions'). The upper confidence limit for the comparison of predicted values method of Section 'Comparison of predicted values' is obtained as the difference in sample medians of the modified predicted values

The lower confidence limit for the comparison of predicted values method is obtained as the difference in sample medians of given above and the modified predicted values

### 5. Simulations

For simulating data, we adopted model (2) with group 0 as the reference group. The primary simulation experiment considered a single covariate, X, with distributions F0 (group 0) and F1 (group 1). We considered several distributions for H within the families Normal, Gamma, and Uniform (each centered at 0 by a shift if necessary). For F0 and F1, we considered several distributions from the families Normal and Binomial. The Box–Cox transformation was indexed by the transformation parameter λ. All simulations reported used and 100 subjects per group.

Results of the type I errors (target 5%) for the testing procedures described in Section 'Methods' are given in Table 1. Although each of the distributions in Table 1 technically follow model (2), the procedures from Sections 'Comparison of predicted values' and 'Comparison of mixture distributions' that use Box–Cox transformation do so within the range . Thus, the entry in the table with represents a case where the assumed model is substantially mis-specified for all procedures. From Table 1, it is seen that the comparison of normal means procedure of Section 'Comparison of normal means' along with the methods based on the Box–Cox transformation, Sections 'Comparison of predicted values' and 'Comparison of mixture distributions', have type I errors fairly close to the target of 5%, although they can get conservative in some cases. The comparison of residuals method of Section 'Comparison of residuals' is very sensitive to skewed error distributions resulting in inflated type I error levels. The median regression method of Section 'Median regression' is also sensitive to skewed error distributions resulting in overly conservative levels of type I error. Simulation standard error based on a true level of 5% with 10,000 replications is 0.22%.

Table 1. Type I errors (%)a) of the testing procedures with one covariate
Testing procedure section
λF0F1H3.13.23.33.43.5
1. a) Based on 10,000 replications.

1N(0,1)N(0,1)N(0,1)4.84.94.94.94.9
1N(0,1)N(1,1)N(0,1)4.92.95.14.94.9
1N(0,1)N(1,1)Γ(1,1)4.84.04.64.64.6
1N(0,1)N(1,1)Γ(.1,1)4.684.80.14.54.5
1N(0,1)N(1,1)U(0,1)4.82.75.54.84.8
1B(1,.5)B(1,.5)N(0,1)4.64.74.24.74.7
1B(1,.1)B(1,.5)N(0,1)5.02.84.45.05.0
1B(1,.1)B(1,.5)Γ(1,1)4.64.64.15.85.8
1B(1,.1)B(1,.5)Γ(.1,1)4.771.20.16.36.3
1B(1,.1)B(1,.5)U(0,1)5.23.05.15.55.5
0N(0,1)N(1,1)N(0,1)4.93.25.25.05.0
0N(0,1)N(1,1)Γ(1,1)4.73.84.64.64.6
0N(0,1)N(1,1)Γ(.1,1)4.09.53.64.34.3
0B(1,.1)B(1,.5)Γ(1,1)5.54.35.85.85.8
2N(0,1)N(1,1)N(0,1)2.316.34.64.34.3
2N(0,1)N(1,1)Γ(1,1)3.137.24.34.74.7
2N(0,1)N(1,1)Γ(.1,1)1.86.93.35.05.0
2B(1,.1)B(1,.5)Γ(1,1)4.313.83.65.85.8
−2N(0,1)N(1,1)N(0,1)4.010.05.04.04.0
−2N(0,1)N(1,1)Γ(1,1)3.93.34.93.93.9
−2N(0,1)N(1,1)Γ(.1,1)2.211.23.32.22.2
−2B(1,.1)B(1,.5)Γ(1,1)6.37.97.86.46.4
8N(0,1)N(1,1)N(0,1)2.984.43.94.84.8

Results of the power for the testing procedures described in Section 'Methods' are given in Table 2. For each case μ was chosen to yield approximately 80% power for the comparison of normal means method of Section 'Comparison of normal means'. The comparison of predicted values (Section 'Comparison of predicted values') and comparison of mixture distributions methods (Section 'Comparison of mixture distributions') are indistinguishable (in terms of power) and have the most consistently high power of the methods considered. Although the comparison of residuals (Section 'Comparison of residuals') and median regression (Section 'Median regression') procedures have higher power in certain cases, this appears to be largely a consequence of their inflated type I error rates.

Table 2. Power (%)a) of the testing procedures with one covariate
Testing procedure section
λμF0F1H3.13.23.33.43.5
1. a) Based on 10,000 replications.

10.38N(0,1)N(0,1)N(0,1)76.874.856.376.876.8
10.42N(0,1)N(1,1)N(0,1)75.665.255.675.475.4
10.48N(0,1)N(1,1)Γ(1,1)85.197.684.591.491.4
10.14N(0,1)N(1,1)Γ(.1,1)79.299.9100.082.582.5
10.13N(0,1)N(1,1)U(0,1)80.267.838.880.480.4
10.42B(1,.5)B(1,.5)N(0,1)84.181.863.583.883.8
10.45B(1,.1)B(1,.5)N(0,1)80.871.360.481.081.0
10.42B(1,.1)B(1,.5)Γ(1,1)75.692.973.797.897.8
10.14B(1,.1)B(1,.5)Γ(.1,1)81.097.9100.0100.0100.0
10.13B(1,.1)B(1,.5)U(0,1)81.269.739.183.083.0
00.46N(0,1)N(1,1)N(0,1)77.787.062.181.681.6
00.42N(0,1)N(1,1)Γ(1,1)85.792.277.285.085.0
00.15N(0,1)N(1,1)Γ(.1,1)79.1100.0100.086.686.6
00.31B(1,.1)B(1,.5)Γ(1,1)77.467.253.088.988.9
20.44N(0,1)N(1,1)N(0,1)76.076.257.679.079.0
20.59N(0,1)N(1,1)Γ(1,1)82.699.992.398.198.1
20.19N(0,1)N(1,1)Γ(.1,1)78.3100.099.295.795.7
20.62B(1,.1)B(1,.5)Γ(1,1)76.799.595.799.999.9
−20.94N(0,1)N(1,1)N(0,1)78.3100.098.994.194.1
−20.65N(0,1)N(1,1)Γ(1,1)83.5100.098.596.896.8
−20.44N(0,1)N(1,1)Γ(.1,1)75.8100.0100.098.198.1
−20.21B(1,.1)B(1,.5)Γ(1,1)79.665.432.168.168.1
80.86N(0,1)N(1,1)N(0,1)77.5100.085.999.999.9

Results of the mean squared error (expressed as % of that for the comparison of normal means procedure of Section 'Comparison of normal means') for the estimation procedures described in Section 'Methods' are given in Table 3. The comparison of residuals and median regression methods, Sections 'Comparison of residuals' and 'Median regression', often have quite high error overall, although in certain cases they can be quite low. Based on this result, we do not consider the comparison of residuals or median regression methods further as we do not consider them well suited for general use. The biggest differences between the comparison of predicted values (Section 'Comparison of predicted values') or comparison of mixture distributions (Section 'Comparison of mixture distributions') methods and the comparison of normal means method (Section 'Comparison of normal means') are when the error distribution is skewed. These cases result in a substantial advantage in accuracy for the methods based on the Box–Cox transformation (Sections 'Comparison of predicted values' and 'Comparison of mixture distributions'). This is in contrast to the slight advantage of the comparison of normal means method ('Comparison of normal means') in some other cases. The methods based on the Box–Cox transformation (Sections 'Comparison of predicted values' and 'Comparison of mixture distributions') have similar error in many cases, but there are a number of cases where the comparison of predicted values method (Sections 'Comparison of predicted values') has lower error in estimation.

Table 3. Mean squared errora) of the estimation procedures with one covariate expressed as percent (%) of that for the Normal procedure
Estimation procedure section
λμF0F1H3.13.23.33.43.5
1. a) Based on 10,000 replications.

10.38N(0,1)N(0,1)N(0,1)100156.9157.1103.0103.1
10.42N(0,1)N(1,1)N(0,1)100138.3156.499.299.3
10.48N(0,1)N(1,1)Γ(1,1)100111.199.477.977.6
10.14N(0,1)N(1,1)Γ(.1,1)10050.00.187.287.9
10.13N(0,1)N(1,1)U(0,1)100231.4288.799.1100.4
10.42B(1,.5)B(1,.5)N(0,1)100154.1155.4103.9103.1
10.45B(1,.1)B(1,.5)N(0,1)100142.0155.0100.8102.1
10.42B(1,.1)B(1,.5)Γ(1,1)100106.298.663.181.6
10.14B(1,.1)B(1,.5)Γ(.1,1)100247.80.115.625.6
10.13B(1,.1)B(1,.5)U(0,1)100233.5288.597.7100.3
00.46N(0,1)N(1,1)N(0,1)10075.0107.886.187.0
00.42N(0,1)N(1,1)Γ(1,1)10070.2110.998.298.9
00.15N(0,1)N(1,1)Γ(.1,1)10036.320.672.880.9
00.31B(1,.1)B(1,.5)Γ(1,1)100113.3165.5123.2135.5
20.44N(0,1)N(1,1)N(0,1)100107.1135.184.984.8
20.59N(0,1)N(1,1)Γ(1,1)10065.172.470.469.6
20.19N(0,1)N(1,1)Γ(.1,1)10041.835.558.658.4
20.62B(1,.1)B(1,.5)Γ(1,1)10063.239.820.624.9
−20.94N(0,1)N(1,1)N(0,1)10042.76.254.875.8
−20.65N(0,1)N(1,1)Γ(1,1)10074.811.659.479.1
−20.44N(0,1)N(1,1)Γ(.1,1)100117.710.357.197.9
−20.21B(1,.1)B(1,.5)Γ(1,1)10029.538.856.158.0
80.86N(0,1)N(1,1)N(0,1)10066.39.24.34.3

A second simulation experiment also used model (2), but with two covariates . Each covariate was generated independent with distribution F0 (group 0) and F1 (group 1), and once again group 0 was the reference group with and 100 subjects per group. Results of the type I errors (target 5%) for the remaining testing procedures described in Section 'Methods' are given in Table 4. The comparison of normal means procedure (Section 'Comparison of normal means') controls the type I error in all cases, although control can be conservative in some cases. The methods based on the Box–Cox procedure (Sections 'Comparison of predicted values' and 'Comparison of mixture distributions') also control the type I error in most cases with one notable exception.

Table 4. Type I errors (%)a) of the testing procedures with two covariates
Testing procedure section
λF0F1H3.13.43.5
1. a) Based on 10,000 replications.

1N(0,1)N(0,1)N(0,1)5.15.35.3
1N(0,1)N(1,1)N(0,1)5.65.65.6
1N(0,1)N(1,1)Γ(1,1)5.14.74.7
1N(0,1)N(1,1)Γ(.1,1)4.95.15.1
1N(0,1)N(1,1)U(0,1)5.25.25.2
1B(1,.5)B(1,.5)N(0,1)4.84.94.9
1B(1,.1)B(1,.5)N(0,1)5.25.25.2
1B(1,.1)B(1,.5)Γ(1,1)4.96.36.3
1B(1,.1)B(1,.5)Γ(.1,1)5.014.714.7
1B(1,.1)B(1,.5)U(0,1)4.85.25.2
0N(0,1)N(1,1)N(0,1)3.65.35.3
2N(0,1)N(1,1)N(0,1)4.65.55.5
−2N(0,1)N(1,1)N(0,1)2.82.82.8
8N(0,1)N(1,1)N(0,1)2.95.25.2

Results of the mean squared error (expressed as percentage of that for the comparison of normal means procedure of Section 'Comparison of normal means') for the estimation procedures are given in Table 5. The Box–Cox procedures (Sections 'Comparison of predicted values' and 'Comparison of mixture distributions') have smaller mean squared error for most of the cases, sometimes substantially smaller. Again the comparison of predicted values method (Sections 'Comparison of predicted values') has the smallest mean squared error in many cases.

Table 5. Mean squared errora) of the estimation procedures with two covariates expressed as percent (%) of that for the Normal procedure
Testing procedure section
λF0F1H3.13.43.5
1. a) Based on 10,000 replications.

1N(0,1)N(0,1)N(0,1)100102.9103.1
1N(0,1)N(1,1)N(0,1)10097.497.7
1N(0,1)N(1,1)Γ(1,1)10083.383.3
1N(0,1)N(1,1)Γ(.1,1)10094.795.4
1N(0,1)N(1,1)U(0,1)10099.2100.7
1B(1,.5)B(1,.5)N(0,1)100103.4103.5
1B(1,.1)B(1,.5)N(0,1)10098.8100.5
1B(1,.1)B(1,.5)Γ(1,1)10053.162.9
1B(1,.1)B(1,.5)Γ(.1,1)10031.740.2
1B(1,.1)B(1,.5)U(0,1)10097.399.8
0N(0,1)N(1,1)N(0,1)10087.388.0
2N(0,1)N(1,1)N(0,1)10065.265.3
−2N(0,1)N(1,1)N(0,1)10043.5130.2
8N(0,1)N(1,1)N(0,1)1000.40.4

Consideration of all of the simulation results leads to the following conclusions in comparison of the three remaining procedures. The comparison of normal means (Section 'Comparison of normal means'), comparison of predicted values (Section 'Comparison of predicted values'), and comparison of mixture distributions (Section 'Comparison of mixture distributions') procedures all exhibit reasonable control of type I error over a wide class of distributions. Choosing between these procedures should therefore come down to mean squared error. Based on mean squared error, the comparison of predicted values (Section 'Comparison of predicted values') and comparison of mixture distributions (Section 'Comparison of mixture distributions') procedures are much preferred over the normal means procedure (Section 'Comparison of normal means') with substantial gain possible while risking little loss. The comparison of predicted values procedure (Section 'Comparison of predicted values') is similarly favored over the comparison of mixture distributions procedure (Section 'Comparison of mixture distributions'). In terms of statistical performance, the simulations therefore point to the comparison of predicted values (Section 'Comparison of predicted values') as the most efficient across a broad class of distributions.

### 6. CPP compared to the CSL

As an application of the methods for comparing adjusted medians, we analyzed data from the CPP (1959–1966) and the CSL (2002–2008). The characteristics of expecting mothers have changed substantially in the 50 years between these studies. We analyze here the duration of the second stage of labor, adjusting for the maternal age, maternal race, gestational age, BMI at delivery, birth weight, and spontaneous rupture of membranes. Longer durations of second stage labor may lead to higher cesarean section rates, as emergent concerns over the baby prompt obstetricians to intervene. Thus, longer durations of second stage labors may be one explanation for higher contemporary cesarean section rates. We restrict to only those with spontaneous labor, singleton gestations, and without cesarean sections. Because of the important effect of parity and receiving assistance during delivery, we restricted our analysis to nulliparous women and analyze separately based on assisted ( for CPP and for the CSL) or unassisted delivery ( for CPP and for the CSL). Because BMI at delivery is missing in a substantial proportion of cases (CPP: 7% without assistance and 9% with assistance; CSL: 21% without assistance and 14% with assistance), we used (imputed) the median value (26.3 for CPP and 29.9 for CSL) in cases where this was unknown. After removing those with missing birth weight (less than 2% in all groups) there remained ( for CPP and for the CSL) with assistance and ( for CPP and for the CSL) without assistance for analysis. All methods yielded in each analysis as the sample sizes were quite large.

We used the CPP as the reference population for comparison of second stage duration without assistance in delivery. Thus, the adjusted median differences are the differences expected between the populations if the characteristics (maternal age, gestational age, maternal BMI, spontaneous rupture of membranes, birth weight, and maternal race) would be found in both populations in the proportions observed in the CPP. Residual differences presumably reflect changes in obstetrical practice over the past 50 years, although other factors not adjusted for could also account for some of the difference. The comparison of normal means (Section 'Comparison of normal means') (difference CSL–CPP: 0.28 h with 95% confidence interval 0.24,0.31), comparison of predicted values (Section 'Comparison of predicted values') (difference CSL–CPP: 0.25 h with 95% confidence interval 0.22,0.27), and comparison of mixture distributions (Section 'Comparison of mixture distributions') (difference CSL–CPP: 0.24 h with 95% confidence interval 0.22,0.27) procedures were used to estimate differences in median second stage duration for nulliparous women without assistance in delivery. Of note is that the confidence interval from the comparison of normal means procedure (Section 'Comparison of normal means') is wider than those of the Box–Cox transformation procedures (Sections 'Comparison of predicted values' and 'Comparison of mixture distributions'). Without knowing the true adjusted median difference, we cannot say definitively that the narrower interval is better. However, Kurtosis of the second stage duration in the CPP is 150 and in the CSL is 20. This rather large Kurtosis presumably leads to an overly wide confidence interval in the normal method (Section 'Comparison of normal means'). The residuals after Box–Cox transformation have Skewness of 0.1 and Kurtosis of 0.3.

We again used the CPP as the reference population for comparison of second stage duration with assistance in delivery. The comparison of normal means (Section 'Comparison of normal means') (difference CSL–CPP: 0.53 h with 95% confidence interval 0.47,0.59), comparison of predicted values (Section 'Comparison of predicted values') (difference CSL–CPP: 0.45 h with 95% confidence interval 0.40,0.50), and comparison of mixture distributions (Section 'Comparison of mixture distributions') (difference CSL–CPP: 0.44 h with 95% confidence interval 0.40,0.49) procedures were used to estimate differences in median second stage duration for nulliparous women with assistance in delivery. The Box–Cox transformation methods (Section 'Comparison of predicted values' and 'Comparison of mixture distributions') give estimates somewhat smaller than the comparison of normal means method (Section 'Comparison of normal means') and confidence intervals a little narrower. After Box–Cox transformation, the residuals have Skewness 0.1 and Kurtosis 0.5. Presumably the babies who receive assistance during delivery nowadays are quite different (more selective) than those receiving assistance 50 years ago (71% of the eligible cohort for the CPP versus 12% of the eligible cohort for the CSL). It is quite possible that other factors might exist to help explain this difference in second stage duration. SAS programs to perform the procedures are available upon request from the corresponding author.

### 7. Discussion

We have described two methods of using a Box–Cox transformation to make inference on the adjusted median difference between populations. One method compares the predicted values from the model of transformed variables obtained using a set of reference covariates. The reference covariates can easily be obtained empirically from either of the populations or from the combined population. The other method uses the estimated median from the fitted mixture of distributions, also obtained using the reference covariates. The methods are compared to other methods from the literature and to the usual (untransformed) linear model that assumes a normal distribution. The transformation methods fare quite well compared to the comparison of normal means method in terms of power and mean squared error of estimation, while exhibiting reasonable control of type I error except in extreme cases outside a rather large class of models.

We recommend the comparison of predicted values method for this problem. This recommendation is based on its relative robustness in terms of type I error and power across a broad range of distribution types. Since the Box–Cox transformation is not guaranteed to yield normally distributed residuals, the usual regression diagnostics should still be checked. The search region for λ is of course able to be expanded if needed. Our experience is that the method works well whenever the Skewness and Kurtosis of the residuals of the transformed model are less than about 1.

The new estimation methods described here are easily adapted to quantiles different from the median. For example, the only modification required for estimation of a general quantile using the predicted values method would be to use the appropriate sample quantile estimate for each sample of predicted values and then take the difference of these quantile estimates. The comparison of mixture distributions method can easily be extended to general quantiles by solving the equations (given in Section 'Comparison of mixture distributions') for values other than 0.5.

Methods can be adapted easily to other data types as well. For example, with dichotomous data, a logistic or log-binomial model could be used (Skov et al., 1998). In this case, the exponential of the estimated coefficient of the group term is directly interpreted as the adjusted odds ratio (logistic) or prevalence proportion ratio (log-binomial). Interval censored data can be handled using an accelerated failure time model, for example, using PROC LIFEREG of SAS software (2010). In this case, the comparison of mixture distributions method can be used by assuming the failure times have a parametric form and using the mixture of fitted distributions obtained over the reference set of covariates.

### Acknowledgments

This research was supported in part by the Intramural Research Program of the NICHD, National Institutes of Health.

This study utilized the high-performance computational capabilities of the Biowulf Linux cluster at the National Institutes of Health, Bethesda, MD (http://biowulf.nih.gov).

### Conflict of interest

The authors have declared no conflict of interest.

### References

• and (2001). ANCOVA methods for heteroscedastic nonparametric regression models. Journal of the American Statistical Association 96, 220232.
• (1956). A technique for studying the effects of a television broadcast. Applied Statistics 5, 195202.
• and (1964). An analysis of transformations. Journal of the Royal Statistical Society, Series B 26, 211252.
• and (2009). A comparison of analysis of covariate-adjusted residuals and analysis of covariance. Communications in Statistics - Simulation and Computation 38, 20192038.
• and (1981). Applied Regression Analysis (2nd edn.). John Wiley & Sons, New York.
• , and (2005). Using the Peters-Belson method to measure health care disparities from complex survey data. Statistics in Medicine 24, 26592668.
• (2003). The collaborative perinatal project: lessons and legacy. Annals of Epidemiology 13, 303311.
• , , , and (2009). Using median regression to obtain adjusted estimates of central tendency for skewed laboratory and epidemiologic data. Clinical Chemistry 55, 165169.
• (1941). A method of matching groups for experiments with no loss of populations. Journal of Educational Research 34, 606612.
• and (1995). Semiparametric efficiency in multivariate regression models with missing data. Journal of the American Statistical Association 90, 122129.
• SAS software., (2010). Data Analysis for This Paper Was Generated by SAS/STAT Software, Version 9.22 of the SAS System for Unix. Copyright 2010 SAS Institute Inc., Cary, NC.
• , , and (1998). Prevalence proportion ratios; estimation and hypothesis testing. International Journal of Epidemiology 27, 9195.
• and (2004). Nonparametric ANCOVA with two and three covariates. Journal of Multivariate Analysis 88, 289319.
• , , and (2008). Covariate adjustment for two-sample treatment comparisons in randomized clinical trials: a principled yet flexible approach. Statistics in Medicine 27, 46584677.
• (1945). Comparisons by ranking methods. Biometrics Bulletin 1, 8083.
• , , , , , , , , , , , , , , and (2010). Contemporary cesarean delivery practice in the United States. American Journal of Obstetrics and Gynecology 203, 326e1326e10.

### Supporting Information

Disclaimer: Supplementary materials have been peer-reviewed but not copyedited.

FilenameFormatSizeDescription
bimj1347-sup-0001-Figures1.pdf9KSupporting Information

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.