Equivalence of regression curves sharing common parameters

In clinical trials the comparison of two different populations is a frequently addressed problem. Non-linear (parametric) regression models are commonly used to describe the relationship between covariates as the dose and a response variable in the two groups. In some situations it is reasonable to assume some model parameters to be the same, for instance the placebo effect or the maximum treatment effect. In this paper we develop a (parametric) bootstrap test to establish the similarity of two regression curves sharing some common parameters. We show by theoretical arguments and by means of a simulation study that the new test controls its level and achieves a reasonable power. Moreover, it is demonstrated that under the assumption of common parameters a considerable more powerful test can be constructed compared to the test which does not use this assumption. Finally, we illustrate potential applications of the new methodology by a clinical trial example.


Introduction
Regression models are commonly used to describe the relationship between multiple covariates and a response variable.In certain applications, more than one regression model is available, such as when assessing the relationship between the covariates and the response variables in more than one population (e.g. in males and females).It is then often of interest to demonstrate the equivalence of the regression curves: If equivalence can be claimed, conclusions can be drawn from the pooled sample and a single regression model is sufficient to describe the data.This can be achieved by testing a suitable null hypothesis that the distance between the regression curves (measured in an appropriate sense) is smaller than a pre-specified equivalence margin at a controlled Type I error rate.Note that the problem of equivalence testing, as considered in this paper, is conceptually different from the more frequent problem of testing for equality of curves and is much less studied in the literature due to methodological difficulties.The problem of testing for equality of regression models has been intensively discussed in the nonparametric context and we refer to the recent work of Feng et al. (2015), whichcontains a rather comprehensive list of references.In applied regression analysis, however, parametric models are usually preferred to a purely nonparametric approach as they admit a direct interpretation of the observed effects in terms of the model parameters.In addition, the available information of the observations is increased by applying more efficient estimation or test procedures, provided that the assumed model is valid.Despite its importance, the problem of establishing equivalence of two parametric regression models while controlling the Type I error rate has only recently found attention in the literature.Using the intersection-union test device from Berger (1982), Liu et al. (2009) investigated the assessment of non-superiority, non-inferiority and equivalence when comparing two regression models over a restricted covariate region.Building upon this work, Gsteiger et al. (2011) derived equivalence tests based on simultaneous confidence bands for nonlinear regression models, with application to population pharmacokinetic analyses.Likewise, Bretz et al. (2016) assessed the similarity of dose response curves in two non-overlapping subgroups of patients.Alternatively, Dette et al. (2018) suggested directly estimating the distance between the regression curves and using a non-standard bootstrap test to decide for equivalence of the two curves if the estimate is less than a certain threshold.Expanding this approach, Moellenhoff et al. (2018) assessed the comparability of drug dissolution profiles via maximum deviation, whereas Hoffelder (2018) demonstrated the equivalence of dissolution profiles using the Mahalanobis distance; see also Collignon et al. (2018).In these papers, the authors assumed that the regression models have different parameters and can therefore be evaluated separately.In some applications, however, this assumption cannot be justified and it is more reasonable to assume that the regression models may have some common parameters.The total number of parameters to estimate is then reduced to the common and remaining parameters of each model, affecting the asymptotic behavior of the estimators.Consider, for example, the Phase II dose finding trial for a weight loss drug described in Bretz et al. (2016).This trial aimed at comparing the dose response relationship for two regimens administered to patients suffering from overweight or obesity: Three doses each for once daily (o.d.) and twice daily (b.i.d.) use of the medication, and placebo.It is reasonable to assume that the placebo response is the same under both the o.d. and the b.i.d.regimen.Since the regression models typically used for dose response modeling contain a parameter for the placebo response (Pinheiro et al. (2006)), they will thus share this common parameter for both the o.d. and the b.i.d.regimen.In some instances, it might even be reasonable to assume that the maximum efficacy for high doses is similar in both groups.Moreover, clinical trial sponsors may even decide to use the same placebo group for logistical reasons.The response of each patient on placebo is then used twice in the estimation of the o.d. and b.i.d.dose response models, further complicating the statistical problem.In this paper, we investigate the equivalence of two parametric regression curves that share common parameters.In Section 2 we first introduce the regression models to be estimated under the assumption of common parameters.We then develop a non-standard bootstrap test which performs the resampling under the constraints of the interval hypotheses implied by the equivalence test problem.The new tests improves the procedure proposed in Dette et al. (2018) using the additional information of common parameter in both groups.We also discuss testing the equivalence of model parameters to assess whether the assumption of common parameters is plausible.In Section 3 we investigate the finite sample properties of the proposed bootstrap test proposed in terms of power and size.In Section 4 we illustrate the methods using a multi-regional clinical trial example where it is conceivable that the placebo and maximum treatment responses are the same across geographic regions but the onset of treatment differs due to intrinsic and extrinsic factors (Malinowski et al. (2008); ICH (2017)).Technical details and proofs are deferred to an appendix.

Models with common parameters
Let denote the observed response of the jth subject at the ith dose level d ,i under the th dose response model m , where = 1, 2 denotes the index of the two groups under consideration.We assume that the (non-linear) regression model m is parametrized through a p -dimensional vector β , = 1, 2. Note that the regression models m 1 and m 2 may be different.Likewise, the parameters β 1 and β 2 may be different even if m 1 = m 2 .We further assume that the error terms η ,i,j are independent and identically distributed with expectation 0 and variance σ 2 .The dose levels d ,i may be different in both groups but they are attained on the same (restricted) covariate region D. In this paper D is assumed to be the dose range, although the results can be generalized to include other covariates.Further, n = k i=1 n ,i denotes the sample size in group where we assume n ,i observations in the ith dose level (i = 1, . . ., k , = 1, 2).The sample sizes n can be unequal and the total number of observations is denoted by n = n 1 + n 2 .
In this paper we consider the situation, where the regression models have some common parameters.More precisely, we assume without loss of generality that these parameters are given by the first p model parameters of the parameter β in model (2.1) that is where β 0 ∈ R p denotes the vector of common parameters in both regression models and β1 and β2 denote the remaining parameters in the models m 1 and m 2 , respectively, which do not necessarily coincide.The case where the models m 1 and m 2 do not share any common parameters is included and corresponds to β = β for = 1, 2 (that is p = 0).As a consequence the p 1 +p 2 −p -dimensional vector of all parameters of the regression functions in model (2.1) under the assumption (2.2) is given by β = (β 0 , β1 , β2 ).Throughout this paper we assume that β ∈ B where B ⊂ R p 1 +p 2 −p is a compact set.These parameters are now estimated by least squares using the combined sample {Y ,i,j (2.3)

Testing equivalence of regression curves
Following Liu et al. (2009) and Gsteiger et al. (2011) we consider the regression curves m 1 and m 2 to be equivalent if the maximum distance between the two curves is smaller than a given pre-specified constant, say ε > 0, that is, In clinical trial practice ε is often referred to as a relevance threshold in the sense that if d ∞ (β 1 , β 2 ) < ε the difference between the two curves is believed not to be clinically relevant.
In order to establish equivalence of the two curves m 1 and m 2 at a controlled type I error, we will develop a test for the hypotheses (2.4) In the following we extend the bootstrap approach from Dette et al. (2018) to test the hypotheses (2.4) in the situation of common parameters.Note that the test procedure proposed below could also be applied to alternative measures of equivalence, such as the integrated deviation D |m 1 (t, β 1 ) − m 2 (t, β 2 )|dt.
Algorithm 2.1.(parametric bootstrap for testing equivalence under the assumption of common parameters) (1) Calculate the ordinary least-square (OLS) parameter estimate (2.3) assuming a common parameter β 0 .The corresponding variance estimates are given by σ2 for the maximal deviation between the two regression curves.
The next two steps describe the (parametric) bootstrap procedure.
(4) Calculate the OLS estimate β * as in Step (1) and the test statistic where β * = (β * 0 , β * ), = 1, 2. The α−quantile of the distribution of the distribution of the statistic d * ∞ is denoted by q * α and the null hypotheses in (2.4) is rejected, whenever d∞ < q * α . (2.9) In practice the q * α can be calculated repeating steps (3) and (4), say B times, in order to obtain replicates d * ∞,1 , . . ., d * ∞,B of d * ∞ .An estimate of q * α then is defined by q(B) ∞ denotes the corresponding order statistic, and this estimate is used in (2.9) The following theorem states that this algorithm yields a valid test procedure.The proof is left to the Appendix 6.
Theorem 2.1.The test defined by (2.9) is a consistent, asymptotic α-level test.That is whenever d ∞ < ε, and lim sup Remark 2.2.The results presented in this section remain correct in trials with a common placebo group, where n 0 observations are taken at dose level d 1 = 0 (corresponding to placebo), which are modelled by the random variables Y 0,1 , . . ., Y 0,n 0 .For the sake of a simple presentation we consider location-scale type models, such that the common effect at the placebo can easily be modelled, but we note that more general models can be considered as well introducing additional constraints for the parameter.
To be precise, we assume that the models in (2.1) are given by m (d, β ) = β 0,1 + β ,1 • m 0 (d, β0 ), = 1, 2, i = 1, . . ., k . (2.12) where m 0 (0, β0 ) = 0 ( = 1, 2), such that the condition m 1 (0, β ) = m 2 (0, β ) = β 0,1 reflects the fact that there is only one placebo group (and as a consequence a common placebo parameter).Models of this type cover the most frequently used functional forms used in drug development and several examples can be found in Ting (2006).Beside the location parameter β 0,1 there may be also other shared parameters, which we do not reflect in our notations for a better readability.The -th model is completely characterized by its parameter β = (β 0 , β ) = (β 0 , β ,1 , β0 ), = 1, 2, and we obtain estimates of the model parameters by minimizing the sum of squares (2.13) Theorem 2.1 remains valid in this situation and a proof can be found in the Appendix (see Section 6.4).

Testing equivalence of model parameters
So far we assumed that the two regression models m 1 and m 2 share the common parameter β 0 .In practice it may be necessary to assess whether this assumption is plausible using an appropriate equivalence test for the shared model parameters.To be more precise, we recall the definition the parameters β in model (2.1), i.e.
and note that assumption (2.2) of p common parameters in the models m 1 and m 2 can be represented as (β 1,1 , . . ., β 1,p ) = (β 2,1 , . . ., β 2,p ) for = 1, 2. In order to investigate if this assumption holds at least approximately we construct a test for the hypotheses where δ denotes the equivalence margin.To be precise let β( ) denote the least squares estimates in model m for the sample {Y ,i,j : j = 1, . . ., n ,i , i = 1, . . ., k } ( = 1, 2), and assume that for large sample considerations the sample sizes n and n ,i converge to infinity such that (2.16) Under standard assumptions, which are listed in Section 6 it can be shown that the least squares estimate β( ) of the parameter β in model m is approximately normal distributed, that is where the symbol D −→ means convergence in distribution and the matrix Σ is defined by Here and throughout this paper we assume that the matrices Σ 1 and Σ 2 are non-singular.Consequently the difference √ n( β(1) − β(2) ) is also asymptotically normal distributed, and in particular it follows for the first p components of the difference that where the matrix Ω is defined by denotes the upper-left p × p -block of the matrix Σ −1 ( = 1, 2) and λ is defined in (2.16).Therefore we obtain the approximation ( β1,1 , . . ., β1,p ) − ( β2,1 , . . ., β2,p ) where Ω is defined in (2.20).We can now apply the test (2.2) proposed in Wang et al. (1999) by rejecting the null hypothesis K 0 in (2.14), whenever where t 1−α,n−2 denotes the 1 − α quantile of the t-distribution with n − 2 degrees of freedom and Ωii the ith diagonal element of the matrix Ω which is an estimate for the (unknown) covariance matrix Ω (this is obtained by replacing the unknown parameters β , σ 2 and weights ζ ,i in (2.18) by their corresponding estimates and n ,i /n , respectively).

Finite sample properties
We now investigate the finite sample properties of the bootstrap test proposed in Section 2.2 in terms of power and size using numerical simulations.The data is generated as follows: (a) We choose the functional form of the models m 1 , m 2 and specify their parameters β 1 , β 2 (including a common parameter β 0 ), which determine the true underlying models.Further we choose variances σ 2 and the actual dose levels d ,i , = 1, 2.
(b) For each dose d ,i we calculate n ,i values for the response given by m (d ,i , (β 0 , β )).By generating residual errors η ,i,j ∼ N (0, σ 2 ) we obtain the final response data The simulation results below were obtained using 1 000 simulation runs, where B = 500 bootstrap replications were used to calculate quantiles of the bootstrap test.
In the following, we report the simulations results for power and size under three different scenarios.We consider the four-parameter sigmoid Emax model which is frequently used in practice when modeling dose response relationships (see for example Gabrielsson and Weiner (2007) or Thomas et al. (2014)).In model (3.2) the parameter β = (β 1 , β 2 , β 3 , β 4 ) corresponds (in this order) to the placebo effect E 0 , the maximum effect E max , the Hill parameter h determining the steepness of the dose-response curve and the dose ED 50 producing half of the maximum effect (Macdougall (2006)).In what follows we add an index = 0 for a shared parameter or = 1, 2 for the group under consideration.
In Table 1 we summarize the simulated rejection probabilities of the bootstrap test (2.9) under the null hypothesis (2.4) with d ∞ = 2, 1.5, 1 and ε = 1.We conclude that the bootstrap test controls its level in all cases under consideration.At the margin of the null hypothesis (i.e.d ∞ = 1) the approximation of the level is very precise, even for sample sizes as small as n ,i = 6.We also investigated the relative residual mean squared errors (RRMSE) of the parameters estimates.Table 2 summarizes the simulation results only for d ∞ = 1 (i.e. at the margin of the null hypothesis), as the results are similar for other choices of d ∞ .We conclude that the RRMSE for estimating the Hill parameter β 0,3 is (by far) the largest.This phenomenon has also been observed by Mielke (2016).We also observe that all estimation errors decrease with larger sample sizes and smaller variances.Table 2 also summarizes the RRMSE when fixing the Hill parameter at β 0,3 = 4 (see the numbers in brackets).In this case, responding responses where the maximum distance to the reference curve m 1 (dashed line) is attained.Table 1: Simulated Type I error of the bootstrap test (2.9) for the equivalence of two sigmoid Emax models defined in Scenario 1 with ε = 1.The numbers in brackets show the simulated Type I error when fixing the Hill parameter at β 0,3 = 4.
four parameters need to be estimated in total and the estimation errors become slightly smaller.We also repeated the Type I error rate simulations when fixing the Hill parameter at β 0,3 = 4.The the results are reported in  The numbers in brackets show the values for the RRMSE when fixing the Hill parameter at β 0,3 = 4.
In Table 3 we summarize the power of the bootstrap when generating the data under the alternative d ∞ = 0.5, 0.25, 0 and ε = 1.As expected, the power increases with larger sample sizes and smaller variances and is reasonably high across all configurations.Fixing the Hill parameter significantly improves the power which can be explained by the difficulty of estimating this parameter precisely, as discussed above.Table 3: Simulated power of the bootstrap test (2.9) for the equivalence of two sigmoid Emax models defined in Scenario 1 with ε = 1.The numbers in brackets show the simulated power when fixing the Hill parameter at β 0,3 = 4.
In Table 5 we summarize the simulated power of the bootstrap test under the alternative d ∞ = 0.5, 0.25, 0 and ε = 1.As expected, the power decreases for increasing values of d ∞ and for higher variances or smaller sample sizes.One noticeable exception occurs at d ∞ = 0, where in some cases the power is smaller than for d ∞ = 0.25.This effect can be explained theoretically when considering the proofs for the bootstrap test.In case of d ∞ = 0 the set E = E + ∪E − containing all points where the maximum distance between the two curves is attained (see Appendix 6.3) consists of the entire dose range D. Therefore, the asymptotic distribution of the test statistic is not Gaussian but a maximum of Gaussian processes.This complex structure of the asymptotic distribution has an impact on the bootstrap procedure and explains the decrease in power for d ∞ = 0.This phenomenon can also be observed, although to a lesser degree, in Scenario 1. Finally, we observe higher power values when fixing the Hill parameter compared to the situation where it has to be estimated.Table 5: Simulated power of the bootstrap test (2.9) for the equivalence of two sigmoid Emax models defined in Scenario 2 with ε = 1.The numbers in brackets show the simulated power when fixing the Hill parameters at their true underlying values.
In Figure 2  simulation errors.Looking at the region d ∞ < 1, we observe that the test assuming three shared parameters has the highest power among all four tests, followed by the test assuming two shared parameters.The difference between the tests assuming one and no shared parameter is rather small.Concluding, the more parameters can be assumed to be common for the two regression curves the higher is the power of the test.Note, however, that strictly speaking the hypotheses (2.4) are different when assuming three, two, one and no shared parameters and that the perceived power gain when assuming more shared parameters comes at the cost of making additional assumptions that need to be verified in practice, as illustrated with the clinical trial example in Section 4.

Clinical trial example
We now illustrate the proposed method with a multi-regional clinical trial example.The objective of this trial is to evaluate the dose response relationships in Caucasian and Japanese patients and assess their similarity.Based on data from previous clinical trials investigating a drug with a similar mode of action, it is reasonable to assume a similar response to placebo and a common maximum treatment effect in both populations, with the main difference expected to be in a different onset of treatment effect.Using the sigmoid Emax model (3.2), these consideration thus lead to different ED 50 and Hill parameters for the two dose response curves.Because the trial is still at its design stage, we simulate data based on the trial assumptions.To maintain confidentiality, we scale the actual doses to lie within the [0, 15] interval.These limitations do not change the utility of the calculations below.
We assume 60 Japanese and 240 Caucasian patients, resulting in 300 patients overall.
Patients from both populations are randomized to receive either placebo (dose level 0) or one of three active dose levels, namely 1, 3, 15 for the Japanese and 0.5, 9 and 15 for the Caucasian patients.Assuming equal allocation of patients within each population, we thus have 75, 60, 15, 15, 60, and 75 patients randomized to the dose levels 0, 0.5, 1, 3, 9 and 15, respectively.The response variable is assumed to be normally distributed and larger values indicate a better outcome.Pharmacological and clinical considerations suggest the use of the (three-parameter) Emax model with the Hill parameter fixed at 1. Later on we relax this assumption as part of a sensitivity analysis.The R code for this example and all other calculations in this paper is available from the authors upon request.
In Figure 3 we display the fitted dose response models m 1 (d, β1 ) and m 2 (d, β2 ) for the Japanese and Caucasian patients, respectively, together with the individual observations, where d ∈ [0, 15] and the y-axis is truncated to [−1, 6] for better readability.The parameter estimates from the two separate model fits are given by β1 = (−0.195, 4.751, 11.991) and β2 = (−0.002,5.676, 33.887).The observed differences for the placebo response and the maximum treatment effect are given by | β1,1 − β2,1 | = 0.193 and | β1,2 − β2,2 | = 0.925, respectively, and thus relatively small, as it also transpires from the plots in Figure 3.To corroborate this empirical observation, we formally test whether the assumption of shared parameters is plausible by applying the equivalence test described in Section 2.3 on the data set under consideration.We choose the threshold δ = 1.5 and therefore test the null hypothesis Applying the test (2.21) for α = 0.05, we obtain Ω11 = 3127.91and Ω22 = 10748.27and therefore 1/2 = 0.928, respectively.
We can thus reject K 0 at the relatively stringent 5% level and conclude equivalence of the two parameters, which justifies using the bootstrap test (2.1) with shared parameters.We now evaluate the similarity of the dose response curves for the Japanese and Caucasian patients, assuming the same placebo and maximum treatment effect.In order to compute the non-linear least squares estimates in model (2.1) with (2.2) we formulate the objective q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Japanese Caucasian Figure 3: Fitted Emax model m 1 (m 2 ) for the Japanese (Caucasians) patients given by the solid (dashed) line with observations marked by "o" ("x").
function of the minimization step as Here, β 0,1 denotes the (shared) placebo effect, β 0,2 the (shared) maximum treatment effect E max , and β1,3 and β2,3 the ED 50 parameters of the two models.Using the auglag function from the alabama package Varadhan (2014) to solve the above optimization problem, we obtain the parameter estimates β0,1 = −0.064(0.074), β0,2 = 5.366 (0.137), β1,3 = 19.400(2.634) and β2,3 = 25.681(3.256).In brackets we report the associated standard errors, which have to be calculated manually based on (5.5).The estimates for the population variances are σ2 1 = 0.508 and σ2 2 = 0.455.The observed maximum difference between both curves over the investigated dose range [0, 15] is d∞ = 0.376, attained at dose 2.23.We apply the bootstrap test (2.9) using B = 1 000 bootstrap replications.Setting ε = 0.7 for the equivalence margin in (2.4), we obtain the quantile q 0.05 = 0.438 for α = 0.05.Thus, we reject the null hypothesis (2.4) at the 5% significance level and conclude that the dose response curves for the Japanese and Caucasian populations are similar, under the shared parameter assumption.Alternatively, we can calculate the p-value ∞ ≤ d∞ ) = 0.023 for the bootstrap test and obtain the same test decision at level α = 0.05.For illustration purposes we also apply the bootstrap test (2.9) but without shared parameters (yet under the assumption of a fixed Hill parameter).Accordingly, we obtain a considerably larger p-value of 0.458, which supports our findings from Scenario 3 in Section 3 about the loss in power when no shared parameters are assumed.In this case the observed maximum distance is d∞ = 0.706, attained at dose 1.42, and the quantile of the bootstrap distribution is q 0.05 = 0.449.Finally, we perform a sensitivity analysis to investigate the assumption of the Hill parameter being equal to 1.As part of this analysis we repeat the model fit and the bootstrap test using the sigmoid Emax model (3.2) where the Hill parameter is now part of the estimation.The parameter estimates (standard errors in brackets) are β0,1 = 0.037 (0.082), β0,2 = 4.544 (0.218), β1,3 = 1.05 (0.229), β1,4 = 13.542 (2.095), β2,3 = 1.650 (0.331) and β2,4 = 16.558 (4.521).Now, the maximum distance between the curves is d∞ = 0.640, attained at dose 0.6.It turns out that the standard errors of the estimates are slightly higher which is in line with the results shown in the simulation studies in Section 3. Performing again the bootstrap test with two shared parameters results the quantile q 0.05 = 0.429 and the p-value 0.285.Consequently, we cannot reject the null hypothesis in this case.In conclusion, fixing the Hill parameter to 1 and assuming both the placebo effect and the maximum treatment effect to be the same in both populations clearly results in the most powerful procedure.We can demonstrate equivalence at the significance level of α = 0.05, whereas in case of estimating both models separately (i.e.no shared parameters) or including the Hill parameter in the estimation we obtain considerably larger p-values.

Conclusion
In this paper we developed a new test for the equivalence of two regression curves when it is reasonable to assume that some model parameters are the same.Our approach is based on an estimate of the maximum deviation between the two curves, where critical values are obtained by a novel constraint bootstrap procedure.We demonstrated that the new test controls its level properly and is consistent.We investigated the finite sample properties of the proposed procedure using extensive simulations and observed that the Type I error rate is controlled in all scenarios under consideration, even for sample sizes as small as 6 patients per dose level.Further, we concluded that the test reaches a reasonable power that increases with larger sample sizes.In particular, we demonstrated that the power of tests for the equivalence of curves can be improved substantially by using the additional information of common parameters in the two regression curves.This effect could also be observed in the clinical trial example, which showed the power advantage of the bootstrap test (2.9) if the underlying assumptions are well justified.Relaxing those assumptions may lead to more robust conclusions, but only at a cost of a loss in power.An interesting extension of the proposed methodology arises from the need to include covariates in clinical trial practice.Covariates can be continuous (e.g.age or body mass index), categorical (e.g.disease status or race), or binary (e.g.gender or smoking yes/no), possibly changing over time.These cases may have to be treated differently and we leave this problem for future research.Another area of research could be the assessment of similarity in two nested populations, thus relaxing the assumption of independence between the observations.In our multi-regional clinical trial example we compared the Japanese with Caucasian patients.It will be interesting and relevant to clinical trials to explore the development of the proposed methods when comparing the Japanese with an overall population that includes Japanese and Caucasian patients.Again, we leave this topic for future research.
6 Appendix: Proof of Theorem 2.1 and Remark 2.2 The proof of the theoretical results of this paper proceeds in several steps.First we state the assumptions under which the statements hold (Section 6.1).Second, we derive the asymptotic distribution of the parameter estimates in models with common parameters (Section 6.2).In Section 6.3 we derive a result on the weak convergence of a stochastic process, from which the proof of Theorem 2.1 and Remark 2.2 can be derived (see Section 6.4).

Assumptions
For the theoretical results of this paper we make the same assumptions as in Dette et al. (2018).
1.The errors η ,i,j are independent, have finite variance σ 2 and expectation zero. 5.The gradients with respect to the parameters are uniformly bounded, that is sup )) 2 , we assume that for any u > 0 there exists a constant v u, > 0 such that lim inf

Asymptotic properties of the OLS
In this section we derive the asymptotic normality of the parameter estimates in models with common parameters.Observing the definition of the OLS β = ( β0 , β1 , β2 ) in (2.3) we obtain, by taking the partial derivatives, β by the necessary conditions 0 where 0 = 0 p−p denotes the zero vector in R p−p .Therefore the equations in (5.1) can be summarized to and therefore β can be linearized as Due to the strong law of large numbers it holds Similarly, note that in the situation of a common placebo group as described in Remark 2.2 the estimates are obtained by minimizing the sum of squares in (2.13).As m 0 (0, β0 ) = 0, = 1, 2, this function is the same as the one which is obtained allocating the observations at placebo arbitrarily to the two groups.More precisely, we can also write the sum of squares in (2.13) as where Y 1,1,j = Y 0,j (j = 1, . . ., n 1,1 ) and Y 2,1,j = Y 0,j (j = n 1,1 + 1, . . ., n 1,1 + n 2,1 = n 0 ).This corresponds to the minimzation of the sum of squares in (2.3) with a common intercept b 0,1 , and consequently this situation can be treated in the same way as considering two different placebo groups with a common intercept b 0,1 .By the same arguments as given in Appendix 6.2 the corresponding estimates are asymptotically normal distributed (if n 1,1 /n → c 1 and n 2,1 /n → c 2 for some constants c 1 , c 2 , ∈ (0, 1)), and the proof in Section 6.3 shows that Theorem 2.1 remains valid in the model with a common placebo group.As a consequence we obtain the claim in Remark 2.2

Figure 1 :
Figure 1: Graphical illustration of Scenarios 1 and 2. Open dots indicate the doses and cor- (b)  we plot the proportion of rejections in dependence of the true maximum absolute difference d ∞ ∈ (0, 2].Under the null hypothesis d ∞ ≥ 1 all four tests control their level, as the proportion of rejections is smaller than or equal to α = 0.05 within

Figure 2 :
Figure 2: (a) Graphical illustration of the regression functions m 1 and m 2 with κ = 1.5, 1.7, 2, 2.5, 3 in Scenario 3. (b) Proportion of rejections in dependence of the true maximum absolute difference d ∞ for four different tests assuming one, two, three and no shared parameters for ε = 1 (vertical dotted line) and α = 0.05 (horizontal dashed line).
2. The covariate region D ⊂ R d is compact and the number and location of dose levels k does not depend on n , = 1, 2.3.All estimates of the parameters β 1 , β 2 are computed over compact setsB 1 ⊂ R p 1 and B 2 ⊂ R p 2 .4.The regression functions m 1 and m 2 are twice continuously differentiable with respect to the parameters for all b 1 , b 2 in neighbourhoods of the true parameters β 1 , β 2 and all d ∈ D. The functions (d, b ) → m (d, b ) and their first two derivatives are continuous on D × B .
Table 1 (numbers in brackets) and we conclude that the size is well controlled within the simulation error.

Table 2 :
RRMSE of the parameters obtained in the model estimation step of the bootstrap test (2.9) for the equivalence of two sigmoid Emax models defined in Scenario 1 with d ∞ = 1.