Identifying alert concentrations using a model‐based bootstrap approach

The determination of alert concentrations, where a pre‐specified threshold of the response variable is exceeded, is an important goal of concentration–response studies. The traditional approach is based on investigating the measured concentrations and attaining statistical significance of the alert concentration by using a multiple t‐test procedure. In this paper, we propose a new model‐based method to identify alert concentrations, based on fitting a concentration–response curve and constructing a simultaneous confidence band for the difference of the response of a concentration compared to the control. In order to obtain these confidence bands, we use a bootstrap approach which can be applied to any functional form of the concentration–response curve. This particularly offers the possibility to investigate also those situations where the concentration–response relationship is not monotone and, moreover, to detect alerts at concentrations which were not measured during the study, providing a highly flexible framework for the problem at hand.


INTRODUCTION
When a response variable exceeds a pre-specified threshold, the corresponding concentration is called an 'alert' concentration. Such a threshold can, for instance, be given by 50% of the maximal effect, in which case the corresponding alert concentration is commonly defined as the EC 50 . Similarly, one can estimate the Absolute Lowest Effective Concentration (ALEC), that is, the absolute lowest effective concentration, which is given by the concentration where a fixed and pre-specified effect level is attained (Jiang, 2013).
In general, the determination of alert concentrations is a frequently addressed issue in many scientific areas as for instance in pharmacokinetics, toxicology, or in clinical research, just to name a few. A specific example is given by the concentration-gene expression dataset by Krug et al. (2013), who conducted a case study in order to assess the effect of the compound valproic acid (VPA) on the development of human embryonic stem cells (hESC). VPA was applied to cells in several concentrations, and the expression values for 54,675 probe sets were measured. Here, we are interested in finding the alert concentration corresponding to a pre-specified threshold for each probe set individually. Traditionally, alert concentrations such as the 'LOEC', the lowest observed effect concentration, are determined observation-based by identifying the lowest concentration, where the mean difference over all replicates between concentration and control significantly exceeds a given threshold (see, e.g., Delignette-Muller et al. 2011;Hothorn 2014). However, these observationbased approaches are criticized for their dependence on the chosen design (see, e.g., Murado and Prieto 2013). An alternative to this procedure is given by a model-based approach, where a concentration-response curve, chosen from a broad variety from different models, is fitted to the data (see, e.g., Ritz 2010). In contrast to the observationbased determination of alert concentrations, this method provides the possibility to detect an alert also at a concentration which was not explicitly measured in the study. In general, the importance of model-based statistical inference has considerably increased over the last years (see Kluxen and Jensen 2021 for a recent overview and comparison of methods) and can be particularly a huge advantage in case of incomplete data.
In this paper, we focus on determining the lowest concentration where the effect-level is significantly increased, which is denoted by the lowest effective concentration (LEC) and investigated in a recent publication by Kappenberg et al. (2021). Precisely, the authors propose a model-based approach for identifying statistically significant alert concentrations (LEC values) by fitting a loglogistic ('4pLL') model with four parameters to the data. Their procedure relies on the monotonic behavior of this type of model and the test statistic is based on the change of response compared to the control and the corresponding variance of this change. An estimate for this variance is obtained by using the delta method (Oehlert, 1992), which is a commonly used technique relying on asymptotic properties of the model estimates. Kappenberg et al. (2021) compared their method to a standard approach based on multiple t-tests and demonstrate that it can detect alert concentrations reasonably well with only few false positives and, in particular, performs better than the classical t-test approach.
However, the method is limited to just one specific functional form, the 4pLL model, requiring particularly a monotone concentration-response relationship. For many genes, this assumption is not realistic, as demonstrated by Duda et al. (2022), who used the multiple comparison procedure and modeling (MCP-Mod) methodology, introduced by Bretz et al. (2005), as a model selection approach for dose-response gene expression data. This procedure, originally aimed at analyzing Phase II clinical studies, is a multistage procedure which combines an MCP-Mod step. Duda et al. (2022) could identify a substantial number of genes showing non-monotone relationships, which can, for example, occur due to cytotoxic effects, overlaying the gene expression response values especially for high concentrations of the considered compound.
To overcome this issue, we propose a new model-based approach for the determination of alert concentrations, offering a much more flexible framework. In general, every functional form can be used to model the concentrationresponse relationship and the statistical test is constructed such that it particularly does not rely on any monotonicity assumption of the model. This is achieved by formulating the hypotheses in terms of the absolute difference between a response at a certain concentration compared to the control, which formally constitutes a setting of relevance testing, or, also called testing of precise hypotheses (see Berger and Delampady 1987). Here, rejecting the null hypothesis means detecting a relevant difference between responses. Interestingly, by interchanging the hypotheses one considers a test for equivalence, a frequently addressed issue in (bio)statistics, where a lot of model-based research has been done over the last years (see, e.g., Dette et al. (2018) for testing equivalence of regression curves).
In the present situation of relevance testing, the null hypothesis of no relevant difference is rejected for a large difference of a certain response compared to the control, given by the difference function evaluated at that point. More precisely, the test decision is based on the construction of a (lower) simultaneous confidence band of the difference function, which requires an estimate for the variance of the difference between responses at different concentrations. In order to avoid elaborate (model-specific) calculations, we do not use the commonly used delta method to obtain this estimate but propose a parametric bootstrap approach, yielding the advantage that it is independent from asymptotic inference and hence a promising tool also in case of low sample sizes or high variances.
In general, the use of bootstrap techniques for the derivation of confidence bands has become an important tool during the past. For instance, Mandel and Betensky (2008) presented an algorithm for the construction of simultaneous confidence bands based on the percentile bootstrap approach, while Hall and Horowitz (2013) proposed a bootstrap method to construct nonparametric confidence bands for functions. Further, Gsteiger et al. (2011) derived simultaneous confidence bands for nonlinear regression models both approximation-based and simulation-based and demonstrate the superiority of the latter. This paper is structured as follows. First, the new methodology including the model, the test procedure, and the calculation of simultaneous bootstrap confidence bands will be introduced. Second, the performance of the method will be investigated extensively by means of a simulation study. Finally, the method is applied to a real dataset investigating the effect of different concentrations of the compound VPA on the development of hESC to neuroectoderm (Krug et al., 2013).

The model
In a first step, we fit a concentration-response model to the data, which contains observations in total, measurements at different concentration levels , respectively ( = 1, … , ). The regression model is given by where describes the response as a function of the concentration. We assume that the error terms , are independently and identically normally distributed, that is , ∼  (0, 2 ). Note that the procedure works similarly with any other assumption on the distribution. Furthermore, it holds = ∑ =1 . We obtain an estimate for the parameter by ordinary least squares (OLS) estimation by minimizing the least-squares function and consequentlyˆis given byˆ= arg min ∈ ( ), where ⊂ ℝ is a compact set and denotes the number of parameters of the model. Under certain regularity conditions, the OLS estimateˆis known to be consistent and asymptotically normally distributed (see Jennrich 1969 for details).

2.2
The test procedure We want to detect an alert concentration, that is, the minimum concentration, where the difference of the response to the response for the control is significantly larger than a pre-specified threshold . Hence, we consider the absolute difference |Δ( , )| ∶= | ( , ) − (0, )| and test the hypotheses In this framework, rejecting 0 means detecting a significant difference between a response of a certain concentration and the response for the control, that is, identifying an alert concentration. For non-monotone functions, this can happen several times. Thus, we define that the smallest concentration , where 0 is rejected yields the LEC. Formally, the hypotheses in Equation (3) constitute a framework of relevance testing, or, also called testing of precise hypotheses (see Berger and Delampady 1987), where rejecting the null hypothesis means detecting a relevant difference.
The construction of the test is based on deriving simultaneous confidence bands. Precisely, let ( ,ˆ) denote a lower simultaneous confidence band of |Δ( , )| with confidence level (1 − ), that is We now reject 0 (i.e., we observe an alert concentration at a particular ∈ ) if meaning that the lower confidence band of the absolute difference exceeds the pre-specified effect level . As mentioned above, it can happen that there is more than one intersection of ( ,ˆ) and -this is the case if Δ( ,ˆ) is a non-monotone function and this, respectively, is the case if the concentration-response curve ( , ) is non-monotone. Consequently, the concentration where ( ,ˆ) exceeds for the first time yields the LEC, that is, LEC = min ∈ ( ,ˆ) ≥ . The test proposed in Equation (5) is an -level test as If the confidence band is too wide, or, in other words, conservative, the proportion of rejections under the null hypotheses will be close to zero and in particular much smaller than , meaning that the test procedure is conservative as well. On the other hand, too narrow confidence bands, not reaching the desired coverage probability of at least 1 − , result in an inflated type I error. Hence, we aim for finding simultaneous confidence bands as precise as possible to get a valid test procedure.

2.3
Simultaneous confidence bands based on a parametric bootstrap As described in Section 2.2 the performance of the test described in Equation (5) strongly depends on the deriva-tion of a simultaneous confidence band which approximates the desired coverage probability as closely as possible. Therefore, we search for a critical value such that is a simultaneous lower (1 − )-confidence band for Δ( , ). Precisely, should fulfill We now aim for estimating botĥ| Δ( ,ˆ)| and the distribution of the maximum in Equation (9) in order to obtain the critical value . In some situations, the standard error̂| Δ( ,ˆ)| could be derived by for example using the delta method. However, this is not always possible and comes along with the cost of intensive calculations, relying on the chosen functional form of the regression model. Furthermore, the performance would heavily rely on the asymptotic normality of the statistic which might not work out well in this situation as the test statistic is given by the absolute difference of responses and is hence possibly skewed. Thus, we propose an alternative based on a parametric bootstrap to keep the approach as flexible as possible with regard to the functional form. As the standard error of |Δ( ,ˆ)| is in general unknown and no formula is available, it has to be estimated as well and therefore two levels of bootstrap are required, resulting in a nested bootstrap approach. Algorithm 1 describes this procedure and yields both the critical value and an estimate for the standard error |Δ( ,ˆ)| such that the confidence band in Equation (7) can be constructed. Figure 1 displays the corresponding flowchart.
Repeating this for all 1 datasets yields replicates |Δ( ,ˆ * 1 )|, … , |Δ( ,ˆ * 1 )| which can be interpreted as the first level of bootstrap. The standard error of the sample |Δ( ,ˆ * 1 )|, … , |Δ( ,ˆ * 1 )| yields an estimate of̂| Δ( ,ˆ)| in Equation (7). • Step 4: In order to estimate the standard error of each bootstrap samplê| Δ( ,ˆ * )| , = 1, … , 1 , a second level of bootstrap sampling is required. Therefore, we generate new bootstrap data for each of the 1 bootstrap samples by usingˆ * and̂2 * , = 1, … , 1 , according to * * , = ( ,ˆ * ) +̂ * * * , = 1, … , = 1, … , , with independent and normally distributed errors * * ∼  (0, 1). From the data, we again estimate model parameters and the corresponding test statistic |Δ( ,ˆ * * )|, where we now omit the indices due to a better readability. Repeating this step 2 times yields 2 replicates of |Δ( ,ˆ * )| for each of the original 1 bootstrap samples. From these 2 values, we get an estimate of the standard error̂| Δ( ,ˆ * )| , = 1, … , 1 . • Step 5: For each of the 1 replicates from the first level of bootstrap ( Step 3) we calculate * ∶= max ∈ approximates the distribution of the maximum in Equation (9). Denoting by * (1) , … , * ( 1 ) the order statistic and taking the (1 − )-quantile of this distribution yields the (approximate) critical value c, that is, estimates is no longer guaranteed (see, e.g., Chakraborty et al. 2013 for a discussion). However, for the construction of simultaneous confidence bands this only becomes relevant if Δ( , ) = ( , ) − (0, ) = 0 for all ∈ , that is ( , ) is a constant function. As this situation is of no practical interest, we assume to be a non-constant function throughout this paper. Performing a nested bootstrap increases the computational effort as the procedure requires the generation of 1 ⋅ 2 bootstrap datasets. According to Efron and Tibshirani (1994), the number 1 for approximating the quantile should be large, while 2 to estimate the standard errors can be chosen rather small. Numerical details on that can F I G U R E 1 Flowchart of Algorithm 1 illustrating the procedure of the nested bootstrap consisting of a first-level bootstrap, where the estimators |Δ( ,ˆ * )| are calculated and of a second-level bootstrap, where the variances of these estimators are determined be found in Section 3. Note that the distribution of the bootstrap error terms in Steps 2 and 4 of Algorithm 1 corresponds to the assumed distribution of the error terms of model (1). Consequently, the algorithm can still be used in case of non-normally distributed error terms by changing the corresponding distributions within the bootstrap steps.

Settings and scenarios
For the controlled simulation study, we choose six scenarios for the underlying true concentration-gene expression relationship. The scenarios are based on two model functions: first, we consider the monotone sigmoidal Emax (sigEmax) model, which is defined as where 0 denotes the base effect, max the asymptotic maximal effect, ℎ is the slope parameter, and the ED 50 corresponds to the concentration, where half of the maximal effect is observed. The sigEmax model is an equivalent parameterization of the commonly used four-parameter log-logistic model (see, e.g., Ritz 2010). The second model is the non-monotone Beta model, which is defined as with ( 1 , 2 ) = ( 1 + 2 ) ( 1 + 2 ) ∕( 1 1 2 2 ). 0 is again the base effect, max denotes the maximum effect within the range of considered concentrations, and is a fixed scaling parameter, which, in the scenarios considered here, takes the fixed value = 1200 (see Bornkamp et al. 2021).
The chosen scenarios are visually shown in Figure 2. The first three scenarios exhibit monotone profiles based on the sigEmax model and are chosen as in Kappenberg et al. (2021). Scenarios IV-VI are chosen to exhibit nonmonotone profiles and are based on a Beta model. The motivation behind the general shape of the curve is to have generally monotone responses to increasing concentrations, but with potentially overlaying cytotoxic effects for higher concentrations, leading to a decrease in the response. We chose a critical effect level = log 2 (1.5) ≈ 0.585 to represent a foldchange (i.e., difference in mean gene expression) of 1.5. The specific parameter values for the six scenarios are chosen as follows: • In Scenario I, the true parameters of the underlying sigEmax model are 0 = 0, max = 0.58, ℎ = 6, ED 50 = 450. The curve never exceeds the threshold = 0.585, such that the true ALEC cannot be calculated. This scenario corresponds to the null situation. • In Scenario II, the true parameters of the underlying sigEmax model are 0 = 0, max = 4, ℎ = 3, ED 50 = 900. Here, the curve exceeds the threshold, such that the true ALEC attains the value 500. • In Scenario III, the true parameters of the underlying sigEmax model are 0 = 0, max = 1.16, ℎ = 3, ED 50 = 400. The parameter ED corresponds to the true ALEC here. • In Scenario IV, the true parameters of the underlying Beta model are 0 = 0, max = 0.58, 1 = 2.5, 2 = 1. The curve never exceeds the threshold = 0.585, such that the true ALEC cannot be calculated, and this scenario also corresponds to the null situation. • In Scenario V, the true parameters of the underlying Beta model are 0 = 0, max = 1.5, 1 = 4, 2 = 1.7. The curve clearly exceeds the threshold, and the true ALEC attains the value 500. • In Scenario VI, the true parameters of the underlying Beta model are 0 = 0, max = 1.2, 1 = 3.1, 2 = 2.2. Due to its strongly pronounced non-monotonic shape, the curve intersects the threshold twice. The minimum of both intersections is used as true ALEC, which thus attains the value 400.
The setup of the simulation study is based on the exemplary case study (see Section 4) and the simulation study in Kappenberg et al. (2021). In brief, we simulate three replicates for each of the concentrations from the case study (0,25,150,350,450,550,800, 1000 M) based on the true parameters from the six scenarios explained before. These are based on a normal distribution with the mean ( , ), where is the sigEmax model or the Beta model, with being the corresponding vector of true parameters. We chose the standard deviation in dependence of the range covered by the respective scenario. The 'medium' values are based on a linear model between the covered ranges and median standard value in the real-data case study, and take values ( We repeated the simulation procedure 1,000 times for each scenario and each variance and estimated LECs based on the newly proposed bootstrap approach. In order to keep the computational effort at a reasonable level, we follow Efron and Tibshirani (1994) and choose 1 = 500 F I G U R E 3 The true underlying curve (in black) and the calculated lower simultaneous (1− )-confidence bands (in grey) for three different scenarios with two choices of variances each (medium and large). The black dotted line indicates the pre-specified effect level = 0.585 repetitions for the first level of the bootstrap and 2 = 25 repetitions for the second level, respectively. Further, we constructed a grid dividing the concentration range from = 1 up to = 1000 into 1000 equal parts in order to find the intersection of a confidence band and the effect level . The estimation of the models is performed using the DoseFinding package (Bornkamp et al., 2021) in R (R Core Team, 2021). With these configurations, the computation time for one dataset using an Intel Core i7 CPU with 32 GB RAM is approximately 25 s. Of note, although these configurations already provide a very high level of accuracy, they can easily be adapted.

Results
Figure 3 visualizes the test principle described in Section 2.2 for the two scenarios under the null hypothesis (Scenarios I and IV) and one scenario under the alternative, that is, Scenario V. For each dataset, a lower simultaneous 95%-confidence band has been calculated according to Algorithm 1, which is then compared to the pre-specified effect level = 0.585. In general, the confidence bands become wider with increasing variance. to the margin of the null hypothesis in Equation (3) as the maximum value of the curve approaches, but never exceeds the threshold , such that no true ALEC value can be calculated. For both scenarios and all configurations of the variability, the total number of alerts is reasonably small and well below the significance level of = 0.05, meaning 50 alerts. The median values of the identified LECs are mostly above 800 as confidence bands and the threshold only intersect at very large concentrations and in only few cases (see also the first two columns of Figure 3). The standard deviation of the LECs increases for larger variances, as well as the width of the confidence bands, resulting in a higher number of alerts. For instance, in Scenario I there are 4 alerts in total for a small variance, but 21 for the large variance setting. Of note, the standard deviation for the small variance setting is with 112.14 larger than for the setting with medium variance (110.3), which results from the fact that it is calculated from only 4 values.
In Table 2, we display the power of the test, as Scenarios II, III, V, and VI correspond to the alternative in Equation (3). It turns out that for most of the configurations the total number of alerts is very large and close to 1, 000. However, the performance of the test is a bit less precise for Scenarios III and VI as the true underlying curves are closer to the threshold . In particular, this becomes visible for a high variability, where only 429 alerts are detected in Scenario III and 476 in Scenario VI, respectively. However, this effect quickly disappears when the variance decreases. Regarding the accuracy of the estimation of the LECs, both scenarios show a similar performance. Precisely, Figure 4 shows a good overview of the distribution of the LECs for all scenarios and medium variance. It turns out that the obtained values are similarly close to the true value for all scenarios under the alternative. For instance, in Scenario II the median is given by 613 (true value is 500), whereas in Scenario III by 538, respectively (true value is 400). In general, the median values are closer to the true values for small variances (see Table 2). Moreover, we investigated the performance of the test principle in the case of misspecification of the parametric curve in an additional simulation study. More precisely, we used the Sigmoid Emax model both for the model fit and the test principle, whereas the actual data were based on the Beta model. It turns out that the test procedure keeps its significance value even in this situation. Furthermore, the procedure shows a tendency to slightly overestimate the alert concentration under the alternative in case of misspecification. A more differentiated presentation of these results can be found in the Supporting Information of this paper.
Finally, we compare the method presented here to the test described in Kappenberg et al. (2021) considering the first three scenarios. Under the null hypothesis, that is, Scenario I, our new method detects fewer alerts, that is, it is slightly more conservative, but, on the other hand, obtains larger median values of the LECs. This means that alert concentrations are detected at a higher concentration level, for instance at 814 compared to 768.3. Summarizing, both tests control their nominal level well. Comparing the situations under alternative, that is, Scenarios II and III, it turns out that the results are very similar, except for Scenario III with large variance. There the method of Kappenberg et al. (2021) identifies 607 alert concentrations, whereas our proposed test obtains only 429. This means that there is a small benefit in power compared to our approach. Further, there is some small difference in the median of the LECs (543.6 compared to 596, standard deviation 147 to 91.6). For scenario II, the results of the two tests are very similar, albeit the median LEC obtained by Kappenberg et al. (2021) is again slightly closer to the true value. Summarizing we note that the procedure proposed by Kappenberg et al. (2021) shows a slight power benefit compared to our method, but comes at the cost of a limited flexibility due to the restriction to monotone concentration-response curves. Consequently, a direct application of this test to Scenarios IV-VI is not possible.

CASE STUDY
In order to assess the effect of the compound VPA on the development of human embryonic stem cells, a case study was conducted. In this study, gene expression values for seven increasing concentrations of VPA (25, 150, 350, 450, 550, 800, 100 M) were measured in an in vitro experiment (Krug et al., 2013). Each concentration was measured in three replicates, and additionally, six replicates for the untreated control are available, leading to 27 samples in total, with measurements for 54,675 probe sets in total. Data are pre-processed using the robust multi-array analysis algorithm (Irizarry et al., 2003), which includes the three steps background correction, normalization, and summarizing the data to one value. Parameter values for pre-processing are chosen as in the original study in Krug et al. (2013). For pre-selection of the probe sets, we applied analysis of variance to each probe set individually. Only those probe sets which resulted in an unadjusted p-values smaller than 0.001 were kept for further analyses, thus we retained 9,460 out of our initial 54,675 probe sets. For each remaining probe set, the Akaike information criterion (AIC) is calculated both for the sigEmax and the Beta models, which are introduced in Section 3.1, respectively. For 4,887 probe sets, the sigEmax model results in a smaller AIC and thus in a better model fit; for 4,573 probe sets, the Beta model results in a smaller AIC. For further analysis, we first applied the bootstrap methodology based on both models to the entire set of 9,460 probe sets. Analogously to the simulation study, we chose a value of = log 2 (1.5) ≈ 0.585 for the threshold that needs to be significantly exceeded. For the sigEmax model, an LEC can be calculated for 4,416 probe sets, and for the Beta model, calculation of the LEC is possible for 4,967 probe

F I G U R E 6
Example profiles of nine chosen probe sets from the case study dataset. For each probe set, the individual data points (grey) and the concentration-wise mean (black) are indicated. The fitted sigEmax and the fitted Beta model are shown (solid line) together with the lower confidence limit based on the bootstrap procedure (dashed line). The value of the left-sided asymptote and the threshold that needs to be significantly exceeded are indicated by thin horizontal lines. If present, the lowest effective concentration (LEC) is indicated by a thin vertical line. The model with the lower value of the Akaike information criterion (AIC) is highlighted by a grey background, respectively the respective model assumptions in terms of the resulting LEC, since the underlying truth is not known. Instead, we focus on the analysis of some example probe sets, whose profiles and respective model fits are shown in more detail in Figure 6. Note that, with the exception of the last shown probe set, all profiles exhibit a generally increasing trend. We chose the probe sets from the case study to underline the resemblance to the scenarios from our simulation study, however, the procedure works analogously for decreasing profiles, which are also present in high numbers in the case study dataset.
The concentration-gene expression profiles of nine chosen probe sets, together with the corresponding fitted sigEmax and Beta model are shown in Figure 6 and the respective limits of the bootstrap-based confidence intervals. The value of the left-sided asymptote of the fitted model and the threshold that needs to be significantly exceeded are indicated by thin horizontal lines. If present, the LEC is indicated by a thin vertical line and the model selected by the AIC is highlighted by a grey background, respectively.
We first consider the probe sets where the sigEmax model leads to a better model fit according to the AIC. This situation occurs for four probe sets under consideration, which are now analyzed in detail. For the first probe set displayed in the upper row of Figure 6, no LEC can be detected if the sigEmax model is used. If the Beta model is used instead (regardless of the model selection step), the LEC can be calculated to be 813. Here, the lower confidence limit, due to the non-monotone shape, clearly exceeds the threshold for the Beta model, whereas it remains slightly below the threshold for the sigEmax model. For the second probe set in the upper row, the situation is reversed: Here, the sigEmax model does lead to a valid LEC, which takes the relatively high value of 943. No LEC can be detected for the Beta model, although no notable difference in the fit-ted models can be seen. The first and the second displayed probe sets in the middle row are chosen as representative examples for probe sets with a clearly pronounced sigmoidal shape, thus resembling Scenario III of our simulation study. Consequently, the SigEmax model leads to a lower value of the AIC in these cases. Focusing on the LECs, we observe LECs of 511 and 607 for these probe sets if the sigEmax model is used. If the bootstrap methodology is based on the Beta model instead, we obtain similar LECs, given by 498 and 644, respectively. For the remaining five probe sets, the Beta model is selected due to its smaller AIC value. In the situation of the right probe set displayed in the upper row of Figure 6 the threshold is clearly exceeded by the fitted Beta model and the lower confidence limit, resulting in an LEC of 628. Note that the SigEmax model results in a similar fit of the data and therefore in a similar LEC of 628 in this case. The third probe set in the middle row shows a profile that is clearly non-monotone, resembling Scenario V of our simulation study. Here, the selected Beta model results in an LEC of 564, if the bootstrap procedure is applied. If the sigEmax model is fitted instead, a similar LEC of 579 is obtained. Generally, the fitted Beta model fits the actual datapoints more accurately than the sigEmax model, which is by definition not able to capture non-monotone shapes. The difference between the two models in a non-monotone case becomes even more clear for the first probe set in the bottom row: this probe set exhibits a strongly non-monotone profile, where the fitted Beta model crosses the threshold twice, as it is the case in Scenario VI from our simulation study. The LEC for the Beta model is 559. However, if the sigEmax model is used despite the non-monotone shape of the data, no LEC can be calculated, as the lower confidence limit is clearly below the threshold. The second probe set in the bottom row displays a situation in which both functional forms lead to an LEC, but the difference between these LECs is large. In particular, the Beta model leads to an LEC of 625, while the sigEmax model leads to a far greater value of 780 and the lower confidence limit only barely crosses the threshold. Finally, the third probe set in the bottom row is chosen to represent an example of the set of probe sets that are in the lower right corner of the display in Figure 5. For these probe sets, the non-monotonicity especially for lower concentrations is so strongly pronounced, that for the Beta model selected by AIC, a strong increase is followed by a decrease, while the sigEmax model results in a generally decreasing profile. This leads to far smaller estimates of the LEC for the Beta model (128 for the shown probe set) compared to the sigEmax model (833). Summarizing, the LECs of the sigEmx model and of the Beta model are similar in the cases, where the sigEmax model is selected by AIC. If the data are of non-monotone shape, the Beta model fits the data more accurately than the sigEmax model, a fact that is also detected by the AIC. In these situations, the proposed bootstrap methodology can be used for the identification of alert concentrations, while the procedure proposed by Kappenberg et al. (2021) is not applicable due to its limitation to monotone curves.

DISCUSSION AND CONCLUSION
The determination of an alert concentration where a prespecified effect level of the response variable is exceeded is a frequently addressed issue in concentration-response studies. Traditionally, this problem is assessed observationbased by identifying the concentration directly from the measured concentration and corresponding responses from the study. In order to demonstrate that this threshold is even statistically significantly exceeded a method based on t-tests, for exampl, Dunnett's procedure (Dunnett, 1955), is used. A big disadvantage of this approach is that concentrations which have not directly been observed in the study cannot be detected. Therefore, we propose to fit a concentration-response model to the data and perform a test procedure based on the difference of a response at a certain concentration to the control. An alert concentration, that is, the LEC, is identified by the smallest concentration, where the lower simultaneous confidence bound exceeds the pre-specified effect level. We calculate this lower confidence bound by a parametric bootstrap approach which enables us to estimate the standard error of the difference function and their quantiles. This procedure works irrespectively of the chosen functional form of the model. Further, by using this model-based approach all concentrations are considered as possible alert concentrations resulting in a much more precise estimation.
In contrast to classical bootstrap theory, where new data are generated by resampling from the original dataset, the approach presented in this paper is based on the estimation of the model parameters, which are then used to generate bootstrap data. Usually this parametric bootstrap is preferred over a non-parametric if the problem at hand is a parametric one, as the approximation tends to be more precise for a correctly specified model (see Chapter 21 of Efron and Tibshirani 1994). Consequently, as the framework in this paper is defined by estimating concentration curves, the parametric bootstrap fits exactly into this context.
A model-based method has already been proposed by Kappenberg et al. (2021), who fit a 4pLL model to the data and calculate a test statistic which is asymptotically normally distributed. In our simulation study, we observed that this procedure performs in some scenarios slightly better, that is, determines the LEC more precisely. However, the method has been applied only for the 4pLL model so far and relies on the assumption of monotonicity of the chosen model. Consequently, it cannot be applied to, for example, the Beta model or any other non-monotone concentration-response curve occurring, for example, in situations of overlaying cytotoxic effects. Further, it relies on asymptotic inference due to the derivation of the variance using the delta method, making it more difficult to adopt the method to other models. Therefore, we think that the small loss of power is worth the highly increased flexibility of our new approach.
We demonstrated in our simulation study that the test proposed in this paper controls its level and that the estimated LEC values are reasonably close to the underlying true values. In the real-data example, we could particularly demonstrate the need of a flexible method by presenting gene expression values of numerous genes showing very different concentration-response relationships. Finally, we could identify a large number of genes where a 4pLL model/sigEmax model is not feasible based on the AIC and, moreover, not able to detect an alert concentration.
By fitting the concentration-response curve to the data the method proposed in this paper relies on a parametric assumption and, in particular, on the chosen functional form. While we assumed the parametric form of the concentration-response curve to be known in this paper, we will consider the performance of our method, if the step of model selection and model averaging is included, in future. More precisely, an interesting aspect of further research will be the extension of our approach to model averaging estimators, as Schorning et al. (2016) showed that the usage of model averaging methods in the context of concentration-response curves results in more accurate inference than the usage of one single parametric model based on model selection.
A further assumption made here is the homogeneity of the response values per probe set. This follows the suggested modeling approach from Lin et al. (2012), who also make this assumption. Typically, pre-processing of gene expression data, for example, via the robust multi-array analysis algorithm (Irizarry et al., 2003), comprises a logtransformation of the response values, thus avoiding any dependency between the variability and the mean of the response values.
Another aspect not addressed in this paper is the influence of the design, that is, the set of considered concentrations, on the performance of the bootstrap test. In the context of equivalence testing  could improve the test's performance substantially by using designs that minimize the width of the corresponding confidence bands and we expect to achieve a similar improvement of the test proposed in this paper, if the design is chosen carefully.

A C K N O W L E D G M E N T S
This work has been supported by the Research Training Group "Biostatistical Methods for High-Dimensional Data in Toxicology" (RTG 2624, P7) funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation, Project Number 427806116). The authors would like to thank two referees and the Associate Editor for their constructive comments on an earlier version of this paper.

O P E N R E S E A R C H B A D G E S
This article has earned an Open Materials badge for making publicly available the components of the research methodology needed to reproduce the reported procedure and analysis. All materials are available at http://re3data. org/.

D ATA AVA I L A B I L I T Y S TAT E M E N T
The data sets and the code used for the simulation studies are available at https://github.com/kathrinmoellenhoff/ alert_concentrations.