Model selection characteristics when using MCP‐Mod for dose–response gene expression data

We extend the scope of application for MCP‐Mod (Multiple Comparison Procedure and Modeling) to in vitro gene expression data and assess its characteristics regarding model selection for concentration gene expression curves. Precisely, we apply MCP‐Mod on single genes of a high‐dimensional gene expression data set, where human embryonic stem cells were exposed to eight concentration levels of the compound valproic acid (VPA). As candidate models we consider the sigmoid Emax$E_{\max }$ (four‐parameter log‐logistic), linear, quadratic, Emax$E_{\max }$ , exponential, and beta model. Through simulations we investigate the impact of omitting one or more models from the candidate model set to uncover possibly superfluous models and to evaluate the precision and recall rates of selected models. Each model is selected according to Akaike information criterion (AIC) for a considerable number of genes. For less noisy cases the popular sigmoid Emax$E_{\max }$ model is frequently selected. For more noisy data, often simpler models like the linear model are selected, but mostly without relevant performance advantage compared to the second best model. Also, the commonly used standard Emax$E_{\max }$ model has an unexpected low performance.

used for treating epilepsy but it is known to be embryo-toxic when taken in the first trimenon of pregnancy (Genton et al., 2006). The MCP-Mod framework can help to gain insights on concentration-response relationships between the concentration of VPA and gene activity.
Specifically, we are interested in two aspects of MCP-Mod when applied on concentration-response data: the detection of genes where VPA has an effect on the dose-response curve (power) and model selection. We investigate these properties in analyses on real and on simulated data. Further, the model performance or goodness-of-fit of selected models is evaluated to identify which models are suitable for gene expression dose-response data.
Model selection and model performance differ substantially in the underlying theory. In model selection a statistical model from a set of candidate models has to be selected, given a data set. The aim is to select the model that represents the true, unknown model function best (Chap. 1 of Claeskens & Hjort, 2008;Schorning et al., 2016)). In addition to selecting the best model among the candidates, we also aim at identifying necessary or dispensable models. Therefore, we use the goodness-of-fit measure 2 adj . We combine the three aspects power, model selection, and goodness-of-fit in a newly proposed score that summarizes the suitability of a model set. This approach is applied on the VPA data set and on simulated data.
In the context of clinical Phase II trials, model uncertainty for dose-response modeling is considered to increase precision in target dose estimation- Ting (2006), Wheeler and Bailer (2009), Bornkamp et al. (2011) among many others. In Phase II trials, decisions on the model set can be based on expert knowledge and concentrate on a single compound and dose-response relationship. For gene expression data, model selection must be considered for thousands of genes simultaneously and it is not straightforward to find or use prior knowledge on the dose-response profile of each gene. House et al. (2017) and Filer et al. (2016) propose experimental pipelines that include concentration-response modeling and model selection for toxicological gene expression data. They consider a flat model, the sigmoid max model with all four parameters or with the lower asymptote fixed to zero, and a gain-loss model that is similar to the beta model considered here. However, detailed investigations on the necessity of model selection and on appropriateness of candidate model sets for gene expression concentration-response data are lacking, which motivates our work. This paper is structured as follows. The VPA data set is introduced in Section 2. The statistical methodology including MCP-Mod and both established performance measures and a newly proposed one are presented in Section 3. Our analysis procedures and results that are based on the VPA data set are presented in Section 4. Different controlled simulation setups and corresponding results follow in Section 5. Final conclusions are summarized in Section 6. Source code to reproduce the results is available as Supporting Information on the journals web page (http://onlinelibrary.wiley.com/doi/xxx/suppinfo).

GENE EXPRESSION DATA SET
The data set was first presented in the study of Krug et al. (2013), where VPA is applied, among others, to human embryonic stem cells (hESC). VPA is widely used to treat different forms of epilepsy. However, it is linked to an increased incidence in congenital abnormalities (Cotariu & Zaidman, 1991). Krug et al. (2013) state that identifying changes in the transcriptome induced by toxic substances illustrates interesting mechanistic insights. Gene expression levels of the hESCs are measured repeatedly for different concentrations, using the GeneChip R Human Genome U133 Plus 2.0. The data are preprocessed with the Robust Multi-Chip Average algorithm by Irizarry et al. (2003), such that the expression data are on the logarithmic scale with base 2.
The data set contains G = 54,675 probe sets, which will be referred to as genes in the following, for simplicity. For every gene, expression values corresponding to the concentrations 1 = 0, 2 = 25, 3 = 150, 4 = 350, 5 = 450, 6 = 550, 7 = 800, and 8 = 1000 M VPA are available. For the control level 1 , 1 = 6 replicates were measured. For all other concentrations there are 2 = ⋯ = 8 = 3 replicates. There are = 27 measurements per gene. The replicates are biological replicates since different cells were used for each experiment. Due to functional relationships between genes, we cannot assume independence between the measurements from different genes. Further, six or three replicates per concentration is small for statistical modeling approaches. These problems are addressed in Section 4.

MCP-MOD METHODOLOGY AND PERFORMANCE MEASURES
In this section, the methodology is presented. First, the MCP-Mod approach is outlined. Then, the performance measures precision and recall for evaluating the model selection in MCP-Mod are explained. Additionally, the newly proposed measure  is presented.

MCP-Mod
The MCP-Mod approach was originally developed by Bretz et al. (2005) to model dose-response relationships in Phase II clinical trials under model uncertainty. For details see also Xun and Bretz (2017) and Bornkamp et al. (2009). The MCP-Mod methodology comprises two analysis steps. First, in the MCP-step, a statistically significant signal in a gene is determined by an optimal-contrast test for a prespecified set of candidate models. If such a signal is found for at least one model, a significant result of the multiple comparison procedure (signifMCP) is present for the gene. This means that an effect of VPA on the gene activity is established. The second step, Mod, refers to the modeling. From the set of models, for which a signifMCP has been established, one model fit is chosen and used as final fit for the data. Alternatively, model averaging can be performed.
Denote 1 as placebo concentration and 2 < … < as active concentrations with replicates. For concentration = 1, … , and = 1, … , , = 1 + ⋯ + , the (preprocessed) expression values are modeled as with homogeneous variance 2 > 0. The mean response E( ) = = ( , ) at concentration is assumed to follow a concentration-response model with parameter vector and as independent errors. For the MCP step, a set  of candidate models needs to be prespecified. Models commonly used for dose-response relationships are summarized in Table 1.
All models summarized in the first column of Table 1 can be reformulated as (see second column of Table 1), where 0 ( , 0 ) is the standardized version of a model. Introduction of the standardized model shape concept is crucial for choosing optimal contrasts in the MCP step, as their choice is scale invariant. It remains to determine initial guesses for the parameter 0 . In practice, for a Phase II study, careful considerations and prior knowledge on expected percentages of maximal effects at certain doses are translated into guesstimates for 0 .
Here, the large number of genes makes individual, gene-dependent decisions on 0 difficult. We therefore use the same guesstimates for all genes. Figure 1 displays the (rescaled) model shapes 0 ( , 0 ) used for the analysis. The guesstimates are listed in Table 1.
To the best of our knowledge there is little preliminary work on dose-response model selection in the context of gene expression data (Filer et al., 2016;House et al., 2017). In toxicology, often monotone dose-response relationships are assumed. Especially the max model, a special case of the sigmoid max model with ℎ = 1, was found to be appropriate for the majority of dose-response relationships in a large meta-analysis of clinical dose-response studies (Thomas et al., 2014). The inclusion of these two monotonic models in the candidate model set is therefore obligatory. The linear model is added as a reference or baseline model. For genes where the true underlying model might be a sigmoid max model, but at the maximal considered dose, the turning point has not yet been reached, the exponential model might be more suitable. The quadratic and the beta model are included as nonmonotone shapes. They are similar to the gain-loss model used by Filer et al. (2016). There might be a nonmonotone relationship between concentration and gene activity, for example, for metabolic genes. Such genes might be activated at lower VPA concentrations but successively deactivated at increasing, highly toxic concentrations. The specific guesstimates for 0 for each model are chosen such that a wide range of dose-response shapes is covered.
Further, we consider that during the experimental design stage, the concentrations were chosen with the expectation that the dose with the half maximal effect ( 50 ) is close to 450 and a plateau is reached at concentration 1000, which translates into the second guess that 95% of the maximal effect is reached at concentration 800. With these two assumptions ( 50 ≈ 450 and 95 ≈ 800), the guesstimate 0 for the sigmoid max model can be calculated analytically. For the max model, a guess of an 50 of 300 is used. And for the exponential model, an 50 of 700 is assumed. Each candidate shape, = 1, … , , defines a respective mean response vector = ( 1 , … , ). For the MCP-step, a contrast -test as first described by Abelson et al. (1963) is calculated. The test is constructed based on the linear contrast ⊤ where = ( 1 , … , ) ⊤ is chosen to maximize the power of the test for the assumed mean response (Bornkamp et al., 2009). This yields the hypotheses H 0 ∶ ⊤ = 0 and H 1 ∶ ⊤ ≠ 0. The test statistics for the contrasts are given by wherēis the observed mean at dose and 2 = ∑ is the equicoordinate -quantile of the null distribution. This approach leads to multiple testing adjustment for {H 0 , H 1 } with a strict control of the family wise error rate at level . The models with > 1− form the set  * = { 1 , … , } of significant models with established signifMCP. The modeling step is only executed if  * ≠ ∅, that is, a dose-response signal is established for at least one model.
During the Mod-step, either one fitted model of those that passed the MCP-step can be chosen for a final fit or all fitted models that passed the MCP-step can be averaged. If a single model is selected, criteria as the Akaike information criterion (AIC) or the Bayesian information criterion (BIC) as well as the largest test statistic (maxT) can be used to pick a model. For model averaging, standardized weights based on the AIC or BIC can be calculated for the models in , and the final model is the resulting weighted average of each of the fitted models.
Calculations are done with the DoseFinding R package, version 0.9-17, and the statistical software R, version 4.0.2 (R Core Team, 2020). For the numerical estimation of the nonlinear parameters, we use the default boundary setting of the DoseFinding package. As the maximum concentration is 1000, this leads to boundaries for the 50 parameter of [1, 1500] and [1/2, 10] for the ℎ parameter of the sigmoid max model.

Measures
In this section, we briefly present for our context the definitions of the evaluation measures precision, recall, and 2 adj . Further, a new measure   is proposed specifically for the context of using MCP-Mod with a fixed candidate set  on many dose-response data sets.
For a set of genes and a specific model ∈  * = { 1 , … , }, the precision is defined as the conditional probability that a model is correct, given that it has been selected. Accordingly, the recall is defined as the conditional probability that a model is selected, given that it is correct (Buckland & Gey, 1994). Formally, we denote where , , and are the number of true positives, false positives, and false negatives. Precision and recall values are in the interval [0,1], and a larger value corresponds to a better performance. They can only be evaluated in simulations where the correct model is known.
For a model fit (⋅,̂) and data of a specific gene, = 1, … , , = 1, … , , we use 2 adj defined as is the sum of squared errors and total sum of squares, respectively. The number of parameters is and the total number of measurements is = ∑ =1 . We further propose a new measure, the suitability of model set score   . It can be used in a descriptive manner when MCP-Mod is applied to dose-response or concentration-response data of many genes. The score balances two desired properties. First, the number of detected signals (signifMCPs) is desired to be large. Additionally, the detected signals are also desired to be clear, that is, to have a large 2 adj value. The score balances the number of detected signals and the model performance, that is, power and goodness-of-fit. It is defined as where denotes the total number of genes and  the considered set of candidate models. For a given set , the score summarizes the proportion of genes with significant MCP after adjustment, weighted by the goodness-of-fit of the respective genes. Adjustment means that the false discovery rate (FDR) is controlled using the Benjamini-Hochberg (BH) procedure (Benjamini & Hochberg, 1995).
In the context of MCP-Mod, this means that for each gene, the smallest -value from the MCP tests of all candidate models is chosen. Consequently, each gene is represented by a single -value. These -values are adjusted with the BH procedure. If a BH-adjusted -value is below 0.05 then this results in a multiplicity-adjusted significant concentrationresponse signal. As performance measure, the value of 2 adj of the winner model w.r.t. AIC for the corresponding gene is used. In general, when comparing two values   1 and   2 , the larger value indicates a favorable choice of the candidate set, since both the number of detected signals and the model performance are taken into account. For improved clarity, in addition both the proportion of genes with detected signal and the mean 2 adj of the fit of the winner models corresponding to those genes will be reported.

DATA-BASED ANALYSIS
In this section, setups and results of the data-based analyses are presented. In the following, they are referred to as Analysis I and Analysis II. In Analysis I, MCP-Mod is applied on the real VPA data set and an overview on model selection results These include that the entire analysis of Analysis I is repeated several times, and each time one of the candidate models of  is omitted. For an overview of the different analyses and the simulation, see Table 2.

Setup for Analysis I
In Analysis I, MCP-Mod is applied independently on each gene of the VPA data set. As we cannot assume only increasing or decreasing effects, each gene is tested with two-sided contrast tests with significance level = 0.05. We apply multiplicity adjustment between genes by controlling the FDR using the BH procedure as described in Section 3.2. For each gene, if a dose-response signal is detected and hence at least one model passes the MCP-step, the AIC is used as model selection criterion. For small sample sizes, Schorning et al. (2016) show that the AIC outperforms the BIC, especially if the true underlying model is a complex one among the considered models. There are = 27 observations per gene in the VPA data set. Thus we use the AIC to avoid too low selection counts of possibly correct, more complex models. In our analysis we will see that even with the AIC the simple linear model is often selected.

4.2
Results for Analysis I Of the 54,675 genes, when controlling the FDR, 20,193 (36.9%) genes pass the MCP-step, that is, their concentrationresponse profile significantly differs from a flat profile. VPA has a significant effect on the activation (deactivation) of these genes. For each gene one winner model is selected by AIC as a final fit (Figure 2). The linear model is selected most often (34.4%), because the AIC penalizes more complex models. However, Figure 3 clearly shows that the linear model performs comparatively poorly with respect to the 2 adj . The popular max model (cf. Thomas et al., 2014, among many others) wins the fewest times and when it does win, its fit has low 2 adj values (Figures 2, 3). Figure 4 shows the distribution of model winner counts w.r.t. AIC stratified by 2 adj . Less noisy genes are represented by the rightmost plot. Assuming (1), we refer to more (less) noisy genes as genes whose underlying model has larger (smaller) standard deviations in relation to the response range, which leads to smaller (larger) 2 adj values. While the sigmoid max and the beta model win most often for the least noisy stratum, the max model is chosen rarely, regardless of the strata. The nonmonotone beta and quadratic model are chosen considerably often. For more noisy genes the linear model is preferred. For these genes, none of the models explains a lot of variance, which favors the linear model in terms of AIC. Hence, if the linear model is selected by AIC, one should hesitate to assume a true linear concentration-response relationship. Some example fits are visualized in Figure 5.
To ensure that the low number of max winners is not only due to too strict parameter constraints in the numerical optimization, we visualize the 50 parameter estimates for genes where the max model won (Appendix, Figure A1). There is no evidence that the poor performance of the max model is due to optimization constraints, but instead due to the often low 50 estimates. For an max model, a low 50 translates to an early plateau, which can lead to an close to the and therefore to a small 2 adj . It is also of interest if any of the models in the candidate set is redundant such that it can be substituted by another model. Removing such a model from the candidate set would increase power as the number of hypotheses would be decreased in the MCP-step of each gene. If for many genes the 2 adj for the winner model and the second best model differ substantially, the winning model should be considered for future analyses. If the quadratic model is the winning model with a good fit, many genes cannot be modeled well by the second best model (Figure 6).
The sigmoid max and the beta model performances also differ by a considerable amount to the second best model's performance across the whole range of explained variance. The max and the exponential model can mostly be replaced by other models without substantial loss in 2 adj . This especially applies to genes with larger explained variance. The linear model can always be replaced with minimal loss in explained variance, as it is a special case of the max model and the quadratic model.

Setup for Analysis II
Analysis II offers further insights into possibly expendable models in the candidate set. The analysis setup is similar to the one of Analysis I. Analysis I is redone several times, but each time one model from the candidate model set is omitted. We refer to these as LOMO analyses.

Results for Analysis II
The number of FDR adjusted significant concentration-response relationships is similar to the main analysis where no model is left out (Table 3). This finding is consistent with the results of Pinheiro et al. (2006). If the quadratic model is omitted from the candidate model set, the total number of signifMCPs increases at the cost of reduced mean 2 adj for the remaining genes. This is due to the different, rarely appropriate shape of the quadratic model compared to all other models. Measured by the   score, it is proposed to drop the max model from the candidate model set (indicated in bold). The  sigmoid max model, which contains the max model as a special case, decreases the score the most when removed from the candidate set. We are further interested by which model an originally selected model after its omission is typically substituted in the modeling step (Figure 7). The beta model is selected more often, if the sigmoid max model is removed and vice versa. If the often selected linear model is omitted, the exponential model is most often replacing it.
Two additional evaluations regarding the validity of the   score were conducted. First, a copy of the VPA data set was generated and all 3067 genes where the exponential model won by AIC were removed and the LOMO analyses were repeated. As expected, in this artificial scenario the   score suggests to drop the exponential model (Table 4, second column from the left).
Second, Analysis I was repeated but with a single model as candidate model (Table 4). When a single candidate model is used, the   score is always smaller than when only one or no model is omitted from the candidate model set and the original VPA data are used. The lowest score of 0.0825 is obtained if the quadratic model is the only candidate model. This is because the quadratic model shape passes the MCP-step for only 12.27% of the genes. When using only one candidate model, the sigmoid max model has the largest score of 0.2036.
The absolute differences in   scores might appear small but must not be misinterpreted as irrelevant. For the artificial scenario where genes with the exponential model as winner model are removed, omitting the exponential model from the candidate model set is considered reasonable by construction. Therefore, the corresponding difference in the   score of TA B L E 4   score, rate of significant genes, and mean 2 adj among significant genes for two added analyses: The LOMO analyses on the modified VPA data set where genes with exponential winner model are removed and Analysis I repeated but with only one model in the candidate model set. Largest   score indicated in bold LOMO analyses on VPA data set without exponential winner genes 0.0024 can be interpreted as relevant. Only using the sigmoid max model compared to using the full candidate model set differs by 0.0098, which can hence be viewed as a relevant difference such that it would not suffice to use a single model. The interpretation of the   score is not straightforward, which is discussed in Section 6.

SIMULATION-BASED ANALYSIS
The simulation gives insights on the effect of the number of replications per concentration level while standard deviation of the noise is fixed (Table 2). Opposed to the data-based analysis, the correct model is known such that precision, recall, and goodness-of-fit can be evaluated.

Setup
Concentration-response data sets are generated for 10,000 genes for each of the six considered models and for the null case, as well as for three different numbers of replicates and a fixed standard deviation to range ratio (see Table 2). Details on how the range and standard deviation are chosen are explained in the following. The null case means that a constant model is used to generate the data. In order to have a realistic data generation process, it is based on the real VPA data set. For each considered , a data set structurally similar to the VPA data set but with 70,000 genes, 10,000 for each of the six nonflat models, and 10,000 for the flat null model, is generated as follows. Consider a model = ( , ) ∈  where | | = 7 and for the null case, = ( , ) = ( , ) = > 0. The assumed ratio of standard deviation to range denoted by (0.5) is explained below. Define  * ( ) as the set of all genes for which TA B L E 5 Summary of the simulation results, stratified by correct model and chosen model, respectively. Signif. genes (%) is the rate of FDR adjusted, detected dose-response signals among the 10,000 generated dose-response data for each model, with replicates at each dose level. For the recall rate, the model is the correct model. For the precision rate, the model is the selected model the estimated variance for each gene. Hence, for = 0.5, we obtain (0.5) = 0.3222, which is used for all nonflat models and all genes to calculate ( , ( ) 0.5) (Appendix, Figure A2). If ( ) is the null case, then range( ( ) ) = 0. In this case, we use ( ( ) , ) = null ( ), which is the empirical -quantile of calculated for nonsignificant genes of Analysis I. We obtain null (0.5) = 0.1909 (Appendix, Figure A3). Using an adapted standard deviation per gene and model is preferred over using a fixed standard deviation, because it allows for comparability between different models and ranges with respect to goodness-of-fit (Kappenberg et al., 2021). Finally, the generated data set is analyzed as the original VPA data set in Analysis I. Table 5 summarizes the results of the simulation w.r.t. signal detection (power), recall, and precision. For = 3, which mimics the conditions in the real VPA data set, a signal is almost always detected if it is present. However, the recall and precision rates for nonlinear and nonflat models for this scenario are below 0.69. If = 10, the rates of these model selection errors are still large, even though the sample size = 8 ⋅ = 80 is rather large in the context of toxicology. For example, if the sigmoid max model is correct, for 31.1% of the generated dose-response data another model is incorrectly selected. Due to the penalty term of the AIC used in model selection, complex correct models as the sigmoid max or the beta model have a comparatively large increase in recall rates when is increased. For comparatively noisy scenarios, these models are rarely selected. The opposite holds for the least complex model, the linear model. It has a comparatively very low precision rate (41.6%) and very high recall rate (70.5%) for = 3, but not for = 10. Precision values naturally have more practical value, as they give insight on how confident one can be with the model selection. The precision rate increases from 92.5% to 99.9% for the flat model if increases from 3 to 5. For nonflat models, the precision rate does not exceed 82.2% at any . In practice, often one is not mainly interested in selecting the true underlying model but to have a sufficiently good fit. Figure 8 summarizes the relative loss in model fit by considering the log-ratio in root-mean-square error (RMSE) between the winner and the true model, that is, between the actually selected model and the fitted model if the correct In general, the relative loss in RMSE decreases with increasing but is still present for = 10, although = 8 ⋅ = 80 is already a large sample size in toxicology. For = 3 the median of the log-ratio is 0.0000 for all winner models, but 0.0543 for the linear model. This demonstrates the low precision of the linear model for small . For small = 3, the penalty of the AIC is comparatively large, yielding too many selections of the simpler linear model in cases where a more complex model might be required. If a more complex model such as the beta or sigmoid max model is selected, the ratio's upper quartile are largest with 2 0.464 = 1.480 and 2 0.594 = 1.509, respectively. Hence, for 25% of the generated genes where the sigmoid max model is selected, the selection is not correct and the RMSE is at least 50.9% larger than the RMSE of the correct model. For the max and for the exponential model, the upper quartiles of the ratio are closest to zero for each . With increasing , the penalty term of the AIC becomes comparatively weak. For = 10, this heavily affects the linear model. It is selected less often and has log-ratios closely concentrated around zero. For the beta, quadratic, and sigmoid max model, the upper quartile of the log-ratios remain comparatively far from 0. If the beta model is selected, for 25% of the generated genes the selection is incorrect and the RMSE is at least 2 0.235 = 1.177 times the RMSE if the correct model was selected. Thus, not selecting the correct model results in a noteworthy relative loss in goodness-of-fit, even when larger sample sizes are used in toxicology.

CONCLUSION
In this work, MCP-Mod was used as model selection approach for gene expression concentration-response data. For the data set at hand, human embryonic stem cells were exposed to varying concentrations of VPA. For 54,675 probe sets or genes the expression is measured. The data set indicated that modeling gene expression concentration-response data requires the consideration of several models, that is, a candidate model set. Only considering the popular max or sigmoid max model might not be sufficient. Especially nonmonotone models like the quadratic model should also be taken into account. When using MCP-Mod, frequent selections of a linear model should not be misinterpreted as evidence for a true, linear concentration-response relationship. A large noise-to-signal ratio, or, more precisely, a large standard deviation to true response range ratio, favors the selection of the linear model. Also, there is typically no notable loss in goodness-of-fit, when instead of the linear model the second best model is used.
Using a newly proposed score,   , it was observed that the max model can be omitted from the candidate set despite its popularity, as long as the more general sigmoid max model is included in the candidate set. Further, the score discourages to omit the linear model, even though it can be easily substituted with respect to goodness-of-fit. Including the linear model in the candidate set aims to detect unclear concentration-response signals rather than modeling detected signals well. If the linear model is omitted, one might fail to identify potentially interesting genes. Simulation studies based on the data set indicate that the confidence in the correctness of the selected model, measured by the precision, is not very high.
Even when increasing the sample size per concentration from 3 to 10, which is very large for this type of toxicological experiments, the precision of nonflat models does not exceed 0.83. Thus, increasing the number of experiments does not increase the precision in model selection proportionally. The relative loss in goodness-of-fit due to model selection mistakes decreases with increasing sample size, but remains notable even for 10 replicates per concentration.
The newly proposed   score served as a help to summarize analysis and simulation results. For a given candidate set, it considers the power, that is, number of detected genes, and the goodness-of-fit of genes with a detected signal simultaneously. Despite its simple form, its interpretability is not straightforward, which allows for improvements. If one does not want to consider both power and goodness-of-fit at the same time but, for example, focuses on optimizing power, the score is not an adequate tool.
The data basis of this work is a single data set, which, despite its size and quality, is an obvious limitation. Similar analyses on other gene expression concentration-response data would be valuable to confirm our results. Another promising approach, which is not considered in this work, is model averaging. It would be interesting to analyze the influence of the different approaches and parameters on target dose estimation.

S U P P O R T I N G I N F O R M AT I O N
Additional supporting information may be found in the online version of the article at the publisher's website.