A comparative study of in vitro dose–response estimation under extreme observations

Quantifying drug potency, which requires an accurate estimation of dose–response relationship, is essential for drug development in biomedical research and life sciences. However, the standard estimation procedure of the median–effect equation to describe the dose–response curve is vulnerable to extreme observations in common experimental data. To facilitate appropriate statistical inference, many powerful estimation tools have been developed in R, including various dose–response packages based on the nonlinear least squares method with different optimization strategies. Recently, beta regression‐based methods have also been introduced in estimation of the median–effect equation. In theory, they can overcome nonnormality, heteroscedasticity, and asymmetry and accommodate flexible robust frameworks and coefficients penalization. To identify a reliable estimation method(s) to estimate dose–response curves even with extreme observations, we conducted a comparative study to review 14 different tools in R and examine their robustness and efficiency via Monte Carlo simulation under a list of comprehensive scenarios. The simulation results demonstrate that penalized beta regression using the mgcv package outperforms other methods in terms of stable, accurate estimation, and reliable uncertainty quantification.

The median-effect equation, proposed by Chou and Talalay (1984), is one of the most impactful and widely used approaches to modeling the dose-response relationship.It is a fundamental method that generalizes different enzyme kinetic relations and provides a uniform framework with solid physicochemical and biochemical validity for dose response in a percentage of cell viability or inhibition bounded in the (0,100%) interval (Chou, 2006).In practice, it is common to observe extreme responses of (un)affected cell fractions that are close to 0% or 100% because of the unknown exact doseresponse curve in experimental design and data collection.In the simulation study, we consider extreme responses as cell fractions of more than 95% or less than 5%.The standard approach to estimate the median-effect equation, which is based on the linear regression with a logit transformation, easily suffers from these extreme observations with a tiny measurement error and yields sensitive and biased estimation (Gadagkar & Call, 2015;Roell et al., 2017).Such variation in analysis poses great challenges for lab scientists and preclinical researchers to present and interpret the experimental results.Additionally, the modeling strategy of deleting the extreme observations will lead to a loss of information associated with fewer data points for estimation and may not be feasible in many cases, especially when they are biologically related to the potency estimation (Solzin et al., 2020).Therefore, an accurate and reliable description of the dose-response curve, as well as robust and efficient modeling strategies with the valid inference of uncertainty quantification, are critical, which motivated this comparative study.
Various methods and tools have been developed in the literature for estimating the median-effect equation and making appropriate statistical inferences.For example, R packages such as drc (Ritz et al., 2015), drda (Malyutina et al., 2021), and MCPMod (Bornkamp et al., 2009(Bornkamp et al., , 2017) ) are developed to facilitate the estimation via the nonlinear parametric function.The fitting of the nonlinear curve is often achieved by the least squares method to minimize the difference between the fitted curve and the observations.However, the development of the least squares method does not pay special attention to extreme cases, especially when the sample sizes are small.Recently, the beta regression model was introduced and incorporated into different statistical methods to yield robust estimation (Ghosh, 2019;Mayr et al., 2018;Niekerk et al., 2019;Ribeiro & Ferrari, 2020;Rigby & Stasinopoulos, 2005;Wood, 2017;Zhou et al., 2022).The construction of the beta regression itself is flexible enough to take on various shapes to account for nonnormality, asymmetry, and heteroskedasticity.Those methods based on the beta regression are established either in robust frameworks or with LASSO/ridge-type penalization (Hoerl & Kennard, 1970;Tibshirani, 1996) to compensate for the lack of robustness induced by outliers in standard beta regression.However, meaningful comparisons of these competing approaches, especially given different levels of extreme observations, are rarely available in the literature, resulting in the use of suboptimal methods in practice.
In this work, we conduct a comparative study through Monte Carlo simulation on 14 different tools for dose-response estimation, including methods based on nonlinear modeling and methods based on beta regression.We attempt to review the difference between various estimation methods and empirically evaluate the computational tools via simulation in different situations to identify a robust and reliable method(s) in R to estimate dose-response curves with extreme observations.Our simulation study not only comprehensively exploits varied methods in extensive settings of extreme responses that are common in experimental studies but also pays special attention to uncertainty quantification of various methods, which is often ignored in previous simulation studies but essential for statistical inference and hypothesis testing to compare drug potency and determine the synergism in drug combinations.
The rest of the article is organized as follows: Section 2 gives a brief overview of the median-effect equation.Section 3 reviews various dose-response R packages and methods based on the beta regression.The simulation study and results are described in Sections 4 and 5.The paper concludes with a discussion in Section 6.

MEDIAN EFFECT EQUATION
The median-effect equation is a unified method based on the mass-action law principle to describe a wide range of complicated enzyme kinetic processes (Greco et al., 1995).The median-effect equation is termed as where   and   are the cell fractions of the system affected and unaffected by the dose level .  is the dose level corresponding to the median effect, for example,  50 ,  50 , or  50 values., also called the Hill coefficient, is the sigmoidicity of the median-effect curve.
Given the fact that   +   = 1, the equation can be simplified to (2) The dose-response curve becomes steeper when the magnitude of  increases.When  is negative/positive, the estimated drug effects decrease/increase as the drug concentration increases.Since the identity strategy would not affect the model, we use the effect on cell fraction  to represent   and   .Then, the equation can be rewritten as for statistical modeling, where  0 and  1 are the intercept and slope that determine the median-effect equation in a doseresponse relationship.To model the median-effect equation, we treat the predetermined logarithm of dose, log , as the independent variable, and the dose-response variable  (with its expectation of the effect ) as the dependent variable.Each data point entails the observed response  =  at a certain dose level .
The median-effect equation is equivalent to the two-parameter log-logistic (2pLL) model, which is a special case of the four-parameter log-logistic (4pLL) model, the sigmoidal E max model and the Hill equation (Greco et al., 1995).The 4pLL model is formulated as where , , , and  are four unknown parameters.When we consider () as the cell response  and set  =   ,  −  =   ,  = , and  =   , the 4pLL model can be reparameterized to the sigmoidal E max model: where   is the minimal effect of the associated drug, which could be the basal effect when  = 0;   is the difference between the maximal and minimal effects; the maximal effect could be obtained from general knowledge of drug or from experiments at a high-dose concentration.Hence, both   and   often can be estimated with high precision.A common practice is to first rescale the data with a linear transformation to the range of (0,1), when the dose-response relationship in the sigmoidal E max model is reduced to the median-effect equation.

THE ESTIMATION METHODS AND TOOLS IN R
Estimation of the median effect equation can be carried out in R using either manual programming or devoted packages.
In this section, we will give a brief review of the available packages and estimation methods.
For observed dose-response outcome  =  with its expectation of the effect , the parameter estimation is conducted by minimizing the squared error loss  = ∑  (  − (  )) 2 or a loss function specified under packages below.
drc: The drc package (Ritz et al., 2015) is one of the most popular packages developed to provide nonlinear model fitting for analysis of dose-response data.It contains several frequently used model functions such as the generalized log-logistic model, the log-normal model, and the Weibull model to describe the dose-response curves.The parameter estimation is based on the nonlinear least squares approach with the maximum likelihood principle under the normality assumption.The nonlinear least square estimates are obtained through the minimization of the weighted sum of squared errors, which is achieved by the optim() (R Core Team, 2013) function in R, implementing the quasi-Newton method for optimization.To facilitate the estimating procedure, users can specify the starting values from previous experimental results or obtain information from the data itself, for example, for the log-logistic model, starting parameter values are calculated from a simple linear regression model.
grofit: The R package grofit (Kahm & Kschischo, 2014) provides two strategies, model-based fits and model-free spline fits, to fit the growth curve in dose-response experiments.The model-based fits include logistic growth, Gompertz growth, modified Gompertz growth, and Richards growth, characterized by at least three parameters: the length of the lag phase, the growth rate, and the maximum cell growth.Similar to drc, grofit applies the nonlinear least squares method for parameter estimation and the optimization is achieved with the nls() function (Bates & Chambers, 1992;Bates & Watts, 1988) and the Gauss-Newton algorithm.The initial values are estimated with the locally weighted regression fit lowess (Cleveland, 1979).Because the built-in models in grofit cannot be directly applied in the restricted two-parameter model, the simulation study for grofit was assessed through the model fitting using the nls() function equivalently.
MCPMod: By incorporating a systematic model testing and selection procedure, MCPMod (Bornkamp et al., 2009(Bornkamp et al., , 2017) estimates a nonlinear functional relationship in the analysis of dose-response studies (Bretz et al., 2005).Their model assumes the response follows a normal distribution with equal variance.The least squares method is used to obtain the parameter estimates, achieved by the iterative optimization using the nls() function (R Core Team, 2013) with the Golub-Pereyra algorithm (Golub & Pereyra, 2003).The initial values are generated with a data-based method in case they are not specified by users.Since the median-effect equation is not included in MCPMod, it is substituted with the nls() function with the Golub-Pereyra algorithm in the comparative simulation study.
DoseFinding: The R package DoseFinding (Bornkamp et al., 2021;Pinheiro et al., 2014) is an extension of MCPMod (Bornkamp et al., 2009(Bornkamp et al., , 2017)).It implements a two-stage, generalized least squares approach to fit nonlinear dose-response models.Under the same parametric model as MCPMod, DoseFinding first uses the analysis of variance parametrization to estimate the standardized dose-response function in stage I.Then, the nonlinear dose-response model is fit by minimizing the generalized least squares (GLS) criterion (Pinheiro et al., 2014).The optimization is obtained with the nlminb()function (R Core Team, 2013), using the quasi-Newton algorithm.The initial values are selected through grid searching.The median-effect equation is not available in DoseFinding, so we consider including the nlminb()function.The nlminb()function with the quasi-Newton algorithm is similar to the L-BFGS-B option in optim(), which is built inside drc.Therefore, for simplicity, we do not include DoseFinding in the comparison.
nlstools: The R package nlstools (Baty et al., 2015) works as a supplementary toolbox for nls() (R Core Team, 2013).Based on the dose-response model fit by the nls() function, it provides additional functionality to assist the selection of initial values by comparing observed and predicted responses in plots, as well as additional model diagnostics to detect suboptimal estimates.Moreover, to free the normality assumption in nonlinear model fitting, it utilizes nonparametric resampling methods, Jackknife (nlstools_jknife), and bootstrap (nlstools_bstrap), for confidence interval estimations.
drda: To mitigate optimization issues such as local optimum convergence, drda (Malyutina et al., 2021) implements the Newton method with a trust region (Steihaug, 1983).Targeting at the five-parameter generalized logistic function, as known as Richards' curve (Richards, 1959), drda follows the standard Newton's method and accepts the new estimate when it is within the trust region at each iteration.Otherwise, the step is replaced by a linear combination of Newton's method step and the steepest descent step.The initial point for the optimization is established by a smart initialization procedure to increase the chances of converging to the global solution.To improve the estimation accuracy, it also uses analytical gradient and Hessian formulas rather than a numerical approximation.
nplr: Focusing only on the generalized logistic function, the nplr package (Commo & Bot, 2016) provides flexible options for weighted least squares regression.It optimizes all the parameters together with a Newton-Raphson method, implemented by the nlm() function (R Core Team, 2013).The objective function is the weighted sum of squared errors with three weighting strategies: residual weights, standard weights (Giraldo et al., 2002), and general weights (Motulsky & Brown, 2006).Because nplr does not provide the option of the 2pLL model, it was replaced by the nlm() function to assess the median-effect equation in the simulation study.

Beta regression and its variation
Apart from dose-response R packages, we are also interested in methods based on the beta regression, which was reported to provide robustness and efficiency against asymmetry, heteroscedasticity, and nonnormality in dose-response data (Zhou et al., 2022).However, given the recent rapid development of robust estimation approaches in the beta regression framework, most simulation studies were conducted only in a moderate sample (Ghosh, 2019;Niekerk et al., 2019;Ribeiro & Ferrari, 2020).It is unclear how these methods perform in a small sample size, especially with extreme values such as commonly seen in a dose-response estimation, compared with the nonlinear regression methods.Thus, we included the following beta regression related methods, mostly developed in recent years, in the comparative study.
Beta regression: The beta regression was proposed for the situation where the variable of interest is continuous and within the open interval (0,1) (Ferrari & Cribari-Neto, 2004).The beta regression can take on a variety of shapes to account for nonnormality and skewness (Johnson et al., 1994).Kieschnick and McCullough (2003) compared the performance of a beta regression model for proportions with several alternatives and conclude that it is often the best option.The classical beta regression framework uses a mean ()-precision () parameterization of the beta law (Ferrari & Cribari-Neto, 2004) and the density function is written as where Γ() denotes the gamma function, 0 <  < 1 and  > 0. The regression parameters are easy to interpret in terms of the mean  and precision  of the response variable after proper parameterization.To enhance Markov chain Monte Carlo simulation, the mean of the response variable  can be conveniently transformed with a linear structure of covariates by a link function g() that maps (0,1) to R. The precision parameter  may be assumed constant over observations or modeled as a regression structure of () transformed by the log function or the square-root function, etc. (Smithson & Verkuilen, 2006).In our setting, we assume the dose-response variable  follows the beta distribution.Model parameters are estimated through maximum likelihood estimation, implemented by the betareg() function (Cribari-Neto & Zeileis, 2010).The optimization is achieved using the optim() function (R Core Team, 2013) with the Nelder-Mead algorithm (Nelder & Mead, 1965).

Penalized beta regression:
The standard maximum likelihood estimator for the beta regression is sensitive to extreme observations.LASSO/ridge-type of penalization (Hoerl & Kennard, 1970;Tibshirani, 1996) was developed and proved to be an ideal solution to overcome the instability (Fu, 1998;Gauraha, 2016;Khan et al., 2019;Meinshausen & Bühlmann, 2010).The implementation was conducted with two independent R packages: GAMLSS (Rigby & Stasinopoulos, 2005) and mgcv (Wood, 2017).They share the same idea of maximizing the penalized likelihood of the model but use different algorithms.
The package mgcv was developed to estimate penalized generalized linear models including generalized additive models and generalized additive mixed models.Either the L 1 LASSO penalty or the L 2 ridge penalty is added to the log-likelihood function with a tuning parameter to control the extent of penalization.The penalized likelihood is then maximized by the penalized iteratively reweighted least squares.
Different from mgcv, GAMLSS models each distribution parameter as a linear function of the explanatory variables.The parameterization of the beta distribution is slightly different from what is defined in betareg() (Cribari-Neto & Zeileis, 2010).Rather than using the mean-precision parameterization of the beta law, GAMLSS parameterizes the beta distribution with location parameter  and scale parameter .Thus, the formulation of the GAMLSS beta regression model is identical to the standard beta regression parameterization only if the scale parameter is a constant.The GAMLSS penalized likelihood (Stasinopoulos & Rigby, 2007) is maximized by the RS algorithm, which is a generalization of Rigby andStasinopoulos' (1996a, 1996b) technique for fitting mean and dispersion additive models.The RS algorithm is stable and efficient with simple starting values (e.g., constants) of  and , so we used the automatic initial values in the comparison.
Robust beta regression: To overcome the drawbacks brought by the standard beta regression against outliers, Ghosh (2019) proposed a robust minimum density power divergence estimator (MDPDE) in beta regression by minimizing the average density power divergence due to its high asymptotic efficiency along with strong robustness.Similar to MDPDE, Ribeiro and Ferrari (2020) developed the surrogate maximum likelihood estimator (SMLE) by maximizing a reparameterized L q -likelihood.The L q -likelihood density is constructed first with a Box-Cox transformation   (Box & Cox, 1964).Both robust methods use the generalized estimating equations for the beta regression parameter estimation.For the tuning parameter, based on simulations and real data analyses, MDPDE recommends the tuning parameter in the range of (0.3, 0.4).Ribeiro and Ferrari (2020) further improve the tuning parameter selection with a data-driven algorithm by conducting a grid search to find the optimal tuning parameter that achieves sufficient stability.
Additional tools of beta regression in R: The R package betaboost (Mayr et al., 2018) wraps packages mboost (Hofner et al., 2014) and gamboostLSS (Hofner et al., 2016) together to implement gradient boosting and boosting distributional regression on the beta regression.It optimizes the log-likelihood of the beta regression through component-wise gradient ascent, where it starts from a covariate-free model and iteratively adds explanatory variables to increase the log-likelihood.In each boosting iteration, one explanatory variable that performs best will be added to the model.betaboost utilizes cross-validation techniques to achieve the optimal boosting iteration numbers.Within the component-wise gradient ascent, betaboost replaces the traditional Fisher scoring algorithm with a gradient descent algorithm for the maximum likelihood estimation.

SIMULATION DESIGN
In this section, we specify the design settings for various simulation scenarios to mimic real-world dose-response relationships and performance measurements of point estimation and uncertainty quantification on both effect estimations and coefficients to evaluate and compare the performances of 14 different dose-response estimation methods.

Data generation process
To compare the above-mentioned dose-response estimation methods, we designed a simulation study to evaluate their performances under different dose-response scenarios.To create an extreme situation, we generated a dataset containing 10 dose levels without replication.Responses were produced from three types of dose-response curves: default dose-response curve, steeper dose-response curve with more frequent extreme responses, and flatter doseresponse curve with less frequent extreme responses.The simulated responses follow a beta distribution with mean  = logit −1 ( 0 +  1 * log ), and either constant precision ( = 10 or 100) or varying precision dependent on logarithm of the dose levels ( = exp( 0 +  1 log )).
In our misspecification case, we simulated the response with simplex error terms (Zhang et al., 2016) with dispersion equals 1. Simplex distributions are a member of the generalized inverse Gaussian distribution (Barndorff-Nielsen & Jørgensen, 1991).With mean  ∈ (0, 1) and dispersion parameter  2 > 0, the simplex distribution for proportion data is defined as where the unit deviance function is Different from beta regression, the simplex distribution relates to a special type of dispersion model and shares many common analytic properties with the exponential dispersion models (Jørgensen, 1997).Since dose-response R packages assume a normal distribution for drug responses and beta regression related methods assume beta distribution, while the response in simplex error scenarios follows simplex distribution, the simplex error scenarios are considered as misspecification cases.
For extreme outcomes, the simulated data are considered in three types (Figure 1): (i) extreme responses in lower dose levels only, (ii) extreme responses in higher dose levels only, and (iii) extreme responses in both higher and lower dose levels.Heuristically, the flatter dose-response curve contains less than 50% extreme responses that are ≤5% or ≥95%; the default dose-response curve contains 50−70% extreme responses; the steeper dose-response curve contains more than 70% extreme responses.Therefore, to maintain the amounts of extreme responses in different scenarios, the 10 sets of dose levels and drug responses are simulated based on different gaps between log  and different dose-response curves (Supporting Information S.1.1).In total, we examined various methods in 30 scenarios (Supporting Information S.1.1).

Methods for comparison
The candidate dose-response R packages use the nonlinear least squares method with different optimization functions to obtain the estimates of parameters.The application of these packages on the median-effect equation is constrained by their model settings.For packages drc and drda, we transformed the median-effect equation to a two-parameter log-logistic model and directly used these two packages for estimation.However, other packages do not provide a nonlinear version of the median-effect equation.We then substituted them with their optimization functions.In detail, we used nls with the Gauss-Newton algorithm, the Golub-Pereyra (gb) algorithm, and the "nl2sol" algorithm from the Port library to represent the analysis of grofit and MCPMod.We used nlm to represent nplr.Therefore, the final candidate methods considered in the simulation study are drc, drda, nls with three algorithms, nlstools with jackknife and bootstrap algorithms, nlm, and beta regression related methods, including standard beta regression (brm), gamlss, mgcv, mdpde, smle, and betaboost.

Performance evaluation
The estimands we considered to evaluate for the simulation study include  0 ,  1 ,  5 ,  50 ,  95 . 5 ,  50 , and  95 measure the drug concentrations needed to inhibit the cell viability by 5%, 50%, and 95%, respectively.They are obtained by transforming Equation (3) with estimated  0 and  1 : where  equals 5, 50, and 95.% corresponds to the drug response .The variance of   is obtained by applying the multivariate delta method with point estimations and variance-covariance estimations of  0 and  1 .
We measured the feasibility, distribution of bias, squared error, 95% confidence interval (CI) length, and the coverage probability of the CI for each estimand.The feasibility measures the proportion of the total 10,000 simulation trials that each method can stably estimate the dose-response curve without giving error.When the coverage probabilities are similar, the estimation method with a narrower CI length suggests higher estimation efficiency.We ran all the results for 10,000 Monte Carlo replications, and the computations were carried out in the R v4.0.5 environment on the x86_64-conda-linux-gnu platform.

SIMULATION RESULTS
The simulation design considers different combinations of dose-response curves and extreme values.To summarize the results, we first outline the overall performance of each method, then compare the simulation results on the feasibility, coverage probability, bias, and squared error, and finally conclude with some practical considerations.

Overall performance
Among all candidate methods, betaboost, mgcv, drc, drda, and nlm are the most stable with 100% feasibility (Figure 2A) across all 30 scenarios specified in Table S.1.1 (see Supporting Information).Feasibilities of other methods are easily affected by more strict situations, for example, higher variation in data ( = 10) and steeper true dose-response curve.For example, the feasibilities for brm, mdpde, and smle vary with the dose-response curve type and the change of precision.
Their feasibilities are as low as 0% under a steeper dose-response curve (Scenarios 19 and 21 in Supporting Information S.1.2,hereinafter noted as S.1.2.19 and S.1.2.21) but keeps 100% under a flatter dose-response curve.Under the default curve, when precision is low and the simulated responses have higher variances, the feasibilities for those methods are lower than those when precision is high, for example, the feasibilities for brm, mdpde, and smle are as low as 13% when  = 10 and extreme responses are located in both sides (S.1.2.1).When  increases to 100, their feasibilities increase to more than 90%.A similar pattern also happens to nonlinear least squares methods: nls with three algorithms and nlstools with bootstrap and jackknife algorithms.Their feasibilities are vulnerable under a steeper dose-response curve and higher variation (S. 1.2.11 and S.1.2.19).Besides, under dose-dependent precision, although gamlss provides the option to regress variance on log dose, it cannot maintain high feasibility (e.g., S.1.2.16-S.1.2.18). Figure 2 also summarizes the point estimation results over the 30 scenarios for each method.Beta regression related methods (gamlss and mgcv) have higher accuracy with a lower absolute bias of  5 ,  50 ,  0 , and  1 , compared to nonlinear least squares methods.The overall performances of  95 for all methods are more variational, especially in scenarios S.1.2.12, S.1.2.14, and S.1.2.29 since the simulated responses are sparser at large dose levels (Figure 1, panels B2, B3) in the original scale.
Figure 3 provides average coverage probabilities across 30 scenarios along with 95% CI length for each candidate method.As shown in panel A, mgcv has the highest average coverage compared to others for all estimands except  1 .nlstools_jknife has the highest average coverage probability for  1 .However, nlstools_jknife only supplies CI estimations on  0 and  1 and it has much wider CI lengths compared to mgcv.Thus, the higher coverage probability of nlstools_jknife is not persuasive enough for it to outperform mgcv.
Overall speaking, combining both Figures 2 and 3, mgcv has the best performance across all methods with respect to stability, estimation accuracy, and uncertainty quantification of the five estimands.mdpde, nlm, nls_gb, and nlstools_jknife often have the poorest performance and cannot provide decent estimations.

Bias and squared error
Based on Figure 2, mgcv, drc, and drda are the three most accurate and stable methods in point estimation, while other methods cannot provide consistent accuracy and high feasibility.Thus, for assessment in terms of bias and squared error, we mainly focus on mgcv, drc, and drda as representatives for brm-based methods and nonlinear least squares methods and touch upon other methods with regard to their advantages and disadvantages in the practical consideration subsection.Figure 2B,C shows that mgcv outperforms drc and drda in estimations of  0 and  1 in almost all 30 scenarios.S.1.2also demonstrate similar results that considering bias and squared error together, mgcv is more precise in estimating  0 and  1 than drc and drda.Under the default dose-response curve (S.1.2.1-S.1.2.8), steeper dose-response curve (S.1.2.20, S.1.2.23, S.1.2.24, and S.1.2.25), and misspecification cases (S.1.2.28 and S.1.2.30), mgcv has smaller bias and squared error along with lower standard deviations of those two measurements.In all scenarios with a flatter dose-response curve S.1.2.18 and S.1.2.29) and several other scenarios, estimations of  0 and  1 are quite similar between mgcv, and drc and drda, where mgcv may have a slightly higher bias or squared error but lower standard deviation.Also, considering the relatively easy fitting scenario when we have fewer extreme values under a flatter dose-response curve, performances of mgcv, drc, and drda in  0 and  1 are close to each other.Meanwhile, there is one outlier, Scenario 9 (S.1.2.9),where we have extreme responses on the left side, dose-dependent precision, and default dose-response curve, drc, and drda have lower bias and squared error compared to mgcv.In general, despite Scenario 9, mgcv surpasses drc and drda as the most robust method in  0 and  1 with small bias and squared error coupled with a reasonable standard deviation.
F I G U R E 3 Summary of CP and length of nominal 95% CI for each estimation method over 30 scenarios.(A) Average coverage probability across 30 scenarios; (B)-(F) Manhattan plots of 95% confidence interval (CI) length for estimands  0 ,  1 ,  5 ,  50 , and  95 of all 30 scenarios within 14 candidate methods.Percentage values above the Manhattan plots indicate percentages of data points (≥0.01%) that exceed the maximum of y-axis limits.The CP is calculated based on the total 10,000 simulation trials.The CP of those infeasible replicates is considered as 0. CP: coverage probability; CI: confidence interval.
For  5 and  50 , performances of mgcv are slightly better than drc and drda.When the model is properly specified, the three methods are equivalent with similar mean and standard deviation of bias and squared error, except lower bias, squared error, and coverage probabilities of  5 for mgcv than drc and drda in a couple of strictest scenarios (S.1.2.19 and S.1.2.21) under the steeper dose-response curve containing 70−80% extreme responses in the dataset.These two scenarios are usually considered as poor data quality and not common in real cases (often, with such a high percentage of extreme observations, the data will be discarded and the experiment will be rerun with more condensed dose levels), so it is reasonable to disregard the inferiority of mgcv.Moreover, in misspecification scenarios (S.1.2.28-S.1.2.30), mgcv demonstrates its robustness with higher accuracy and efficiency with a lower variation.Therefore, in estimating  5 and  50 , mgcv is superior to drc and drda.One thing to note is that under the right-side extreme values (Figure 1 A2, B2), few simulated responses are around 5% after adding variation to the datasets.mgcv, drc, and drda can still extrapolate to accurate  5 estimations.

Coverage probability
Restricted by the design of implementation tools of dose-response estimation methods, some of the candidate methods do not provide variance estimation for all estimands, so we cannot compare their coverage probability and 95% CI length.
Comparing the coverage probabilities of effect estimations ( 5 ,  50 , and  95 ) (Figure 3A, D-F), mgcv has the highest average coverage probability among all the candidate methods and often a narrow 95% CI length.Considering individual coverage probabilities under each scenario in S.1.2,mgcv has the highest coverage probabilities across 30 scenarios, except in some scenarios under a flatter dose-response curve (S.1.2.9-S.1.2.12, S.1.2.15, S.1.2.17, and S.1.2.18) or a steeper doseresponse curve (S. 1.2.19 and S.1.2.21).In these scenarios, either  5 or  95 of mgcv are slightly lower than drc and drda.For easier simulation scenarios with fewer extreme dose levels and flatter dose-response curves, nonlinear least squares methods (drc and drda) can catch mgcv with similar performances, but under some strict cases it is recommended to try some nonlinear least squares methods like drc and drda in addition to mgcv.In a general sense, mgcv has the highest coverage in effect estimations.
We include nlstools_bstrap and nlstools_jknife to compare the coverage probabilities of coefficients ( 0 and  1 ).Apparently, nlstools_jknife has the highest average coverage probabilities for  1 among all methods (Figure 3A).However, taking the 95% CI length into consideration, nlstools_jknife tends to have much wider lengths than other methods, suggesting low estimation efficiency.Thus, the high coverage probability cannot demonstrate the superiority of nlstools_jknife in estimating  1 .Among the remaining methods, mgcv stands out with the highest coverage probabilities of  0 and  1 and reasonable 95% CI lengths (Figure 3B, C), maintaining all lengths within (0, 2.5).Individual scenario tables (Supporting Information) demonstrate the same pattern.For all 30 scenarios, the coverage probabilities of  0 of mgcv are consistently better than other methods.A similar dominating advantage also exists in  1 , but in some scenarios (S.1.2.1, S.1.2.3, S.1.2.10-S.1.2.12, S.1.2.18-S.1.2.21), the coverage probabilities of  1 of mgcv are lower than drc and drda.The differences are usually small, but in scenarios under a steeper dose-response curve (S.1.2.18-S.1.2.21), the coverage probabilities of  1 are worse than drc and drda, indicating the possible deficiency of mgcv in strict environment compared to nonlinear least squares methods drc and drda.brm shares similar performances as mgcv, but brm is quite unstable under a steeper dose-response curve (e.g., S.1.2.19) with low feasibilities.Therefore, generally, mgcv has the best performance in the coverage probabilities of  0 and  1 .

Practical consideration
In summary, mgcv outperforms drc, drda, and other candidate methods owing to its perfect feasibility, more accurate estimations of  5 ,  50 ,  0 , and  1 , and higher coverage probabilities of the five estimands.drc and drda are slightly better than mgcv in  95 when there are sparse dose levels and a high proportion of extreme values, but the data generated from these settings are often labeled as "poor data from poor design" for more than 70% extreme values.In practice, these data could be replaced by newly generated data from another run of a better-designed experiment to avoid most of the extreme value doses, but the "make-up" experiments are only customized on a very small scale for the most severe cases.Other methods can also have outstanding performances in certain scenarios, but they cannot maintain consistently satisfying performances across all situations.For example, the performance of brm is similar to mgcv and sometimes even better than mgcv (e.g., S.1.2.12).However, brm is vulnerable under extreme scenarios.Its feasibilities can maintain 100% under a flatter dose-response curve but fall under 50% under a steeper dose-response curve.betaboost gives outstanding performance in bias and squared error of estimands when we have symmetric extreme values, but its performance is unstable when the extreme pattern is asymmetric (e.g., S.1.2.1 vs. S.1.2.2). gamlss shares a similar penalization strategy with mgcv, so the results of gamlss are close to mgcv's.However, under dose-dependent precision, when the precision is modeled with log dose, gamlss cannot give stable estimations such that its feasibility decreases to 89%, 53%, and 21% in S.1.2.7-S.1.2.9, for example.In Scenario 14 (S.1.2.14) and Scenario 28 (S.1.2.28), nlm outperforms all other methods, but in other scenarios (e.g., S.1.2.9, S.1.2.10, and S.1.2.19) it renders undesirable estimations in  0 ,  1 , and effect estimations.Therefore, mgcv, drc, and drda consistently provide accurate estimations and decent uncertainty quantifications.Among the three methods, mgcv generally is better than the other two and is recommended as the first choice to be applied to different dose-response datasets.Researchers can choose mgcv, drc, or drda based on different estimation goals.Specifically, mgcv is better than drc and drda in terms of accurate estimations and better uncertainty quantifications of  5 ,  50 ,  0 , and  1 , regardless of the locations of extreme values and types of dose-response curves.Therefore, if researchers are interested in the estimations of  5 ,  50 ,  0 , and  1 , they should prioritize mgcv over drc and drda.For  95 , mgcv, drc, and drda may not always yield desirable estimations in scenarios with few data around the 95% responses.Thus, we suggest trying drc or drda first for point estimation if the target estimand is  95 ; for interval estimation including  95 , mgcv is always favored.

DISCUSSION
In the era of data science, selecting the appropriate statistical approach is a key component in the process to best characterize the dose-response relationship.In real-world data analysis, the popular Chou-Talalay method (a.k.a.median-effect equation) is vulnerable to extreme observations in experiments, but deleting these observations impairs the estimation efficiency and accuracy (Zhou et al., 2022).To identify the robust method for extreme values with implementations in R, we conducted a comparative study using a Monte Carlo simulation.Among different combinations of extreme doseresponse settings, we analyzed 30 dose-response scenarios including different settings for extreme values with regard to the feasibility of each method, coverage probability, bias, and squared error of the estimands of common interest.Of the 14 candidate approaches, penalized beta regression using the mgcv package, and nonlinear regression using the drc and drda packages outperform others in terms of stability and accuracy in estimating the five estimands  0 ,  1 ,  5 ,  50 ,  95 , and higher coverage probability with relatively narrower 95% confidence interval.If only one approach can be picked, the study results support that mgcv is the most robust and stable in estimating the median-effect equation especially when the model is misspecified, which is quite common in real-world data analysis.As discussed in the practical considerations, researchers can select the suitable approach among mgcv, drc, and drda based on the estimation goals and specific data patterns.Other methods like standard beta regression, gamlss, and nls are reasonable alternative choices, but each has its limitations in estimation accuracy or uncertainty quantification.
The comparative study has limitations.We considered the estimation of the median-effect equation with the logit transformation, but other transformations may be more suitable in certain biomedical systems.Nevertheless, the current logit transformation is the most popular with the median-effect equation, and mgcv and other beta regression related methods can be easily adapted for different transformations, like probit and cloglog.Also, the simulation study is restricted to R, which is the most comprehensive statistical software and contains the most recent development for the comparison of the cutting-edge methods.Other platforms such as Python or MATLAB may not be as convenient to include all the methods we had for comparison, and the computational algorithms are restricted in other popular software based on SAS, STATA, or even GraphPad Prism with various limitations in optimization algorithms.
Robust estimation of the dose-response curve can play a critical role in the investigation of adequate drug combinations (García-Fuente et al., 2018;Lee et al., 2007;Zhao et al., 2014).In a wide range of diseases, multiagent synergistic combination therapies have demonstrated great advantages in improving therapeutic efficacy, reducing treatment toxicity, and revolutionizing patient outcomes (García-Fuente et al., 2018;Gauraha, 2016;Yadav et al., 2015).To determine the synergism of drug combinations, the statistical inference will rely heavily on the estimation of  0 and  1 .For example, the interaction index (Greco et al., 1995), a popular metric of drug synergy based on Loewe additivity (Fitzgerald et al., 2006;Greco et al., 1995), can be written as a function of s.The study results support that mgcv is the most robust and stable in estimating s, and it can further be employed to improve the estimation of the interaction index in drug combinations and facilitate the associated statistical testing.The work is in progress, and the results will be communicated in a separate report.

D ATA AVA I L A B I L I T Y S TAT E M E N T
The data that supports the findings of this study are available in the Supporting Information of this article.

O P E N R E S E A R C H B A D G E S
This article has earned an Open Data badge for making publicly available the digitally-shareable data necessary to reproduce the reported results.The data is available in the Supporting Information section.This article has earned an open data badge "Reproducible Research" for making publicly available the code necessary to reproduce the reported results.The results reported in this article were reproduced partially due to insufficient code quality.

F
I G U R E 1 Three types of extreme responses for data simulation.Each set of (A1, B1) for type (i), (A2, B2) for type (ii), and (A3, B3) for type (iii) in Section 4.1 are under the same true dose-response curve based on the median effect equation.The x-axis of A1-A3 is log scaled.

F
Summary of feasibility and point estimation bias for each estimation method over 30 scenarios.(A) average feasibilities across 30 scenarios; (B)-(E) Manhattan plots of absolute bias over 30 scenarios for estimands  0 ,  1 ,  5 ,  50 , and  95 of 14 candidate methods.Percentage values above the Manhattan plots indicate percentages of data points (≥0.01%) that exceed the maximum of y-axis limits.
C K N O W L E D G M E N T This work was supported in part by the Penn State College of Medicine Junior Faculty Development Award.C O N F L I C T O F I N T E R E S T S TAT E M E N TThe authors have declared no conflict of interest.