1. Nonlinear, parametric curve-fitting provides a framework for understanding diverse ecological and evolutionary trends (e.g. growth patterns and seasonal cycles). Currently, parametric curve-fitting requires a priori assumptions of curve trajectories, restricting their use for exploratory analyses. Furthermore, use of analytical techniques [nonlinear least-squares (NLS) and nonlinear mixed-effects models] for complex parametric curves requires efficient choice of starting parameters.
2. We illustrate the new R package FlexParamCurve that automates curve selection and provides tools to analyse nonmonotonic curve data in NLS and nonlinear mixed-effects models. Examples include empirical and simulated data sets for the growth of seabird chicks.
3. By automating curve selection and parameterization during curve-fitting, FlexParamCurve extends current possibilities for parametric analysis in ecological and evolutionary studies.
Fitting nonlinear curves has widespread applications in ecological and evolutionary studies, as it can incorporate complex relationships between variables with greater accuracy than linear regression and greater parsimony than polynomial regression (Pinheiro & Bates 2000; p. 273–277). Although nonparametric methods are more flexible for large sample sizes, performance can be unreliable with smaller samples. Parametric methods are preferred for curve-fitting because of computational efficiency, low estimator variance and theoretical support (Gedeon, Wong & Harris 1995; p. 552). Choice of parameters as determined by theory or proposed by hypothesis facilitates experimental testing and prevents over-fitting (Kohn, Smith & Chan 2001), as opposed to nonparametric methods that use purely statistical (not biological) parameters. Hence, variance in parameter estimation is lower for parametric methods as they are constrained a priori to a specific function (Gedeon, Wong & Harris 1995, p. 552). Parametric curve-fitting, therefore, can analyse relatively small data sets and estimate parameters of biological significance.
Where small data sets exhibit nonmonotonic functional relationships, monotonic parametric curves (e.g. logistic, Richards) and nonparametric methods may give inaccurate results. This often arises in ecological studies with few measurements of each subject (e.g. growth analyses). These studies have traditionally ignored individual differences and have not exploited the analytical power of mixed-effects models (Huin & Prince 2000). Although some ecological studies now employ more complex parametric models (e.g. Bunnefeld et al. 2011), selecting a parametric function of appropriate complexity to accommodate observed trajectories in the data and choosing starting values for its parameters is usually difficult, limiting their general use. Hitherto, no automated approach exists to perform model selection for nonmonotonic curves.
Here, we describe a new R (R Development Core Team 2011) package, FlexParamCurve (Oswald 2011), that includes functions to estimate parameters for nonmonotonic curves and select the models that best fit datasets. We consider generalized forms of double-logistic, biphasic and bi-logistic curves providing examples using simulated and empirical data sets.
Biphasic or nonmonotonic relationships arise in several ecological contexts, including growth in birds, fish, mammals and plants; seasonal migrations or changes in vegetation cover; precipitation-use efficiencies of grasslands and flowering in plants (published examples cited in Appendix S1 Table S1·3).
Self-Starting (selfStart) Functions
Nonlinear equations generally lack analytical solutions, as they include products or divisions among their parameters, and thus are generally solved numerically. Such iterative estimation approaches require a priori specification of starting values for all parameters that are refined during the fitting process until estimation accuracy meets specified convergence criteria. Selection of starting values is often burdensome and, consequently, self-starting [selfStart()] functions are available for many curves (Pinheiro & Bates 2000; p.342–346; Ritz & Streibig 2005; Paine et al. 2012).
FlexParamCurve includes a selfStart() function –SSposnegRichards()– for many nonmonotonic curves. SSposnegRichards() combines two Richards curves (Nelder 1962):
where A, k, i and m are, respectively, the asymptote, rate parameter, inflection point and shape parameter of the first Richards curve and A′, k′, i′ and m′ are the corresponding parameters for the second curve (parameters described in Appendix S1 Table S1·1). In FlexParamCurve, these parameters are designated Asym, K, Infl, M, RAsym, Rk, Ri and RM, respectively. Individual Richards curves can model many sigmoidal forms (e.g. logistic, Gompertz and von Bertalanffy); the double-Richards curve is equally flexible for nonmonotonic relationships (Fig. 1).
Parameter redundancy often arises when the equation fitted is too complex for the data and can lead to estimation problems. Therefore, FlexParamCurve allows fitting [SSposnegRichards()] and plotting [posnegRichards.eqn()] reduced versions of the double-Richards curve by fixing ≤5 parameters to user-specified values (or means by default). FlexParamCurve uses this approach because parameters of the Richards curve have empirical biological meaning for many datasets (e.g. Brisbin et al. 1987). Fixing a parameter achieves the same numerical advantage (fewer estimable parameters) but avoids compensatory changes to estimable parameters that occur when a parameter is dropped. In this way, by default FlexParamCurve allows the data to suggest the most parsimonious curve, but also permits users to select appropriate parameterizations. The default parameter bounds (tested across diverse datasets) can also be specified by the user.
SSposnegRichards() and posnegRichards.eqn() use argument modno to specify one of 32 versions of the double-Richards curve (all 16 possible reductions in the second curve (fixing A′, k′, i′, or m′) both when m is fixed or estimated; see Appendix S1 Table S1·2 and the SSposnegRichards() help file). This allows fitting of monotonic curves such as logistic (model 32, m = 1), Gompertz (model 32, m ≈ 0) and von Bertalanffy (model 32, m = −0·3), as well as many nonmonotonic forms, for example, double-logistic (model 22, m = m′ = 1), double-Gompertz (model 22, m = m′ ≈ 0), double-von Bertalanffy (model 22, m = m′ = −0·3) and biphasic growth models (Fig. 1, Appendix S1 Table S1·2).
The output from SSposnegRichards() feeds directly into functions such as nls(), nlsList() and nlme() (Pinheiro et al. 2007) and is thus compatible with all methods for these functions [e.g. anova()].
FlexParamCurve includes functions [pn.mod.compare() and pn.modselect.step()] to determine the most suitable reduction in the double-Richards curve for a data set. These fit models in nlsList() (Pinheiro et al. 2007), yielding nonlinear least-squares (NLS) fits for each group (e.g. each individual in a growth analysis). This represents the suitability of a particular curve more robustly than a simple NLS across all groups (which ignores individual contributions; Fig. 2).
pn.mod.compare() ranks candidate nlsList() models according to penalized root-mean-square error (pRSE′):
where σ2 is the estimated variance (square of residual standard error) for each of the β fitted grouping levels and n is the number of data points fitted. √(Σσ2/β) is root-mean-square error (RSE) and by default this is divided by √n (thus, pRSE′ represents per level measurement error exponentially discounted by sample size). This penalizes models that fit only a few groups and consequently have low RSE (because there is likely less variation in fit among fewer levels) and allows comparison of nonnested models (as it uses residual squared error rather than maximum likelihood). Users can also edit the formulation of pRSE′ to match their desired balance between sensitivity and specificity.
Processing time can be considerable for multiple nlsList() models with many groups, so pn.mod.compare() and pn.modselect.step() first evaluate the parsimony of a fixed shape parameter m. Initially, an extra sum-of-squares F-test compares the full 8-parameter model (model 1) with a 7-parameter model (model 21) in which m is fixed to the mean across the data set. If the 8-parameter model provides a significantly better fit, subsequent reductions explore models in which m is estimated (modno = 2–16). Otherwise, subsequent evaluations use the same reductions but with m fixed to the mean across the data set (modno = 22–36) (see Fig. 1 and Appendix S1 Table S1·2).
After assessing the need to estimate m, pn.modselect.step() uses backwards, step-wise selection of subsequent nlsList() models. At the next step, four candidate models (each with one of the four-second curve parameters, A′, k′, i′, m′, fixed at its mean value) are ranked by pRSE′ (as they are not mutually nested) and the highest-ranked reduction is compared with the general model (1 or 21) using extra sum-of-squares F-tests (Ritz & Streibig 2009). This rank-then-test procedure is used at all subsequent steps.
For additional flexibility, functions extraF() and extraF.nls() allow users to undertake extra sum-of-squares F-tests for any two nested nlsList() or nls() models, respectively.
Using FlexParamCurve: Examples from Avian Growth Analyses
The help files for FlexParamCurve provide illustrative examples; see also Figs 1–3 and Appendix S2. Here, we demonstrate the general approach for using FlexParamCurve to determine the most suitable parametric curve then fit NLS models or nonlinear mixed-effects models. We use published data on growth of common terns (Sterna hirundo Linnaeus) (Nisbet 1975; Nisbet, Wilson & Broad 1978; tern.data) and little penguins (Eudyptula minor Forster) (Chiaradia & Nisbet 2006; penguin.data) and a simulated data set for black-browed albatrosses (Thalassarche melanophrys Temminck; posneg.data see help file). Appendix S3 provides codes for these examples.
1 Run function modpar() to generate a list of initial parameter estimates, fitting options and parameter bounds. This provides information needed to fit [using SSposnegRichards()] and predict [using posnegRichards.eqn()] and can subsequently be modified manually or with change.pnparameters(). Calling either model selection routine automatically calls modpar() if a suitable list is not supplied.
2 Perform model selection using pn.model.compare() and pn.modselect.step(). These functions may suggest different reductions in the double-Richards curve because pn.model.compare() is more sensitive to curves with low pRSE′ and pn.modselect.step() relies on sequential model reduction. For example, both routines selected a double-Gompertz curve (modno = 22, Table 1, Appendix S1 Table S1·2) as the best fit to the posneg.data data set. In contrast, they each suggested different final models for both penguin and tern data sets (Table 1). For penguins, pn.model.compare() selected model (modno) 31, a 4-parameter model including one-second curve parameter that fitted 90% (122/150) of the individuals in the data set (Table 1, Fig. 2), rather than the anticipated (Chiaradia & Nisbet 2006) double-Gompertz curve that required two-second curve parameters (modno = 34). For terns, pn.model.compare() selected model (modno) 32, a 3-parameter model with the shape parameter m fixed at 0·72 (mean across the data set); this fitted 89% (67/75) of the individuals in the data set (Table 1, Fig. 2) and was similar in shape to a logistic curve (m = 1·0).
Table 1. Top-ranked models by pn.model.compare() (first subtable) and stepwise selection by pn.modselect.step() (second subtable) for (a) posneg.data (100 levels), (b) little penguin (150 levels) and (c) common tern (75 levels) data sets. For pn.model.compare() models are ranked according to minimized, penalized root-mean-square error (pRSE′) (lowest value in bold), No. of levels fit is the number of groups (individual chicks) parameterized in nlsList() and No. of params is the number of parameters. For pn.modselect.step(), only the most general and most reduced models are shown (see Appendix S1 Tables S1·4–6 for full output)
‘Selected’ indicates the model (modno) preferred from extra F comparisons between the reduced (‘Reduced’) and more general (‘General’) models tested at this step; extra F statistics are given. Preferred model at Step 6 (bold) is deemed the most suitable. For additional detail, see Appendix S1.
3 Fit NLS [nls()] or nonlinear mixed-effects models [nlme()] using the most suitable curve in SSposnegRichards(). Model selection in nls() or nlme() can then investigate effects of factors, variates or covariates (fixed or random) on the parameters selected (Pinheiro & Bates 2000; p. 377–409). For example, the penguin data set contains data from two contrasting years (Chiaradia & Nisbet 2006). When analysed within a single NLME model (Appendix S3) both yearly and seasonal differences are evident (Fig. 3).
FlexParamCurve provides ways to fit, plot and compare a multiplicity of monotonic or nonmonotonic parametric curves in R, using NLS and mixed-effects models. This permits modelling of nonmonotonic relationships with relatively small data sets, including studies of growth, migration and seasonal vegetation dynamics, both when data are expected to follow a particular nonmonotonic relationship and when the relationship is as yet unexplored.
We thank Sinéad English for data and comments during testing, Dieter Menne for advice on optimization routines and Timothy Paine and one anonymous reviewer for helpful comments on the manuscript.