rTPC and nls.multstart: A new pipeline to fit thermal performance curves in r

Quantifying thermal performance curves (TPCs) for biological rates has many applications to important problems such as predicting responses of biological systems—from individuals to communities—to directional climate change or climatic fluctuations. Current software tools for fitting TPC models to data are not adequate for dealing with the immense size of new datasets that are increasingly becoming available. We require tools capable of tackling this issue in a simple, reproducible and accessible way. We present a new, reproducible pipeline in r that allows for relatively simple fitting of 24 different TPC models using nonlinear least squares (NLLS) regression. The pipeline consists of two packages—rTPC and nls.multstart—that provide functions which conveniently address common problems with NLLS fitting such as the NLLS parameter starting values problem. rTPC also includes functions to set starting values, estimate key TPC parameters and calculate uncertainty around parameter estimates as well as the fitted model as a whole. We demonstrate how this pipeline can be combined with other packages in r to robustly and reproducibly fit multiple mathematical models to multiple TPC datasets at once. In addition, we show how model selection or averaging, weighted model fitting and bootstrapping can be easily implemented within the pipeline. This new pipeline provides a flexible and reproducible approach that makes the challenging task of fitting multiple TPC models to data accessible to a wide range of users across ecology and evolution.


| INTRODUC TI ON
Thermal performance curves (TPCs) describe how biological rates such as growth, photosynthesis and respiration change with temperature. TPCs and the parameters that underpin them have been used widely in biology, from studying thermal adaptation (Kontopoulos, Van Sebille, et al., 2020;Schaum et al., 2017), to predicting ectotherm range shifts (Sinclair et al., 2016;Sunday et al., 2012) and changes in disease dynamics (Molnár et al., 2013;Mordecai et al., 2019) under expected climate change. Studies looking across large spatial scales, or using a comparative approach, need to fit mathematical models to large datasets of hundreds or even thousands of TPCs across multiple taxonomic groups . Increasingly large datasets require reproducible and user-friendly computational pipelines for fitting multiple, competing TPC models to data. A few r packages (e.g. temperatureresponse (Low-Décarie et al., 2017) and devRate (Rebaudo et al., 2018)) provide methods for fitting TPC models to data using nonlinear least squares regression (NLLS). While these packages are a significant advance to TPC model fitting, no single pipeline addresses three key requirements: (a) implementation of a representative number of mathematical models, (b) methods to overcome the well-known sensitivity of NLLS algorithms to parameter starting values and (c) calculation of uncertainty in parameter estimates and the model fit as a whole.
The first requirement-of fitting a sufficient number of mathematical models-is important because a large number of mathematical models have been proposed to quantify TPC data (DeLong et al., 2017;Krenek et al., 2011). This makes the challenge of determining the 'best' model for any given dataset particularly difficult. A few papers have evaluated the performance of TPC models (Angilletta Jr., 2006;Krenek et al., 2011;Shi & Ge, 2010;Shi et al., 2016Shi et al., , 2017. The most comprehensive analysis to date compared 12 models, and demonstrated how model choice alters the predicted species-level response to temperature (Low-Décarie et al., 2017). However, despite the wide uptake of model selection across ecology and evolution (Johnson & Omland, 2004), fitting multiple mathematical models to TPCs remains rare practice. Instead, a single model is typically used, either because of its mechanistic underpinnings or simply because it is 'well known' and provides 'adequate' estimates of desired parameters (e.g. optimum temperature). Indeed, there is likely no 'best' model to use for fitting TPCs, with different models proving the most appropriate for different biological processes, taxa and levels of data quality.
Allowing users with different research questions and modelling requirements to fit TPCs in a reproducible manner requires a pipeline that is flexible, modular. The second requirement-finding starting values-is a well-known challenge with NLLS model fitting (Burnham & Anderson, 2002), and generally requires the development of bespoke methods that vary with the mathematical model. This issue is particularly challenging when it comes to mathematical TPC models because of the number and diversity of models that are available. Finally, the third requirement-calculating uncertainty-is especially important for TPCs when fitting models to multiple datasets from diverse taxonomic groups or traits as the data can vary widely in sampling replication, measurement accuracy and coverage of temperature range.
Here, we present rTPC and nls.multstart, two open-source r packages that provide the basis for a pipeline to robustly and reproducibly fit TPCs by addressing these three key requirements. The pipeline allows the fitting of 24 different TPC model formulations, and we demonstrate how multiple models can be fitted to the same curve, as well as how multiple datasets can be fitted. We also describe new helper functions within rTPC for the estimation of start parameters, upper and lower parameter limits and the calculation of commonly used parameters (e.g. optimum temperature, activation energy or Q 10 ). Finally, we illustrate how this pipeline can be used for model selection and model averaging, as well as how weighted model fitting and bootstrapping implemented using rTPC can be used to account for parameter and model uncertainty.

| PIPELINE OVERVIE W
The goal of rTPC and the associated pipeline is to make fitting TPCs easier, repeatable and transparent. Tutorials can be found at https:// padpa dpadp ad.github.io/rTPC where all vignettes are available.
When developing rTPC, we made a conscious decision not to repeat code and methods that are already optimised and available in the r ecosystem. Instead, they are utilised and incorporated into the pipeline (see Table S1 for a list of r packages used). This modularity of design improves flexibility, allowing users to incorporate rTPC and nls.multstart into their own pipelines, but still benefit from the helper functions.

| Pre-processing of data before using rTPC
rTPC can fit TPCs to any biological rate or fitness proxy that shows a unimodal response to temperature. Data need to be stored in long format, where each row is one rate measurement per curve.
This means that each TPC will have multiple rows in the dataframe, with extra treatment columns added to distinguish between curves ( Figure 1a). Pre-processing to reformat data to long format can easily be done using, for example, tidyr::pivot_longer().
Models were chosen through an extensive search of the primary literature and review papers of TPC model performance. Most models are named after the author who first formulated the model and the year of its first use (e.g. thomas_2012()). A list of all models in rTPC can be accessed using get_model_names(). Models can be characterised by whether they appropriately model negative rates before and after the optimum temperature (Table S2). As the search was not exhaustive, some models-for example, the Logan model (Logan et al., 1976)-are not currently implemented. However, requests to add new models can be made on rTPC's github repository: https:// github.com/padpa dpadp ad/rTPC/issues.

| Reliable NLLS fitting using nls.multstart
The Gauss-Newton (implemented in nls) and the Levenberg-Marquardt   (Elzhov et al., 2016). Starting values can be generated in two ways. The first is a simple, constrained random search, where starting parameter values are sampled from a uniform distribution between pre-defined lower and upper start values for each parameter. The second is a grid search, where the start parameters are generated such that space between the given parameter bounds is evenly sampled. The best model is then picked and returned using Akaike's information criterion corrected for small sample size (AICc) (Padfield & Matheson, 2018).
nls.multstart can also be used for fitting other nonlinear models used in biology, such as logistic growth curves (Padfield et al., 2020) and photosynthesis-irradiance curves.

| Calculating derived TPC parameters
One common motivation for fitting TPCs to mathematical models is to extract key TPC parameters, such as optimum temperature or  Figure 1e). calc_params() does not return estimates of uncertainty in these derived parameters.

Below we give examples of potential applications and extensions to
the pipeline, why they are important, and guidance as to how they can be incorporated.

| Model selection and model averaging
The 'best' model for one dataset is not necessarily the best across other datasets. Fitting multiple models to TPCs allows the user to select the most suitable model for their research question.
Our pipeline provides a flexible approach to help with model selection. For example, after fitting a number of potential models, AICc scores can be used to rank the models for each individual curve fit and pick the best overall model across all curves in a dataset. Alternatively, one may choose the best model specific to each TPC, or use model averaging to obtain an overall TPC curve and parameter estimates by weighting each model's fit by its AICc (Figure 2). vignette("model_selection_averaging") provides an example of how to implement model selection and model averaging.

| K E Y CON S IDER ATI ON S WHEN FIT TING TP C S
Effective fitting of TPCs depends on decisions made during experimental design, data collection and model choice.

| Data considerations
For effective fitting of TPCs, the number of unique temperature values used, the level of replication at each temperature and the temperature range, all need to be considered. In the (common) scenario where all three cannot be maximised, the objective of the TPC fitting-and the parameters of particular interest-need to be considered. For example, in thermodynamic models, if the objective is to quantify the activation energy accurately, thermal range can be traded off for a finer degree of temperature resolution in the operational temperature range of the study organism (Pawar et al., 2016). It is particularly important to consider the level of replication at each temperature. Sampling multiple individuals at each temperature can give multiple individual TPCs of a population, which could be used to evaluate intraspecific variation in traits.

| Which models to fit
The decision on which TPC models to fit largely depends on the type and quality of data, and the questions being asked. In terms of data requirements, there must be at least k + 1 points to fit a given model, where k is the number of model parameters. However, in NLLS fitting, the minimum number of data points needed to reliably fit a model to data can vary with the mathematical structure of the model (Burnham & Anderson, 2002), so in general, 'the more the  Kontopoulos et al., 2018 in the case of the Sharpe-Schoolfield model).

| CON CLUDING REMARK S
The pipeline presented here allows TPC data to be fitted to mathematical models in a simple, reproducible and flexible way. rTPC includes 24 mathematical models that represent the wide diversity of nonlinear TPC models available in the literature, and nls.multstart allows this set of models to be reliably fitted to data using NLLS by addressing the starting values problem. However, this pipeline does not accommodate non-independent (related) replicates, and clustered or stratified sampling (possibly with missing values). In such situations, nonlinear mixed effects model fitting (e.g. using the nlme r package; Oddi et al., 2019) or Bayesian approaches (e.g. using the brms r package; Bürkner, 2017) would be more appropriate.
Nevertheless, for fitting massive TPC datasets to multiple mathematical models, rTPC offers a simple, reliable and reproducible computational pipeline with robust methods for calculation of model uncertainty, requiring minimal statistical and computational expertise, and suitable for a wide range of applications.

AUTH O R S ' CO NTR I B UTI O N S
D.P. conceived the ideas and designed the pipeline; D.P. authored the r package and wrote the initial draft. All the authors contributed to developing the manuscript and gave final approval for publication.

PE E R R E V I E W
The peer review history for this article is available at https://publo ns.

DATA AVA I L A B I L I T Y S TAT E M E N T
All data and code used in rTPC are archived at https://doi.