Sample size assessments for thermal physiology studies: An R package and R Shiny application

Required sample sizes for a study need to be carefully assessed to account for logistics, cost, ethics and statistical rigour. For example, many studies have shown that methodological variations can impact the critical thermal limits (CTLs) recorded for a species, although studies on the impact of sample size on these measures are lacking. Here, we present ThermalSampleR; an R CRAN package and Shiny application that can assist researchers in determining when adequate sample sizes have been reached for their data. The method is particularly useful because it is not taxon specific. The Shiny application offers a user‐friendly interface equivalent to the package for users not familiar with R programming. ThermalSampleR is accompanied by an in‐built example dataset, which we use to guide the user through the workflow with a fully worked tutorial.


INTRODUCTION
Insufficient sample sizes in a study represent a waste of resources by not having the power to reliably detect patterns in the data, which can lead to incorrect inferences and inappropriate management interventions (Duffy et al., 2021).Oversized studies consume more resources than is necessary, which imposes unnecessary costs and provides little improvement in the ability to answer particular ecological research questions (Forcino et al., 2015).For studies that involve animals, and particularly threatened species, sample size determination is important for ethical reasons too (Duffy et al., 2021).Indeed, many journals, institutions and ethics committees require that researchers justify the number of samples used during the study (Hampton et al., 2019), which should be determined as the minimum sample size necessary to achieve the goals of the study (Fitts, 2011).
Recently, several studies have shown that the results and inferences obtained from thermal tolerance studies can be significantly affected by methodological choices when designing and performing the experiments, such as the use of a pre-experimental acclimation period, the temperature ramping rate and ramping intervals (e.g., Chown et al., 2009;Nyamukondiwa & Terblanche, 2009;Rezende et al., 2014).Similarly, Duffy et al. (2021) demonstrated that the number of individuals tested during thermal tolerance studies (sample size) can significantly bias the results obtained and any inferences drawn from these studies.Determining the sample size requirements for a study is an essential component of study design, which can have serious consequences for the logistics, cost, ethics and statistical rigour of the study (Arnold et al., 2011;Gerrodette, 1987).
Insect thermal limit studies are plentiful and therefore offer an ideal source of data for exploring sample size requirements.Most insects are poikilothermic ectotherms (Sinclair et al., 2015), and so their bodily functions and life history characteristics are strongly correlated to the ambient microclimate (Neven, 2000;Nguyen et al., 2014;Sinclair et al., 2015).To survive and reproduce, insect body temperatures need to be maintained within the limits of their thermal tolerance range (Koštál et al., 2011;Nguyen et al., 2014).As such, thermal tolerances can be used to explain the geographical distributions (Rezende et al., 2014;Sinclair et al., 2015) and the performance of insects under different environmental conditions (Nguyen et al., 2014;Nyamukondiwa & Terblanche, 2009;Sinclair et al., 2015).In this vein, thermal tolerance investigations have been used to determine the establishment and spread of insect pests (Wang et al., 2019) and biological control agents (Coetzee et al., 2007).
The applicability of these studies has increased recently as researchers aim to forecast changes in faunal and floral assemblages under current and future climate change scenarios (Bennett et al., 2018;Duffy et al., 2015;Rezende et al., 2014).
In this paper, we present ThermalSampleR-an R package and R Shiny graphical user interface (GUI) application that allows users to easily assess the sample sizes required to obtain reliable and accurate thermal physiology parameters (e.g., critical thermal limits [CT min /CT max ]).Ther-malSampleR is designed to make analysing sample size requirements simple and provide easily interpretable summary statistics.The Shiny GUI provides the functionality of the full R package to researchers with little to no experience in R.

PACKAGE BACKGROUND
Several tools and analyses have been developed to aid in sample size planning for biological studies, primarily focusing on the use of power calculations (Peterman, 1990;Toft & Shea, 1983).The power of a statistical test refers to the probability that the test correctly rejects the null hypothesis.However, power calculations are centred on assessing whether sample sizes are large enough to detect a statistically significant difference between groups (i.e., correctly rejecting the null hypothesis using a p-value).They are, therefore, of little use for estimating the critical thermal limit (CTL) of a single population or assessing the accuracy and precision of betweengroup differences in thermal tolerance parameters.To remedy this, many researchers have adopted the practice of calculating the effect of sizes (e.g., difference in means/medians) and 95% confidence intervals (CIs) for a more rigorous and intuitive method to make comparisons amongst groups, rather than by simply relying on a p-value (Gardner & Altman, 1986;Halsey, 2019;Nakagawa & Cuthill, 2007).Practitioners need to consider sample size planning for both power and accuracy in parameter estimation (AIPE), which both require different statistical approaches (Maxwell et al., 2008).
To account for sample size planning for both power and AIPE, the ThermalSampleR package uses simulation and bootstrap resampling procedures to calculate population parameters and CIs (Maxwell et al., 2008).The CI approach to power planning has the added benefit (as compared to obtaining a p-value) of indicating a direction of effect.Moreover, CIs can be used to assess sample size planning for AIPE by computing and controlling the CI of the parameter of interest (Maxwell et al., 2008).This contains two distinct components: (1) planning for accuracy, whereby researchers assess the probability that the CI contains the true population parameter of interest (e.g., CT min /CT max ), and (2) planning for precision, where precision is measured by the width of the CI (i.e., a smaller CI width indicates a more precise estimate of the population parameter; Maxwell et al., 2008).
CTL studies can be divided into two broad categories: singlesample studies and multiple-group comparison studies.Singlesample studies use an estimate of a population parameter of interest, such as the CT min /CT max of a single population of a species.
These kinds of studies are usually descriptive, or may be of interest to predict where the best release sites could be in the country of introduction for a new biocontrol agent, or how insects could be expected to respond to climate change (e.g., Coetzee et al., 2007).Two-or multiple-group comparison studies use an estimate of the possible difference in CTLs between different groups.Examples of these kinds of studies include, amongst others, those where multiple species or populations of a biological control agent need to be compared to determine which would be better suited for release at a specific site, or where the CTLs of groups exposed to different environmental conditions are compared to determine whether acclimation is possible (e.g., Porter et al., 2019).The functions provided within the ThermalSampleR package are distinguished by whether the experimental data originates from a single-sample (boot_one() and plot_one_group()) or multiple-group comparisons study (boot_two() and boot_two_groups()).

TUTORIAL
The following tutorial illustrates the core functions available within the ThermalSampleR package.Our goal is to provide an easy-to-follow and fully reproducible analysis of both a sample size assessment for (a) a single-sample study and (b) a multiple-groups comparison study.

Package installation
ThermalSampleR can be accessed by running one of the options below in R: 1. Via the CRAN repository install.packages("ThermalSampleR")In this example, the accuracy of our CT min estimate was high once n > 10 individuals were tested.As noted above, because the true population parameter is estimated from the raw data, this analysis of parameter accuracy may be biased, and thus, should be interpreted with caution.
Take-home message: As long as the researchers were content with obtaining a CT min estimate for adult C. schaffneri with a precision of approximately 1.2-1.5 C, the experiment could be concluded at n = 15 individuals tested.Adding additional samples above n = 15 would likely improve the precision of the CT min estimate; however, the gain in precision must be considered in light of the logistics, costs and ethics of testing additional specimens.

Sample size assessment-Multiple-group comparisons
ThermalSampleR also allows the user to estimate sample size adequacy for studies comparing the CTLs across multiple groups (e.g., testing for differences in CT min between different taxa, populations, treatments applied and sexes).For example, the built-in example data (coreid_data) in ThermalSampleR contains CT min data for 30 adults and 30 nymphs of C. schaffneri.Researchers may be interested in determining whether releasing adults or nymphs would lead to better establishment rates in the field.As such, the researchers could assess the CT min of each life stage and use these data to release the life stage with the lower CT min value as they would be assumed to better tolerate low temperatures.To do this, we apply a similar workflow as per the 'single sample' assessments in the previous section.We use a bootstrap resampling procedure to estimate the width of the 95% CI of the difference in CT min estimates between our two groups of interest (C.schaffneri adults vs. nymphs) across a range of sample sizes: # Set a seed to make the results reproducible, for illustrative purposes.set.seed(2012)# Perform simulations ThermalSampleR::bt_two = boot_two( # Which dataframe does the data come from?data = coreid_data, # Provide the column name containing the taxon ID groups_col = col, # Provide the name of the column containing the response variable (e.g CTmin data) response = response, # Provide the name of the first taxon to be compared group1 = "Catorhintha_schaffneri_APM", # Provide the name of the second taxon to be compared group2 = "Catorhintha_schaffneri_NPM", # Maximum sample to extrapolate to n_max = 49, # How many bootstrap resamples should be drawn?iter = 299) testing approach.Their approach differed from ours by randomly resampling simulated datasets with varying skewness characteristics rather than resampling the raw data.Thereafter, the authors compare the mean and variance of smaller subsets of the full dataset to the full dataset using a 'two one-sided t-test' approach (Duffy et al., 2021).Tests applied were either standard one-sided t-tests (for normally distributed datasets) or Chen's modified one-sided ttest (Chen, 1995).The user can specify an equivalence margin indicating the acceptable degree of error between the data subsets and the full dataset (e.g., an equivalence margin of 1 C indicates whether the mean or variance of the thermal limit for each subsample was within 1 C of the full dataset).The value of the approach adopted by Duffy et al. (2021) is that it accounts for the often-skewed distribution of thermal limits datasets (Janion-Scheepers et al., 2018).
ThermalSampleR allows users to calculate sample size 2. GitHub devtools::install_github ("clarkevansteenderen/ThermalSampleR")3.The R Shiny GUI can be accessed directly on the R console by running library(shiny) shiny::runUrl( "https://github.com/clarkevansteenderen/ThermalSampleR_Shiny/ archive/main.tar.gz")orvia the link to the R Shiny application server: https://clarkevansteenderen.shinyapps.io/ThermalSampleR_Shiny/DatastructureThis tutorial uses the coreid_data dataset as an example, which is a data frame/tibble included in the package.This dataset represents the CT min data for the twig-wilting bug Catorintha schaffneri (Hemiptera: Coreidae), a biological control agent introduced into South Africa from Brazil to control the invasive cactus Pereskia aculeata Miller (Cactaceae;Muskett et al., 2020).The dataset contains two columns, the first being col, which contains a unique identifier label (e.g., a species/taxon/population name), distinguishing data obtained from adults (Catorhintha_ schaffneri_APM) or nymphs (Catorhintha_schaffneri_NPM).The second column, response, contains a numeric vector containing our response variable, the CT min value (in C).Each row represents a unique individual that was tested during the experiment.Before starting any analyses, we can inspect the raw data:Sample size assessment-single sampleThe simplest application of ThermalSampleR is to evaluate whether a study has used a sufficient sample size to accurately estimate a parameter of interest for a single taxon.Below, we illustrate this by performing these calculations to estimate sample sizes required to accurately estimate the CT min of adults of C. schaffneri (denoted by Catorintha_schaffneri_APM in coreid_data; Muskett et al., 2020).This simulation uses a bootstrap resampling procedure to estimate the width of the 95% CI of the parameter of interest estimate across a range of sample sizes, which defaults to starting at n = 3 individuals tested, and which can be extrapolated to sample sizes greater than the sample size of the existing data by specifying a value to n_max: # Set a seed to make the results reproducible, for illustrative purposes.set.seed(2012)# Perform simulations ThermalSampleR::bt_one = boot_one( # Which dataframe does the data come from?data = coreid_data, # Provide the column name containing the taxon ID groups_col = col, # Provide the name of the taxon to be tested groups_which = "Catorhintha_schaffneri_ APM", # Provide the name of the column containing the response variable (e.g CTmin data) response = response, # Maximum sample to extrapolate to n_max = 49, # How many bootstrap resamples should be drawn?iter = 299) The variable containing the bootstrap resamples should then be passed to the plot_one_group() function to visualise the simulation results.A number of optional parameters can be passed to the function to alter the aesthetics of the graphs: , we visualise the precision of our CT min estimate for adult C. schaffneri, whereby precision is measured as the width of a 95% CI.For example, in the context of CTLs, a CI width of 1 indicates that practitioners can be 95% confident that their CTL estimate is within 1 C of the true CT min value.The smaller the CI width, the greater the precision of the CTL estimate.In this example, the precision of our CT min estimate was high and was not predicted to improve substantially by increasing sample size once approximately n = 20 individuals were tested, as the 95% CI reached a plateau at n = 20.The plateau is in the extrapolation section of the graph indicating that more individuals would need to be tested for the 95% CI to become approximately stable.However, at the existing sample size of n = 15, the researchers could be relatively confident that the CT min estimate they have obtained is precise to within approximately 1.2-1.5 C. Researchers will need to decide for themselves what an acceptable degree of precision is for their own datasets.Inspecting Figure 1b, we visualise the sampling distribution (i.e., the range of plausible CT min values) for the taxa under study.This assessment can produce biased results at small sample sizes because the population parameter (e.g., the taxon's CT min ) is unknown and must therefore be estimated from the experimental data.Figure 1b gives an indication of parameter estimation accuracy by plotting the proportion of bootstrap resamples across each sample size for which the 95% CI included the estimated population parameter.An accurate parameter estimate should produce CIs that, on 95% of occasions, contain the estimated population parameter.

1
Figure 2a can be interpreted analogously to Figure 1a produced during the 'single sample' assessments in the previous section.Here, we are visualising the precision of our estimate for the difference in CT min of C. schaffneri adults versus nymphs across sample sizes.In this example, where n = 30 individuals were tested for both adults and nymphs of C. schaffneri, the precision of our estimated difference between the requirements using this Test of Total Equivalency (TOTE) as developed by Duffy et al. (2021), using the equiv_tost() function.Using the same coreid dataset from the previous sections, we illustrate below how to assess sample size requirements to precisely estimate the CT min parameter for adult C. schaffneri across a range of sample sizes (i.e., in a single-sample study design): tte = ThermalSampleR::equiv_tost( # Which dataframe does the data come from?data = coreid_data, # Provide the column name containing the taxon ID groups_col = col, # Provide the name of the taxon to be tested F I G U R E 3 Test of Total Equivalency output for Catorintha schaffneri.Panel (a) shows the equivalence of means, and panel (b) shows the equivalence of variances.Both graphs are simulated for low (1) and high (10) skewness in the data and show a plateau in the curves.F I G U R E 4 Test of Total Equivalency using only six Catorintha schaffneri individuals.Panel (a) shows the equivalence of means, and panel (b) shows the equivalence of variances.Both graphs are simulated for low (1) and high (10) skewness.Neither panel has reached a plateau.