## Introduction

Repeatability is the fraction of total variation in a set of measurements because of the variance among individuals. Estimates of repeatability are useful in a variety of ecological and evolutionary fields and can be obtained from repeated measurements on a number of individuals (Lessells & Boag 1987). In physiological and behavioural ecology, repeatability is often used to measure the extent to which an individual’s performance or behaviour is canalized (Bennett 1987; Lessells & Boag 1987; Boake 1989). Researchers interested in the evolution of traits often use estimates of repeatability as a rough upper limit to the heritability (Boake 1989; Falconer & Mackay 1996; but see Dohm 2002). This is particularly useful in areas such as behavioural ecology, where the inability to obtain accurate heritability estimates under natural conditions has impeded studies of behavioural trait evolution (Boake 1989). In many areas of ecology and evolutionary biology, researchers seek to increase the power of their experiments and the accuracy of their conclusions by minimizing measurement error (Bailey & Byrnes 1990; Yezerinac, Lougheed, & Handford 1992). When considering static or fixed specimens, the percentage of total variation because of measurement error is simply one minus the repeatability. In these cases, repeatability provides a standard means of quantifying measurement error for comparisons among traits and treatment groups.

Estimating repeatability proceeds with *k* measurements of a trait on each of *n* individuals. The collected data can then be analysed using a one-way analysis of variance to obtain the variance components (i.e. among-individual variance and within-individual variance – the latter being composed of both error variance and actual trait variation, within individual *n*, among the *k* measurements) that are used to estimate the repeatability (Lessells & Boag 1987). Much like other statistics (e.g. mean and standard deviation), the estimated repeatability is merely a sample statistic that estimates a population parameter, which in this case is the true repeatability (Sokal & Rohlf 1995). This evokes the question, ‘how many individuals (*n*) and how many times does each individual need to be measured (*k*) to obtain a precise estimate of repeatability?’

In the existing ecological, behavioural and evolutionary literature, there is no standing guideline for determining the sample size (*n* and *k*) required to estimate repeatability. This is apparent from the inconsistency of sample sizes used in repeatability estimates across papers coupled with a lack of justification for the particular sample sizes chosen. For example, in a review of repeatability estimates for behavioural traits, Bell, Hankison, & Laskowski (2009) found repeatabilities based on measurements of as few as five and as many as 1318 individuals, with most individuals measured twice, but ranging up to 60 separate times (mean *k* = 4·41). Although most papers using repeatability cite Lessells & Boag (1987) to indicate how the repeatability calculation was carried out, this instructive paper does not discuss the sampling error of the statistic and hence does not provide guidelines for determining the sample size required to obtain precise estimates. Of course, defining the appropriate level of precision depends upon the question(s), one is trying to answer with the statistic. When assessing measurement error, researchers are seeking to determine the external variance introduced by the act of measurement. Thus, investigators strive to obtain an estimate with the confidence interval width as small as possible. However, other approaches, such as when estimating the repeatability of a particular behavioural response (e.g. thermal preference of *Drosophila subobscura*; Rego *et al.* 2010), intend to distinguish the repeatability estimate from some expected value (e.g. zero in the case of demonstrating significant among-individual variation). In the latter case, the precision with which repeatability must be estimated is contextualized by the distance between the two estimates being compared. It is possible to test whether or not a particular repeatability estimate is different from some *a priori* chosen value (Donner & Eliasziw 1987; Walter, Eliasziw, & Donner 1998), and researchers often use these methods to test the hypothesis that their repeatability estimates are significantly >0. In some contexts, this may be sufficient, but to convey the precision of the estimate, it is more informative and appropriate to report the magnitude of the repeatability with an associated confidence interval (Nakagawa & Cuthill 2007). Little attention has been paid to the precision of repeatability estimates in the ecological, behavioural and evolutionary literature, and confidence intervals are rarely reported. However, this issue has received considerable attention in the medical literature (Donner & Eliasziw 1987; Walter, Eliasziw, & Donner 1998; Bonett 2002), and one of our goals in this study is to introduce the methods derived therein to the behavioural, ecological and evolutionary research community.

We begin by sampling recently published papers in the behavioural, ecological and evolutionary literature to assess the precision of existing estimates of repeatability and the frequency with which confidence intervals are currently reported. From this survey, it is apparent that guidelines are needed to instruct researchers how to design protocols for assessing repeatability with precision. With that goal, we introduce an equation for sample size estimation from Bonett (2002) and use simulations to test the validity of this method for data with diverse variance structures and over a wide range of intraclass correlation coefficients (ICCs). In general, we find that the prescriptive equation from Bonett (2002) agrees well with results from our simulated data. Finally, we conclude the paper by discussing strategies to maximize the precision of estimates while minimizing the total sample size and recommendations for the future reporting of repeatability estimates. In addition, we have created an R software package, icc, based upon the results and methods discussed in this paper. The package provides a set of tools that researchers can use to design an experiment that will maximize precision of repeatability estimation while minimizing effort.