Horn growth appears to decline under intense trophy hunting, but biases in hunt data challenge the interpretation of the evolutionary basis of trends

Abstract A recent article in Evolutionary Applications by LaSharr et al. reports on trends in the size of horns of bighorn sheep (Ovis canadensis) throughout much of the species’ range. The article concludes that there are “... stable or increasing trends in horn growth over nearly 3 decades in the majority of hunt areas throughout the western U.S. and Canada.” However, the article equates nonsignificance of predominantly negative trends in the areas with the most selective harvest as evidence for the null hypothesis of no trends and also fails to consider well‐known and serious biases in the use of data collected in size‐regulated hunts. By applying meta‐analysis to the estimates reported by LaSharr et al., we show that there has been a pervasive overall trend of declining horn sizes in Alberta, where the combination of horn size‐based legality, combined with unrestricted hunter numbers are understood to generate the greatest selective pressures. Given the nature of the biases in the underlying data, the magnitudes of the trends resulting from our re‐analysis of LaSharr et al.'s (Evolutionary Applications, 2019, 12, 1823) trend estimates are probably underestimated.


| INTRODUC TI ON
Human actions have the potential to generate artificial selection and evolution in wild populations (Hendry et al., 2017;Stockwell et al., 2003), and one of the most important agents of anthropogenic selection can be harvest (Fugère & Hendry, 2018). In a recent article in Evolutionary Applications, LaSharr et al., (2019) argue that horn size in bighorn sheep (Ovis canadensis) throughout their range is generally stable or increasing, largely irrespective of the degree of selectivity of trophy hunting in different regions. Harvest of bighorn sheep rams is limited throughout the species' range by varying combinations of limited entry hunt systems and a definition of minimum horn size of harvestable ("legal") rams.
The degree to which such regulations cause artificial selection against large horn size will be determined by both the nature of the size-dependent harvest and by the overall harvest rate. In most US jurisdictions, harvest rates are relatively low, and licence numbers are typically controlled by limited entry hunts. As such, even though some jurisdictions impose minimum thresholds on the horn size, below which harvest is illegal, low overall harvest rates probably mean that selection in most of the US part of the range is low. In contrast, in the Canadian province of Alberta, where approximately 15% of bighorn sheep occur (Larkins 2012), selection is likely to be much stronger, and much more directly dictated by minimum size requirements. In most of Alberta, rams may not be harvested until a legislated degree of horn curl is reached, and harvest rates of legal individuals may be very high as unlimited numbers of licences are available for resident hunters in most management areas. As such, LaSharr et al. (2019) surmise that harvest pressure on "legal" rams is likely higher in Alberta than elsewhere, an interpretation with which we agree. However, despite the expectation of strong selection in Alberta, LaSharr et al.
(2019) conclude that the data show "... stable or increasing trends in horn growth over nearly 3 decades in the majority of hunt areas throughout the western U.S. and Canada." LaSharr et al. (2019) note that in Alberta "Age-specific horn size declined in 44% of hunt areas where harvest was regulated solely by morphological criteria", yet conclude that "phenotypic consequences are not a foregone conclusion in the face of selective harvest; over half of the hunt areas with highly selective and intensive harvest did not exhibit age-specific declines in horn size".
While basic theory of the evolution of quantitative traits (Walsh & Lynch, 2018) recognizes that responses to selection are not a foregone conclusion, the tempering of significant declines in 44% of areas with a notion that more than half of areas in Alberta show stable horn size is problematic. Sample size is greatly diminished in any one area, reducing statistical power. Nearly all areas have negative trend estimates.
We present alternative analyses of LaSharr et al.'s results. First, we conduct a re-analysis of estimates of trends in horn size, using established meta-analytic techniques, to avoid equating nonsignificance with evidence for the null hypothesis. We then discuss statistical artefacts that likely dampen trends of declining horn size in harvest data.  (Tables S1 and S2 of LaSharr et al., 2019) included measurements from 344 rams over 24 years. The estimated slope of the regression of horn size on year was 0.01 cm/year, and the 95% confidence interval was −0.12 to 0.14 cm/year. This nonsignificant result is associated with a range of rates of change that encompases increases or decreases of approximately one cm per decade, or more than 2 cm over the available time series. Is this range of changes small enough that the population can be characterized as stable? Particularly if many males are harvested the year they become legal, 2 cm could be the difference between legally harvestable, or not.

| HIER ARCHIC AL MODEL-BA S ED ANALYS IS OF HORN TRENDS
Statistical noise arising from small sample size will (a) make small effects likely nonsignificant, and (b) inflate variability from one estimate to the next. While estimated regressions of horn size on time in each area are highly uncertain, it is possible to estimate the average trends, and the variability of trends among areas, with a useful amount of precision.
We applied a mixed effects model to LaSharr et al.'s (2019) trend estimates as reported in their supplemental tables following where ̂ i is the estimated slope for area i and SE i its standard error.
i is the (unknown) true regression slopes, and m i is the estimation errors of those slopes (also unknown). The model integrates over the uncertainty in the measurement errors m i , whose distribution is given by their corresponding standard errors (SE i ). We can estimate the average of true slopes ( ) and their variances ( 2 ). Equation 1 is a random effect meta-analysis (Koricheva et al., 2013).
We applied the mixed effects meta-analytic model (Equation 1) to data from management areas in Alberta, implemented as a Bayesian mixed model using the R package MCMCglmm (Hadfield, 2010), applying diffuse Gaussian and inverse gamma priors (Gelman & Hill, 2007) for the model intercept (the mean slope, ), and the residual variance (the variance of slopes, 2 ), respectively. We estimated the mean and variance of slopes of (a) horn size, (b) horn size standardized to age seven and (c) horn size standardized to age seven accounting for environmental effects. The raw data are the trend estimates in Table S1 of LaSharr et al. (2019) for uncorrected horn size and in Table S2 for the two measures standardized to age seven.
LaSharr et al.'s supplemental tables give 95% confidence intervals for all slopes; we calculated standard errors for each estimate as one quarter of the difference between the upper and lower limits of each confidence interval. In most hunt areas in Alberta, size-based harvest regulations have been consistent over the study period. In the Westcastle-Yarrow area, the legal harvest criterion was changed from 4/5 curl to full curl in 1996, forcing a legislated strong increase in mean horn size among harvested individuals over time. We therefore conducted all analyses with and without the Westcastle-Yarrow area.
The meta-analytic model estimates a decline for all three types of horn size trend, across all areas within Alberta with consistent regulations (Table 1). We focus on predicted horn sizes for age seven accounting for environmental variables, which is most relevant to the potential contribution of evolution to changing horn size. These analyses are the basis of LaSharr et al.'s key second table and third figure. The average change in predicted horn size is = −0.08 cm per year (95% CI: −0.12 to −0.05 cm/year). Slopes vary among areas with 2 = 0.0026 (95% CI: 2 × 10 −4 to 0.0072).
From this mean and variance of slopes, we estimate the proportion of areas in Alberta where the temporal trend in horn size is negative, as where Φ 0, , 2 is a cumulative normal distribution function with mean and variance 2 , evaluated at 0. Applying Equation (2)  The proportion of areas experiencing a decline does not measure the magnitudes of trends. The mean and among-area variability in trends, inferred from the hierarchical model analyses, are depicted in Figure 1. While jurisdictions in the USA do not show consistent declines, variability among management areas is greater than in Alberta. When declines occur, they can be of similar magnitude to trends in Alberta ( Figure 1). Within all jurisdictions that have declined, those declines are about 5% of mean horn size.
In addition the main question of whether or not there are consistent trends in horn size in Alberta and in the United States, the difference in the average slope between these two groups of areas is also of interest. We fitted a model to directly estimate this difference, as Notation is as for Equation 1, except that the model intercept US represents the average slope in US areas, and the contrast represents the difference in average slope between Albertan and US areas, with AB representing an indicator variable ( AB = 0 for US areas, AB = 1 for Albertan areas); 2 loc is residual variances, (2) Note: Analyses are conducted for three subsets of the data: first, for all of Alberta, second, for management areas in Alberta that had consistent size-based harvest regulations throughout the study, and third, for US jurisdictions. Hierarchical model summaries include − , the average slope across management areas, , the standard deviation of slopes across management areas, and P ( < 0 ), the proportion of management areas experiencing a declining trend. All estimates are posterior means with 95% credible intervals in parentheses. a One management area in this subset is expected to have a substantially positively biased trend as a result of a change to the size-based harvest regulations during the course of the study. This bias is likely to be manifested primarily in the raw size data (part a); standardizations of size measurements to age seven should at least eliminate the bias associated with the management change for parts (b) and (c) because these analyses attempt to disambiguate effects of age and size at age.

| B IA S A SSO CIATED WITH A FIXED THRE S HOLD FOR HARVE S T
The magnitude of trends in horn size is downwardly biased in data from animals harvested under a strict phenotype-based threshold.
This phenomenon has been demonstrated thoroughly by Pelletier et al. (2012) and Festa-Bianchet et al. (2015). More generally, it is well established that size-dependent mortality and harvest require careful consideration when estimating growth (Ricker, 1969). In this section, we attempt to make the most important considerations as clear as possible.
Consider the phenotypic distributions depicted by red and blue dotted lines in Figure 2a. These distributions differ in mean by 10 cm. Imagine that the phenotypic distributions represent a population where the mean phenotype declined from 90 cm (red) to 80 cm (blue). Now, imagine that the only data available are from individuals with phenotypes greater than 100 cm (the threshold in Figure 2a).
Would the available data from the two time periods estimate the magnitude of the change?
The mean of a truncated normal distribution, t , is given by where and 2 are the mean and variance of the (nontruncated) normal distribution, and ( ) and Φ ( ) are Gaussian probability density and cumulative density functions, respectively. Our qualitative conclusion is not dependent on the trait having a normal distribution, but analytical results for the normal distribution illustrate it.
The mean of the truncated red distribution is 105.25 cm, and the , 2 ) , (f) mean of the blue is 103.73 cm. A true decline of 10 cm is underestimated by −6.6. This phenomenon is not restricted to thresholdbased selection. Figure 2b shows a highly selective logistic function.
If this function describes the probability of being harvested, then the corresponding distributions of available data are given by the solid red and blue lines. In the numerical example in Figure 2b, the true change in phenotype of 10 cm is reflected by a change of about −3.3 cm in the measured sample.
While t is entirely determined by the difference between the truncation point and the underlying mean, that is, by t − , for any given variance, it is useful to visually examine Equation (4), and the partial derivatives of t with respect to both and t. Figure 3 shows how, when t − is large, t is almost entirely influenced by  Figure 3a), t is much less than one, while t is larger, and closer to one, indicating that any trends (or stability) is likely to be more reflective of changes (or stability) in the truncation point.  (4) by noting that a truncation point that is defined relative to the mean can be related to a fixed truncation point t, as defined above, as t = + . The mean of a truncated distribution in terms of this truncation point that is defined relative to the mean is then So, the mean of the truncated distribution is given by the underlying mean, plus a quantity 2 ( ,0, 2 ) 1 − Φ ( ,0, 2 ) . This additional quantity does not depend on the mean, and so = 1. As such, a change in is perfectly reflective of a change in , when a "moving" truncation point is defined relative to the mean itself. This section considered the mean of a distribution truncated at a single time point. In a long-lived animal, the consequences of truncation will be more complicated. As a cohort matures, and its horns grow, selection of horn size by hunters, and thus the accumulation of data from trophies, will occur over several years.
Phenotypic changes will affect the age-specific probability of exceeding the truncation threshold. The mathematical study of truncation in this section is aimed at understanding, in isolation, how truncation contributes to a more complex data generating mechanism.

| ADD ITI ONAL B IA S A SSO CIATED WITH CHANG ING AG E S TRUC TURE
The interaction of threshold-based harvest and age structure increases the scope for biases that would dampen detection of temporal trends in horn size. Two consequences of age structure are relevant. Within a cohort, faster growing individuals will be harvested at younger ages than slower growing individuals (Douhard et al., 2016;Hengeveld & Festa-Bianchet, 2011). The fastest growing individuals will be harvested at relatively young ages, and the mean size at age of individuals harvested at older ages will be downwardly biased. As a result, growth rates estimated from harvest data will be downwardly biased. The underlying principles are well established (Ricker, 1969), and we illustrate them here with the help of a numerical toy example aimed qualitatively at the application to horn growth.
When the downward bias in estimated growth rate is combined with the expectation that mean age at harvest will increase if growth rates decline, as reported by Festa-Bianchet et al. (2014), then an additional bias will arise. LaSharr et al. (2019) predict mean size at age seven from growth functions estimated from harvest data. Since these growth functions are too shallow, they underestimate the mean size at age seven of individuals that are harvested relatively young, and overestimate the mean size at age seven of individuals shot at relatively old ages. Where horn growth rates have declined, such that harvested individuals show an increasing age trend, the change detected from trophy harvests will be further dampened because older individuals will have upwardly biased estimates of mean size at age seven.
There are too many unknowns to quantify the degree to which bias generated by the combination of underestimated growth rates and shifting mean age at harvest influence results in LaSharr et al.
(2019). We will therefore only illustrate the qualitative bias. We simulated three cohorts with growth mean rates of g = [ 0.8, 1.0, 1.2 ] .
We simulated 1000 individuals in each cohort and assigned each individual i a growth rate effect described by a standard random deviate i . We then assigned each individual a horn size trajectory, relative to a harvest threshold with an arbitrary value of zero, using the formula y ia = −6 + ( g + i ) a for integer values of age, (5a) = + 2 ( + , , 2 ) 1 − Φ ( + , , 2 ) , (5b) = + 2 ( , 0, 2 ) 1 − Φ ( , 0, 2 ) a, between four and 10. In the growth expression, −6 is an initial size, relative to the threshold for harvest (and therefore measurement) that we arbitrarily set to zero; these only have relevance with respect to each other, and their exact values are determined by convenience for plotting. We simulated harvest by selecting individual records of size at age, if size at age exceeded the threshold, with probability 0.5, simulating a 50% annual harvest rate of harvestable males.
We recorded the trajectory of size at age in each cohort, and the trajectory inferred from size at age of harvest (top row of Figure 4). Mean growth rates were underestimated by harvested individuals across all cohorts. The slowest-growing cohort had a 73% downward bias, relative to approximately 42% bias in the fastest growing cohort. Age at harvest increased from the fastest to the slowest-growing cohort (middle row of Figure 4). Truncation combined with the changing age structure yielded the most severe upward bias of predicted sizes at age seven in the cohort with the lowest growth rate (bottom right panel of Figure 4), demonstrating a further bias in inference of trends in growth rate or size at age from harvest data.

F I G U R E 4
Bias arising from the interaction of phenotype-dependent harvest and shifting age structure of harvested individuals. Columns from left to right simulate cohorts with decreasing growth rates. The top row shows true growth rates (solid lines) and growth rates estimated from harvest data (red lines). The middle row shows the shifting age structure of harvested individuals, as growth rate declines but the threshold for harvest remains the same. The bottom row shows the prediction of mean size from the biased estimated growth functions. Predicted values at the standard age (seven) are most upwardly biased when the growth rate is lowest, generating a bias that will dampen any trend for decreasing horn size. In the bottom row, red points, connected with lines, show mean size at harvest for each age, connected to the mean predicted size at age 7 based on a fixed threshold to define legally harvestable rams, combined with no limit on harvest, is expected to generate the strongest selection on horn size.
Because of the biases inherent to phenotype-based data collection, it would be unwise to attempt to interpret the magnitude of the declines in horn size throughout Alberta. However, it seems inescapable that those magnitudes are underestimated. We have elucidated two mechanisms -truncation (Figures 2 and 3), and changes in age structure ( Figure 4) -that likely bias estimates of the magnitude of changes in mean horn size, as inferred from trophy data, towards smaller values.
Documentation of temporal trends in mean phenotype is a weak form of inference of evolutionary change (Endler, 1986). LaSharr et al.'s (2019) analysis also attempts to control for changes in environmental variables. Insofar as such an analysis can account for changes in the environment to which phenotypes may respond plastically, any remaining temporal trend is much more reasonably interpreted as an evolutionary change. However, any environmental changes beyond those captured by the available environmental data will contribute to temporal trends in phenotype, concomitant with any evolutionary change. We therefore caution against interpretations of changes in mean phenotype, even accounting for environmental covariates at the enormous spatial scale of LaSharr et al.'s (2019) analysis, as strong evidence for evolutionary change. Associations between trends in phenotypes and selective pressures are an additional type of evidence for an evolutionary response to natural selection (Endler, 1986)

CO N FLI C T O F I NTE R E S T
The authors declare no conflict of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
All data are publicly available in the Supporting information of LaSharr et al. (2019). TA B L E 2 Differences in temporal trends of mean horn size of harvested bighorn sheep between Albertan and US areas based on the model given in Equation 3, wherein the term ̂ represents the difference between trends in Alberta vs the United States Note: Estimates of ̂ are reported with 95% credible interals, and also P ( US − AB < 0 ), the proportion of posterior distribution of the ̂ that is negative (indicating more negative trends in Alberta than elsewhere), and P ( H 0 : US = AB ) the two-sided quasi p value associated with the null hypothesis of equal mean slopes in Alberta and in the United States. a One management area in Alberta is expected to have a substantially positively biased trend as a result of a change to the size-based harvest regulations during the course of the study and is excluded from analyses reported in part (b). This bias is likely to be manifested primarily in the raw size data.