Pollinator size and its consequences: Predictive allometry for pollinating insects

Body size is an integral functional trait that underlies pollination-related ecological processes, yet it is often impractical to measure directly. Allometric scaling laws have been used to overcome this problem. However, most existing models rely upon small sample sizes, geographically restricted sampling and have limited applicability for non-bee taxa. Predictive allometric models that consider biogeography, phylogenetic relatedness and intraspecific variation are urgently required to ensure greater accuracy. Here, we measured body size, as dry weight, and intertegular distance (ITD) of 391 bee species (4035 specimens) and 103 hoverfly species (399 specimens) across four biogeographic regions: Australia, Europe, North America and South America. We updated existing models within a Bayesian mixed-model framework to test the power of ITD to predict interspecific variation in pollinator dry weight in interaction with different co-variates: phylogeny or taxonomy, sexual dimorphism and biogeographic region. In addition, we used ordinary least squares (OLS) regression to assess intraspecific dry weight – ITD relationships for 10 bee and five hoverfly species. Including co-variates led to more robust interspecific body size predictions for both bees (Bayesian R2: 0.946; ΔR2 0.047) and hoverflies (Bayesian R2: 0.821; ΔR2 0.058) relative to models with ITD alone. In contrast, at the intraspecific level, our results demonstrate that ITD is an inconsistent predictor of body size for bees (R2: 0.02 – 0.66) and hoverflies (R2: −0.11 – 0.44). Therefore, predictive allometry is more suitable for interspecific comparative analyses than assessing intraspecific variation. Collectively, these models form the basis of the dynamic R package, ‘pollimetry’, which provides a comprehensive resource for allometric research concerning insect pollinators worldwide.


5
Therefore, allometric traits central to pollination-related ecological processes both appear and 102 interact at the intra-and interspecific levels. Despite their ubiquity, few predictive models for 103 body size exist for pollinating insects below the ordinal level, with one notable exception. Cane 104 (1987) pioneered a predictive model for bee body size as a function of the intertegular distance 105 (ITD) (the distance between the wing-attachment points on either side of the thorax (See technologies. Addressing these key deficiencies will increase model accuracy and applicability 128 of predictive allometry for pollinating insects. 129 130 Here, we catalogue pre-existing models for key pollinating insect taxa (Diptera, Hymenoptera 131 and Lepidoptera) and develop new predictive allometric models within an iterative framework 132 for bees and hoverflies that incorporate species evolutionary histories, intraspecific variation 133 and biogeography. These form the basis of a new R package, entitled "pollimetry". Specifically, 134 we address the following research questions : 135 i.
Is ITD a robust predictor of inter-specific body size variation for two dominant 136 pollinator taxa, bees and hoverflies? 137 ii.
Does incorporating sexual dimorphism and phylogenetic/taxonomic relatedness when 138 We obtained specimens collected in recent field research projects on insect pollinator diversity. 148 We included studies across four continents. In Australia, collections were made in New South 149 Wales, Victoria, Queensland, South Australia and the Northern Territory. In Europe, we 150 amassed specimens from Belgium, UK, Germany, Ireland, Spain and Switzerland. In the 7 Americas, we included collections from Minnesota, USA and Ceará, Brazil. In addition, Cane's 152 (1987) original data from Alabama, USA was obtained using Engauge Digitizer version 10.6 153 (Mitchell et al. 2018). 154 155 The majority of specimens were dehydrated and weighed within three to six months of 156 collection, although some, in particular, those from Victoria, Australia, Belgium, Switzerland 157 and Cane's original samples were of variable ages: ranging from one to five years since 158 collection. We excluded damaged specimens. For every specimen, we obtained sample location 159 Body size was measured as the dry weight in milligrams of each specimen. We therefore refer 165 to body size as dry weight herein for continuity. Dry weight was measured by first dehydrating 166 specimens at 70 °C for at least 24hrs prior to weighing to remove residual humidity and then 167 weighed on an analytical balance to an accuracy of 0.001g. All North American bees as well 168 as small-bodied Australian bees were dehydrated and weighed prior to pinning. For all other 169 specimens, pins were not removed prior to weighing. Instead, we identified the pin type and 170 weighed a sample of 10 -50 pins per type. The mean weight was then subtracted off the total 171 weight. Pin weight variance was minimal (range of standard errors: 6.3*10 -4 to 2mg). 172 Intertegular distance was measured in millimetres using a stereo-microscope, either mounted 173 with a calibrated scale or microscope camera. Body length was measured along the lateral side 174 of each specimen with a calibrated scale or microscope camera for Australian, British, German, 175 Irish and Spanish specimens (see Supporting Information for visual representation of ITD and  176 body length measurements). 177 178 Data analysis: Model structures 179 All analyses were undertaken in R (version 3.5.1) (R Core Team 2018). We first assessed the 180 Pearson's correlation coefficient between ITD and body length. ITD and body length (BL) 181 were highly correlated in both bees (ρ = 0.932), and hoverflies (ρ = 0.853). We then compared 182 both ITD and body length independently in predicting body size using ordinary least squares 183 (OLS) regression to select the best body weight predictor. ITD was marginally more predictive 184 than BL in estimating dry weight in bees: ITD R 2 : 0.896; BL R 2 : 0.877, and considerably better 185 than BL for hoverflies: ITD R 2 : 0.854; BL R 2 : 0.796. Hence, we used ITD in the following 186 analyses. where Y = dry weight, α = intercept,  = allometric co-efficient and = dry weight or body 193 length . 194 195 OLS does not allow for the incorporation of random effects or phylogenetic co-variance 196 matrices. Therefore, to incorporate these more complex model structures with the best predictor 197 (i.e. ITD) of dry weight, we specified Bayesian generalised linear mixed models (GLMM) 198 using the brms package (version 2.4.0) (Bürkner 2017). Log-transformed dry weight was 199 predicted as a function of the log-transformed ITD in interaction with sex and taxonomic 200 grouping: bee families following Michener (2000) and hoverfly subfamilies following 201 Thompson and Rotheray (1998 As such, we made the explicit assumption that phylogenetic patterns in body size were 225 assessed at and above the genus level. We estimated relative node ages using the mean path 226 lengths method of Britton et al. (2002). We assessed the significance of phylogenetic signal 227 using Pagel's λ (Pagel 1999) with the phytools package (version 0.6-44; Revell et al. 2012). 228 Phylogenetic signal was highly significant for bee ln body size (λ: 0.793, p < 0.001) (Fig. 1). 229 Therefore, we implemented a nested phylogenetic generalised linear mixed model (PGLMM) 230 which considered ITD in interaction with intraspecific sexual dimorphism whilst accounting 231 for phylogenetic dependencies through a nested random term: species nested within region (i.e. 232 the nested species term was constrained by the constructed phylogeny). We refer to these 233 models as phylogenetic GLMMs. 234 235

Data analysis: Model selection: Bayesian R 2 and K-fold cross-validation 236
We first fitted the two full models described above; a taxonomic GLMM and a phylogenetic 237 GLMM. As we were interested in their predictive power, these models were then compared 238 against reduced models (i.e. without sex as either intercepts/slopes) including random effects 239 along with two ITD-only models, one with and one without random terms (Table 1)  We assessed the utility of ITD in predicting intraspecific dry weight variation. For the 10 most 261 abundant bee species of a given sex (nine using females, one using males) and five most 262 abundant hoverfly species (all using females) we tested the utility of ITD in predicting 263 Stenotritidae) and two hoverfly subfamilies (Syrphinae and Eristalinae) were represented. The 283 mean specimen number per bee species was nine (♀) and five (♂) and ranged from one -201. 284 In hoverflies, the mean specimen number per species was three for both sexes and ranged from 285 one -50. In bees, when dry weight variation was visualised across the phylogeny (Fig. 1), large 286 dry weight was most evident within the Apidae, the largest bee in our dataset being the South 287 American Xylocopa frontalis (♀ mean weight: 760.75mg). In contrast, Halictid (i.e. Halictus, 288 Homalictus and Lasioglossum species) and Colletid bees, in particular, the Australian Euhesma 289 sp. (♀ mean weight: 0.71mg, ♂ mean weight: 0.66mg) and the European Hylaeus communis 290 (♀ mean weight: 6.15mg, ♂ mean weight: 2.76mg) were considerably small. 291 292

Interspecific model selection and performance 293
All three tested co-variables exhibited significant influences on the allometric scaling of ITD 294 Table 1). For bees, both GLMM and PGLMM analyses indicated that models including 295 family or phylogeny and sex in interaction or in addition with ITD, along with our nested 296 random term better predicted dry weight relative to the baseline model (ITD-only model 297 without random term) on the basis of K-fold CV and Bayesian R 2 (Table 2; ΔR 2 : 0.046, ΔK-13 fold CV: 2226.6). However, differences in K-fold CV and Bayesian R 2 between the best-fitting 299 taxonomic and phylogenetic models were minimal (ΔR 2 <0.001, ΔK-fold CV: 7.92). In 300 hoverflies, incorporating taxonomy and/ sex increased body size predictions relative to the 301 baseline ITD-only models considerably (ΔR 2 : 0.058, ΔK-fold CV: 73.3). 302 303 Increases in model performance as a result of incorporating co-variates were most pronounced 304 in bees in terms of root mean square error (RMSE) (Fig. 3). All formulated models 305 outperformed ITD-only models in their predictive precision. RMSE ranged between 10.804 -306 12.462mg for both taxonomic and phylogenetic GLMMs. The RMSE for the baseline ITD-307 only model was 15.565mg, which was near-identical the RMSE for Cane's (1987) original 308 model: 15.553mg. The RMSE for taxonomic GLMMs for hoverflies ranged from 4.619mg to 309 4.849mg and all were slightly lower than the RMSE of the baseline ITD-only model (6.179mg). 310 The range of prediction error for ITD was also considerably lower than any pre-existing and 311 applicable model using body length: 36.36mg  8.29 for bees and 7.99mg  0.69 for hoverflies. 312  The developed R package, 'pollimetry', integrates models for estimating body size (i.e. dry 329 weight) in bees and hoverflies using the ITD and co-variates (see Table 2), which were 330 parameterized with the enclosed dataset, into a wrapper function that returns body size 331 estimates, along with standard error and 95% credible intervals. In addition, pollimetry includes 332 functions for estimating pollinator dry weight using pre-existing models which utilise the 333 following co-varying traits: body length, head width and body length * body width; see 334 Supporting Information). The R package also includes functions for estimating bee foraging Interestingly, taxonomic and phylogenetic GLMM models were near-identical in all model 354 rankings (Bayesian R 2 , K-fold CV and RMSE), demonstrating that differential allometric 355 scaling is present at/or below the familial level. These results suggest that predictive inferences 356 of body size above the family level lack accuracy and generalisability. 357

358
Where the aim is prediction, GLMMs incorporating taxonomic groupings without considering 359 phylogeny are more practical given well-resolved phylogenies are lacking for most groups (e.g. 360 one can predict allometric relationships for non-represented species). A further advantage of 361 using taxonomic groupings over phylogeny is that they provide easy-to-interpret regression 362 intercepts and/or slopes as opposed to a phylogenetic co-variance matrix. Therefore, for bees, 363 we confirm that incorporating taxonomy is predictively equivalent in predicting allometric 364 scaling relationships where phylogenetic information is unavailable. Importantly, this 365 uniformity between taxonomic and phylogenetic models may not exist for other taxa with 366 either high paraphyly or low correspondence between taxonomy and phylogeny. In hoverflies, 367 including subfamily was less informative, yet still retained, in describing body size variation, 368 potentially due to their lower taxonomic ranking. In essence, our results suggest that where Observed biogeographical differences within this study likely arise from differing species 414 diversification patterns as well as from sampling biases, such as variation in commonality 415 among species. Therefore, discerning hypotheses that explain biogeographic variation in the 416 allometric scaling of ITD is problematic. However, it is clear that the influence of biogeography 417 appears alongside species' evolutionary histories and intraspecific variation. 418 419 By incorporating phylogeny or taxonomy, sexual dimorphism and biogeographic random 420 effects we improved model predictions and reduced the limitations of traditional predictive 421 allometry. These three predictors represent fundamentally-related causes of body size variation in pollinating insects. In consideration of the multiple metrics (i.e. Bayesian R 2 , K-fold CV, 423 and RMSE) used in model selection and performance, we provide multiple, near-equally 424 accurate predictive models. This is important as research questions may not garner 425 investigation of sex-related allometric differences and may occur outside the included 426 biogeographic regions. Therefore, disseminating the most appropriate allometric model 427 becomes a hypothesis-driven formula that should consider and then discount each examined 428 factor. Importantly, given the high resolution across our described models and large sample 429 size of specimens within our study, our models will improve body size predictions relative to 430 pre-existing models even when considering only ITD. After accounting for biogeographical 431 and species-level effects, failing to incorporate sex or phylogeny/taxonomy will not result in 432 considerable error (see Fig. 3) although we endorse their use as it enables more meaningful 433 analyses. Lastly, we caution the use of ordinal-level predictive models as allometric constraints 434 are ubiquitous at the familial level (See Fig. 1).    (Table S1A). Eleven models 12 were reported for the entire order, including nine without any taxonomic breakdown of samples 13 used. Twelve models were collated for the three main suborders Nematocera (6), Brachycera 14 (4) and Cycllorapha (2) and two for specific families; Asilidae and Bombyliidae. 15 16 Hymenoptera: 38 predictive allometric models for Hymenoptera were collated (Table S1B). 17 These included eight models for the entire order, ten for Formicidae and seven for all 18    Models are present in the form of = ln( ) + ln( ) * unless Type noted with *. ** = Included body width as well as length. 44