Demographic and environmental drivers of metagenomic viral diversity in vampire bats

Abstract Viruses infect all forms of life and play critical roles as agents of disease, drivers of biochemical cycles and sources of genetic diversity for their hosts. Our understanding of viral diversity derives primarily from comparisons among host species, precluding insight into how intraspecific variation in host ecology affects viral communities or how predictable viral communities are across populations. Here we test spatial, demographic and environmental hypotheses explaining viral richness and community composition across populations of common vampire bats, which occur in diverse habitats of North, Central and South America. We demonstrate marked variation in viral communities that was not consistently predicted by a null model of declining community similarity with increasing spatial or genetic distances separating populations. We also find no evidence that larger bat colonies host greater viral diversity. Instead, viral diversity follows an elevational gradient, is enriched by juvenile‐biased age structure, and declines with local anthropogenic food resources as measured by livestock density. Our results establish the value of linking the modern influx of metagenomic sequence data with comparative ecology, reveal that snapshot views of viral diversity are unlikely to be representative at the species level, and affirm existing ecological theories that link host ecology not only to single pathogen dynamics but also to viral communities.


Section 1: Population size
Census population size (Nc) was estimated from mark-recapture data for each colony using one of three methods. For sites sampled over multiple years, Cormack-Jolly-Seber models (Cormack 1989) were implemented in the package Rcapture (Baillargeon & Rivest 2007). For sites sampled in only one year, the Petersen estimator was calculated for sites with two capture occasions and the Schnabel estimator was calculated for sites with more than two capture occasions, both with Chapman correction (Chapman 1951) in the R package FSA (Ogle 2017) ( Table S2). Sampling intervals were not consistent between years and across sites, so the same estimator could not be used. Three sites (AMA7, LR2 and LR3) were excluded because roost locations were inaccessible, so sites were sampled around livestock and there were few recaptures. Two other sites (API17 and AYA15) were only sampled using hand nets during the day, making it difficult to account for recapture probability, so colony size was not estimated for those sites. Two datasets were created using the most recent estimates of Nc, as colonies can undergo major changes in Nc over a short number of years (Streicker et al. 2012). One dataset included all sites where Nc was estimated and another more conservative dataset included only Petersen or Schnabel estimates from 2016-2017 (Table S2).
The estimate of colony size (Nc) was missing from five sites, so the more and less conservative datasets described above were each analyzed to test for a univariate effect on richness. No relationship was found with either Nc dataset for all combinations of saliva/feces and all viruses/vertebrate-infecting viruses (data not shown) so Nc was excluded from GLMs as an explanatory variable in order to include the five sites missing Nc in models.
Section 2: Host population structure Nine previously developed microsatellite loci (Piaggio et al. 2008) were amplified in two multiplex reactions. For some individuals, loci were amplified according to previously optimized conditions (Piaggio et al. 2008), and fragment analysis was performed at the University of Georgia Genomics Facility on an Applied Biosystems 3730xl instrument (Streicker et al. 2016). For other individuals, amplification was carried out using a Multiplex PCR Kit (Qiagen) in 15 μL reactions containing a final concentration of 3 mM MgCl2 and 0.2 μM each primer. PCR conditions were 15 minutes at 95°C, 40 cycles (Panel A) or 35 cycles (Panel B) consisting of 30 seconds at 94°C, 90 seconds at 52°C, and 60 seconds at 72°C, followed by 30 minutes at 60°C. Fragment analysis for these individuals was performed at the University of Dundee DNA Sequencing and Services on an Applied Biosystems 3730xl instrument. Microsatellite scoring was done using either Genemarker v.2.4.0 or the microsatellite plug-in for Geneious v. 7.17 (Kearse et al. 2012).
To account for potential scoring discrepancies between labs and microsatellite genotyping errors, 21 individuals were genotyped using both protocols, eight of which were included in the final dataset (the other 13 were from colonies not included in the final dataset). Scores differed in a consistent manner between the two amplification and genotyping methods, as expected given differences between labs and protocols (Ellis et al. 2011) and scores from samples genotyped in Glasgow were converted to allow comparison across labs (Table S3). Nineteen individuals were genotyped twice in Glasgow to ensure consistent results across replicates within the same lab.
The program PEDANT (Johnson & Haydon 2008) was used to calculate maximum likelihood estimates of genotyping error rate, and to estimate what proportion of errors were due to allelic dropout and false alleles.
The program MicroChecker (Van Oosterhout et al. 2004) was used to check for evidence of null alleles within loci or populations. The program FreeNA (Chapuis & Estoup 2007) was also used to calculate null allele frequencies. The inbreeding coefficient FIS (Weir & Cockerham 1984) was estimated for each locus using FSTAT v.2.9.3.2 (Goudet 1995). FSTAT was used to test for significant departures from Hardy-Weinberg equilibrium for within population FIS using 1000 permutations of a randomization test and Bonferroni correction.
Error rates per marker were relatively high for samples re-genotyped across different labs, particularly at loci DeroC12 and DeroD06 (Table S3). In contrast, replicates within Glasgow showed low error rates that were comparable to previous studies (Ellis et al. 2011). One locus (DeroH02) showed evidence of null alleles according to both methods, with 13 populations exhibiting evidence of null alleles in MicroChecker. FIS values also deviated significantly from Hardy-Weinberg equilibrium (HWE) at DeroH02 and DeroD06 (Table S3). Overall there was evidence of genotyping error and null alleles at the three loci DeroC12, DeroD06 and DeroH02.
Although the effects of genotyping errors may be less severe in population level analyses compared to analyses reliant on individual identification (Taberlet et al. 1999;Pompanon et al. 2005), we repeated analyses using a six loci subset, excluding potentially problematic loci to ensure that results were consistent using the two different datasets.
Per locus microsatellite diversity indices were calculated using the program FSTAT, including number of alleles (NA) and allelic richness (AR) ( Table S4). Observed heterozygosity (HO) and expected heterozygosity (HE) were calculated using adegenet (Jombart 2008;Jombart & Ahmed 2011) in R and ENA-corrected FST was calculated using FreeNA. Each pair of loci was tested for evidence of linkage disequilibrium in FSTAT, with no pairs of loci showing evidence of linkage disequilibrium using 1000 permutations of a randomization test and Bonferroni correction.
Per population statistics (Table S5) were calculated using adegenet (NA, AR, percent missing data per site, HE, and HO) and FSTAT (FIS). Two sites (AMA2 and CAJ4) were initially genotyped for larger numbers of individuals than other sites (61 and 88 respectively) but were randomly subsampled to 30 individuals to ensure that unequal sample sizes did not affect other analyses (Puechmaille 2016). Following subsampling, AMA2 deviated significantly from HWE in the 9 loci dataset while no populations deviated significantly in the 6 loci dataset (Table S5). Figure S1. Principal component analysis (PCA) of sites by environmental variables. The PCA was performed with centering and scaling on the variables annual mean temperature, annual precipitation, and annual temperature range. Sites are colored by ecoregion (blue, Coast; green, Andes; purple, Amazon) and circles show the 95% normal probability ellipse for each group. Pairwise correlations between all continuous explanatory variables were examined for potential multi-collinearity. Variables with a correlation coefficient of r>0.5 were excluded from the same model in model averaging.   Figure S4. Vertebrate-infecting viral richness and community composition compared across ecoregions. Plots show comparisons of vertebrate-infecting viruses across ecoregions in saliva (top panels) and feces (lower panels). In boxplots, bold lines show the median, and upper and lower hinges show the first and third quartiles. Whiskers extend from the hinge to 1.5 x the inter-quartile range. In PCoA plots, circles show the 95% normal probability ellipse for each group and stars indicate communities are significantly different as assessed by PERMANOVA. Colors correspond to different ecoregions within Peru (blue, Coast; green, Andes; purple, Amazon).  Figure S5. Correlations between viral community distance and geographic or genetic distance. Plots show saliva virus community Jaccard distances compared with colony geographic distance (km) (Panel A; Mantel r=-0.05; p=0.67) and genetic distance FST calculated using 9 microsatellite loci (Panel B; Mantel r=-0.007; p=0.53), and fecal virus community Jaccard distances correlated with colony geographic distance (km) (Panel C; Mantel r=0.25; p=0.003) and genetic distance FST calculated using 9 microsatellite loci (Panel D; Mantel r=0.13; p=0.03).