Use of the nucleotide diversity in COI mitochondrial gene as an early diagnostic of conservation status of animal species

Species assessed as threatened by the International Union of Conservation of Nature (IUCN) show evidence of declining population sizes. Genetic diversity is lost by this decline, which reduces the adaptive potential of the species and increases its extinction risk in a changing environment. In this study, we collected an extensive dataset of nucleotide diversities in the COI (Cytochrome C Oxidase subunit I) mitochondrial gene for 4,363 animal species assessed by IUCN and found significantly reduced levels of diversity in threatened species of long‐lived animal classes. Then, we built up a comparative frame by acquiring the 95% confidence interval (CI) of mean values of COI nucleotide diversity in bootstrapped samples of nonthreatened species. Finally, we tested the comparative frame with data from the endangered bivalve species, Pinna nobilis. We conclude that nucleotide diversity in COI is a good proxy for a first evaluation of the conservation status of species populations, where previous knowledge is lacking and census is difficult to perform .


INTRODUCTION
The rate of species extinction and biodiversity loss has greatly increased in the past decades, therefore obtaining information about the conservation status of populations of species has become a priority (Ceballos, Ehrlich, & Dirzo, 2017).Declining population sizes of species can be taken as the common factor in the criteria used by IUCN (International Union of Conservation of Nature) to assess a species as threatened (Mace et al., 2008).Although direct estimates for population sizes by census are difficult to get, inferences on changes in species population sizes can be obtained by using molecular markers (Bonebrake, Christensen, Boggs, & Ehrlich, 2010).Estimates of nucleotide diversity of molecular markers could be especially useful because the decrease in population size produces a genomewide decrease of neutral genetic variation (Charlesworth, 2009), that can be investigated by contrasting the levels of nucleotide diversity against neutral expectations (π = Ne*μ, where π is the nucleotide diversity, Ne is the effective population size, and μ is the mutation rate; Kimura, 1983).Thus, the expectation is that genetic diversity is decreased in threatened species, which has been shown in a large number of published studies (among others: Casas-Marce et al., 2017;Hauser, Adcock, Smith, Ramirez, & Carvalho, 2002;Spielman, Brook, & Frankham, 2004;van der Valk et al., 2018;van der Valk, Diez-Del-Molino, Marques-Bonet, Guschanski, & Dalen, 2019).As genetic diversity is the raw material on which natural selection acts, loss of genetic diversity negatively affects the ability of the species to cope with environmental changes (e.g., global climate change, new or changed diseases, new predators, etc.) or in other words restricts their adaptive potential (Spielman et al., 2004;Willi, Van Buskirk, & Hoffmann, 2006).Genetic diversity levels and population size inferences of threatened species have been studied using molecular markers from both nuclear and mitochondrial genomes, though the use of the latter has been controversial since the publication of Bazin et al. in 2006(Bazin, Glemin, & Galtier, 2006; but see James, Castellano, & Eyre-Walker, 2017;Mulligan, Kitchen, & Miyamoto, 2006;Piganeau & Eyre-Walker, 2009).Bazin et al. (2006) did not find evidence of a correlation between population size and mitochondrial genetic diversity and explained their result by the effect of recurrent positive selection (genetic draft): if positive selection acts in a nonrecombinant genome, it will sweep all linked neutral variation lowering the estimates of genetic diversity.However, the expected effect of changes in population size, especially declines, can have a disproportionate effect on nucleotide diversity levels by genetic drift (Charlesworth, 2009) and will affect both the nuclear and the mitochondrial genomes.Two recent examples showing this in threatened species are the Iberian lynx (Casas-Marce et al., 2017) and eastern gorillas (van der Valk et al., 2018(van der Valk et al., , 2019)).
The integration of knowledge on population genetics into decisions concerning conservation has been rather poor (Cook & Sgrò, 2019).One explanation could be the lack of practical tools to help conservation practitioners to include genetic information into their decision-making process (Cook & Sgrò, 2019).For example, without a comparative frame, an estimate of species genetic diversity says nothing about its conservation status.A temporal comparative frame would be ideal: there is availability of genetic diversity estimates previous to the decline of species populations (e.g., Casas-Marce et al., 2017;van der Valk et al., 2018van der Valk et al., , 2019)).Nevertheless, both long-term population census and nucleotide diversity estimates are scarce or nonexistent for most species (Bonebrake et al., 2010).Another comparative frame can be built by surveying genetic diversity estimates in a group of taxonomically related species with different conservation status.Ideally, estimates of genetic diversity should come from the same neutral markers surveyed throughout the genomes of the species to be compared; but it would require a large investment in species without genomic knowledge.Therefore, given the great concern about biodiversity loss and the urgent need to take actions, in this study, we evaluated the potential use of the nucleotide diversity estimates in the popular molecular marker, the mitochondrial cytochrome C oxidase subunit I gene (cox1 or COI), as a proxy of the conservation status of species populations.Although estimates of genetic diversity in COI could be influenced by species-specific traits and histories other than changes in population size (see Ballard & Whitlock, 2004, for review), it has the advantage of being widely used in DNA barcoding (Hebert, Ratnasingham, & de Waard, 2003) and in population genetics studies of a large number of species, providing millions of sequences from thousand species through public repositories.This exceptional number of data for the same genetic marker across a wide range of species gives the opportunity to construct a strong comparative frame that can be used for an early diagnosis of the conservation status of local populations of species.
Although the risk of extinction of the species is assessed at a global scale, genetic diversity is affected by local processes such as overexploitation and habitat loss.Under these scenarios, genetic drift will become the main process impacting the levels of nucleotide diversity in the local populations, which in turn will erode the adaptive potential of the whole species, increasing its extinction risk.Thus, local assessment of the species conservation status is becoming of great importance to improve the global assessments and to implement conservation measures within countries and regions (Mace et al., 2008).As a case study, we used data of a critically endangered bivalve species, Pinna nobilis, which has been driven to the edge of extinction because of the infection of a previously unknown haplosporidium that appeared in 2016 and rapidly spread impacting all populations across the species' distribution in the Mediterranean Sea (Cabanellas-Reboredo et al., 2019).Before the outbreak, the species was locally protected in a great area of its distribution for more than 20 years, and populations census seemed to indicate population's recovery, especially in marine-protected areas (see a species review in Kersting et al., 2019).Thus, P. nobilis represents a good example to test the feasibility of using nucleotide diversity in the COI mitochondrial gene as an indicator of the conservation status of species populations in terms of their adaptive potential.

METHODS
Data from species conservation status and distribution was downloaded from IUCN (IUCN, 2018) for all vertebrates and invertebrates Bivalvia and Insecta classes and categorized as threatened (CR+EN+VU), near threatened (NT), and nonthreatened (LC).Well-annotated COI (cytochrome C oxidase subunit I mitochondrial gene) sequences of species' individuals were obtained from NCBI by searching for population genetic and phylogenetic studies in Popset (620 species, PopDS) and then in the whole nucleotide DB (3,743 species, NdtDS).At first the sequences were filtered out eliminating subspecies, hybrids, unverified organism, and haplotypes consensus sequences.Then, sequences were filtered out by taking into account the percentage of identity and length of the alignments in blast analyses of each sequence against its species dataset (SpsDS, see Supporting Information, for details of filtering).At last, the sequences were aligned by species using Muscle (Edgar, 2004) and checked visually for Popset datasets.Nucleotide diversity estimates (COI-π) and other population genetics statistics were estimated using DNAsp v6 (Rozas et al., 2017).The Tajima's D statistic was used to filter out species with evidences of nonrandom sampling (positive Tajima's D, P < .1,N = 165; Tajima, 1995).Nucleotide diversity data from P. nobilis populations were obtained from studies performed before the outbreak across nearly all range of the species distribution (see the Supporting Information -Methods for details).Statistical analyses were performed using R v3.5.To test differences between threatened and nonthreatened species we used a one-side asymptotic test of normal quantile ranks (two-sample van der Waerden test) as implemented in the R package coin within the framework of permutation tests (Hothorn, van de Wiel, & Zeileis, 2008).The 95% Confidence Intervals (CI) were obtained by the BCa method (Bias Corrected and accelerated bootstrap interval) with the R package "boot" (Hesterberg, 2011).Methods used for additional analyses are presented as the Supporting Information.N i refers to the number of sequenced individuals, N s is the number of analyzed species, and N as is the number of analyzed sites in the sequences alignments.

RESULTS
After the collection and filtering of COI sequences, we got nucleotide diversity estimates for 4,363 animal species based on alignments with an average length (N as ) of 587 bp and 19 individuals (N i ).Table 1 shows the number of species evaluated in this study (N s ) by classes and the representation of the IUCN database.Basic statistics by class and IUCN category are presented in Supplementary Table S1 and raw data in Supplementary Dataset 1. Differences in nucleotide diversity levels (π) were evaluated by permutation tests within each class, showing significantly lower COI nucleotide diversity (COI-π) levels in all threatened groups of species within classes, except in Insecta (Figure 1).We investigated different bias that could be affecting these results: (1) unbalanced sam- ple sizes in numbers of species (N s ), (2) the effect of low sample sizes (N i ) on the estimated COI-π, (3) the effect of the differences in length of the alignments (N as ), and (4) species distribution.Details of the analyses are presented in the Supporting Information.Overall, the results show that the significant differences presented in Figure 1 are kept across all analyses, except in Amphibia where we found the effects of N s and N i (Supplementary Table S2).
In order to build up a comparative frame that is useful for researchers interested in checking the conservation status of species populations without assessment in IUCN, we performed a resampling of 10,000 replications, bootstrapping the mean values of COI-π for the threatened, and nonthreatened groups of species within each class.The results showed no overlapping between the 95% Confidence Intervals (CI) of mean values of COI-π for the two groups of species (Table 2).We corrected the lower boundary of the 95% CIs of the nonthreatened species by the estimated effect of low sample sizes (N i ≤ 10) on COI-π (Supplementary Table S4 and Supplementary Figure 1), and still no overlapping is showed between the 95% CIs, except in Aves where the upper boundary of threatened overlaps with the lower boundary of nonthreatened in 6e −4 (Table 2).
Finally, we used the Bivalvia comparative frame (95% CI, Table 2) to evaluate the conservation status of Pinna nobilis before the outbreak which started in 2016 and drove the species nearly to extinction.COI-π estimated for the species was 0.0055 falling outside of bivalves' 95% CI of the nonthreatened group (Table 2).COI-π estimated for local populations shows the same result (Figure 2).
The Amphibia class shows some inconsistent results across all analyses and should be evaluated with a more curated dataset.We did not find evidence of signifi-cantly lower levels of COI-π values in the Insecta class, though a reduction was observed for critically endangered (CR) species (Supplementary Dataset 1).Because insects have on average a greater number of generations per year than the other animals studied here, and genetic diversity depends on the mutation rate by generation, we hypothesize that insects recover the mutationdrift equilibrium in shorter periods of time than longlived animals when population reductions are not too drastic.
Since population decline is a common criterion to assess a species as threatened (Mace et al., 2008), our results are compatible with a model in which population declines affect genetic variation in the mitochondrial COI gene, producing low values in threatened species.Given that we are evaluating the same gene in all species, we assumed that selective constraints and mutation rates of COI are not significantly different among the threatened and nonthreatened groups of species within classes.However, selection and variation in mutation rates have been suggested as the reason for the unclear relationship between mitochondrial DNA diversity and population size (Bazin et al., 2006;James et al., 2017;Nabholz, Glemin, & Galtier, 2009;Piganeau & Eyre-Walker, 2009).On the one hand, Bazin et al. (2006) explained the pattern of mitochondrial genetic diversity by genetic draft, while Nabholz et al. (2009) found evidence of variation in the mutation rate across species.These explanations seem to be improbable to account for the differences between threatened and nonthreatened species, as it would require systematically more events of selective sweeps or low mutation rate in all unrelated threatened species analyzed here.On the other hand, Piganeau and Eyre-Walker (2009) and James et al. (2017) found that differences in selection efficiency could explain the differences in the patterns of mitochondrial variation, which is expected by differences in effective population sizes (Lynch & Conery, 2003;Lynch, Koskella, & Schaack, 2006;Petit & Barbadilla, 2009).In contrast to the findings of Bazin et al. (2006) and Nabholz et al. (2009) in this study we do find evidence of a positive correlation between nucleotide variation in the COI mitochondrial gene and population size, because threatened species (CR, EN, and VU) have proven evidence of population declines, while nonthreatened (LC) have not (criteria A-D of IUCN; Mace et al., 2008).
The evaluation of the endangered bivalve, P. nobilis, shows that before the haplosporidium outbreak that is driving the species toward extinction (Kersting et al., 2019), COI-π levels were already significantly lower than expected from nonthreatened bivalve species (Figure 2).Wesselmann et al. (2018) found significant inbreeding coefficients (FIS) in local Spanish populations of P. nobilis, and their estimates of COI-π and allelic richness in microsatellites are well correlated (Supplementary Figure S3), suggesting that both mitochondrial and nuclear genomes were affected.We conclude that P. nobilis populations experienced a strong population decline within the last century, which reduced their capacity to cope with the new pathogen that appeared in 2016.However, the population census was not alarming before the outbreak (Kersting et al., 2019), raising the question of how representative of Ne are population census in marine species with a high variance in reproductive success (Charlesworth, 2009;Hauser et al., 2002).
The use of the COI comparative frame to check species conservation status can also be exemplified in the case of the fish, Pagrus auratus.Hauser et al. (2002) found a significant decline in nuclear genetic diversity in a population of P. auratus species during its exploitation history.Our estimated COI-π for this species is 0.0048 (N = 41), which fall outside of the 95% CI of the mean COI-π estimated for nonthreatened species of Actinopterygii (Table 2).Another very well demonstrated examples of loss of genetic diversity in both mitochondrial and nuclear markers by population declines can be found in the genomewide analyses of two endangered mammals' species: the eastern gorilla (van der Valk et al., 2018(van der Valk et al., , 2019) ) and the Iberian lynx (Casas-Marce et al., 2017).However, it is possible that the method will not be very powerful for Aves, given the low levels of COI-π found in this class, more likely due to strong selective constrains on the COI gene.
Here, we propose an easy method that can be used as an early warning system of the conservation status of a speciesť population, to estimate nucleotide diversity in the COI mitochondrial gene and contrast the value against the 95% CI of nonthreatened species from its taxonomic class.
Although further research will be necessary for the confirmation of population declines and threats, this could help to take fast and appropriate investment decisions to conserve biodiversity.We think that the method could be useful especially in local populations of marine species where there is a lack of information and many technical and financial issues to cope with.Obtaining COI-π estimates as a first inquiry of the conservation status of populations has several advantages: • Lower cost compared to surveying genomewide neutral nuclear markers.• Universal primers are available for its use as DNA barcode.• For marine species with logistic problems of access to populations, populations of many species could be surveyed with the same sampling effort.

A C K N O W L E D G M E N T S
We thank J. Gonzalez, B. Martinez-Cruz, two anonymous reviewers, and the editor for their helpful feedback.This study was financially supported by the PADI Foundation (Grant number 32981).MVL acknowledges support from MICINN (Grant number JCI-2016-29329).

D ATA AVA I L A B I L I T Y S TAT E M E N T
Research data are not shared.

F*
Notched box-plots of the differences in the levels of COI nucleotide diversity (COI-π) among nonthreatened (gray) and threatened (white) group of species by class.The box shows the interquartile range (IQR), the line shows the median, the notch displays the CI around the median, and whiskers include IQR ± 1.5.Species with IUCN's NT status showed variables COI-π values across classes (Supplementary TableS1).Numbers on the bars: N • of analyzed species (N s ).p-values of permutation tests are presented at the bottom of the boxes TA B L E 2 Confidence Intervals (95% CI) obtained by bootstrapping COI-π mean values with 10,000 replications in nonthreatened and threatened groups, for animal classes where all analyses were consistent with significant differences between the species groups Corrected by subtracting the average variation in COI-π produced by effect of low sample size (N i ≤ 10, 1.21e −3 , Supplementary Table

F
Distribution of mean values of the 10,000 samples generated by bootstrapping for bivalves' threatened and nonthreatened groups of species.Arrows show the values of P. nobilis nucleotide diversities in COI from the different studied populations (Greece = 0.0044, N = 25; Tunisia = 0.0014, N = 49; Italy = 0.0075, N = 228; France = 0.0061,N = 34; and Spain = 0.0027, N = 48) Number of species analyzed by class (N s ) and percentage of representation of the Red List IUCN Database (%Analyzed).PopDS: N s with data from NCBI-Popset Database.Avg.N i and Avg.N as : average number of sequenced individuals and analyzed sites, respectively, on which nucleotide diversity (COI-π) estimates are based TA B L E 1