Conservation implications of genetic structure in the narrowest endemic quillwort from the Eastern Amazon

Abstract The quillwort Isoëtes cangae is a critically endangered species occurring in a single lake in Serra dos Carajás, Eastern Amazon. Low genetic diversity and small effective population sizes (N e) are expected for narrow endemic species (NES). Conservation biology studies centered in a single species show some limitations, but they are still useful considering the limited time and resources available for protection of species at risk of extinction. Here, we evaluated the genetic diversity, population structure, N e, and minimum viable population (MVP) of I. cangae to provide information for effective conservation programs. Our analyses were based on 55 individuals collected from the Amendoim Lake and 35,638 neutral SNPs. Our results indicated a single panmictic population, moderate levels of genetic diversity, and N e in the order of thousands, contrasting the expected for NES. Negative FIS values were also found, suggesting that I. cangae is not under risk of inbreeding depression. Our findings imply that I. cangae contains enough genetic diversity to ensure evolutionary potential and that all individuals should be treated as one demographic unit. These results provide essential information to optimize ex situ conservation efforts and genetic diversity monitoring, which are currently applied to guide I. cangae conservation plans.


| INTRODUC TI ON
Information on the genetic diversity and population structure can be used to support monitoring and conservation programs for threatened species, such as choosing priority populations for conservation (Fallon, 2007). Genetic information is even more valuable for endemic species with restricted distributions (Hamrick et al., 1991), known as "narrow endemic species" (NES), which are composed of one or a few populations limited to a specific habitat and confined to a small geographic area (Kruckeberg & Rabinowitz, 1985). A classical assumption about NES is their lower genetic diversity in comparison with widespread species due to their small effective population sizes (Gibson et al., 2008;Leimu et al., 2006;Smith & Pham, 1996).
However, recent studies have shown that most Mediterranean plants considered as NES show moderate to high levels of genetic diversity (Fernández-Mazuecos, 2014;Forrest et al., 2017;Jiménez-Mejías, 2015;López-Pujol et al., 2013). Such genetic data for NES from tropical areas such as Eastern Amazon are still scarce.
Although the Amazon basin is usually represented as a predominant forest formation, there are several restricted and sparsely distributed open habitats within this biome, such as savannas, campinaranas, cangas, and campos rupestres (Devecchi et al., 2020;Pires & Prance, 1985). Among them, the cangas present one of the highest levels of diversity and endemism , occurring among other areas, in the elevated plateaus of the mountain range of Serra dos Carajás, southeast of the state of Pará, Brazil, in the Eastern Amazon (Figure 1a-b). This mountainous complex is characterized by high iron ore concentrations under industrial exploration, thus requiring the establishment of species conservation plans (Freitas, 1986;Santos, 1986;STCP, 2016). Mining activities in the Serra do Carajás have been accompanied by scientific expeditions and botanical investigations (Viana et al., 2016). Thus, studies on the plant diversity from this mountain range started with many records of endemic genera and species (Giulietti et al., 2019).
South America is one of the centers of diversity of the genus, and Brazil comprises 26 species, most of them considered as NES Flora do Brasil, 2020). However, specific conservation precautions need to be considered for I. cangae, which occurs in a single locality, the Amendoim Lake, submerged in the permanent and ultraoligotrophic lake ( Figure 1c) in a ferruginous altitude field in the Serra Sul dos Carajás (Figure 1e-f) Viana et al., 2016). No modern nor historical palynological data of I. cangae were recorded in other lakes in the region (Absy et al., 2014;Guimarães et al., 2014E. F. Silva, Lopes et al., 2020), indicating that this species is historically restricted to the Amendoim Lake. Its reduced area of occupancy (AOO) and extent of occurrence (EOO) have led to I. cangae being inserted in the red list of IUCN as a critically endangered (CR) species (Lansdown, 2019). Also, habitat quality deterioration due to intensive landscape alterations in the surrounding area, such as mining activities and forest conversion into pasturelands (Souza-Filho et al., 2016), may bring future impacts on the species.
In the only population genetics study on I. cangae, Santos et al.
(2020) found a high genetic diversity using ISSR markers with high gene flow among the sampling areas within the Amendoim Lake.
However, the authors did not test the genetic structure using assignment tests nor estimated effective population size for the species, which are essential parameters to outline future conservation actions (Hoban et al., 2020). Furthermore, comparing genetic diversity levels using different markers allows access to new polymorphisms in new portions of the genome which can decrease the relative effects of gene flow and genetic drift in the observed genetic structure patterns (Freville et al., 2001). In addition, thousands of markers such as SNPs can bring more information about evolutionary processes, and more accurate estimates of demographic parameters, fundamental to optimize conservation biology efforts (Morin et al., 2009;Helyar et al., 2011;Torkamaneh et al., 2018). Few studies on Isoëtes applied high-throughput sequencing technology to acquire genomic data, most of them were focusing on phylogenomics (Wood et al., 2020), phylogeography (Wood et al., 2018), local adaptation (Yang & Liu, 2016), or species delimitation (Nunes et al., 2018).
Conservation biology studies centered in a single species show some limitations (Simberloff, 1998) but they are still useful considering the limited time and resources available (Olden, 2003), and they are important either as models for management guidelines (Bichet et al., 2016) or for use of a single species as a focal, umbrella, or charismatic species (Watson et al., 2001;Pease et al., 2021;Politni et al., 2021). Here, we used SNPs from genomic data to estimate genetic diversity and population structure of the most endangered species of quillworts from Serra dos Carajás, I. cangae. This information will enhance genetic management capability both in situ and ex situ , aiming the conservation of this narrowly endemic species. We investigated how genetic diversity is structured in the Amendoim Lake, testing whether there are subpopulations and different management units for conservation strategies.
We also estimated effective population size (N e ) using genomic data to provide information about minimum viable population (MVP), a primordial data for species long-term survival. Furthermore, with thousands of SNPs, we evaluated whether I. cangae is a Neotropical NES with low genetic diversity as classical studies supposed or, according to Santos et al. (2020) using ISSR markers, whether the species is a NES with high genetic diversity.

| Sampling effort
The Amendoim Lake is located in the Carajás National Forest (Floresta Nacional de Carajás; FLONA Carajás). The FLONA Carajás was created as a protected area for sustainable use, conciliating conservation, and mining activities of the ferruginous mountain outcrops from Serra dos Carajás. It sits 720 m above the sea level with an area of 1.23 km 2 . The lake is hydrologically charged essentially by rainfall of a limited catchment basin. The water varies seasonally from ultraoligotrophic to oligotrophic conditions . The Amendoim Lake is found in the canga (Figure 1e-f) which is composed of several phytophysiognomies, comprising grasslands, scrublands, wetlands, and forest formations (Mota et al., 2015), differing in terms of the plant communities they support as well as in their soil chemistry (Mitre et al., 2018). The severe environmental conditions such as high temperature, UV radiation, high water loss, and poorly developed soils rich in metals are also peculiar of the canga (Jacobi et al., 2007), harboring endemics, and rare species (Giulietti et al., 2019).
We collected leaf samples of 55 specimens of I. cangae distributed along the whole area of the Amendoim Lake (Table S1) in five sampling areas, four of them representing similar areas used by Santos et al. (2020): north (n = 13), south (n = 10), east (n = 11), and west (n = 8), besides a fifth group to cover extra individuals located more centrally in the lake (center; n = 13) as shown in the Figure 1c. Samples were collected with a minimum distance of 2 m between plants to reduce the pool's relatedness.
The collected plants were shipped in water to the laboratory.

| Library preparation
The genetic diversity analysis of I. cangae individuals was performed using the genome skimming approach ( TapeStation (Agilent Technologies®) using a ScreenTape DNA 1,000 kit (Agilent Technologies). The libraries were adjusted to a 4 nM concentration, pooled, denatured, and diluted to a running concentration of 1.8 pM. The sequencing run was performed in the NextSeq 500 Illumina platform using a NextSeq 500 v2 kit high output (300 cycles).

| Variant calling and SNP filtering
We employed the 30,000 contigs to align and build three files for each sample: (i) reference sequences file (.fasta files) in SAMtools;  (Poplin et al., 2017). Then, we combined the resulting files into a single.gvcf by applying the function CombineGVCFs. Next, the genotype calls were verified and corrected with the GenotypeGVCFs function to improve the genetic mapping accuracy. Finally, we used F I G U R E 2 Methodological approach for sequencing and data analyses. (a) Genomic DNA sequencing was performed using one genomic library with 55 samples of Isoetes cangae from Amendoim Lake in the Serra dos Carajás, Pará, Brazil. The library was sequenced using the genome skimming method, and the assembly step was carried out using a reference genome. (b) We recovered 881,782 contigs, and we selected 30,000 contigs for the variant calling step (c). After filtering SNPs (d) by quality, we maintained only neutral SNPs for statistical analyses (e) BCFTools v1.10.2 (http://www.htslib.org/) to convert the final.gvcf into a.vcf file.
We filtered SNPs from the final.vcf file ( Figure 2d) with VCFtools v0.1.16 (Danecek et al., 2011) and the "r2vcftools" package (Pope, 2020) for the R package v3.6.3 (R Core Team, 2020). We excluded indels and maintained only biallelic SNPs as I. cangae is a diploid species (data not shown). Afterward, we selected SNPs by quality, keeping only SNPs with (i) with 10% of maximum missing data value because a large amount of missing data may affect some demographic parameters (Marandel et al., 2020) and analyses (Novembre & Stephens, 2008;Helyar et al., 2011)

| Genetic structure
Neutral SNPs were used to estimate population structure with two approaches: sparse Nonnegative Matrix Factorization (sNMF) and Discriminant Analysis of Principal Components (DAPC). Both methods are model-free, and there are no assumptions regarding the population model, unlike other clustering methods (Fenderson et al., 2020). sNMF was performed in "LEA" to estimate the individual ancestry coefficients, allowing the inference of the number of ancestral populations (K) that would correspond to the groups or genetic clusters .
We used the "adegenet" R package (Jombart & Ahmed, 2011) to carry out the DAPC and to investigate the probability of individuals belonging to each genetic cluster observed, applying a Discriminant Function Analysis (DFA) with Principal Components (PC) (Jombart et al., 2010).
This approach reduced the data dimensionality without losing genetic information and returning the best number of genetic clusters to explain the current genetic structure (Jombart et al., 2010).
For sNMF and DAPC, we tested K values between 1 and 10. In sNMF, different values for the regularization parameter (α) were tested to evaluate possible changes in the best K value , with 10 replications for each value (αvariance to verify changes in the estimated = 10, 100, 500, 1,000, 2,000, 4,000). Plots were constructed to visualize the lower cross-entropy value by K in each α. The number of K with the lowest cross-entropy represented the number of ancestry populations. In DAPC, all PCs (100% of the variance) were used to select the best K using the Bayesian information criterion (BIC) in the function find.clusters (Jombart et al., 2010).
We also tested the find.clusters function with other numbers of PCs, representing 95%, 75%, and 50% of the variance to verify changes in the estimated K (Miller et al., 2020). BIC was interpreted like crossentropy: the lower its value, the more likely this K value represents the number of genetic clusters. Finally, we ran a principal components analysis (PCA) with "adegenet" to plot sample genotypes in the multivariate space according to the five sampled areas and thus visually assess any potential clustering following a spatial pattern (Jombart et al., 2009).

| Genetic diversity, genetic distance, and effective population size
Genetic diversity indexes and their confidence intervals (C.I. 95%) were also calculated for I. cangae, employing the neutral SNPs and

| Neutral SNPs
The draft genome adopted as a reference, namely "ITV2008_illu-  (Table S2). Each sample resulted in approximately seven million reads, with a total of 810,000,051 bp, and most contigs showed fragment sizes between 250 and 500 bp (Table S2).
The variant call step identified 2,349,431 variants in the.vcf file for 55 individuals, which after filtering by quality, were reduced to 71,621 SNPs (Table S3). Later, we selected 35,638 neutral SNPs ( Figure S1) for our final dataset used in the subsequent analyses. The final dataset showed a mean coverage depth of 83.8 reads by SNP (Median = 71.49), with 9.09% and 0.40% of the maximum amount of missing data per locus and individuals, respectively (Table S1).

| Genetic structure
Both approaches for estimating genetic structure, sNMF, and DAPC, recovered one single cluster in I. cangae (Figure 3). In sNMF (Figure 3a), the lowest value of cross-entropy was 0.549 for α = 1,000 and all α values showed one ancestral population K = 1 ( Figure S2). The best value for DAPC (BIC =617.6) was found when K = 1 (Figure 3b), and the number of PCs did not affect the results ( Figure S3). Also, PCA did not indicate multiple clusters (Figure 4), corroborating the sNMF and DAPC results.

| D ISCUSS I ON
Here, we addressed the population genomics of an endangered Neotropical NES of Isoëtes, applying SNPs to provide information for management and conservation programs. We showed that I. cangae is composed of a single panmictic population with moderate genetic diversity and no inbreeding signal, contradicting the classical assumptions for a NES. Population genetics studies of Isoëtes species around the world have been reported different levels of genetic diversity in Isoëtes species, applying other genetic markers (Caplen & Werth, 2000;Chen et al., 2005;Kang et al., 2005;Kim et al., 2008;Gentili et al., 2010;Li et al., 2013;Stelt et al., 2017;Ma et al., 2019;Zheng et al., 2020). Santos et al. (2020) also found a high genetic diversity using ISSR markers but the presence of a genetically differentiated on the North of the Amendoim Lake was not corroborated by our results. Genetic diversity metrics at the individual level showed that the expected heterozygosity is significantly different in each area but the observed heterozygosity is higher and significantly different in the west sampling area, indicating that this area holds more genetic diversity. However, these differences among areas are too small (see the scale of the axis in Figure 5 and Figure S4), not being enough to recover any subpopulation in the assignment population tests.
Individual inbreeding coefficient did not differ between areas, suggesting there is no risk of inbreeding among individuals of I. cangae.
These differences can be attributed to our sampling effort, employing more individuals, and to thousands of markers (SNPs) used in analyses which allowed us a broader explanatory power on neutral genetic structure. SNPs are codominant markers that may represent different portions of the genome. SNPs contain enough information for population genetics analyses while providing a refined and accurate genomic data source to access the genetic structure and diversity at a low cost (Allendorf et al., 2010;Helyar et al., 2011;Angeloni et al., 2012). Previous studies have been successfully using this approach to study other endemic and endangered species of angiosperms from the cangas in Serra dos Carajás (e.g., Lanes et al., 2018;Carvalho et al., 2019), besides the analyses conducted with several other groups of vascular plants from distinct regions in the planet (e.g., Wickell et al., 2017;Wolf et al., 2019;Wang et al., 2020).
We followed Forrest et al. (2017)   Rather, there is an indication of outbreeding with negative F IS values. Small and Hickey (1997) also found a small and negative F IS for I. karstenii Braun., a Neotropical species from the high-altitude paramos of Merida, Venezuela. The authors interpreted these results as evidence of near-random mating within subpopulations, without inbreeding or outbreeding. F IS values provide a coefficient of "correlation between uniting gametes" (Wright, 1922(Wright, , 1965. A high positive correlation will generate inbred offspring. In contrast, a low negative correlation will generate more heterozygous offspring (Johnson & Shaw, 2015). Usually, negative F IS values resulting in more heterozygous individuals than expected might be explained by (i) reproductive mechanisms preventing inbreeding or enhancing the breeding of unrelated individuals or (ii) hybridization between different species or distant populations (Dobzhansky, 1950).
Regarding reproductive mechanisms, it is unclear whether I. cangae reproduces predominately sexually in its natural habitat. an ephemeral wetlands species from North America, due to genetic structure and low dispersal ability. Therefore, dispersion and connection among individuals may be greater in a submerged species as I. cangae with its spores being transported all over the year Santos et al., 2020), guiding to an opposite genetic pattern for other NES. Also, outbreeding can arise within a single population after generations of local adaptation in selfing plant species (Fischer & Matthies, 1997;Johnson & Shaw, 2015), even in small distances such as 30 m (Waser & Price, 1994). Subsequent crossings between inbred lines with different local adaptations can generate negative F IS (Edmands, 2007;Johnson & Shaw, 2015).
Hybridization does not fit as the process responsible for this F IS pattern. Although hybridization is a typical process in other Isoëtes species (Kim et al., 2010;Pereira et al., 2019), the only other species of the genus reported to the region (I. serracarajensis) is not sympatric in the same habitat, being found in temporary lakes, and intense fieldworks in the area never registered any hybrids between them (Nunes et al., 2018;Caldeira et al., 2019;Santos et al., 2020). In addition, historically I. cangae is only reported in the Amendoim Lake (Absy et al., 2014;Guimarães et al., 2014E. F. Silva, Lopes et al., 2020). Gene flow between distant populations, creating contact areas and leading to outbreeding in I. cangae, is also unlikely because the only known population occurs in the Amendoim Lake  and our analyses did not find a genetic structure.
Usually, conservation programs based on genetic data aim to reduce the loss of genetic diversity through genetic drift, increasing the number of individuals by translocation of adult individuals (Johnson et al., 2010), or transplantation via spores between populations (Hufford & Mazer, 2003). Our results indicated that I. cangae individuals should be treated as one demographic unit for conservation and management purposes. Furthermore, our estimates for the effective population size of I. cangae using the LD method showed a large population size even in lower bound of confidence intervals, N e > 500, which probably would allow the species to adapt to environmental changes (Jamieson & Allendorf, 2012;Hoban et al., 2020). Population census estimates indicated the presence of 200,000 individuals in the Amendoim Lake, which is in accordance with our N e evaluation.
MVP estimates can be interpreted as the minimum number of individuals that need to be rescued in an ex situ conservation strategy to maintain future generations' current genetic diversity in I. cangae.
Even considering the caveats, our MVP estimation is a good starting point to a viable ex situ conservation strategy aiming for a minimum population that will not suffer from inbreeding in the short and medium term (Allendorf et al., 2010;Jamieson & Allendorf, 2012).
Usually, there are concerns about inbreeding in endangered species (Frankham et al., 2017). Still, our results suggested the occurrence of outbreeding for I. cangae, which also needs to be considered in conservation programs for this species.
The vulnerability of I. cangae goes beyond being an NES, restricted to a single lake in Eastern Amazon. Habitat loss due to the mining activities and climate change may affect this species in the future (Santos, 1986;Souza-Filho et al., 2016;STCP, 2016), and a lot of effort has been made to get information for the management of this species. Caldeira et al. (2019)  the minimal number of individuals (MVP) for an ex situ conservation approach, and the possibility of selecting individuals throughout the Amendoim Lake, since I. cangae comprises a single population. A random selection of individuals for propagation and other studies is possible even with significant differences in genetic diversity between the areas because these differences are small and we did not found differences between areas associated with F IS values.
In short, our results showed that the only known population of

CO N FLI C T O F I NTE R E S T
None declared.