Challenges of next‐generation sequencing in conservation management: Insights from long‐term monitoring of corridor effects on the genetic diversity of mouse lemurs in a fragmented landscape

Abstract Long‐term genetic monitoring of populations is essential for efforts aimed at preserving genetic diversity of endangered species. Here, we employ a framework of long‐term genetic monitoring to evaluate the effects of fragmentation and the effectiveness of the establishment of corridors in restoring population connectivity and genetic diversity of mouse lemurs Microcebus ganzhorni. To this end, we supplement estimates of neutral genetic diversity with the assessment of adaptive genetic variability of the major histocompatibility complex (MHC). In addition, we address the challenges of long‐term genetic monitoring of functional diversity by comparing the genotyping performance and estimates of MHC variability generated by single‐stranded conformation polymorphism (SSCP)/Sanger sequencing with those obtained by high‐throughput sequencing (next‐generation sequencing [NGS], Illumina), an issue that is particularly relevant when previous work serves as a baseline for planning management strategies that aim to ensure the viability of a population. We report that SSCP greatly underestimates individual diversity and that discrepancies in estimates of MHC diversity attributable to the comparisons of traditional and NGS genotyping techniques can influence the conclusions drawn from conservation management scenarios. Evidence of migration among fragments in Mandena suggests that mouse lemurs are robust to the process of fragmentation and that the effect of corridors is masked by ongoing gene flow. Nonetheless, results based on a larger number of shared private alleles at neutral loci between fragment pairs found after the establishment of corridors in Mandena suggest that gene flow is augmented as a result of enhanced connectivity. Our data point out that despite low effective population size, M. ganzhorni maintains high individual heterozygosity at neutral loci and at MHC II DRB gene and that selection plays a predominant role in maintaining MHC diversity. These findings highlight the importance of long‐term genetic monitoring in order to disentangle between the processes of drift and selection maintaining adaptive genetic diversity in small populations.


Supplementary Laboratory Procedures
MHC genotyping by SSCP/Sanger sequencing: amplification of the MHC class II DRB exon 2 was carried on with target-specific primers (Schad, Sommer, & Ganzhorn, 2004) JS1: 5´ -GAGTGTCATTTCTACAACGGGACG-3´ and JS2: 5´-TCCCGTAGTTGTGTCTGCA-3´). PCR products were denatured, loaded onto 15% polyacrylamide gels (CleanGel DNA-HP, ETC. Elektrophoresetechnik, Kirchintellinsfurt, Germany) and run on a horizontal cooling electrophoresis system (Amersham Pharmacia Biotech, Freiburg, Germany). After DNA separation, gels were fixed and silver-stained for DNA visualization (DNA Plus One Silver Staining Kit, Amersham Pharmacia Biotech, Freiburg, Germany). Samples sharing similar banding patterns were placed next to each other and re-run on a gel. All known alleles were included on each gel and used as a reference. All identified alleles were sequenced bi-directionally. At least three samples of each allele were excised from the gel, dissolved in 1xTBE buffer and re-amplified under the same PCR conditions mentioned above. Cycle sequencing of the PCR products was performed by using a dye terminator sequencing kit (Applied Biosystems, Foster City, CA, USA) and then analysed by gel electrophoresis with an Applied Biosystems automated sequencer (model 377), following the manufacturer´s instructions (Schad, Ganzhorn, & Sommer, 2005;Schad et al., 2004).
Amplicon-based NGS library preparation: amplicon libraries were generated using a two-step PCR approach. The first PCR round was carried out in a 10-μl reaction volume, including 1 μl DNA template, 0.5 μM primers, 1 μl GC enhancer and 1 unit AmpliTaq Gold 360 Master Mix. PCR cycling included an activation step at 95°C for 10 min followed by 30 cycles consisting of a denaturation step at 95 °C for 30 s, annealing at 57 °C for 30 s and elongation at 72 °C for 60 s. A final elongation step was omitted to reduce artefact formation (Smyth et al., 2010). The second PCR contained 2 μl of the product generated by the initial PCR, 1 μl GC enhancer, 80 nM per barcode primer and 0.5 units AmpliTaq Gold 360 Master Mix in a final volume of 20 μl. Cycling conditions were the same as those outlined above but the number of cycles was reduced to six. PCR products were purified by using an Agilent AMPure XP (Beckman Coulter) bead cleanup. The size and concentration of the cleaned PCR products were estimated with the QIAxcel Advanced System (Qiagen). Cleaned samples were pooled at equimolar ratios into a single tube. The amplicon library was prepared for sequencing according to the MiSeq Reagent Kit Preparation Guide (Illumina, San Diego, CA, USA) and spiked with a 5% Phix library.
Microsatellite genotyping: the microsatellite loci were fluorescently labelled by using the M13 method (Schuelke, 2000) and amplified under optimized multiplex PCR combinations based on allele size distribution. PCR was performed in a total volume of 7.

Bioinformatic Workflow for Amplicon-Based NGS Data
Raw sequence data was processed following the pipeline fully described in Santos et al., (2016). Paired-end reads were merged with FLASH ( (Magoč & Salzberg, 2011)) by using optimized minimum and maximum overlap parameters (-m= 150, -M= 230) and an offset phredscore of 30. Adapters and consecutive stretches of Ns were trimmed by using a customized Python script. Here, only reads with a perfect match to the forward and reverse target primers (JS1 and JS2) were kept for further analyses. Clear artefacts (repeats of A, C, T, G, GC or GT motifs longer than 15 bp) and reads that had more than 90% of their bases below a 30 phred quality score were removed by using the software FASTX-Toolkit. Chimaera detection and removal were performed by using the default settings of UCHIME (Edgar et al., 2011). Reads were then submitted to a local BLAST against a database built with publicly available MHC class II sequences fetched from GenBank. A dedicated python script was used to parse the BLAST results and to filter out reads that fell below an e-value threshold of 10 -13 . This e-value minimizes the inclusion of sequences that don't correspond to the MHC (e.g. Phix) in the following steps. Individual fasta files were pooled into a single file and an alignment of all reads was used for allele calling by using Oligotyping (Eren et al., 2015). This approach allowed the discrimination of similar alleles based on the identification of subtle single nucleotide variants resulting from a Shannon entropy analysis. Oligotyping (v. 2.2) was performed by using those positions (components) with higher-than-background entropy suggesting polymorphic sites. These components were identified based on visual inspection of the distribution of entropy in the alignment. Alleles were subsequently mapped to individuals by using a customized Python script (Santos et al., 2016).

Methodological aspects influencing genotyping performance of traditional SSCP/Sanger compared with amplicon-based NGS
Our comparison between a traditional method (SSCP/ Sanger) with amplicon based NGS identified type I and type II errors (i.e. allelic dropout as well as false positives). Additional alleles detected with SSCP can be attributed to the shortcomings of stabilizing the denatured DNA and the establishment of optimal conditions that allow maximum gel resolution (Glavač & Dean, 1993;Hayashi & Yandell, 1993;Ortí, Hare, & Avise, 1997). Variations in the run temperature, the PCR volume loaded onto the polyacrylamide gel and the amount of glycerol added to the gel mix are some of the factors that influence the banding pattern and might result in extra or faint bands (Sunnucks et al., 2000). The absence of sequences of these rare alleles among the putative variants obtained by high coverage NGS suggests that they are probably PCR-based artefacts such as chimeric products (Brakenhoff, Schoenmakers, & Lubsen, 1991;Meyerhans, Vartanian, & Wain-Hobson, 1990;Smyth et al., 2010) and Taq DNA polymerase errors (Qiu et al., 2001).
Alelic dropout, defined here as any mechanism that underestimates heterozygosity or the presence of an allele, was the main driver of the discrepancies in the individual genotypes reported in our study. Allelic dropout can be caused by PCR competition which can often be resolved by high sequencing depth using NGS. However, an additional mechanism of allelic dropout that can occur using methods like SSCP is band sharing (i.e. very similar conformation of an allele that results in overlapping bands). Suboptimal resolution in the banding pattern of SSCP gels might also explain underestimates of individual heterozygosity.
The ability to generate more sequencing data per individual (i.e. high sequencing depth) represents a clear advantage of NGS over traditional methods employed for genotyping MHC genes, although the processing of larger datasets is associated with numerous challenges (Babik, 2010;Lighten, van Oosterhout, & Bentzen, 2014). Limitations of NGS include platform-dependent sequencing errors; for instance, Illumina reports up to 0.1% error rates for 75-85% of bases (Glenn, 2011) and, as in SSCP/Sanger sequencing, PCR biases are a pervasive concern in NGS analysis (Burri, Promerová, Goebel, & Fumagalli, 2014;Lenz & Becker, 2008). The need to distinguish true allelic variation from artefacts has pushed forward the development of many bioinformatic pipelines that aim at increasing the accuracy of MHC genotyping (e.g. Huchard et al., 2012;Lighten et al., 2014;Radwan et al., 2012;Sebastian, Herdegen, Migalska, & Radwan, 2016;Sommer, Courtiol, & Mazzoni, 2013). Our estimates of MHC II allelic diversity in M. ganzhorni by using the ampliconbased NGS approach offer increased confidence over traditional methods because of highsequencing depth (2642 ± 1120 reads per individual) relative to the number of MHC II DRB loci (Galan et al., 2010), repeatability (97%) and a stringent bioinformatic workflow (Santos et al., 2016;Sommer et al., 2013). Table S1. Summary of sample sizes used for microsatellites and MHC class II DRB exon 2 genotyping by SSCP / Sanger sequencing and amplicon-based NGS. Table S2. Details of microsatellite loci and PCR multiplex conditions. Locus identity, fluorescent dye, initial and final annealing temperatures for stepdown cycling conditions, forwardM13 and reverse primer concentrations in the PCR, allele size ranges and number of alleles Table S3. Log-likelihood values and parameter estimates of models testing for selection processes acting on MHC DRB exon 2 of Microcebus ganzhorni revealed by SSCP / Sanger sequencing and amplicon-based NGS. The locus Mm43b was excluded from subsequent analyses because of evidence for the presence of null alleles. Individual genotypes with missing data on more than 4 microsatellite loci (2%) or with low amplification success (4%) were excluded from the overall dataset. A total of 417 samples yielded genotype data at 11-14 microsatellite loci.   (Puechmaille, 2016); ii) natural logarithm of the probability of the data (Ln Pr(X|K), Pritchard, Stephens, & Donnelly, 2000); and iii) ΔK ( (Evanno, Regnaut, & Goudet, 2005)).     Forest fragment M4-5 was cleared by 2012 and therefore was excluded from the analyses.