Mitochondrial genomes and genetic structure of the Kemp's ridley sea turtle (Lepidochelys kempii)

Abstract The Kemp's ridley (Lepidochelys kempii) is the world's most endangered sea turtle species and is primarily distributed in the Gulf of Mexico. In the United States, South Padre Island, Texas serves as a key nesting ground for the species. Genetic studies of the Kemp's ridley have been used to aid in conservation and management practices, with the mitochondrial control region as the most commonly used marker due to its perceived hypervariability and ease of sequencing. However, with the advent of next generation sequencing technology, targeting complete mitochondrial genomes is now feasible. Here, we describe a more complete mitochondrial genome for the Kemp's ridley than has been previously published in literature and demonstrate a cost‐effective and efficient method for obtaining complete mitochondrial genomes from sea turtles. We compare the genetic diversity and taxonomic resolution obtained from whole mitochondrial genomes to that obtained from the mitochondrial control region alone. We compare current genetic diversity with previous records. Furthermore, we evaluate the genetic structure between the breeding stock in South Padre Island and that of deceased Kemp's ridleys recovered on the Northern coast of the Gulf of Mexico after the 2010 BP Deepwater Horizon oil spill, and of Kemp's ridleys stranded on the East Coast of the United States. Our results show that complete mitochondrial genomes provide greater resolution than the control region alone. They also show that the genetic diversity of the Kemp's ridley has remained stable, despite large population declines, and that the genetic makeup of deceased turtles stranded after the Deepwater Horizon oil spill is indistinguishable from the breeding stock in South Padre Island, Texas. Open Data Badge This article has earned an Open Data Badge for making publicly available the digitally‐shareable data necessary to reproduce the reported results. The data is available at https://www.ncbi.nlm.nih.gov/genbank/.


| INTRODUC TI ON
The Kemp's ridley (Lepidochelys kempii), the world's most endangered sea turtle species (Burchfield, 2005), is restricted primarily to the Gulf of Mexico (GoM) and parts of the Northern Atlantic (Bowen et al., 1998). By the mid-1980s, the Kemp's ridley had suffered a dramatic population decline, and it is estimated that fewer than 300 females nested in 1985 (National Marine Fisheries Service [NMFS] et al., 2011). The species recovered through a combination of domestic and international actions including establishing a bi-national working agreement between the U.S. and Mexico to increase the protection of nesting females, hatchlings, and eggs (Márquez Millan, Olmeda, Sánchez, & Díaz, 1989;Woody, 1989), prohibition of trawling in GoM waters offshore of Rancho Nuevo, the primary nesting beach in Mexico, during the nesting season (Márquez Millan et al., 1989), a reintroduction program, with a head-starting component, designed to form a secondary nesting colony at Padre Island National Seashore, Texas (Caillouet, Shaver, & Landry, 2015;Fletcher, 1989;, and gradual implementation of turtle excluder devices (TEDS) on U.S. shrimping vessels in the GoM (Turtle Expert Working Group [TEWG], 2000). Positive population growth was observed from the 1980s until 2010 (Caillouet, 2011;Crowder & Heppell, 2011;Gallaway et al., 2013), when annual nesting numbers dropped by 35.4% (Caillouet, Gallaway, & Putman, 2016). A decline in nesting numbers was evident in 2013 and 2014 (Caillouet, 2014;Caillouet et al., 2015), and the overall population was predicted to be decreasing by 5% per year (Heppell, 2014). The causes of the post-2010 nesting setback are still debated (Caillouet et al., 2018), but recently a record high number of nests were documented in Mexico and Texas in 2017 (Caillouet et al., 2018).
One way to predict the ability of a species to adapt to environmental change is to quantify the genetic variability within the population (Frankham, 1996). Variation within sea turtle DNA is commonly studied using microsatellite markers (Aggarwal et al., 2004), nuclear markers (Bowen et al., 1998;Bowen, Meylan, & Avise, 1991), single nucleotide polymorphisms (Hurtado et al., 2016), or the mitochondrial control region (Gaos et al., 2016;Matsuzawa et al., 2016). Previous studies of Kemp's ridleys have utilized various nuclear markers to determine divergence from Olive ridley (Lepidochelys olivacea) sea turtles (Bowen et al., 1998(Bowen et al., , 1991, determine genetic diversity between nesting colonies (Kichler, 1996), document nesting (Johnson, Bass, Libert, Marshall, & Fulk, 1999), and detect multiple paternity in clutches (Kichler, Holder, Davis, Márquez-M, & Owens, 1999). Through analysis of heterozygosity at microsatellite loci, the original decline of the Kemp's ridley population was determined to not have had a significant effect on their genetic diversity by Kichler (1996). However, a later study conducted by Stephens (2003) using microsatellites indicated that the demographic bottleneck led to a measurable loss of genetic variation in the species. The apparent contradictions are potentially resolved if the bottleneck occurred too quickly to be detected by Kichler's (1996) markers. Dutton, Pease, and Shaver (2006) used mitochondrial DNA control region sequences to compare haplotype frequencies of nesting females in Texas to haplotype frequencies from females at Rancho Nuevo, Mexico. The study found six distinct haplotypes; however, the results indicated genetic homogeneity between the two populations. Studies after the 2010 halt in population growth have focused on determining genetic diversity between nesting colonies (Rivera, 2012) and distinguishing individual nesters (Frey, Dutton, Shaver, Shelby Walker, & Rubio, 2014). Microsatellites showed no genotype segregation among rookeries in Tamaulipas, Mexico (Rivera, 2012). Recent work using mitochondrial DNA concluded that there are at least two lineages of females nesting along the Texas coast and discovered eight haplotype sequences for Kemp's ridleys (Frey et al., 2014). Presently, only two partial mitochondrial genomes have been published for Kemp's ridley sea turtles, neither of which could sequence a distinct 117 bp region (Duchene et al., 2012).
Despite the many discrete haplotypes discovered in past studies, the samples taken from Kemp's ridleys in Texas and Mexico indicate there is one homogenous population in the GoM, and though there is infrequent nesting in other areas (Johnson et al., 1999;Marquez-M, 1994;Rafferty, Shaver, Frandsen, & Montello, 2019), individuals nesting outside of the historic nesting range likely originate from nesting beaches in the western Gulf. the genetic diversity within the species, but past studies are conflicting (Kichler, 1996;Stephens, 2003).
One method that can be used to determine whether there has been a bottleneck in the Kemp's ridley population is to determine which haplotypes are present within current individuals by analyzing the mitochondrial DNA, and then comparing observed haplotype frequencies to historical data. The control region is thought to be the most variable region within the mitochondrial genome, and targeting this region by using Sanger sequencing has traditionally been more cost effective than sequencing full mitochondrial genomes. However, with the advent of sequencing-by-synthesis, now commonly applied in what is known as next generation sequencing, the financial advantage of targeting short markers is quickly diminishing., In sea turtles, complete mitochondrial genomes have been recovered primarily by targeting overlapping fragments of the mitochondrial genome using various sets of primers and sequencing products by the Sanger method (i.e., Drosopoulou et al., 2012;Hernández-Fernández, Beltrán-Torres, & Mariño-Ramírez, 2017;Shamblin et al., 2012), and by using long range PCR and amplifying a few long, overlapping, regions followed by next generation sequencing (i.e., Duchene et al., 2012).
The need for amplification through PCR can be bypassed, and genomic DNA extractions can be sequenced directly by using sequencing-by-synthesis with the complete mitochondrial genome recovered through assembly of the resulting reads. This method is possible due to the high copy number of mitochondrial DNA contained within genomic DNA, and it has been applied successfully in a wide range of taxa including invertebrates, birds, mammals, amphibians, and reptiles (Cao, Wang, Ge, & Gong, 2019;Caparroz et al., 2018;Chen, 2018;Cho et al., 2018;Cooke, King, Johnson, Huang et al., 2014). We follow a similar procedure as these studies by preparing DNA libraries from genomic DNA extractions followed by next generation sequencing to obtain complete mitochondrial genomes of L. kempii.
Here, we describe the complete mitochondrial genomes for several individuals of Kemp's ridley and demonstrate a cost-effective and efficient method for obtaining complete mitochondrial genomes from sea turtles using next generation sequencing technology. We compare the genetic diversity and taxonomic resolution obtained from whole mitochondrial genomes to that obtained from the mitochondrial control region alone, by evaluating a sampling of Kemp's ridleys in South Padre Island, Texas. Furthermore, using the control region, we evaluate the genetic structure between the breeding stock in South Padre Island and that of deceased Kemp's ridleys re-   and Western GoM (n = 42). Samples collected on the Northern GoM coast (n = 11) were labeled as DWH samples due to uncertainty whether the sampled individuals were transients or residents of the area where they stranded. Samples collected from Kemp's ridleys in rehabilitation facilities (n = 2) were labeled as captive, due to their nonreleasable status ( Figure 2). DNA (200 ng) was obtained from all samples after standard extraction with Thermo Fisher Scientific's Purelink Genomic DNA extraction kit (model #K1820-01, Thermo Fisher Scientific), following the manufacture's protocol for mouse tissue. Samples were fully digested before extraction for 2-8 hr with Proteinase K. Once the DNA was obtained using the Genomic DNA extraction kit, the concentration of DNA was measured using a Life Technologies Qubit fluorometer (Life Technologies Inc). Gel electrophoresis in a 2% agarose gel stained with ethidium bromide ensured the extracted F I G U R E 1 Kemp's ridley (Lepidochelys kempii) sea turtle nesting on the south Texas coast. Photographed by Hilary R. Frandsen DNA of all samples was of high quality and high molecular weight.

| DNA extraction
Extracted DNA was stored at −20°C.

| Control region
Two sea turtle-specific primers for the control region sequence were used as follows: LCM15382 (5′ GCTTAACCCTAAAGCATTGG 3′) and  (Dutton et al., 2008). Gel electrophoresis was used to verify target size and single-band amplification. PCR products were purified using Sigma-Aldrich's GenElute PCR Clean-Up kit or Invitrogen's Purelink Quick PCR Purification Kit. Each extracted PCR product was sequenced in the forward and reverse direction using the LCM15382 (forward) and H950g (reverse) primers by Eurofins MWG Operon, LLC. A consensus sequence for the control region was created using the LCM15382 (forward) and H950g (reverse) primers with Qiagen CLC Genomics Workbench software.
During the alignment of the forward and reverse sequences for each sample, a manual check was conducted to ensure the quality of the chromatogram reading of nucleotides. When there was a conflict between forward and reverse sequences, the strand with the clearest chromatogram trace was given priority, and that nucleotide was assigned as the consensus nucleotide. For those samples that only had one readable strand, that reading was used as the consensus sequence, as long as the chromatogram trace was of excellent quality (no double peaks) and with a minimum Phred score of 20.

| Mitochondrial genome
The genomic DNA extraction of ten individuals was used to prepare an indexed library following standard procedures with the Nextera

| Phylogenetic and population analyses
Two separate datasets were analyzed, one only using complete mitochondrial genomes and the other only using the mitochondrial control region. All available sequences from GenBank were added to these two datasets (Table 1). Haplotypes were defined with DnaSP software (Rozas, 2009). Minimum-spanning haplotype networks were created using Population Analysis with Reticulate Trees (PopArt) software (Leigh & Bryant, 2015). Arlequin v3.5.1.2 (Excoffier & Lischer, 2010) was used to make pairwise fixation index (ΦST) comparisons among all sampling groups using default settings. The statistical significance of the fixation indices was assessed under the null hypothesis of panmixia by performing 10,000 permutations of the original dataset by random reallocation of individuals to each population.
Phylogenetic analyses of the haplotypes identified from the control region were performed with MEGA7 (Kumar, Stecher, & F I G U R E 2 Sources of Lepidochelys kempii samples: Nesting females from South Padre Island in the Western Gulf of Mexico, Texas; deceased turtles recovered from the Deep Water Horizon oil spill; and stranded juvenile turtles recovered from the East Coast of the United States Tamura, 2016) using maximum-likelihood (ML) methods with bootstrap values from 10,000 replicates. The Tamura 3-parameter model (Tamura, 1992) with uniform rates was selected by MEGA7 as the best fitting model of molecular evolution based on the Akaike information criterion (AIC). The tree was rooted using two Olive ridley genomes: GenBank accession numbers AM258984 and DQ486893.
The ten mitochondrial genomes were used in a partitioned maximum-likelihood phylogenetic analysis using PartitionFinder v1.1.1 (Lanfear, Calcott, Kainer, Mayer, & Stamatakis, 2014) and RAxML v8.0.0 (Stamatakis, 2014). Each gene, RNA, and control region were aligned separately using MUSCLE. The resulting alignments were then concatenated. Data blocks were defined by codon positions of the 12 protein-coding genes, the 2 RNAs, and the control region (Table 2). PartitionFinder divided the data into 8 partitions and selected General Time Reversible plus Gamma (GTR + G) as the best evolutionary model (Table 3). Within the RAxML program, 20 independent searches of 10,000 bootstrap replicates delivered the best maximum-likelihood (ML) tree. The tree was rooted using two Olive ridley genomes: GenBank accession numbers AM258984 and DQ486893.

| RE SULTS
The control region was sequenced for 113 samples, resulting in ten unique haplotypes within the dataset. Eight of these haplotypes    (Table 6).
The maximum number of nucleotide differences between any two sequences was 4 (Table 6). For all geographical regions combined, h = 10, Hd = 0.658 ± 0.033, π = 0.00133 ± 0.00013, and the average number of nucleotide differences between pairwise sequences was 1.013 (Tables 5 and 6).
The maximum-likelihood phylogenetic reconstruction based on the ten haplotypes of the control region does not resolve the TA B L E 2 Data block arrangement for partitioned phylogenetic analyses of 12 protein-coding genes, 2 RNAs, and control region in the Lepidochelys kempii mitochondrial genome

TA B L E 3 Partition scheme identified by PartitionFinder for Lepidochelys kempii mitochondrial genome data
relationship between these haplotypes, with most branches collapsed due to weak support ( Figure 5). Only Haplotypes 1, 5, and 10 are grouped in a strongly supported clade with Haplotype 1 basal to Haplotypes 5 and 10.
Ten mitochondrial genomes were successfully sequenced and compared to two partial genomes present on GenBank (Accession JX454981, JX454982) ( containing the control region ( Figure 6). Eight out of twelve protein-coding genes are separated by a tRNA sequence ( Figure 6). This is the same gene arrangement in the two partial genomes downloaded from GenBank.
The mitochondrial genome displayed the greatest number of haplotypes (h = 12) and the highest level of haplotypic diversity (Hd = 1) ( Table 7). Gene 16s was the longest gene, had the greatest number of haplotypes among all genes (h = 4), and the greatest number of variable sites (n = 31), but a lower haplotype diversity (Hd = 0.4545) compared to the control region (Hd = 0.5909) and COX3 (Hd = 0.5909) ( Table 7).
The ten mitochondrial genomes recovered match three previ- The partitioned maximum-likelihood phylogenetic reconstruction using complete mitochondrial genomes results in a better resolved tree than that based on the control region alone ( Figure 5).
GenBank JX454982 (control region Haplotype 2) forms the basal branch. This is followed by four unresolved branches (3 belong to control region Haplotype 2 and 1 to control region Haplotype 6).
These unresolved branches form a polytomy with a strongly supported clade of specimens belonging to control region Haplotype 1. This control region Haplotype 1 clade is made up of a polytomy of three unresolved branches, and two supported clades ( Figure 5).

| D ISCUSS I ON
All but one of the six haplotypes documented by Frey et al. (2014Frey et al. ( ), present in 2003Frey et al. ( and 2006 (Table 5).
This is consistent with the fact that the majority of the population nests in northern Mexico, with a growing number documented along the Texas coast . The significant Tajima's D and Fu Fs statistics indicate the Western Gulf population may be undergoing population expansion (Table 6). This is further supported by our evidence of additional haplotypes and higher haplotype diversity compared to earlier studies (Frey et al., 2014). Population analyses of the U.S. East Coast samples are not warranted because they were taken from juveniles and therefore cannot be considered a separate population as these individuals may have originated from nesting beaches in Mexico or Texas (Pritchard & Márquez, 1973;Putman, Mansfield, He, Shaver, & Verley, 2013;Putman, Shay, & Lohmann, 2010).
Gulf and the BP DWH oil spill samples are similar. Haplotype 2 is the dominant of the three haplotypes present in the DWH samples, while Additionally, when comparing the U.S. East Coast samples to the Western Gulf samples, the resulting fixation index is also close to zero. This demonstrates that these two groups are also genetically similar and cannot be considered separate populations.  (Table 7). Ribosomal RNA 16s contained the greatest number of haplotypes of all markers, but the greatest haplotype diversity was found in the control region (Table 7).
Gene COX3 had equivalent haplotype number and haplotype diversity values when compared to the control region, but within a longer sequence. When looking at the complete mitochondrial genome, it is also evident that some genes are more variable than others. Genes ATP6, ATP8, ND2, ND3, ND4L, ND6, and COX2 and the ribosomal RNA 12s TA B L E 6 Summary statistics of haplotypes found within the three geographical sampling regions, continued Targeting solely the control region limits the detection of variation between individuals. This is apparent from the phylogenetic reconstruction where the tree based on complete mitochondrial genomes shows greater bootstrap support and more resolved branches. Though patterns can be seen within the control region, this study indicates that full genomes convey a more robust analysis.
Targeting solely the control region is adequate for assigning individuals into haplotype groups; however, specimens still contain differences from each other within the rest of the mitochondrial genome.
Analysis of full mitochondrial genomes results in more haplotypes and provides greater resolution for phylogenetic reconstruction.
Though it must also be considered that the poor resolution in the phylogenetic reconstruction based solely in the control region is partly due to the multifurcation shown by the star-shaped haplotype network which cannot be represented well by bifurcation as required in phylogenetic reconstruction (Posada & Crandall, 2001  for her assistance in preparing and processing genomic DNA for next generation sequencing analyses.

CO N FLI C T O F I NTE R E S T
The authors declare they have no conflicts of interest. Sea Turtle, Inc. is a nonprofit organization focused on sea turtle rehabilitation, conservation, and public education and is not a competing interest.

AUTH O R CO NTR I B UTI O N S
DF conceptualized the study; DF, HRF, and JAG designed the methodology; DF, JAG, and HRF acquired funding; DF, JAG, and HRF provided resources and acquired the data; DF and HRF analyzed and interpreted the data; DF and HRF drafted the manuscript; and DF, HRF, and JAG revised and approved the manuscript to be submitted.

DATA AVA I L A B I L I T Y S TAT E M E N T
Mitochondrial genome and haplotype DNA sequences can be accessed online through GenBank.