The first complete mitochondrial genome of the Indian Tent Turtle, Pangshura tentoria (Testudines: Geoemydidae): Characterization and comparative analysis

Abstract The characterization of a complete mitogenome is widely used in genomics studies for systematics and evolutionary research. However, the sequences and structural motifs contained within the mitogenome of Testudines taxa have rarely been examined. The present study decodes the first complete mitochondrial genome of the Indian Tent Turtle, Pangshura tentoria (16,657 bp) by using next‐generation sequencing. This denovo assembly encodes 37 genes: 13 protein‐coding genes (PCGs), 22 transfer RNA (tRNAs), two ribosomal RNA, and one control region (CR). Most of the genes were encoded on majority strand, except for one PCG (NADH dehydrogenase subunit 6) and eight tRNAs. Most of the PCGs were started with an ATG initiation codon, except for Cytochrome oxidase subunit 1 with “GTG” and NADH dehydrogenase subunit 5 with “ATA.” The termination codons, “TAA” and “AGA” were observed in two subunits of NADH dehydrogenase gene. The relative synonymous codon usage analysis revealed the maximum abundance of alanine, isoleucine, leucine, and threonine. The nonsynonymous/synonymous ratios were <1 in all PCGs, which indicates strong negative selection among all Geoemydid species. The study also found the typical cloverleaf secondary structure in most of the tRNA genes, except for serine with the lack of the conventional DHU arm. The comparative study of Geoemydid mitogenomes revealed the occurrence of tandem repeats was frequent in the 3′ end of CR. Further, two copies of a unique tandem repeat “TTCTCTTT” were identified in P. tentoria. The Bayesian and maximum‐likelihood phylogenetic trees using concatenation of 13 PCGs revealed the close relationships of P. tentoria with Batagur trivittata in the studied dataset. All the Geoemydid species showed distinct clustering with high bootstrap support congruent with previous evolutionary hypotheses. We suggest that the generations of more mitogenomes of Geoemydid species are required, to improve our understanding of their in‐depth phylogenetic and evolutionary relationships.


| Ethics statement
Prior permission was acquired from the wildlife authority,

| Sample collection, and DNA extraction
The fieldwork was conducted in the northeastern region of India, and a P. tentoria sample was collected from Arunachal Pradesh state (latitude 27°30′N and longitude 95°59′E; Figure S2). The blood sample was collected aseptically from the limbs by using a micro-syringe and subsequently stored in EDTA containing vial at 4°C. The specimen was released back in the same environment after collecting the biological sample. About 10 µl of blood sample was centrifuged at 700 × g for 5 min at 4°C in 1 ml buffer (0.32 M Sucrose, 1 mM EDTA, 10 mM TrisHCl) to remove nuclei and cell debris. The supernatant was collected in 1.5 ml eppendorf tubes and centrifuged at 12,000 × g for 10 min at 4°C to precipitate the mitochondria. The mitochondrial pellet was resuspended in 200 µl of buffer (50 mM TrisHCl, 25 mM of EDTA, 150 mM NaCl), with the addition of 20 µl of proteinase K (20 mg/ml) followed by incubation at 56°C for 1 hr.
Lastly, the mitochondrial DNA was extracted by Qiagen DNeasy Blood & Tissue Kit (QIAGEN Inc.). The DNA quality was checked in 1% agarose gel electrophoresis, and the concentration of mitochondrial DNA was quantified by NANODROP 2000 spectrophotometer (Thermo Scientific).

| Mitogenome sequencing, assembly, and annotation
Complete mitochondrial genome sequencing and denovo assembly was carried out at Genotypic Technology Pvt. Ltd. Bangalore, India (http://www.genot ypic.co.in/). First, 200 ng of DNA was used in Illumina TruSeq Nano DNA HT library preparation kit for library assembly (Illumina, Inc). The purified A-tailed fragments were ligated with the sequencing indexed adapters after the fragmentation of mitochondrial DNA by ultrasonication (Covaris M220, Covaris Inc.).
Then, fragments of 450 bp were selected using sample purification beads and amplified by polymerase chain reaction (PCR) to enrich it. The amplified PCR library was analyzed using a Bioanalyzer 2200 (Agilent Technologies, Inc.) with high sensitivity DNA chips. After obtaining the required concentration (632.77 pg/µl) and mean peak size (466 bp) opted in NEXTflex Rapid DNA protocol (BIOO Scientific), total >4 million raw reads were generated through Illumina NextSeq 500 (150 × 2 chemistry; Illumina Inc). The raw reads were processed using the cutadapt tool (http://code.google. com/p/cutad apt/) for adapters and low-quality base trimming with a cutoff of Phred quality scores of Q20. Total sequencing depth was >71,000×. The high-quality reads were down sampled to 2 million reads using Seqtk (https ://github.com/lh3/seqtk ) and down sampled high-quality reads were denovo assembled using SPAdes-3.7.1 using default parameters (Bankevich et al., 2012). The generated sequence annotation was also checked in MITOS online server (http://mitos. bioinf.uni-leipz ig.de). The DNA sequences of PCGs were initially translated into the putative amino acid sequences on the basis of the genetic code of vertebrate mitochondrial genome. The mitogenome (accession no. MH795989) was submitted to the GenBank database through the Sequin submission tool ( Figure S3).
Based on a homology search in the Refseq database (https ://www. ncbi.nlm.nih.gov/refse q/), 31 Geoemydidae species mitogenomes were downloaded from GenBank and incorporated in the dataset for comparative analysis (Table S1). The genome size and comparative analysis of nucleotide composition were calculated using MEGA6.0 (Tamura, Stecher, Peterson, Filipski, & Kumar, 2013). The direction and arrangements of each gene were also checked through MITOS online server. The overlapping regions and intergenic spacers between genes were counted manually in Microsoft Excel. The start and stop codons of PCGs were checked through the Open Reading Frame Finder (https ://www.ncbi.nlm.nih.gov/orffi nder/) web tool. The comparative analysis of relative synonymous codon usage (RSCU), relative abundance of amino acids, and codons distribution were calculated using MEGA6.0. The pairwise test of the synonymous (Ks) and nonsynonymous (Ka) substitutions were calculated between Pangshura and other Geoemydids using DnaSPv5.0 (Librado & Rozas, 2009). The tRNA genes were verified in MITOS online server, tRNAscan-SE Search Server 2.0 (http://lowel ab.ucsc. edu/tRNAs can-SE/) and ARWEN 1.2 with the default settings (Laslett & Canbäck, 2008;Lowe & Chan, 2016). The base composition of all stems (DHU, acceptor, TѱC, anticodon) were examined manually to distinguish the Watson-Crick, wobble, and mismatched base pairing. The tandem repeats in the CR were predicted by the online Tandem Repeats Finder web tool (https ://tandem.bu.edu/trf/ trf.html; Benson, 1999). The base composition skew was calculated

| Phylogenetic analysis
To assess the phylogenetic relationship, the 13 PCGs of 37 mitogenomes (32 Geoemydidae species and five species from other taxonomic lineages) were aligned individually by codons using MAFFT algorithm in TranslatorX with L-INS-i strategy with GBlocks parameters and default settings (Abascal, Zardoya, & Telford, 2010). The database sequence of Chelus fimbriata (accession no. HQ172156) under family Chelidae (suborder: Pleurodira) was used as an outgroup in both ML and BA phylogenetic analysis. The dataset of all PCGs was concatenated (10,647 bp) using SequenceMatrix v1.7.84537 (Vaidya, Lohman, & Meier, 2010). The aligned dataset was further submitted to the web service, TreeBASE version 2 (Piel et al., 2009) and made publicly available (http://purl.org/phylo/ treeb ase/phylo ws/study/ TB2:S24607). The model test and phylogenetic analysis were performed at the CIPRES Science Gateway V. 3.3 (Miller et al., 2015). Six models were estimated and tested separately through PartitionFinder 2 (Lanfear, Frandsen, Wright, Senfeld, & Calcott, 2016; Table S2). The maximum-likelihood (ML) tree was constructed using IQ-Tree web server (Trifinopoulos, Nguyen, von Haeseler, & Minh, 2016) with the bootstrap support for each branch nodes were fixed with 1,000 replicates. The Bayesian analysis (BA) was performed through Mr. Bayes 3.1.2 (Ronquist & Huelsenbeck, 2003). The metropolis-coupled Markov Chain Monte Carlo (MCMC) was run for 100,000,000 generations with sampling at every 100th generation and 25% of samples were discarded as burn-in. Both ML and BA tree were further processed in iTOL v4. Interactive Tree of Life online tool for better representation (Letunic & Bork, 2007).

| Mitogenome structure and organization
In this study, the complete mitogenome (16,657 bp) of Indian Tent Turtle, P. tentoria was determined (GenBank accession no. MH795989). The mitogenome was encoded by 37 genes, including 13 PCGs, 22 tRNAs, two rRNAs, and a major noncoding CR.
Among these, 28 genes (12 PCGs, 14 tRNAs, and two rRNAs) were located on the majority strand and the remaining genes (NADH dehydrogenase subunit 6 and eight tRNAs) were located on the minority strand (Table 1, Figure 1). In other Geoemydid species, the locations of 37 genes are similar to P. tentoria in both majority and minority strands (Table S3). The study depicted the gene arrangements of P. tentoria were the same as in the typical vertebrate gene arrangement (Anderson et al., 1982

| Overlapping and intergenic spacer regions
Six overlapping regions with a total length of 12 bp were identified in P. tentoria mitogenome. These regions varied in length from 1 to 4 bp with the longest overlapping region presented between NADH dehydrogenase subunit 4 L (nad4l) and NADH dehydrogenase subunit 4 (nad4) as well as in between ATP synthase F0 subunit 8 (atp8) and ATP synthase F0 subunit 6 (atp6). In other Geoemydid species, the number of overlapping regions varied from four to six with a length variation 18 bp (C. dentata) to 94 bp (M. leprosa) with the longest overlapping region (67 bp) located between tRNA-Proline (trnP) and CR of M. leprosa. The intergenic spacers in P. tentoria mitogenome were spread over 19 regions and ranged from 1 to 27 bp with a total length of 115 bp. The longest spacer (27 bp) was observed between tRNA-Asparagine (trnN) and tRNA-Cysteine (trnC; Table 1). In other Geoemydid species, the longest intergenic spacer of 29 bp was present between tRNA-Asparagine (trnN) and tRNA-Cysteine (trnC) of N. platynota (Table S4).

| Protein-coding genes
The total length of PCGs was 11,295 bp in P. tentoria, which represents 67.8% of complete mitogenome. The nucleotide composition, AT-skew and GC-skew of PCGs in comparable Geoemydid species, is outlined in by Cytochrome oxidase subunit 3 (cox3), nad5, cytb, and nad1. The incomplete termination codon "T" was also observed in nad2, cox3, nad6, and cytb (Table S5).

| Relative synonymous codon usage
The RSCU analysis revealed a maximum abundance of alanine, isoleucine, leucine, and threonine in the PCGs of P. tentoria, whereas Arginine, Aspartic Acid, Cysteine, and Lysine were less abundant ( Figure S4). In other Geoemydid species, maximum abundance of alanine, Asparagine, isoleucine, leucine, serine, and threonine was  Figure S6). Note: The A + T biases of whole mitogenome, protein-coding genes, tRNA, rRNA, and control regions were calculated by AT-skew = (A-T)/(A + T) and GC-skew = (G-C)/(G + C), respectively.
TA B L E 2 (Continued)

| Synonymous and nonsynonymous substitutions
Darwinian selection plays an important role behind species divergence (Mikkelsen et al., 2005). The utility of mitogenomes for detecting positive selection that acts on PCGs can shed light on natural selection which may affect protein function (Bloom, Labthavikul, Otey, & Arnold, 2006;Hirsh & Fraser, 2001). These pairwise tests of the Ks and Ka substitutions were evidence of the adaptive evolution in vertebrates and other species (Montoya-Burgos, 2011;Yang & Nielsen, 2000). It was stated that, the Ka/Ks > 1 evidenced for positive selection, Ka/Ks = 1 for neutrality, and Ka/Ks < 1 for negative selection . ) as compared to other species pairs, implying a closer phylogenetic relationship between these two species. The Ka/Ks ratio of all the PCGs follows the order: cox1 < cox3 < cox2 < cytb < atp6 < n ad3 < nad5 < atp8 < nad4 < nad4l < nad2 < nad1 < nad6 ( Figure S7).
Thus, comparative analysis of Ka/Ks in Geoemydid species mitogenomes will help to understand the natural selection and evolution of species.

| Transfer RNAs and ribosomal RNAs
The wobble base pairing is a unique characteristic of RNA secondary structure and often replaces the GC or AT base pairs due to the thermodynamic stability (Yang & Bielawski, 2000). RNA-binding proteins bind to G-U sites and differ from Watson-Crick base pairs (Crick, 1966). Hence, the characteristics of tRNAs secondary structures are essential for understanding the functional role of the mitogenomes (Varani & McClain, 2000). The total length of 22 tRNAs of P. tentoria  Table 2).
Among all 22 tRNA genes, 14 were on majority strand and remaining eight (trnQ, trnA, trnN, trnC, trnY, trnS2, trnE, and trnP) on the minority strand (Table S3). The anticodons of all tRNAs genes were similar in all Geoemydid species including P. tentoria (Table S6). The tRNAs were folded into classic clover-leaf secondary structures, except for trnS1, which lacked the conventional DHU stem. It has been evidenced that, the unique arrangements "WANCY" in vertebrates, have an important role in the replacement function of the minority strand in mitogenomes (Satoh, Miya, Mabuchi, & Nishida, 2016).
The similar tRNAs arrangements were also observed in P. tentoria mitogenome and other Geoemydid species. In the tRNA secondary structures of P. tentoria, the Watson-Crick base pairing were found in most of the positions ( Figure S8). The highest changes of base pairing were observed in trnQ, while no changes were observed in trnR and trnT. Further, the wobble base pairing was observed in 11 tRNAs: in the acceptor stem of trnA, trnQ, trnN, trnC, trnY, trnE, trnP; in the TѱC stem of trnQ, trnN, trnC, trnG; in the anticodon stem of trnL2,trnQ,trnN,trnS2,trnG,trnL1,trnE,trnP;and in the DHU stem of trnA,trnQ,trnN,trnY,trnS2,trnG,trnP (Figure S8
These patterns of conserved sequences alter within different vertebrate groups, including turtles (Ruokonen & Kvist, 2002;Wang, Zhou, & Nie, 2011). The CR is also known for the initiation of replication in vertebrates, including Geoemydid species, and is located between trnP and trnF with a varying size (Bing, Fei, Yi, & Qing-Wei, 2006;Zheng et al., 2013). The length of the CR of P. tentoria was 949 bp with 66.06% A + T composition (  with two copy numbers was observed in P. tentoria (Figure 2). The numbers of tandem repeats are higher at the 3′ end of the CR in most of the studied Geoemydid species (Figure 2 Overall, the CR of Geoemydid species showed a specific sequence and structural feature, which was species-specific and can be used as a molecular marker.

| Phylogeny of Geoemydid mitogenomes
Both mitochondrial and nuclear genes have been widely used for effective species identification and delimitation in Testudines (Fritz et al., 2008;Ihlow et al., 2016). However, to better understand evolutionary relationships and phylogeny within Testudines, datasets representing more taxa and loci are needed (Le et al., 2006). Comparative mitogenomic data have demonstrated utility in elucidating the phylogenetic relationships of turtles Li et al., 2017). The genus Pangshura was erected from Batagur and elevated as a distinct genus through morphological characteristics (Günther, 1864;Moll, 1986). Indeed, molecular studies with limited sampling supported Pangshura as a discrete monophyletic genus with four species (Spinks, Shaffer, Iverson, & McCord, 2004 tween Pangshura and other Geoemydid species. In addition, based on the jaw morphology, the family Geoemydidae was divided into two subfamilies Geoemydinae and Batagurinae (Gaffney & Meylan, 1988). Nevertheless, a single complete mitochondrial genome of Batagurinae taxa (B. trivittata) is available so far and the present study contributes the denovo assembly of P. tentoria mitogenome in the global database. We propose more taxon sampling of Batagurinae from different geographical locations, in the expectation that their mitogenomes will be useful to reconcile indepth phylogeny and evolutionary relationship.

ACK N OWLED G M ENTS
We The funders had no role in study design, data collection and analysis or preparation of the manuscript.

CO N FLI C T O F I NTE R E S T
The authors declare that they have no competing interests.

DATA AVA I L A B I L I T Y S TAT E M E N T
The following information was supplied regarding the availability of