An insight into molecular taxonomy of bufonids, microhylids, and dicroglossid frogs: First genetic records from Pakistan

Abstract The current study was focused on documentation of amphibian assemblage in North Punjab and Islamabad Capital Territory, Pakistan, by using mitochondrial gene sequences of 16S rRNA. Our study entailed 37% of the known amphibian species of the country. We provided a phylogenetic analysis based on 74 newly generated mitochondrial 16S rRNAs from nine species of genus Microlyla, Duttaphrynus, Allopaa, Nanorana, Sphaerotheca, Minervarya, Hoplobatrachus, and Euphlyctis. We employed the maximum‐likelihood inference and Bayesian analysis to assess the taxonomic status of the samples obtained from Pakistan, with respect to other congeneric species from surrounding regions. Our findings confirmed the taxonomic status of South Asian anuran species Duttaphrynus stomaticus, Duttaphrynus melanostictus, Microhyla nilphamariensis, Allopaa hazarensis, Nanorana vicina, Sphaerotheca maskeyi (synonym: S. pashchima), Minervarya pierrei, Hoplobatrachus tigerinus, and Euphlyctis kalasgramensis in Pakistan. We have reported new country records of genus Minervarya ( M. pierrei). Minervarya pierrei was previously misidentified as Fejervarya limnocharis, due to dearth of genetic information. We provided the first genetic records of our endemic species N. vicina. The results revealed the taxonomic placement of N. vicina with respect to its congeners and validated the taxonomic status of N. vicina from its type locality (Murree) for the first time. The findings of the present study also indicated the paraphyletic relationship of A.‐ hazarensis with Nanorana species. So, based on our phylogenetic inferences, morphological characters, and habitat preferences, validity of generic status of A. hazarensis is undecided. As our data were not enough to resolve this issue, we suggest sequencing of additional mitochondrial and nuclear genes in the future studies to get a better resolution. We recommend carrying out extensive surveys throughout the country for proper scientific documentation of amphibians of Pakistan. Many new species, some of them might be endemic to Pakistan, are expected to be discovered, and taxonomic status of other species would be resolved.


| INTRODUC TI ON
Identification of species by examining only morphological characters is difficult and may result in misidentifications (Stuart et al., 2006).
Modern amphibian taxonomy relies heavily on molecular taxonomy and phylogeny (Frost et al., 2006). In the recent past, integrated taxonomic approaches have been applied successfully to resolve the complications associated with the species identification; see Phuge et al. (2020). In amphibian's taxonomy, many new species are being described worldwide (Frost, 2019) and the discovery of new anuran species is an ongoing activity (Ohler et al., 2009). Despite these new findings and descriptions, the amphibian species in South and South-East Asia remain underestimated generally due to the presence of homoplasy in morphology of amphibians (Stuart et al., 2006). Boulenger (1890) provided a detailed description of amphibians of British India (Now Pakistan, India, Myanmar, and Sri Lanka). A total of 348 amphibians have been described so far from the eight countries of South Asia, with significant contribution from India and Sri Lanka (Molur, 2008;Pratihar et al., 2014). The territory of Pakistan, which is influenced by fauna from different geographic directions, is divided zoo geographically into the Palearctic and Oriental regions (Khan, 2006). Pakistan is one of the important territories in Eurasia in respect of past biodiversity dynamics (Jablonski et al., 2020). The amphibians of Pakistan are represented by a heterogeneous assemblage of 24 anuran species belonging to nine genera (Duttaphrynus, Scutiger, Microhyla, Uperodon, Euphlyctis, Fejervarya, Hoplobatrachus, Allopaa, and Sphaerotheca) distributed over four families Bufonidae, Microhylidae, Megophryidae, and Dicroglossidae (Khan, 2006).

| Sampling area
The selected sampling area was Rawalpindi District (North Punjab) and Islamabad Capital Territory of Pakistan (Figures 1 and 2). The Rawalpindi District (33.4095°N, 72.9933°E) is located in the northwest of Punjab Province and covers an area of 5,286 km 2 . The area is rocky and has mostly scrub vegetation. Administratively, the district has seven tehsils (Rawalpindi, Gujar Khan, Kallar Syedan, Kahuta, Murree, Kotli Sattian, and Taxila). The areas experience a humid subtropical climate with long and hot summers, a short monsoon period, and mild wet winters (Chaudhry & Rasul, 2004). The average temperature ranges from 2°C in January to 38.6°C in the June. Tehsil

| Molecular analysis
The anuran specimens were euthanized by using chloroform, and toe clips were removed and stored in 95% ethanol in sample tubes for genetic analysis. The voucher specimens were fixed and later preserved in 10% formalin solution. The preserved specimens were then deposited in the museum of Herpetology Laboratory,  (Palumbi, 1996), and for genus Nanorana and Allopaa, we used the primer pair 16SC, 16SD (Cannatella et al., 1998). The PCR protocol of previous publication (Palumbi, 1996) was followed with few modifications for this study. Two mitochondrial fragments were amplified in 25 µl volume reaction. The recipe for the master mix is 2 µl of genomic DNA template, 2.5 µl of 10× PCR buffer, 0.5 µl of dNTP, 0.125 µl of Taq DNA polymerase, 0.5µl of forward primer, 0.5µl of reverse primer, and 18.875 µl of distilled water. The annealing temperature of the primers was 50°C. The thermocycler settings were as follows: initial denaturation step with 4 min at 94°C, 40 cycles of denaturation 30 s at 94°C, annealing for 30 s at 50°C, and extension for 90 s at 72°C. Final extension at 72°C was conducted for 7 min. DNA amplification was confirmed by agarose gel electrophoresis. The resulting PCR products were then cleaned using the Promega DNA purification kit (Wizard ® SV Gel and PCR Clean-Up System). The concentration of DNA was checked through NanoDrop Spectrophotometer (Invitrogen). The resulting PCR products were then sequenced in both directions using the same primers. Sanger sequencing was performed at the Institute for Cellular and Molecular Biology Core Facility, University of Texas Austin, Texas, USA. The sequences generated from the present study were deposited in the GenBank, and accession numbers are provided in Appendix 1.

| Data analysis
For reading, editing, and making consensus of forward and reverse sequences, the program Geneious (ver. R 7.1.9) (Kearse et al., 2012) was used. The new 16S sequences obtained were blasted on NCBI Nucleotide Blast Tool to identify and collect reference sequences of the same and closely related species. Sequences with more than 95% percentage similarity were retrieved and included in the analysis in order to find a good match. To analyze the taxonomic placements of our samples/species, we also included the representative samples of other species and genera of geographically linked species distrib- Nucleotide sequences were aligned using MAFFT multiple sequence alignment program v. 7 with the option --auto that chooses the most appropriate algorithm for the data type (Katoh & Standley, 2013). The ambiguities, insertion, and deletion of single nucleotides from the sequences were manually edited using program Geneious (ver. R 7.1.9) (Kearse et al., 2012). Phylogenetic analysis of sequences was performed with maximum-likelihood analysis (Bootstrap value 100) on the CIPRES Science Gateway V. 3.3 (Miller et al., 2010) using the software IQ-TREE (Nguyen et al., 2015) with 1,000 ultrafast bootstraps approximation (Hoang et al., 2018). The model selection for each analysis tree was done as part of the run in IQ-TREE (Kalyaanamoorthy et al., 2017) using the option -TESTONLY. The best-fit model for Bufonidae family was TIM2+F+I+G4, for Dicroglossidae GTR+F+I+G4, and for Microhylidae TIM2e+I+G4. The consensus tree was calculated using SumTrees v.5.4.1 (Sukumaran & Holder, 2010). The tree with the highest maximum likelihood was selected, and the support from the bootstrap was mapped into that topology. The best value of maximum likelihood for Bufonidae was −3001.2908, for Dicroglossidae −9362.9580, and for Microhylidae −2110.5332.
To evaluate different strategy, an alignment that takes into account the secondary structure was made, with the option Q-INS-i of MAFFT online service: multiple sequence alignment, interactive sequence choice, and visualization (https://mafft.cbrc.jp/align ment/ serve r/, Katoh et al., 2019). Based on secondary structure alignment, Bayesian phylogenetic inference (BI) (posterior probability 1) was performed in MrBayes ver. 3.2.6, (Ronquist et al., 2011). We ran the BI tree in CIPRES using MrBayes on XSEDE. We used the  (Schwarz, 1978). We ran the BI analysis using a set parameters of two runs with eight chains with length of 40 million generations, sampling a tree every 1,000 generations. The convergence and effective sample sizes (ESSs) >200 of the runs were seen in TRACER v.1.6.0 (Rambaut et al., 2014). A burn-in was defined at 10%, by using SUMTREES v.4.2.0 (Sukumaran & Holder, 2010), and we discarded the burn-in and calculated a maximum clade credibility tree (MCCT).
The pairwise genetic distances between species groups were estimated. Within the group mean distance, between-group mean distance was calculated by using uncorrected p-distances in the software MEGA. 7.0 (Kumar et al., 2016).

Duttaphrynus)
We estimated phylogenies using the alternative alignments (primary and secondary structure), and found that the tree topologies are almost similar to each other. We conducted a maximum-likelihood and Bayesian analyses for taxonomic identification of bufonid toads

Microhyla)
The maximum-likelihood and Bayesian analyses of microhylid species were performed on data matrix comprised of 48 samples of 10 species including two out-groups (Uperodon systoma and  (Table S4).

The phylogenetic trees inferred from both maximum-likelihood and
Bayesian analyses were similar, and the tree topologies were well re-  Tables S5 and   S6). The uncorrected p-distance between N. vicina and A. hazarensis was 2.1% (Table S5). pashchima and S. breviceps, it was 6.4% (Tables S5 and S6).   Appendix 8). However, the uncorrected p-distance between two sister clades of E. kalasgramensis was 2.1% (Table S5). Furthermore, two samples of Iran shared the same clade with our samples with 0% genetic divergence (Table S6). The species complex of D. melanostictus also entails taxonomic revision. It exhibits a wide geographical range (Wogan et al., 2016).

| D ISCUSS I ON
The ancestral range for D. melanostictus is estimated to be the Myanmar-China border (Wogan et al., 2016). We validated the taxonomic status of D. melanostictus based on our maximum-likelihood and Bayesian analyses, which inferred that newly reported samples from the study area are closely related to the Indian samples with uncorrected p-distance within group of 0.6% (Table S2). The genetic divergence of 1.7% was observed between the two clades: (China, Vietnam) and (Pakistan, India) (Table S1). This genetic variation within species was previously reported by Khan (2001), who differentiated this species of D. melanostictus from South-East Asian congeners based on morphological parameters and ranked Pakistani samples as a subspecies named D. melanostictus hazarensis. Our genetic data are also in agreement with this existing variation within this species across its range. Another study by Mulcahy et al. (2018) also indicated this genetic variation but referred all the samples as D.
melanostictus until this species complex is revised.
Microhylid species are believed to be one of the most challenging taxonomical group of microhylid frogs due to their small size, conserved morphology, and widespread distribution of its members across Asia (Garg et al., 2018); Matsui et al., 2005Matsui et al., , 2011. Molecular data have doubled the number of recognized Microhyla species found in South Asia Howlader et al., 2015a;Khatiwada et al., 2017;Seshadri et al., 2016;Vineeth et al., 2018;Wijayathilaka et al., 2016) and delineation of already known taxa (Garg et al., 2018;Matsui, 2011;Matsui et al., 2005Matsui et al., , 2011Yuan et al., 2016), therefore elevating the importance of this region for Microhyla diversity.
We provided the first genetic records of N. vicina from Pakistan by sampling from its type locality (Murree). Nanorana vicina (Stolickza, 1872) is a least studied anuran species from Pakistan. After its initial reports from Pakistan by Dubois (1976)  Allopaa hazarensis (Dubois & Khan, 1979), which is endemic to Pakistan, was first placed in the supergroup of Paa liebigii (Dubois & Khan, 1979). Latterly, based on its unique combination of morphological characters, Ohler and Dubois (2006) proposed a separate genus for this species. In a recent study, Hofmann et al. (2021) conducted the first phylogenetic analysis based on genetic sampling of A. hazarensis from foothills of Kashmir Himalaya, which also includes its type locality (Dutta, District Mansehra). In this study, we included samples from same geographical range, that is, Murree (North Punjab), and our results showed genetic resemblance of A.
hazarensis with all congeners of Nanorana and recovered as paraphyletic with respect to all sampled Nanorana species (Figures 6 and 7; Appendices 5 and 6), with genetic distance of 2.1% between N. vicina and A. hazarensis (Table S5). Hofmann et al. (2021) also reported this paraphyletic relationship of Allopaa with respect to genus Nanorana.
The separate genus of Allopaa which was described by Ohler and Dubois (2006) on morphological basis has appeared as nested within genus Nanorana by genetic analysis. Allopaa hazarensis also share its morphological characters with N. vicina, except having grayish dorsum with a superimposed network of dark olive green color, with horny spinules on dorsal and lateral sides and well-developed male secondary sex characters (nuptial spines) (Dubois & Khan, 1979) (see Figure 3). Both A. hazarensis and N. vicina share their habitat (freshwater streams) at higher elevation (>1,000 m) (Ahmed et al., 2020).
So, based on our phylogenetic inferences, morphological characters, and habitat preferences, we doubt on the validity of generic status of A. hazarensis. As our data were not enough to resolve this taxonomic issue, we suggest sequencing of additional mitochondrial and nuclear genes in the future studies to get a better resolution. As a least studied genus, which is exclusive to Pakistan, with no other documented species till date, the genus is particularly important for Pakistan. However, to prevent taxonomic instability, we are hesitant to propose any taxonomic changes until further evidence is available.
All the sampled species in our dataset of genus Sphaerotheca (S. pluvialis, S. dobsonii, S. magadha, S. rolandae, S. breviceps, S. pashchima) recovered to have an independent taxonomic species rank albeit with low branch support (Figure 8; Appendix 7). Sphaerotheca breviceps, which is endemic to South Asia, was considered as a species complex (Dubois, 1983;Dutta, 1986), but Padhye et al. (2017) restricted its distribution range to the eastern coastal plains of  (Frost, 2019). Fejervarya Limnocharis was reported by earlier workers (such as by Akram et al., 2015;Khan, 2006;Pratihar et al., 2014;Rais et al., 2012) from Pakistan, merely on basis of morphological characters. In the present study, we did first ever phylogenetic analysis based on genetic sampling from Pakistan. Minervarya species from the North Punjab which was misidentified previously as F. limnocharis is actually M. pierrei with 7.9% uncorrected p-distance between groups (Table S5)  (between groups uncorrected p-distance ranged from 0.9% to 3%) (Table S5). These lineages correspond to the separate clades, clade 1 corresponds to southern India, which include E. mudigere, and second clade is the south Indian clade that we consider to be nominal E. cyanophlyctis because of its proximity to the type locality. Third clade constitutes samples of E. kalasgramensis from Bangladesh, India (Assam), and Pakistan and the last clade with newly generated samples of Pakistan (Northern Punjab) and Iran (Figure 9; Appendix 8).
Two samples (KF992800 and KF992815) from Iran in study of Khajeh et al. (2014) are genetically identical to our samples (0% uncorrected p-distance with our samples) (Table S6) from Pakistan, India, and Bangladesh. The genetic divergence of 2.1% (uncorrected p-distance) was observed between these two sister clades (Table S5). However, we suggest detailed taxonomic study exclusively for this genus by including more mitochondrial and nuclear genes to clarify the presence of possible cryptic species under the taxonomic rank of Euphlyctis.

| CON CLUS IONS
This study was aimed -to document the species taxonomic status vicina from its type locality (Murree), which will provide a baseline data for this understudied species.

| Recommendations
Systematic surveys have never been conducted in Pakistan to document amphibian diversity of the country. A deep review is still necessary to resolve the taxonomy morphologically undistinguishable putative species complexes in the future studies.

DATA AVA I L A B I L I T Y S TAT E M E N T
The sequence data generated in this study are readily available

A PPE N D I X 3
Bayesian inference analysis based on the 16S rRNA, of genus Duttaphrynus, Family Bufonidae. Posterior probability values are indicated near each node. Sequences generated in the present study are given in red.

A PPE N D I X 4
Bayesian inference analysis based on the 16S rRNA, of genus Microhyla, Family Microhylidae. Posterior probability values are indicated near each node. Sequences generated in the present study are given in red.

A PPE N D I X 5
Bayesian inference analysis based on the 16S rRNA, of genus Nanorana and Allopaa. Posterior probability values are indicated near each node.
Sequences generated in the present study are given in red.

A PPE N D I X 6
Bayesian inference analysis based on the 16S rRNA, of genus Nanorana. Posterior probability values are indicated near each node. Sequences generated in the present study are given in red.

A PPE N D I X 7
Bayesian inference analysis based on the 16S rRNA, of genus Sphaerotheca, Fejervarya, and Minervarya. Posterior probability values are indicated near each node. Sequences generated in the present study are given in red.

A PPE N D I X 8
Bayesian inference analysis based on the 16S rRNA, of genus Hoplobatrachus and Euphlyctis. Posterior probability values are indicated near each node. Sequences generated in the present study are given in red.