Molecular characterization of marine and coastal fishes of Bangladesh through DNA barcodes

Abstract This study describes the molecular characterization of marine and coastal fishes of Bangladesh based on the mitochondrial cytochrome c oxidase subunit I (COI) gene as a marker. A total of 376 mitochondrial COI barcode sequences were obtained from 185 species belonging to 146 genera, 74 families, 21 orders, and two classes of fishes. The mean length of the sequences was 652 base pairs. In Elasmobranchii (Sharks and rays), the average Kimura two parameter (K2P) distances within species, genera, families, and orders were 1.20%, 6.07%, 11.08%, and 14.68%, respectively, and for Actinopterygii, the average K2P distances within species, genera, families, and orders were 0.40%, 6.36%, 14.10%, and 24.07%, respectively. The mean interspecies distance was 16‐fold higher than the mean intraspecies distance. The K2P neighbor‐joining (NJ) trees based on the sequences generally clustered species in accordance with their taxonomic position. A total of 21 species were newly recorded in Bangladesh. High efficiency and fidelity in species identification and discrimination were demonstrated in the present study by DNA barcoding, and we conclude that COI sequencing can be used as an authentic identification marker for Bangladesh marine fish species.

It is evident from the published articles, books, and review papers on Bangladeshi marine fishes (Ahmed et al., 2019;Hussain, 1971;IUCN, 2000;Rahman et al., 2009;Shafi & Quddus, 1982) that the ichthyofaunal diversity statistics are incomplete. Moreover, till date, no molecular taxonomic study has been undertaken on the marine and coastal ichthyofaunal diversity of Bangladesh.
The accurate identification of fish species is a pivotal component to protect the extant ichthyofaunal biodiversity and to perform regular assessments of local fish faunas for conservation planning (Ahmed et al., 2019). Currently, partial cytochrome c oxidase subunit I (COI) sequences (DNA barcodes) are applied extensively as a complement to the traditional morpho-taxonomy for standardized and regular species identification (Chin et al., 2016;Filonzi et al., 2010;Hebert et al. 2003;Shehata et al., 2019). The marked divergence and lack of overlap between intraspecific and interspecific genetic distances is the primary reason for the selection of COI as the standard barcode gene (Hebert et al., 2003). More importantly, COI evolution is sufficiently rapid to allow the discrimination of very closely related species in most groups, as well as taxonomically significant intraspecific variation associated with geographic structure (Bucklin et al., 2011).
Molecular-based approach becomes a necessary tool when there are difficulties in morphologically similar groups or damaged samples which is a challenge even if experts are available. As a highly overpopulated country, anthropogenic activities, overfishing, habitat destruction, and natural disasters have generated significant impacts on the biodiversity and structure of the fish community in Bangladesh. Unfortunately, the marine ichthyofauna of Bangladesh remains unexplored due to the lack of taxonomists. Hence, adopting an authentic and quick identification method is essential to assist fishery managers, scientists, and policymakers for sustainable management of these invaluable marine resources. This study aims to build a DNA-based barcode library of the morphologically identified marine and coastal fish species of Bangladesh using partial COI gene sequence.

| Study area and specimen collection
Fish samples were collected from marine and coastal habitats, fish landing centers, fish markets or from the local fishermen from July 2015 to June 2019 ( Figure 1). Most of the specimens were collected from the Cox's Bazar and Patuakhali regions. At least three specimens were collected for each species. In case of rare ones, only a single specimen was analyzed. Personal fishing was also conducted to collect some rare and noncommercial fish species whenever necessary. Digital photographs of all the fishes were taken immediately and taxonomic identification of specimens was done following previous reports (Talwar & Jhingran, 1991;Carpenter & De Angelis, 2002;Carpenter & Niem, 2002;Last et al., 2016;Nakabo, 2002;Rahman et al., 2009;Siddiqui et al., 2007). Immediately after collecting the specimens, tissue samples were excised and stored in 90% ethanol. Voucher specimens were fixed with 10% formalin and then transferred to 70% ethanol solution for preservation. Voucher specimens were transported to Dhaka and deposited in the Dhaka University Zoology Museum (DUZM).   (Kimura, 1980). To estimate the K2P distances of the species with single sequence, we used at least 2-3 sequences of the same species available in GenBank submitted from Bangladesh or neighboring countries. Using this distance matrix, a neighbor-joining (NJ) algorithm was implemented to generate the phylogenetic tree topology in Mega X with a bootstrapping value of 1,000 replications (Felsenstein, 1985;Saitou & Nei, 1987). Basic statistical analyses were done in Excel 2013 and the hypotheses testing (e.g., t test and F test) was done in RStudio-1.2.5001.

| General inference
A total of 748 tissue samples were collected from Bangladesh coast, among which 376 COI sequences were obtained (  (Zhang & Hewitt, 1996).

| Elasmobranchii (Sharks and rays)
A total of 52 samples were sequenced belonging to 5 order, 8 families, 13 genera, and 21 species (Table 1) The K2P genetic distances within each taxonomic level are summarized in Table 2. The average genetic distance within species, genus, family, and order were 1.20 ± 0.0007%, 6.07 ± 0.014%, 11.08 ± 0.018%, and 14.68 ± 0.04%, respectively. The NJ tree clearly distinguished all the species and the species belonging to 8 families were represented by eight distinct clades (Figure 2).

| Actinopterygii (Ray-finned fishes)
A total of 324 sequences were generated belonging to 16 orders, 66 families, 133 genera and 164 species (Table 1). Among the studied 164 species of ray-finned fishes, two are listed as endangered (EN), two as vulnerable (VU), four as near threatened (NT), 108 as least concern (LC), 12 as data deficient (DD) and the rest 36 are not eval- We assume that some of these commercially important species are indiscriminately overexploiting and threatened by extinction from this region. So, the country now needs to assess its national red list status of marine fishes to enforce conservation and management strategy.
The sequence analysis revealed average nucleotide frequencies to be A: 23.9%, T: 29.6%, G: 18.3%, and C: 28.2%. The base composition analysis for the COI sequence showed that the average T content was the highest and the average G content was the lowest; the AT content (53.50%) was higher than the GC content (46.50%). The GC contents at the first, second, and third codon positions for the fish species were 54.62%, 43.66%, and 38.04%, respectively. The pattern of %GC content at different codons was 1st> 2nd>3rd (p-value <0.005) The K2P genetic distances within each taxonomic level are summarized in Table 3. The average genetic distance within species, genus, family and order were 0.40 ± 0.002%, 6.36 ± 0.008%, 14.10 ± 0.01% and 24.07 ± 0.02%, respectively (Table 3).  The NJ tree of all generated sequences included 164 species is provided in Figure 3. Most of the specimens of the same species were clustered together, which reflected the prior taxonomic assignment based on morphology. No taxonomic deviation was detected at the species level, indicating that the majority of the examined species could be authenticated by the barcode approach.

| Order Gobiiformes
Gobiiformes has recently been segregated from the Perciformes as a new order (Nelson et al., 2016). The average base composition was T: 30.1%, C: 27.4%, A: 24.4% and G: 18.1% for the 16 species of Gobiiformes under the two families. The AT content (54.50%) was higher than the GC content (45.50%). The average GC content in 1st, 2nd, and 3rd codon was 55.2%, 44.8%, and 37.7%, respectively which follow 1st> 2nd> 3rd codon. The K2P distances of the COI sequence within species, family were 0.59% and 13.22%, respectively.

| Order Siluriformes
A total of 14 samples were sequenced belonging to four families, six genera, and six species. The overall mean nucleotide base frequencies observed for these sequences were T: 29.30%, C: 28.10%, A: 24.80%, and G: 17.70%. The AT content (54.10%) was higher than the GC content (45.90%). The GC contents at the first, second, and third codon positions were 56.70%, 42.70%, and 38.10%, respectively. The K2P distances of the COI sequence within species and between species were 0.14 and 26.62, respectively.

| Order Pleuronectiformes
Fourteen samples were sequenced belonging to three families, five genera, and six species. The overall mean nucleotide base frequencies observed for these sequences were T: 30.80%, C: 26.70%, A: 24.10% and G: 18.30%. The AT content (54.90%) was higher than the GC content (45.10%). The GC contents at the first, second, and third codon positions were 55.50%, 42.10%, and 37.60%, respectively.
The K2P distances of the COI sequences within species and family were 0.36 and 17.65, respectively.

| Order Beloniformes
Fifteen samples were sequenced belonging to four families, six genera and seven species. The overall mean nucleotide base frequencies observed for these sequences were T: 32.40%, C: 25.90%, A: 24.90%, and G: 16.80%. The AT content (57.30%) was higher than the GC content (42.70%). The GC contents at the first, second, and third codon positions were 55.10%, 42.60%, and 30.60%, respectively. The K2P distances of the COI sequence within species, genera, and family were 0.24, 4.77, and 9.67, respectively.

| D ISCUSS I ON
DNA barcoding has been adopted as a global bio-scanner to provide an efficient molecular technique for species-specific identification using the partial sequence of the mitochondrial COI gene. It is evident from more than a decade of studies (Chang et al., 2017;Hubert et al., 2008;Lakra et al., 2011;Thu et al., 2019;Ward et al., 2005;Zhang, 2011)  was larger than 650bp, the limit typically observed for nuclear DNA sequences originating from mtDNA (NUMTs; Gunbin et al., 2017).
All of these species were differentiable based on the individual COI barcodes. Hence, this study has strongly validated the efficiency of COI barcodes for identifying fish species.
Within the Elasmobranchii, a total of 12 rays and nine sharks species were confirmed through barcoding. The overall AT and GC content was 58.60% and 41.40%, respectively. But the mean GC content of the 11 barcoded ray species was higher than the 8 shark species (43.67% vs. 38.59%). This was largely due to the GC variation in the 3rd codon position (33.59% vs. 20.38%).
In this study, the COI barcode sequences for 164 ray-finned fish species were successfully amplified (Table 1; Figure 3). At least 21 new records have been confirmed in this study based on morphometric and molecular approach (Table 1) The base composition analysis of the COI sequences revealed AT content (53.50%) to be higher than GC content (46.50%), similar to the pattern observed in Australian (Ward et al., 2005), Canadian (Steinke et al., 2009) and Cuban fish species (Lara et al., 2010 Figure 4).
Consistent with previously published fish barcoding data, pairwise genetic distance values were increasing at higher taxonomic levels.
This increase in the genetic distance through the higher taxonomic levels supports the significant change in genetic divergence at the species boundaries (Hubert et al., 2008;Lakra et al., 2011).
In this study, the average within species K2P distance was 0.40%, compared with 6.36% for within genera. The mean interspecific distance was found to be 16-fold higher than the mean intraspecific distance. More than 13.9-fold difference was observed in the marine fishes commonly encountered in the Canadian Atlantic (Steinke et al., 2009), Indian (Lakra et al., 2011) and Australian marine fishes (Ward et al., 2005). This result corresponds to the DNA barcoding principle that interspecific divergence sufficiently outscores intraspecific divergence.
The accuracy of species identification through DNA barcoding mostly depends on both interspecific and intraspecific divergence.
In our study, the average genetic distance within species was found  (Table 4).

Phylogenetic relationship of barcoded species of Elasmobranchii
and Actinopterygii were shown in separate NJ tree (Figures 2 and 3).
Each species was associated with a specific DNA barcode cluster and the relationship among these species was clearly revealed. Closer species in terms of genetic divergence were clustered at the same nodes and the distance between the terminal branches of the NJ tree widened as they got more distinct. No cryptic diversity was recorded especially in this biodiversity hotspot area. Perhaps, the reason was number of replicates for each species was relatively small in this study. Our extended research on rest of the marine species with large sample sizes from different locations might reveal the potential cryptic biodiversity among marine fishes of Bangladesh in future.
Present study suggests that DNA barcoding has been successful in identifying and discriminating the vast majority of marine ichthyofauna. The DNA barcoding method has been proven to be an effective tool for species identification, particularly with specimens that are damaged, incomplete, or consisting of several morphologically distinct stages (Bingpeng et al., 2018;Pečnikar & Buzan, 2014 TA B L E 4 Summary of genetic divergences (% K2P) of marine fishes within various taxonomic levels based on COI sequences from different geographic regions in the Bay of Bengal, Bangladesh has been established, which could be used to assign fish species by screening sequences against it in the future. We hope this would appreciably contribute to achieving better monitoring, conservation, and management of fisheries in this overexploited region.

ACK N OWLED G M ENTS
We acknowledge the financial support from the Ministry of Education, Government of the People's Republic of Bangladesh as a grant for advanced research in Education (Grants No. MRS2017448).
We are also thankful to Dr. Mohammad Abdul Baki, Nusrat Jahan Sanzida, JBM Aysha Akter, Ayesha Akhter Zhilik, Nishat Jahan Chowdhury, Sumaiya Salam, Tamim Afrin, Noor Aida Arfin, Mysha Mahjabin, Sajid Ahmed, Kawsar Ahmed, Afroza Kaonine Haque and Md. Anwar Hossain for their cooperation during this research and manuscript preparation. We thank the two anonymous reviewers for their critical review, insightful comments and suggestion which helped to improve the earlier version of the manuscript.

CO N FLI C T O F I NTE R E S T
The authors declare no conflict of interest.

AUTH O R S' CO NTR I B UTI O N S
Ahmed MS initiated the project, acquired funding, managed project administration, specimen curation, and internal reports, performed the taxonomic analyses, and prepared the whole manuscript. Datta SK and Saha T performed field collection, generated and analyzed molecular sequences, curated tissue samples, and participated in the writing of the final manuscript. Hossain Z contributed laboratory analyses, participated in generation of sequences and made contribution to the manuscript. All authors reviewed the manuscript.

E TH I C A L A PPROVA L
Collection of fish samples were conducted under a permit to the University of Dhaka, and approval of the Ethical Review Committee, Faculty of Biological Sciences, University of Dhaka.