Phylogenetic relations and mitogenome‐wide similarity metrics reveal monophyly of Penaeus sensu lato

Abstract Splitting of the genus Penaeus sensu lato into six new genera based on morphological features alone has been controversial in penaeid shrimp taxonomy. Several studies focused on building phylogenetic relations among the genera of Penaeus sensu lato. However, they lack in utilizing full mitochondrial DNA genome of shrimp representing all the six controversial genera. For the first time, the present study targeted the testing of all the six genera of Penaeus sensu lato for phylogenetic relations utilizing complete mitochondrial genome sequence. In addition, the study reports for the first time about the complete mitochondrial DNA genome sequence of Fenneropenaeus indicus, an important candidate species in aquaculture and fisheries, and utilized it for phylogenomics. The maximum likelihood and Bayesian approaches were deployed to generate and comprehend the phylogenetic relationship among the shrimp in the suborder, Dendrobranchiata. The phylogenetic relations established with limited taxon sampling considered in the study pointed to the monophyly of Penaeus sensu lato and suggested collapsing of the new genera to a single genus. Further, trends in mitogenome‐wide estimates of average amino acid identity in the order Decapoda and the genus Penaeus sensu lato supported restoration of the old genus, Penaeus, rather promoting the creation of new genera.

few morphological traits. The Division 1 includes 12 shrimps exhibiting diagnostic characteristics such as adrostral carinae not extending to near the posterior margin of carapace, lacking postocular crest, rostrum bearing more than one ventral tooth, laterally unarmed telson, and ischium of the first pair of chelate legs armed with a spine whereas Division 2 includes nine species which are characterized by adrostral carinae extending almost to the posterior margin of carapace and the presence of postocular crest. He has also suggested that characters such as open thelycum, short adrostral carina, and ischial spine on the first cheliped are probably primitive characters for the genus Penaeus.
A notable observation from his publication is that the word "subgenus" was also used in place of "Division" at some places. Subsequently, Kubo (1949) modified the classification of Burkenroad (1934) based on the presence or absence of hepatic carina and bifurcated Division 1 into two sub divisions. Later, Perez-Farfante (1969) proposed four subgenera and added another character: thelycum type, in addition to those used by earlier workers to distinguish subgenera. The four subgenera carina prominent, petasma with long ventral costa and closed thelycum]. In due course, two more subgenera were proposed which are Marsupenaeus, for shrimp with tube-like thelycum (Tirmizi, 1971) and Farfantepenaeus, for American shrimp (Burukovsky, 1972).
There have been some debates regarding the creation of subgenera. Dall et al., (1990) showed concern in creating six subgenera in a genus containing only 29 species. To a large extent, the creation of subgenera had little effect on practitioners of shrimp farming or researchers as subgenera names are rarely used in scientific literature or in commerce (Flegel, 2007). However, in a relatively recent monograph, Pérez-Farfante and Kensley (1997) elevated these subgenera into generic status, and this unilateral elevation has been controversial for the last two decades. Several publications (Baldwin et al., 1998;Dall, 2007;Flegel, 2008;Hurzaid et al., 2020;Lavery et al., 2004;Ma et al., 2009Ma et al., , 2011Maggioni et al., 2001;McLaughlin et al., 2008) came out either in support or against the proposal of Pérez-Farfante and Kensley (1997).
Looking into the events, it is evident that the terminology of "Division" which was started for the sake of convenience has become a taxonomic nomenclature (initially subgenus and then genus) which has been debated for the last two decades. The authors who have proposed a subgenus or genus have not considered phylogenetic evidence. Till date, the "Species fact sheets" published by Food and Agriculture Organization (FAO), Rome, uses the old genus nomenclature "Penaeus." The new genera names have not been used in FAO webpages and fact sheets. This clearly indicates that the global research community is divided in adopting one genus versus six genera nomenclature. At this juncture, it has become very important to build true phylogenetic relations among the penaeid shrimp based on proper datasets and robust analytical methods.
Selecting morphological traits alone in taxonomic delineation has been extremely difficult or often misleading as symmetrical and simple diagnostic features are harder to achieve (Burkenroad, 1983).
Genetic differences were observed between morphologically and ecologically similar species (Palumbi & Benzie, 1991). As morphological characters are susceptible to convergent evolution that would obscure the phylogenetic inferences (Hedges & Maxson, 1996), molecular approaches are widely used to resolve the taxonomic disagreement (Ma et al., 2011). In order to resolve the taxonomic confusion, several authors have reconstructed the phylogeny for species in the genus Penaeus s.l. using mitochondrial and nuclear DNA markers (Baldwin et al., 1998;Chan et al., 2008;Gusmão et al., 2000;Lavery et al., 2004;Ma et al., 2009Ma et al., , 2011Quan et al., 2004;Voloch et al., 2005;Wang et al., 2004). Although the science of taxonomy was revolutionized by the DNA barcoding system, it has been argued that the sequence may be, at least in some cases, uninformative if only one portion of genome is used (Galtier et al., 2009). In traditional mitochondrial taxonomy, only partial sequence of one gene has been used, and it is often insufficient to resolve the relationship when radiation is very rapid (Morin et al., 2010). The resolution of phylogeny could be increased and refined by increasing the amount of sequence data (DeFilippis & Moore, 2000;Rokas & Carroll, 2005).
Therefore, the complete mitochondrial genome is found to be an ideal marker for phylogeny, population genetic diversity, and maternal inheritance (Ma et al., 2013(Ma et al., , 2015. Even with complete mitogenome sequences, insufficient taxon sampling could hinder resolution of certain interrelationships as shown with infraorders in the order, Decapoda (Tan et al., 2019). However, in all the previous analyses conducted utilizing complete mitogenomes of Penaeus s.l., shrimp in the subgenus Melicertus were missing as sequence data are not available. Our study included the representative shrimp from the subgenus Melicertus (Zhong et al., 2018) and also the accession of Fenneropenaeus indicus which is reported in this study.
Setting aside the controversies of one genus versus six genera taxonomic nomenclature for penaeid shrimp, the phylogenetic relations among them based on complete mitochondrial genomes are of paramount interest to researchers in the field. The main objective of this paper is to determine whether the controversial assignment of genera in Penaeus s.l. based on the morphology alone

| Sequence datasets
The complete mitochondrial DNA genomes in suborder Dendrobranchiata including ten of the shrimp in Penaeus sensu lato were used for phylogenetic analysis ( Table 1). The suborder Dendrobranchiata has two superfamilies, Penaeoidea and Sergestoidea.
The Penaeid shrimp, whose taxonomy is being addressed in this paper, belongs to the superfamily, Penaeoidea. Therefore, the sequence of Sergia lucens from the superfamily, Sergestoidea was kept as outgroup for phylogenetic analysis in this study. The materials and methods used for assembling complete mitochondrial DNA of F. indicus have been given in Appendix A. Two datasets were used for phylogenetic reconstruction of Penaeus s.l. viz. combined sequences of all 13 protein-coding genes and combined sequences of 2 rRNA genes. Initially, individual gene sequences were aligned in MAFFT v7.305b (Katoh & Standley, 2013) following L-INS-i strategy setting maximum iterations to 1,000. The individual gene alignments were examined using Guidance2 tool (Sela et al., 2015) to identify the positions in the alignment that have poor alignment confidence scores (<0.93). Those positions with poor confidence scores, present in less than 25% accessions and some additional positions to keep triplet codon structure in case of protein-coding genes were removed. Then, individual gene alignments were concatenated to obtain final alignment for phylogenetic analyses.
In-depth details about preparation of sequence alignments have been mentioned in Appendix B.
In addition, two more datasets containing all protein-coding and rRNA gene sequences of the 10 complete mitochondrial genomes of Penaeus s.l. were prepared and utilized to study phylogenetic relations. The Metapenaeus ensis, a species in the superfamily, Penaeidae does not belong to the Penaeus s.l., but has been used as outgroup for building these phylogenetic trees. Detailed alignment procedures were mentioned in Appendix B.

| Maximum likelihood tree
The protein-coding and rRNA genes datasets were used to build Maximum Likelihood (ML) tree in RAxML version 8.2.9 (Stamatakis, 2014) with GTRGAMMAI model (GTR substitution model with gamma distributed rates and a proportion of invariant sites). For protein-coding genes, the best fit partitioning schemes obtained using PartitionFinder v2.1.1 (Guindon et al., 2010;Lanfear et al., 2017) were also defined for building ML tree. A hundred multiple searches were made to find the best starting tree with a random seed 12,345. Then, nodal support for the tree with best likelihood was obtained with 1,000 bootstrap replications.

| Bayesian tree
The datasets were also subjected to Bayesian inference analysis in MrBayes 3.2.6 (Huelsenbeck & Ronquist, 2001). For protein-coding genes, the best fit models and partitioning schemes are defined as per the results from PartitionFinder v2.1.1. For rRNA genes dataset, GTRGAMMAI model was employed to build Bayes tree considering each rRNA gene sequence as a partition. The underlying evolutionary process was assumed to be different for each gene and hence a separate set of parameters was estimated for each gene partition.
Two simultaneous but completely independent runs that start from different random trees were executed with four chains for 10 million generations while sampling trees every 100 generations and calculating convergence statistics every 1,000 generations. First, 25% of samples were discarded and hence not included for calculating summary statistics. The maximum standard deviation of split frequencies and potential scale reduction factor, PSRF (Gelman & Rubin, 1992) were used to check the convergence of runs. The same was performed by also examining the trace files in Tracer v1.6 (Rambaut et al., 2014).

| Genome-based similarity metrics
A genome-wide similarity metric, the Average Aminoacid Identity (AAI) was estimated among the species in each genus of the order "Decapoda" where complete mitochondrial genome is available for at least two species. The AAI values were estimated using the

| Mitochondrial genome of F. indicus
The specimen used for generating mitogenome sequence of F. indicus was confirmed for the species based on the partial sequence

| Maximum likelihood trees
The detailed results of analyses indicating the best partitioning schemes and the evolutionary models for protein-coding sequence datasets have been given in Appendix B. The Maximum Likelihood (ML) trees built for shrimp in the superfamily, Penaeoidea with protein-coding and rRNA genes datasets have been depicted in

| Bayesian trees
The maximum standard deviation of split frequencies and potential scale reduction factor, PSRF (Gelman & Rubin, 1992)

| Genome-wide similarity metrics
Wherever mitochondrial genome data were available for a minimum of two species under a genus, the average amino acid identity (AAI) was estimated between all the species in each genus of order, Decapoda (Appendix C and D). For each genus, the minimum and maximum AAI estimate obtained are given as ranges in Appendix D.
Wherever complete mitochondrial DNA genome is available for only two species in a genus, the single estimate of AAI between them is given. For 45 different genera, the range of genome-wide estimates obtained would establish the general trend of estimates for order Decapoda. The aim was to compare the within-genus trend in the order Decapoda with the estimates in the genus, Penaeus. As suggested by a reviewer, we have also estimated the genome-wide similarity metrics for between-genera (within each family) in the order, Decapoda. Here, for each genus, one representative species was randomly used to get between-genus similarity metrics in each family of Penaeioidea. The objective is to examine the general trend of similarity metrics at between-genus level in relation to betweenspecies level.
Among species in genus Penaeus s.l., the highest AAI was es-   (Ojala et al., 1981). The overlap observed between reading frames of ATP8 and ATP6 genes and that of ND4 and ND4L genes is a common phenomenon in crustacean species.

| D ISCUSS I ON
All previous attempts to construct phylogenetic relations among species of genus Penaeus s.l. considered only one/few of the mito-   Figure 4 is that the between-species estimates tend to be higher than the between-genus estimates. The between-species AAI estimates for 45 genera in the order Decapoda have a median of 88.57. The between-genera AAI estimates for 11 families in the order Decapoda have a median of 86.32. In Penaeus sensu lato, 45 AAI estimates were obtained among 10 species. The median of these 45 estimates is 93.37 which is higher than the median in the order, Decapoda. All the AAI estimates obtained among species of Penaeus s.l. are higher than the median estimate obtained for between-species of all genera in the order, Decapoda. Though there are no set standard values for AAI estimates to fix different species in a genus, it is expected that the species of same genus have high AAI estimates. The between-species AAI estimates across all the genera in the order, Decapoda indicated that the estimate tends to be higher (>80) in most of the cases. Only 18 out of 405 estimates were less than 80. Therefore, the general trend establishes higher AAI estimates between species of the same genus. When it comes to shrimps of Penaeus s.l., the AAI estimates are all above 90 ( Figure 4 and Appendix C). The high estimates of AAI within Penaeus s.l. are consistent with comparisons of congeneric species in other genera of the order, Decapoda.
While splitting the genus Penaeus s.l. into subgenera and later into genera (Perez-Farfante, 1969;Pérez-Farfante & Kensley, 1997) the key biological evidence considered had been the shape of external genitalia of females, the thelycum (open vs. closed), a structure which is used to store sperm prior to spawning (Bauer, 1998).
It was argued (McLaughlin et al., 2008) that these differences provide ample evidence to separate Penaeus s.l. into genera in the proposed taxonomy of Pérez-Farfante and Kensley (1997). Further, the open thelycum was considered to be an ancient character, and it allowed them to suggest that the genus Penaeus s.l. originated

F I G U R E 4
The box and whisker plot of AAI estimates obtained for between-genera within a family; between-species within a genus and between-species in the genus Penaeus sensu lato in the order Decapoda. The values plotted are minimum, 25th percentile, median, 75th percentile, and maximum. The numerical value at each box is the median AAI estimate first in the western hemisphere (Burkenroad, 1934). However, other researchers (Ma et al., 2011) opined that this hypothesis does not reflect the true evolutionary trend. The divergence in reproductive structure and behavior may cause strong selection pressure as difference in the reproduction related structure can facilitate prezygotic isolation between species, and therefore, it may evolve faster than postzygotic isolation (Mendelson, 2003).
According to the molecular data, sister clade of Farfantepenaeus and Litopenaeus that inhabit the American waters may have evolved sympatrically owing to the strong sexual selection that drives rapid development of prezygotic isolation (Ma et al., 2011).
They further concluded that external morphology of reproductive organ is a derived trait and that does not reflect true phylogenetic relationship. Our molecular phylogenetic data reveal that L. vannamei, L. stylirostris, and F. californiensis have been monophyletic, and this observation is in agreement with other findings (Lavery et al., 2004;Ma et al., 2011;Maggioni et al., 2001).
The second important morphological trait used for phylogeny and classification of genus Penaeus s.l. has been the presence or absence of the adrostral groove. The grooved shrimp have a long and distinct adrostral groove almost along the entire dorsal carapace (Burkenroad, 1934;Burukovsky, 1972;Kubo, 1949;Perez-Farfante, 1969;Tirmizi, 1971)  Although it is hard to deny the importance of morphological data in phylogeny reconstruction (Wiens, 2004), it is inherently problematic to use morphological data (Scotland et al., 2003). This is particularly true among the species of genus Penaeus s.l.
as most of the diagnostic characters of the species are subtle and have less resolution (Flegel, 2007). As morphological diagnostic traits of this genus are unsupportive to provide the true evolutionary trend, it becomes imperative to use molecular data. Our phylogenomic analysis strongly supports the monophyly of Penaeus s.l. in accordance with previous studies that used mitochondrial as well as nuclear genes (Baldwin et al., 1998;Lavery et al., 2004;Ma et al., 2009Ma et al., , 2011. Further, our phylogenomic studies are in concordance with the biogeographic provinces defined for marine fauna (Baldwin et al., 1998) The ultimate goal of taxonomic science is to have taxonomy where both morphological and molecular data are in agreement (Dall, 2007) and to develop a natural system that reflects the phylogenetic relationship (Schram & Ng, 2012). Morphological differences separating the subgenera/genera in the genus Penaeus s.l. have also been extremely minor (Dall, 2007), and most lay people would not be able to differentiate them even at species level (Flegel, 2008). Perez-Farfante (1969) has also opined that the creation of subgenera based on morphological characters is arranged solely for convenience of their recognition and no phylogenetic inferences should be drawn from it. Later when Pérez-Farfante and Kensley (1997) proposed the new binomial revision, they demonstrated the same morphological/anatomical differences used for the erection of subgenera (Flegel, 2007). Recently, Ma et al., (2011) opined that revision of genus proposed by Pérez-Farfante and Kensley (1997) were not in a strict cladistics sense. Taxonomic revision based on a gene sequence analysis of a single gene also raises concern by previous workers (Baldwin et al., 1998;McLaughlin et al., 2008

| CON CLUS IONS
The main purpose of this study was to determine whether the con-

ACK N OWLED G M ENTS
The authors acknowledge funding from ICAR-CRP on Genomics project entitled "Whole genome sequencing of Indian white shrimp Penaeus indicus".

CO N FLI C T O F I NTE R E S T
The authors declare that there is no conflict of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
The mitochondrial genome sequence of F. indicus generated in this study is deposited at Genbank with accession number, KX462904.