Complete mitochondrial genomes of three skippers in the tribe Aeromachini (Lepidoptera: Hesperiidae: Hesperiinae) and their phylogenetic implications

Abstract The mitochondrial genome is now widely used in the study of phylogenetics and molecular evolution due to its maternal inheritance, fast evolutionary rate, and highly conserved gene content. To explore the phylogenetic relationships of the tribe Aeromachini within the subfamily Hesperiinae at the mitochondrial genomic level, we sequenced and annotated the complete mitogenomes of 3 skippers: Ampittia virgata, Halpe nephele, and Onryza maga (new mitogenomes for 2 genera) with a total length of 15,333 bp, 15,291 bp, and 15,381 bp, respectively. The mitogenomes all contain 13 protein‐coding genes (PCGs), 22 transfer RNAs (tRNAs), 2 ribosomal RNAs (rRNAs), and a noncoding A + T‐rich region and are consistent with other lepidopterans in gene order and type. In addition, we reconstructed the phylogenetic trees of Hesperiinae using maximum likelihood (ML) and Bayesian inference (BI) methods based on mitogenomic data. Results show that the tribe Aeromachini in this study robustly constitute a monophyletic group in the subfamily Hesperiinae, with the relationships Coeliadinae + (Euschemoninae + (Pyrginae + ((Eudaminae + Tagiadinae) + (Heteropterinae + ((Trapezitinae + Barcinae) + Hesperiinae))))). Moreover, our study supports the view that Apostictopterus fuliginosus and Barca bicolor should be placed out of the subfamily Hesperiinae.

+ (Heteropterinae + (Barcinae +Trapezitinae) + Hesperiinae)))))), but this higher classification and the phylogeny has not been approved generally. Hesperiinae, the largest subfamily, has been acknowledged as a distinctly monophyletic group as monophyletic in the hypothesis of Li, Cong, et al. (2019), Sahoo et al. (2016), Toussaint et al. (2018, Warren et al. (2009), and Zhang et al. (2019). There are 11 mitochondrial genomes of Hesperiinae directly available on GenBank, and the rest can be accessed in the form of raw data. We have provided more comprehensive data support for the phylogenetic research of the groups.
Aeromachini is a large and diverse tribe of the subfamily Hesperiinae and currently contains approximately 130 species in 12 genera, distributed in the Oriental Region, the Palearctic Region, and the Afrotropical Region (Cock & Congdon, 2012;Devyatkin, 1996;Evans, 1949;Huang et al., 2019;Warren et al., 2009;Yuan et al., 2015). Most of the genera of Aeromachini are distributed in the Sino-Himalayan Subregion. In the previous phylogenetic studies, the tribe is always retrieved as a clade sister to the rest of the Hesperiinae (Li, Cong, et al., 2019). Two molecular studies within the tribe are known Li, Zhu, et al., 2019).
The mitogenome is the most extensively studied genomic system in insects, which is a double strand molecule about 15~16 kb in size, typically containing 13 protein-coding genes (PCGs), 22 transfer RNAs (tRNAs), 2 ribosomal RNAs (rRNAs), and a noncoding A + Trich region. In the past few decades, due to its maternal inheritance, fast evolutionary rate and highly conserved gene content compared to nuclear genes, it has been widely utilized to investigate insect taxonomy, phylogenetic relationships, evolution, and biogeography, as a source of sequence data for phylogenetic analysis. (Cameron, 2014;Galtier et al., 2009). In this study, we determined the complete mitochondrial genome sequences of 3 skipper species of the tribe Aeromachini and reconstructed the phylogenetic relationships of the family Hesperiidae, combined with other available sequence data in GenBank, and using maximum likelihood and Bayesian inference methods, aiming to provide new horizons and genomics data for the phylogenetic research of the Aeromachini.  Yuan et al. (2015) and the identity was confirmed via cox1 barcoding using the BOLD database (Ratnasingham & Hebert, 2013). The genomic DNA was isolated from the thoracic tissue using the EasyPure R Genomic DNA Kit (TransGen Biotech, Beijing).

| Sequencing, assembly, annotation, and bioinformatic analyses
Three complete mitogenomes were sequenced using nextgeneration sequencing (NGS) on an Illumina HiSeq 2000 platform (Biomarker Technologies, Beijing). The correct recognition rate of bases reaches 99.9%. Each Illumina HiSeq read was 150 bp, and about 1.2 Gb raw data were trimmed with default parameters, then the raw paired reads were retrieved and quality-trimmed using CLC Genomics Workbench v10.0.1 (CLC Bio, Aarhus, Denmark) with default parameters. The clean paired reads were then used for mitogenome reconstruction using MITObim v1.7 software (Hahn et al., 2013) with default parameters and the mitogenome of Ampittia dioscorides (KM102732) (Qin et al., 2017) as the reference. Annotation of the mitogenomes and comparative analyses were conducted following the methodology outlined above. The various genomic features were annotated using Geneious 8.1.3 (Biomatters, Auckland, New Zealand) and referenced to the complete mitogenome sequence of A. dioscorides. Protein-coding genes (PCGs) were determined by finding the ORFs based on the invertebrate codon table (codon Table 5), and RNAs (tRNAs and rRNAs) were identified using MITOS Web Server (Bernt et al., 2013).
Transfer RNAs were manually plotted according to the secondary structure predicted by MITOS, using Adobe Illustrator CS5.
Finally, all genes were visually inspected against the reference mitogenome in Geneious. Nucleotide composition, codon usage, comparative mitogenomic architecture tables for the three mitogenomes, and data used to plot RSCU (relative synonymous codon usage) figures were all calculated and created using PhyloSuite (Zhang et al., 2020). The AT-skew and GC-skew were computed according to the following formulas: and GC-skew = [G − C]/[G + C] (Perna & Kocher, 1995

| Sequence read archive (SRA) data extraction
We referred to and used the data of six genomes from over 300 hesperiid species determined in previous study to extract the mitochondrial genomes because of the lack of directly available mitogenomes on GenBank. The SRA data of the subfamily Trapezitinae (Hewitsoniella migonitis, Anisynta dominula, Toxidia parvulus, and Signeta flammeata) and two species of the tribe Aeromachini (Aeromachus stigmata and Ampittia dioscorides) were obtained from GenBank with the DNA Voucher NVG-17108D07, NVG-17069D05, NVG-7813, NVG-7760, NVG-7915, and NVG-7291, respectively (Li, Cong, et al., 2019;Zhang et al., 2019). The raw data of 4 species of Trapezitinae were assembled into mitogenomes referred to Barca bicolor (Han et al., 2018), and the 2 species of Aeromachini were referred to Isoteinon lamprospilus (Ma et al., 2020) using Geneious.

| Phylogenetic analysis
A total of 41 species (3 newly determined in this study, 38 available from GenBank) representing 9 subfamilies of Hesperiidae sens Li, Cong, et al. (2019) and Zhang et al. (2019)  Nucleotide sequences were aligned using the G-INS-i (accurate) strategy and codon alignment mode. All rRNAs were aligned in the MAFFT with the Q-INS-i strategy (Katoh & Standley, 2013). Poorly matched sites in the alignments were removed using Gblocks v0.91b (Castresana, 2000). Individual genes were also concatenated using PhyloSuite v1.2.2.
We used 3 datasets to reconstruct the phylogenetic relationship: (1) PCG matrix, containing all codon positions of the 13 protein-coding genes; (2) PRT matrix, concatenating all codon positions of the 13 protein-coding genes, 22 tRNAs and 2 rRNAs; and (3) 12PRT matrix, including the first and second codon positions of 13 protein-coding genes plus 22 tRNAs and 2 rRNAs. Based on 3 datasets, the maximum likelihood (ML) and Bayesian inference (BI) methods were used to reconstruct the phylogeny. The optimal partitioning scheme and nucleotide substitution model for ML and BI phylogenetic analyses were selected using PartitionFinder 2.1.1 (Lanfear et al., 2017) with the greedy algorithm and BIC (Bayesian information criterion) criteria (Tables S1 and S2). Maximum likelihood analysis was inferred using IQ-TREE (Nguyen et al., 2015) with the standard bootstrap approximation approach, as well as the Shimodaira-Hasegawa-like approximate likelihood ratio test (Guindon et al., 2010), and the bootstrap value (BS) of each node of the ML tree was evaluated via the bootstrap test with 10,000 replicates. Bayesian inference was carried out using MrBayes 3.2.6 (Ronquist et al., 2012) with the following requirements: 2 independent runs of 1 × 10 7 generations were conducted with four independent Markov Chain Monte Carlo (MCMC) runs, including 3 heated chains and a cold chain, by sampling every 1,000 generations. A consensus tree was obtained from all the trees after the initial 25% of trees from each MCMC run was discarded as burn-in, with the chain convergence assumed after the average standard deviation of split frequencies fell below 0.01. The confidence value of each node of the BI tree was presented as the Bayesian posterior probability (BP).

| Mitogenome organization and base composition
The  (Table 3). Compared with the whole-genome, the noncoding A + T-rich region (NCR) has the highest A + T content, up to 89.7%, 89.3%, and 91.9%, respectively. On the contrary, PCGs are the regions with the lowest A + T content, which is 79.1%, 78.4%, and 78.2% respectively. In addition, the T content of these mitogenomes on the major strand is higher than that of A, with the exception of H. nephele (Table 3).

| Protein-coding genes and codon usage
The total lengths of the 13 PCGs of Ampittia virgata, Halpe nephele,and Onryza maga are 11,190 bp,11,202 bp,and 11,187 bp,respectively (Table 3). In these 3 sequenced species, 9 of 13 PCGs (nad2, cox1, cox2, atp8, atp6, cox3, nad3, nad6, and cytb) are encoded in the J-strand, and the other 4 (nad5, nad4, nad4L, and nad1) are located on the N-strand. The size of the 13 PCGs with the smallest gene for the 13 PCGs is the atp8, and the largest gene is the nad5 ranging in size from 162 bp to 1,744 bp. The AT-skew and GC-skew indicate that the T content of PCGs is obviously higher than that of A among these 3 species, while the content of G and C is not much different. The AT bias of the bases is more significant in the third codon, and the AT content of the third codon (90.5%-92.3%) is remarkably higher than that of the first codon (73.7%-74.6%) and the second codon (70.1%-70.5%), which is consistent with the higher mutation rate of the third codon site compared with the second and first codon sites (Table 3).
All PCGs of these 3 mitogenomes start with typical ATN (ATG, ATT, and ATA) codons except cox1 using CGA, and all of them use TAA or TAG as the stop codons, with the exception for cox1, cox2, nad4, and nad5, which use a single T as stop codons ( relative synonymous codon usage (RSCU) of the 3 skippers shows that the codon UUA (Leu2), UCU (Ser2), and CGA (Arg) are the 3 used most frequently, and the codons terminating with A and T also have a relatively higher frequency (Figure 2).   (Table 3). Most tRNA genes of these 3 mitogenomes could be folded into a cloverleaf secondary structure, except for trnS (AGN), which lacks the DHU arm ( Figure 3).

| Transfer and ribosomal RNA genes
The total number of unmatched base pairs found in the tRNAs of the  (Table 2). In addition, both tRNA and rRNA of the three mitogenomes show a strong AT bias, which is higher than that of the whole mitogenomes (Table 3). namely the nad2-trnW (2 bp), trnW-trnC (8 bp), atp8-atp6 (7 bp), and atp6-cox3 (1 bp) are all present in these 3 mitogenomes (Table 2).

| A + T-rich region
The A + T-rich region is the longest noncoding region with a relatively high A + T content, deemed to be related to the origin of replication and transcription (Boore, 1999;Cameron, 2014)

| Phylogenetic relationships
In this study, we used the mitogenomes of a total of 45 hesperiid species containing 3 newly determined species ( The phylogenetic tree consists of 9 clades corresponding to 9 major hesperiid subfamilies sens Li, Cong, et al. (2019) andZhang et al. (2019), and their relationships are Coeliadinae + (Euschemoninae + (Pyrginae + ((Eudaminae +Tagiadinae) + (Heteropterinae + ((Tr apezitinae +Barcinae) + Hesperiinae))))) ( Figure 5). The position of Eudaminae does not agree with that of previous studies where the subfamily is sister to the Pyrginae sensu lato, that is, Tagiadinae, Pyrrhopiginae, and Pyrginae sens Zhang et al. (2019). The inconsistency of phylogenetic relationships may be mainly caused by incomplete lineage sorting and inadequate taxon sampling (Pollard et al., 2006;Sahoo et al., 2016). In addition, Sahoo et al. (2016) thought the data from entire genomes may result in a betterresolved phylogeny. On the contrary, Zhang et al. (2021) pointed out that phylogenetic trees based on whole-genome sequence data may not always represent the true evolutionary history mainly due to the gene flow, which is also one possibility with appropriate references.
However, it is difficult to effectively circumvent its influence firstly due to incomplete pedigree selection that results in long-branch attraction and other phylogenetic errors and secondly due to the gene flow that causes the network evolution rather than the branched evolution. There are two taxonomic alternatives: (1) recognize three distinct subfamilies, Pyrginae, Eudaminae, and Tagiadinae as suggested by Zhang et al. (2019) of the subfamily Eudaminae is retained, or (2) combine them all as Pyrginae as in the conventional treatment.
Here we tentatively adopt the former, pending further taxonomic investigation.
For the five Aeromachini species in this study, all results indicate that the Aeromachini form an independent clade in the subfamily Hesperiinae, which is the basal lineage among them (PP = 1), that Hesperiinae (Parsons, 1999;Warren et al., 2009). Here we refrain from drawing conclusion, pending further research.

| CON CLUS IONS
In this study, we newly determined the mitogenomes of three hesperiid species in the tribe Aeromachini (Ampittia virgata, Halpe nephele, and Onryza maga) and provide more comprehensive molecular data for hesperiid phylogenetic study, meanwhile reconstructed the robust phylogenetic trees of hesperiid F I G U R E 5 Phylogenetic tree inferred by BI method based on 12PRT dataset. Numbers on nodes are the posterior probabilities (PP) F I G U R E 4 Structural element found in the AT-rich region of 3 skippers (The presented nucleotides indicate the conserved sequences, Dots between sequences indicate omitted sequences) butterflies using relatively sufficient taxa sampling based on multiple mitogenomic datasets. The size and structure of mitochondria, gene order, and AT content of these three species are highly consistent with other Lepidoptera species. The phylogenetic analysis results show that Aeromachini is a monophyletic group and sister to the rest of Hesperiinae and that the relationships among hesperiid subfamilies are Coeliadinae + (Euschem oninae + (Pyrginae + ((Eudaminae + Tagiadinae) + (Heteropter inae + ((Trapezitinae + Barcinae) + Hesperiinae))))). Moreover, our analysis supports the viewpoint of previous study that A.
fuliginosus and B. bicolor should be placed out of the subfamily Hesperiinae.

ACK N OWLED G M ENTS
We thank Prof. John Richard Schrock (Emporia State University, Emporia, USA) for revising the manuscript, Miss. Fangfang Liu and Mrs. Luyao Ma for help with sample collection, Mr. Nan Zhou for assistance of biological software.

CO N FLI C T S O F I NTE R E S T
All authors report no conflicts of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
The following information was supplied regarding the availability