Evolution of mitogenomic gene order in Orthoptera

Mitochondrial gene order has contributed to the elucidation of evolutionary relationships in several animal groups. It generally has found its application as a phylogenetic marker for deep nodes. Yet, in Orthoptera limited research has been performed on the gene order, although the group represents one of the oldest insect orders. We performed a comprehensive study on mitochondrial genome rearrangements (MTRs) within Orthoptera in the context of mitogenomic sequence‐based phylogeny. We used 280 published mitogenome sequences from 256 species, including three outgroup species, to reconstruct a molecular phylogeny. Using a heuristic approach, we assigned MTR scenarios to the edges of the phylogenetic tree and reconstructed ancestral gene orders to identify possible synapomorphies in Orthoptera. We found all types of MTRs in our dataset: inversions, transpositions, inverse transpositions, and tandem‐duplication/random loss events (TDRL). Most of the suggested MTRs were in single and unrelated species. Out of five MTRs which were unique in subgroups of Orthoptera, we suggest four of them to be synapomorphies; those were in the infraorder Acrididea, in the tribe Holochlorini, in the subfamily Pseudophyllinae, and in the two families Phalangopsidae and Gryllidae or their common ancestor (leading to the relationship ((Phalangopsidae + Gryllidae) + Trigonidiidae)). However, similar MTRs have been found in distant insect lineages. Our findings suggest convergent evolution of specific mitochondrial gene orders in several species, deviant from the evolution of the mitogenome DNA sequence. As most MTRs were detected at terminal nodes, a phylogenetic inference of deeper nodes based on MTRs is not supported. Hence, the marker does not seem to aid resolving the phylogeny of Orthoptera, but adds further evidence for the complex evolution of the whole group, especially at the genetic and genomic levels. The results indicate a high demand for more research on patterns and underlying mechanisms of MTR events in Orthoptera.


INTRODUCTION
Orthoptera has been suggested to be one of the oldest extant lineages of insects with the earliest fossils dating back to the upper Carboniferous (290 million years ago; Grimaldi & Engel, 2005) and has an estimated age of 300-355 million years (Song et al., 2015;Song et al., 2020). Its 29,365 species (Cigliano et al., 2022) are divided into two suborders-the Ensifera and Caelifera (Flook et al., 1999;Song et al., 2015). Its old age, but also its large and complex genomes (e.g. Husemann et al., 2020;Husemann et al., 2022) have made assessing the phylogeny difficult, but also make Orthoptera an interesting target to study the evolution of the mitochondrial genome (in the following mitogenome) and to test its potential as a reliable marker for phylogenetic reconstructions (mitogenome; Fenn et al., 2008).
Furthermore, there is a large non-coding region, also called the adenine and thymine-rich region or control region (CR; Boore, 1999), which contains the origin of replication and transcription. The CR is highly variable among insects (Lewis et al., 1995). Because of their high variability, mitochondrial sequences and whole mitogenomes are frequently used in phylogeny and phylogeography. However, this high evolutionary rates also make mitochondrial sequences more prone to saturation effects and hence make them less suited for resolving deeper phylogenetic relationships (Allio et al., 2017;Ballard & Whitlock, 2004).
However, not only the sequence of the mitochondrial DNA, but also the order of genes can be used as a phylogenetic marker (Boore, 1999).
The mitochondrial gene order generally evolves at slow rates and is much more conserved than amino acid or nucleotide sequences in metazoans . Hence, the frequency and type of changes in the gene order, also called mitochondrial genome rearrangements (MTRs), may be a target for the investigation of deeper phylogenetic relationships (Rokas & Holland, 2000). As a result of selective constraints, MTRs of tRNA genes have been suggested to occur at a higher rate than MTRs involving protein coding genes (Dowton et al., 2009). In studies across all insects and in some specific orders, for example, Hymenoptera, mitogenome gene order proved to be informative and provided new insights into the evolution of these groups (e.g. Cameron, 2014;Gotzek et al., 2010;Silvestre et al., 2008).
For Orthoptera, an unambiguous phylogeny based on DNA sequences still remains a challenge (Song et al., 2015(Song et al., , 2018(Song et al., , 2020. This is partially because of their large genome sizes and complex genomic architecture, leading to inconsistencies between mitochondrial and nuclear genomic data, as demonstrated, for example, for the subfamily Gomphocerinae . Gene order as a conservative marker may provide some additional information improving phylogenetic resolution, especially at deeper nodes. However, limited research has been performed to investigate mitochondrial gene order information content in Orthoptera, making it crucial to examine the information content of gene order for phylogenetic research in this insect order. So far, research on MTRs in Orthoptera has only addressed specific subgroups with novel MTRs representing relatively shallow branches, for example, two families of Grylloidea (Ma & Li, 2018) and Holochlorini (Liu et al., 2013). One noteworthy exception is a major tRNA gene rearrangement supporting a closer relationship between Tetrigoidea and Acridomorpha (Song et al., 2015). Apart from that, evolutionary patterns of MTRs are largely unknown in Orthoptera. Therefore, we here reconstructed a molecular phylogeny of 277 published mitogenomes of Orthoptera (and three additional outgroup sequences) as a base for estimating the ancestral gene order and to identify MTRs. In this work, we provide the first comprehensive analysis of MTRs of all accessible orthopteran mitogenomes (up to July 2021) with the aim of (1) detecting patterns of MTRs in Orthoptera and (2) understanding the evolution of these MTRs. Our goal was to obtain information on the scale and direction of the divergence of mitochondrial gene order within Orthoptera. We hypothesize that (a) MTR events occur at higher rates for tRNA genes than for protein-coding genes and (b) MTRs are infrequent genomic changes which may help to elucidate deep branches of Orthoptera.

Molecular phylogeny
The final dataset contained 280 mitogenome sequences (277 ingroup sequences and three outgroups) of 256 species. The final alignment used for phylogenetic inference had a cropped length of 15,808 base pairs. The overall guanine and cytosine content was 26.6% with no significant difference between Ensifera (N = 84; 28.5%) and Caelifera (N = 193; 25.8%). The results of Bayesian inference (BI) and Maximum Likelihood (ML) were largely consistent within Ensifera with one exception. The ML topology grouped Phaneropterinae with Mecopodinae and Pseudophyllinae, albeit with low support (bootstrap = 77%). The BI analysis instead grouped Mecopodinae as sister group of Pseudophyllinae together with Phaneropterinae (posterior probability value = 0.86, Figure 1). More inconsistencies between the BI and ML trees were observed in Caelifera (see Figure 1): according to BI Calliptaminae, Catantopinae, Cyrtacanthacridini, and Eyprepocnemidini form a highly supported (pp >0.95) branch, distinct and divergent from the rest of the Acrididae. ML placed this clade within the Acrididae. Most other branches that differed between the BI and ML were poorly supported (a phylogeny based on ML is given as Figure S1).

Gene order inference
Eleven taxa that contained deletions or duplications of genes (see Table S3) were not considered for TreeREx analysis, but still included in the phylogenetic analyses. We used the BI tree as input for reconstructing MTR events and assigning them to the nodes of the tree in TreeREx. We decided for the BI tree because it received overall better support values compared with the ML tree (in the following, all numerals and letters marked by brackets represent MTR events as shown in Figures 2 and 3; all tRNA gene abbreviations are according  to the IUPAC IUB nomenclature for amino acids (see also Table S2; Cornishbowden, 1984). The hyphens in '(À)' indicate genes that are encoded on the light strand).
According to our analysis, all four types of MTR events were found in Orthoptera: inversions, transpositions, inverse transpositions, and tandem-duplication/random loss events (TDRL). We found six F I G U R E 1 Phylogenetic tree based on mitogenomes constructed with MrBayes. Numbers close to the nodes represent posterior probability values (pp). Dots on nodes indicate a pp >0.95. The taxon labels are highlighted in different colours assigned to major groups based on the currently accepted taxonomy (following the Orthoptera Species File; Cigliano et al., 2022). The '*' sign indicates paraphyly or polyphyly based on the present results. No assigned tribe/subfamiliy implies that the taxon in question is currently not classified in any of the groups within a family. Tribes in blue letters: Oedipodinae, tribes in red letters: Gomphocerinae. Rectangles with red lines highlight inconsistencies with the ML topology. Editing was performed with Inkscape and Figtree. Gene names follow IUPAC IUB (Cornishbowden, 1984).  The only species with a TDRL event was Lipotactes tripyrga, in which the most parsimonious MTR scenario was an inverse transposi- trnA-trnG-nad3, followed by an inversion of the trnA-trnR-trnN-trnS1block to the original strand ( Figure S2; MTR event '(7)'). A transposition was assigned to Cyphoderris monstrosa (nad3-trnA-trnR! nad3-trnR-trnA; see Figure S2; MTR event '(11)').
All other MTR events were inversions that appear exclusively in single or in unrelated species, based on the mitogenome-based phylogeny (see Figure 3). In Ensifera, these were Trigonidium sjostedti       Table S2 for terminology. Gene names are according to IUPAC IUB (Cornishbowden, 1984).
As 11 species did not fulfil the requirements of TreeREx in terms of a consistent gene set, they were excluded from the analysis.
However, they are shown in Figure S2 and evaluated for complementation. The noncoding CR was excluded from the gene order inference for all taxa, but the differences are nevertheless observable: Polionemobius taprobanensis was predicted to have, in addition to the initial CR, another CR between trnK and trnD ( Figure S2; event '(a)'). In

Gene order inference
This study represents the first comprehensive attempt to reconstruct mitochondrial gene order evolution for the order of Orthoptera based on all published mitogenomes (up to the end of data collection in July 2021). We aimed to test if MTR events occur at higher rates in tRNA genes than in protein-coding genes and if MTRs represent potential alternative markers to elucidate deep branches of Orthoptera. While we found support for the first hypothesis, we had to reject the second one, as our results showed a low overall occurrence in deep branches, but not in shallow branches. In the following, we discuss these results in more detail.

MTR bias towards tRNAs
Generally, most of our MTRs were detected for tRNAs, rather than for protein-coding genes, a pattern commonly observed. A study of hymenopteran gene order (Dowton et al., 2009) suggested selective constraints for the position of protein-coding genes because of size differences. According to this study, MTRs are more likely lethal or selectively disadvantageous, for example, through incomplete replication, if they concern larger genes. As the function of the transcription machinery can be affected, MTRs can lead to a loss of function of the genes. In addition, the movement of protein-coding genes between strands can severely be restricted by strand compositional skews (Foster et al., 1997;Min & Hickey, 2007). Thus, MTRs including only tRNA genes are generally expected to be more frequent. Our results support this, because tRNA gene inversions are most abundant in the context of our dataset. Only in three species, protein-coding genes were involved in MTR events: the species Holochlora fruhstorferi, Sinochlora longifissa, and Sinochlora szechwanensis shared the transposition trnI-trnQ-trnM-nad2 ! trnI-trnM-nad2-trnQ, which involves the movement of the nad2 gene (Figure 2; MTR event '(8)'). All other MTRs are associated only with the movement of tRNA and rRNA genes, which is consistent with our hypothesis.

MTRs as potential phylogenetic marker in Orthoptera
Mitochondrial gene order has been used successfully as phylogenetic marker in other insect lineages (Cameron, 2014;Cao et al., 2012;Timmermans & Vogler, 2012). Hence, we determined possible synapomorphies in MTRs for deep branches in Orthoptera to assess the phylogenetic signal. Because MTRs are rather infrequent events in many lineages, the saturation effect and the evolutionary rate are expected to be low, which makes gene order a valuable phylogenetic marker for old branches; accordingly, homoplasy levels will be low as well (Rokas & Holland, 2000). Regarding the major clades, we found can be seen as synapomorphy for Acrididea.
Furthermore, for the Gryllidae, Phalangopsidae, and Trigonidiidae, we also found an inversion of the trnN-trnS1-trnE block (Figure 2; MTR event '(2)'). As it was exclusively found in these groups, this MTR is considered synapomorphic. Likewise, Song et al. (2015) found similar gene orders in the three families, but based on a smaller sample size. Consequently, a larger taxonomic sampling is needed to assess the phylogenetic signal for this feature. Two other studies concluded that the inversion of trnN-trnS1-trnE block appeared in the common ancestor of Gryllidae, Phalangopsidae, and Trigonidiidae, consistent with their phylogenetic reconstructions (Ma & Li, 2018;Sanno et al., 2021). Yet, there is no formal higher taxon unifying the clade "((Phalangopsidae + Gryllidae) + Trigonidiidae)" at the moment.
Furthermore, Holochlora fruhstorferi and Sinochlora spp. share the transposition trnI-trnQ-trnM-nad2 ! trnI-trnM-nad2-trnQ (Figure 2; MTR event '(8)'). This leads to the hypothesis that this MTR is synapomorphic for Holochlorini based on the dataset. Support for a possible synapomorphy is given by a mitogenome analysis of Liu et al. (2013), in which the gene orders of S. longifissa and S. retrolateralis were compared with other Orthopterans. Both species shared the MTR event trnI-trnQ-trnM-nad2 ! trnI-trnM-nad2-trnQ and a novel position of the CR. We found the same MTR in S. szechwanensis and in H. fruhstorferi, but the novel control region was absent in H. fruhstorferi (Figure 2). An association between the shift of the CR and the MTR event, as Liu et al. (2013) assumed, is thus not clear.
A mitogenomic analysis reported a similar MTR event in Lepidoptera and Hymenoptera (Wu et al., 2014), which leads to the hypothesis that the MTR "trnI-trnQ-trnM ! trnM-trnI-trnQ" is homoplastic when considering insects as a whole. Yang et al. (2016) suggested that the MTR is based on a TDRL event. In contrast, the transposition trnV-rrnS ! rrnS-trnV, reconstructed for Trigonidiidae, but revealed as well for Ornebius bimaculatus, as an unrelated species, implies homoplasy ( Figure 2; MTR event '(3)'). This is also supported by the results of Sanno et al. (2021) and Ma and Li (2018), who found the transposition trnV-rrnS ! rrnS-trnV occurring two times independently in Grylloidea. Based on our data we suggest that the evolution of mitochondrial gene order does not follow a linear pattern. This is deviant from previous assumptions, that a high rate of nucleotide substitutions is correlated with a high rate of gene rearrangements in insects (Shao et al., 2003). More research is needed to understand the evolution of the mitogenome in Orthoptera because of its characteristics seem to differ strongly from other insects. . Based on our phylogeny, these species are unrelated. Hence, we suggest that this MTR has evolved convergently in these species. A similar pattern was observed by a study that analysed the mitogenomes of Hymenoptera, in which an MTR occurred independently in at least five different families (Cameron, 2014).
Convergence of MTR events is not unusual in insects, even at larger taxonomic scales, as previously already reported by the similar gene orders in hymenopterans and Lepidoptera (Babbucci et al., 2014). That convergence of different traits appears to be common in Orthoptera has previously already been suggested, for example, for morphological traits (Husemann et al., 2012), or for chromosome formation (e.g. Palacios-Gimenez et al., 2018) and numbers . At least for chromosome numbers, a similar pattern, as found here for mitochondrial genome rearrangement has been found: variation occurred largely at terminal nodes rather than as synapomorphies of deeper branches . In general, much remains to be learned about the evolution of gene order and similar traits in Orthoptera. However, we can conclude from our analyses, that mitochondrial gene order has only limited power as a phylogenetic marker in Orthoptera.
Our results generally have to be considered with caution as it remains probable that not all mitogenomes available in the databases are correctly assembled and such errors may lead to false conclusions.
Yet, analysing potential errors is beyond the scope of our study.
However, it is possible that incorrectly assembled mitogenomes would rather lead to additional artificial variation and mitogenomic gene order may be even more conserved than observed in our analyses. More large-scale sequencing efforts will likely shed more light on this in the future.

Comparison of phylogenetic patterns with previous analyses
We here used published mitogenomic data to reconstruct a phylogeny of Orthoptera, similar to previous approaches (e.g., Song et al., 2015;Song et al., 2018). The phylogenetic reconstruction by BI and ML ( Figure 1 and Figure S1) enabled a subsequent analysis of MTR events based on the information of the gene orders in terminal nodes. As the validity of our results rely on a robust and accurate phylogeny, we discuss our present phylogenetic tree in the light of the most recent and comprehensive phylogeny of Orthoptera (Song et al., 2015;Song et al., 2020). Overall, our results are largely consistent with this phylogeny, which was based on transcriptomic and whole mitogenome data. This supports the idea that mitochondrial genomes are expected to recover intraordinal phylogenetic relationships concordant with other sources of data with high nodal support (Fenn et al., 2008).
However, some differences were observed, which may be expected considering differences in the species composition of the datasets and the different marker systems used (mitogenomes vs. transcriptomes).
Overall, the placement of taxa within Tettigonidae is similar in both phylogenies, but no support is given for the monophyly of some subfamilies, for example, Tettigoniinae and Conocephalinae (Song et al., 2020). All subfamilies in Ensifera, except Pseudophyllinae and Mecopodinae, are well-supported in our tree (pp >0.95; Figure 1).
In The same is true for Oxya, Euchorthippus, Chorthippus, Euthystira, and Pseudochorthippus, which were not included in the study by Song et al. (2020). Moreover, Gonista bicolor is considered as sister taxon to Phlaeoba spp. in our analysis but forms a sister taxon to Ceracris spp. in Song et al. (2020).
In contrast to the analyses by Song et al. (2020), where the tribe Podismini is sister group to the rest of Acrididae, in our study a cluster consisting of Calliptaminae, Shirakiacris, Choroedocus, Cyrtacanthacridini, and Catantopini forms the sister group to the remaining Acrididae. Next, a clade including the genera Oxya, Caryanda, Spasthosternum, Pseudoxya, and Hieroglyphus splits off after Podismini according to Song et al. (2020), but in our tree, these form the sister clade to Podismini, suggesting a closer relationship ( Figure 1). Acrididae is considered monophyletic by Song et al.
(2020), but paraphyletic when using four nuclear loci and mitogenomic data combined, as Romaleidae clustered with Acrididae (Song et al., 2015). In our analyses, Dericorythidae clustered within Acrididae suggesting paraphyly as well (Figure 1 and Figure S1). As we only included a single species of Romaleidae, the monophyly of this family is yet to be resolved. The remaining Acrididae are not discussed in detail here, as the group is in strong need of revision and the systematics remain unclear. However, as both our BI and ML trees have overall high nodal support and are largely consistent in the topology with the so far largest phylogenetic study by Song et al. (2020), the discussed inconsistencies do not diminish the validity of our findings on MTR evolution in Orthoptera.

MATERIALS AND METHODS
Following the current taxonomy outlined in the Orthoptera Species File (Cigliano et al., 2022), we obtained 280 mitogenomes from 256 species (including three outgroups) for the analyses (Table S1).
Out of these, we downloaded 239 mitogenomes of Caelifera and  Altschul et al., 1990). As this species has been labelled incorrectly by the provider, it should be tagged in the database. Musca domestica was subsequently used as an additional more distant outgroup. As part of the data collection, we provide a documentation of the gene order as deposited in GenBank (accessed between May and July, 2021). We included supplementary information, such as the total genome length, reference, Accession ID (if obtained from GenBank), suborder, family, subfamily, and tribe (see Table S1).
Using the IQTree Web Server (Trifinopoulos et al., 2016), we inferred a phylogeny based on ML. We performed the analysis with the best partitioning scheme and otherwise default settings, that is, with 1000 iterations of ultrafast bootstrapping (UFBoot; Minh et al., 2013), a minimum correlation coefficient of 0.99, and Shimodaira-Hasegawalike approximate likelihood ratio test (SH-aLRT) with 1000 replicates (Guindon et al., 2010;Minh et al., 2013)

TreeREx analysis
We used the topology resulting from BI as input for TreeREx (Bernt et al., 2008) in combination with the gene order of each taxon to reconstruct rearrangement scenarios of the mitogenome gene order.
We directly obtained the gene order information from GenBank and saved it in a .txt file (see Appendix S1). TreeREx was then used to reconstruct gene orders of deeper nodes based on the gene order of the terminal nodes of the tree. We applied the TreeREx algorithm on 269 of the 280 mitogenomes in our dataset. Eleven mitogenomes with gene duplications, deletions, and missing data had to be excluded (see Table S3). Their gene orders were replaced by the ancestral insect gene order (Boore, 1999; in the input gene order file. We studied the set of 13 protein-coding genes, 22 tRNAs, and two rRNA units based on the common insect ancestor . Position and number of CRs, origin of replication, occurrence of pseudo-genes, and other non-coding regions were not considered to create a consistent input dataset. We ran TreeREx with the default parameters: strong consistency method (Às), weak consistency method (Àw), and parsimonious weak consistency method (-W).
These methods use algorithms with different stringencies and were applied to make sure that every node of the phylogenetic tree gets assigned a gene order regardless of the confidence level (Bernt et al., 2008). We added alternative scenarios from patterns in breakpoints (Ào) and the default number of inversions/input permutations per binary tree (Àm = 2) to the algorithm. Subsequently, we converted the output .dot-file to a graphic svg file with Graphviz (Gansner & North, 2000) to visualize the combined tree consisting of the phylogenetic information and the MTR events (see Appendix S2 for original TreeREx output data).

Editing and visualization of TreeREx results
Finally, we mapped the MTR events on the complete BI tree and dis-  investigation; writingreview and editing; supervision; project administration; methodology.

SUPPORTING INFORMATION
Additional supporting information can be found online in the Supporting Information section at the end of this article.
Appendix S1. Gene order file (as .txt file) that contains all gene orders downloaded from GenBank and was used for the TreeREx analysis in combination with the BI tree. Note that all species with gene deletions or duplications (Table S3) were replaced by the ancestral insect gene order.
Appendix S2. Original TreeREx output (Bayesian inference tree as . svg file) with confidence levels (C values) indicated in green and yellow boxes. C = 1: consistent; C = 0.5: k-consistent; P = parsimony value, that is, how much better the solution was than a possible alternative. "I", inversion; "T", transposition; "iT", inverse transposition; TDRL, tandem-duplication/random loss   Table S3 for those excluded taxa. Letters in white circles: novel control region position. Collapsed branches are indicated by triangles. Dots on nodes indicate a confidence level of 1 (= consistent). Underlined genes are encoded on the light strand.
An output scheme inferred with CrEx is shown for visualization of multiple MTR events of Lipotactes tripyrga. Monophyletic genera occur as "Genus spp." monophyletic species as "ssp." Note that the MTR scenarios of Euborellia arcanum are not shown despite its multiple modifications, since it was excluded from the TreeREx analysis.
Gene names are according to IUPAC IUB (Cornishbowden, 1984). The legend on the right translates amino acid coding gene names. Table S1. List of taxa used in the study for phylogenetic inference and gene order analyses. Additional taxonomic information according to the Orthoptera species file (Cigliano et al., 2022) and genome length is provided. "GB" = GenBank, "OH" = O. Hawlitschek, "Prepr." = preprint.