A phylogenomic perspective on the relationships of subfamilies in the family Geometridae (Lepidoptera)

Geometrid moths, the second largest radiation of Lepidoptera, have been the target of extensive phylogenetic studies. Those studies have flagged several problems in tree topology that have remained unanswered. We address three of those: (i) deep nodes of Geometridae (subfamilies Sterrhinae + Larentiinae, or Sterrhinae alone as sister to all other subfamilies), (ii) the taxonomic status of subfamily Orthostixinae and (iii) the systematic position of the genus Eumelea (classified in Desmobathrinae: Eumeleini or incertae sedis earlier). We address these questions by using a phylogenomic approach, a novel method on these moths, with up to 1000 protein‐coding genes extracted from whole‐genome shotgun sequencing data. Our datasets include representatives from all geometrid subfamilies and we analyse the data by using three different tree search strategies: partitioned models, GHOST model and multispecies coalescent analysis. Despite the extensive data, we found incongruences in tree topologies. Eumelea did not associate with Desmobathrinae as suggested earlier, but instead, it was recovered in three different phylogenetic positions, either as sister to Oenochrominae, Geometrinae or as sister to Oenochrominae + Geometrinae. Orthostixinae, represented by its type species, falls within Desmobathrinae. We propose the following taxonomic changes: we raise Eumeleini to subfamily rank as Eumeleinae stat. nov. and we treat Orthostixinae as a junior synonym of Desmobathrinae syn. nov.


INTRODUCTION
Robust phylogenetic hypotheses are the backbone of evolutionary hypothesis testing (Rokas & Carroll, 2005). However, the robustness of these hypotheses depends on representative taxon sampling and data (e.g., the number of genes). For highly species-rich groups, it remains unclear how the robustness of the results depends on taxon sampling versus dataset size. In insects, for instance, the standard is to infer hypotheses based on a reasonable number of taxa representing the known variation within the taxon. However, these phylogenetic trees can sometimes be poorly supported, with vagueness in the position of some difficult groups and unresolved deep radiations (Misof et al., 2014;Rota et al., 2022;Zahiri et al., 2013). Some recent studies, however, focus on maximizing gene number rather than taxon sampling (Kawahara & Breinholt, 2014;Peters et al., 2014;Rota et al., 2022), and even complete genomes are analysed. Such approaches are aimed at better understanding the relationships of conflicting groups. The downside of these analyses is that the number of analysed taxa is usually restricted, and only limited phylogenetic questions can be addressed. This is the case with the geometrid moths (Lepidoptera). The Geometridae are a lineage with more than 24,000 described species and are among the most species-rich groups of insects (Rajaei et al., 2022). Geometrids have been the focus of a series of phylogenetic studies for instance Sihvonen et al., 2011), aimed at obtaining reliable hypotheses about the evolutionary history of this diverse group. The most comprehensive phylogenetic tree to date included 1206 taxa , but despite being one of the most extensive lepidopteran phylogenetic trees, it covers only 4% of known species. The ever-increasing taxon sampling (Abraham et al., 2001;Brehm et al., 2019;Murillo-Ramos et al., 2019;Sihvonen et al., 2011;Sihvonen et al., 2020;Wahlberg et al., 2010;Yamamoto & Sota, 2007) has helped to improve our understanding of the phylogenetic relationships within Geometridae, especially at the level of genera and tribes. Despite these advances, the phylogenetic relationships among some Geometridae subfamilies have remained in flux, and different relationships have been inferred in different studies (Abraham et al., 2001;Brehm et al., 2019;Murillo-Ramos et al., 2019;Sihvonen et al., 2011;Sihvonen et al., 2020;Wahlberg et al., 2010;Yamamoto & Sota, 2007). The underlying causes, which have resulted in slightly different phylogenetic relationships, between some studies are unknown. The conflicting variation of tree topologies may be caused mainly by gradually broadening taxon sampling and increased availability of genetic data for phylogenetic analyses. The genetic data used in such studies has increased from 3 genes to 11 genes (Abraham et al., 2001;Murillo-Ramos et al., 2019;Sihvonen et al., 2011).
Some of the phylogenetic studies that have included hundreds of taxa and several genetic markers have inferred a sister relationship between the subfamilies Sterrhinae and Larentiinae (Sihvonen et al., 2011;Yamamoto & Sota, 2007), those being sister to all other Geometridae (Abraham et al., 2001;Sihvonen et al., 2011;Wahlberg et al., 2010;Yamamoto & Sota, 2007). Together these subfamilies represent almost 40% of the total Geometridae species. This scenario was widely accepted until an alternative hypothesis was proposed by the latest molecular study of Murillo-Ramos et al. (2019), based on 11 markers and more than a thousand species. Although poorly supported, in that study Sterrhinae were inferred as sister to all Geometridae. In the mentioned study, a few other new views on the phylogenetic relationships of some subfamilies within the Geometridae were also found ( Figure 1). Beyond the overall subfamily-level topology, there are several unresolved cases in the geometrid phylogeny and classification. These include for instance the validity of the subfamily Orthostixinae, and the phylogenetic position of the enigmatic tribe Eumeleini and its single genus Eumelea Duncan & Westwood. Orthostixinae is a small group of four genera and 17 species, distributed in the Palaearctic and Indo-Pacific (Hausmann, 2001). Orthostixinae shares morphological features with both Ennominae (Holloway, 1996) and Desmobathrinae (Beljaev, 2008;Hausmann, 1996;Murillo-Ramos et al., 2021). In their molecular phylogenetic analysis, Sihvonen et al. (2011) did not assess the systematic position of Orthostixinae sensu stricto, because the subfamily was not represented by the type genus but by a species of Naxa Walker.
Naxa was placed in Ennominae, supporting the morphological view of Holloway (1996), while Beljaev (2008) and the recent morphological revision of geometrid moths by Murillo-Ramos et al. (2021) have suggested that Orthostixinae could be part of Desmobathrinae. So far, the type genus Orthostixis has not been included in molecular phylogenies.
F I G U R E 1 Molecular phylogenetic hypotheses for the relationships of the subfamilies of geometrid moths, with questions addressed in this study highlighted. A: represents the results by Murillo-Ramos et al. (2019) based on 1200 taxa and up to 11 genetic markers per sample. B: represents the hypothesis proposed by Sihvonen et al. (2011), based on 164 taxa and up to eight genetic markers per sample. Epidesmiinae was proposed as a new subfamily by Murillo-Ramos et al. (2019), which was considered as part of Oenochrominae (1) in the Sihvonen et al. (2011) hypothesis (green). The root of Geometridae has either been Sterrhinae, or Sterrhinae + Larentiinae (salmon), and Eumelea has been considered the sister to Geometrinae or Oenochrominae (blue).
The taxonomic history of Eumelea has been even more unstable. This genus was part of Oenochrominae s.l., then it was transferred to Desmobathrinae (Holloway, 1996), where it was classified as valid at tribe rank by Holloway (1996). Molecular studies have suggested that Eumeleini are sister to Oenochrominae (Sihvonen et al., 2011), while morphological and other molecular analyses have suggested instead that this tribe could be part of the Geometrinae (Beljaev, 2008) or Ennominae .
The uncertainty surrounding the phylogenetic positions of Orthostixinae and Eumeleini within Geometridae may relate to the difficulty of obtaining sequence data of sufficiently high resolution. In previous studies, only a few genetic markers on these taxa were successfully sequenced, and the genetic information was not conclusive. However, new protocols for generating whole-genome sequencing data from museum samples and old DNA extracts  have been successfully implemented for other Lepidoptera (Call et al., 2021). These developments have paved the way towards increasing the available genetic information for museum specimens, including taxa that are rare, extinct, or otherwise difficult to collect.
Here, we revisit the phylogenetic relationships of geometrid moths, using for the first time on these insects, whole-genome sequencing data. To this end, we generated 21 new whole-genome shotgun data sets using representative species of all Geometridae subfamilies, thus intending to recover the deepest nodes and provide insights into the relationships among the nine currently recognized lineages of Geometridae (Rajaei et al., 2022). We analysed the genome assemblies against the 5286 single-copy orthologs (BUSCO) of a Lepidoptera gene set (Manni et al., 2021), and subsequently manually curated a gene set of 1000 protein-coding genes.
Through analyses of these new data, combined with our previous data and analyses, we asked: (I) Do the data support the hypothesis of Sterrhinae + Larentiinae (Sihvonen et al., 2011)

Taxon sampling and DNA extracts for genome sequencing
Twenty-one geometrid taxa were chosen for genome sequencing from DNA extracts used in Murillo-Ramos et al. (2019), ensuring at least one taxon per subfamily was represented. In addition, a further 18 species were included from available genomes or assemblies produced by Rota et al. (2022) (See Table S1). The final taxon set represents 39 species, 11 of which belong to outgroup species from other lepidopteran families (Table S1).

Library preparation and sequencing
We prepared libraries for whole-genome sequencing from the extractions used in Murillo-Ramos et al. (2019). Due to the presence of high molecular-weight DNA in these extractions, fragmentation was carried out by sonication to yield fragments of approximately 200-300 bp in size. We used Bioruptor ® with the following settings: (M) medium power output, 30 s ON/90 s OFF pulses up to 45 min in a 4 C water bath, followed by vacuum centrifugation and resuspension in 50 μL elution buffer. The prepared libraries followed a modified version of the protocol for Blunt-End Illumina Library construction in Meyer and Kircher (2010). Full details are given in Twort et al. (2021). Briefly, DNA is blunt-end repaired, followed by column purification with the MinElute purification kit (Qiagen). The next step is adapter ligation, followed by reaction purification and adapter fill-in. The indexing of each sample utilized a unique dual index strategy. To avoid amplification bias, we carried out six independent indexing PCR reactions for each sample and pooled them together before the final magnetic bead cleanup with Agencourt AMPure XP beads. We performed a two-step clean-up process, to remove long DNA fragments we used an initial bead concentration of 0.5X. This was followed by a 1.8X concentration to recover the library, with the expected size range of $300 bp. We quantify and We checked the quality of raw reads with FASTQC v0.11.8 (Andrews, 2010), followed by the removal of reads with ambiguous bases using Prinseq 0.20.4 (Schmieder & Edwards, 2011). Trimmomatic 0.38 (Bolger et al., 2014) was used to remove low-quality bases from the beginning (LEADING:3) and end (TRAILING:3), by evaluating read quality with a sliding window approach (4 bp window with average PHREAD of >25) and removing reads less than 30 bp in size. We carried out De novo genome assembly for each sample with SPAdes v3.13.0 (Bankevich et al., 2012), using a multi-kmer approach, with kmer values of 21, 33 and 55. Final genome assemblies can be accessed at Zenodo: DOI https://doi.org/10.5281/ zenodo. 7,801,402.

Reference gene set
To generate a reference gene set, publicly available reference genomes (see Table S1) were used. We used BUSCO v 4.1.4 (Manni et al., 2021) to identify single-copy orthologs using the Lepidoptera gene set (Lepidoptera_odb10, 5286 genes screened). Genes were aligned with MAFFT v7.310 (Katoh & Standley, 2013), before filtering and manual checking. To reduce the number of alignments to screen, a minimum of 10 sequences per gene was required. The alignments were then manually screened to ensure orthology, remove pseudogenes and remove difficult-to-align genes. This resulted in a final gene set of 1000 genes (see Table S2 for gene descriptions, retrieved from OrthoBD V10).

Orthologue identification and alignments
To identify and extract the 1000 reference genes from our assembled genomes we used the MESPA v7.310 pipeline (Neethiraj et al., 2017).
The extracted sequences were aligned to our existing reference align-

Sanger sequencing of additional Geometridae subfamilies
A total of 183 additional samples (Table S3) Table S3 for Accessions. Additionally, the 11 Sanger-sequenced genes were mined from the de novo Orthostixis cribaria assembly (sample PS307) using a BLAST approach.
Bombyx mori Linnaeus gene sequences (taken from the Murillo-Ramos et al. (2019) dataset) were queried against the O. cribaria assembly using a tBlastn approach (1e-5 cut-off). The output of the BLAST search (coordinates of BLAST hits within the genome) was used to extract the genes from the assembly using a set of Python scripts written by Carlos Peña (https://github.com/carlosp420/PyPhyloGenomics). The final dataset consisted of 1388 taxa with up to eleven markers.

Datasets
We built three datasets as follows: (1) a concatenated dataset of the 1000 genes for 39 taxa to be analysed in a supermatrix approach.

Analyses of the concatenated dataset
We conducted all the phylogenetic analyses under a Maximumlikelihood (ML) approach in IQ-TREE2 V2.0.7 . We implemented four search strategies: (1) a dataset partitioned by genes; (2) a dataset partitioned by codon position across genes; (3) a dataset with nucleotides translated to amino acids and (4) a dataset without partitions.
The best substitution models that fit our data were selected with ModelFinder (Kalyaanamoorthy et al., 2017). The dataset without partitions was analysed under the GHOST model. GHOST is a mixture model, consisting of several classes that accommodate heterotachous evolution among lineages (Crotty et al., 2020). We tested four categories under a GTR model. For all our search strategies, support for nodes were evaluated with 1000 ultrafast bootstrap (UFBoot2) approximations (Hoang et al., 2018), and SH-like approximate likelihood ratio test (Guindon et al., 2010). To reduce the risk of overestimating branch supports in the UFBoot2 test, we implemented the -bnni option. We visualized and edited the trees in FigTree v1.4.3 and the R package ggtree (Yu et al., 2017).

Analyses for individual gene sets
To consider possible conflicting gene histories, we analysed nucleotide sequences for each gene region in IQ-TREE2 with 1000 UFBoot2 approximations. We jointly used the ML and UFBoot2 gene trees for species-tree inference under a multi-species coalescent model as implemented in ASTRAL-III 5.7.1 (Zhang et al., 2018). To consider the uncertainty in gene trees, we quantified nodes support with the set of bootstrap trees for each gene (Ài -b -o options). However, recent versions of ASTRAL are also able to quantify node support without bootstrapped trees but based on the maximum-likelihood gene trees alone. To do so, we performed a second analysis where we assessed node support by posterior probabilities (Ài -o option).

Hypotheses testing and most influential genes
To test if our data provide support for either the Murillo-Ramos et al.
(2019) or the Sihvonen et al. (2011) hypotheses, we quantify the proportion of genes that support each node in our inferred topologies, by using the gene concordance factor (gCF) in IQ-TREE2. In addition, we tested if the inferred trees from our datasets have significant differences in log-likelihoods. To do so, we conducted a tree topology test in IQTREE2 with an approximately unbiased (AU) test (Shimodaira, 2002). If we detected topological differences, then we investigated possible causes of disagreements. Thus, we identified the genes contributing the most phylogenetic signal to those topologies and we calculated the differences among the gene-wise log-likelihood given the trees. Once we identified the most influential genes, we removed them from the dataset and repeated the IQ-TREE2 run with a partition model. We visualized and edited the final trees in FigTree and the R package ggtree (Yu et al., 2017).
To test topological incongruence in internal branches, we implemented a quartet-based sampling (QS) method (Pease et al., 2018).
QS quantifies branch support in phylogenetic trees and assesses the confidence, consistency and informativeness of internal branches.
Analyses were carried out using concatenated alignment. We compared the topologies recovered from analyses partitioned by codon, the GHOST model and our Sanger dataset including 1388 taxa. We ran 10,000 replicates at each node and used IQTREE2 to evaluate tree model likelihoods.
The phylogenetic position of Orthostixinae and Eumelea

Shotgun genome data
Sequencing of the 21 genomes undertaken in this study resulted in 702 million reads, with an average of 35 million reads per sample ( Sterrhinae as sister to Larentiinae or to all geometrids?
Based on a dataset of up to 1000 genetic markers per sample, we

Hypotheses testing and most influential genes
Based on the genes concordant factor, 17% of genes are found to support the sister relationship of Sterrhinae + Larentiinae ( Figure S2a), while 13% of genes support Sterrhinae as sister to all geometrids ( Figure S2b). The tree topology test rejected all the hypotheses except for the best tree recovered with data partitioned (Table S4). We investigated possible causes of topological differences and identified the genes contributing the most phylogenetic signal in the whole dataset that could potentially influence the results. According to the gene-wise log-likelihood test, 78 genes provided the strongest phylogenetic signal in the data (Supplementary File S1), therefore, we removed them from the dataset and reran IQTREE.
However, after the removal experiment, there were no changes in topology.
While the topology recovered with our data partitioned by genes, In all data analyses, the phylogenetic position of Orthostixis is maintained with moderate support (0.4, except in 1388 taxa data with weak QC = À0.01). The branch leading to Eumelea was not conclusive and the results are discordant. QC values from partitioned data and Sanger data were low (QC = À0.26, and À0.15), while the data analysed with the GHOST model was moderate (QC = 0.27).

The phylogenetic position of Orthostixinae and Eumelea
All subfamilies within Geometridae were monophyletic except for Desmobathrinae. All our tree search strategies, including the re- and latitudes, being adapted to cooler and humid conditions (Brehm & Fiedler, 2003;Hausmann, 2004;Holloway, 1986Holloway, , 1997.
For these reasons, we discuss the tree incongruence in detail in the following paragraphs.
Incongruence in tree topology is a major challenge for phylogenomic studies, and there remains little consensus on the best way to treat such differences, especially, if the different hypotheses are highly supported (Kapli et al., 2020;Philippe et al., 2011). In our case, one intuitive interpretation of the incongruences in our topologies could be explained by the effect of our tree search strategies, the nature of our data and limited taxon sampling.
F I G U R E 2 Phylogenetic hypothesis of the Geometridae moths based on phylogenomic data of thousand genetic markers. A: phylogenetic tree from the concatenated dataset, with partitions by codon across genes (the same topology was recovered with partitions by gene and amino acids One of our tree search strategies was data partitions. By far, partitioned analyses remain the most widely used method for analysing multilocus alignments (Kainer & Lanfear, 2015). However, in partitioned analyses, the data are always constrained by using a priori knowledge (Lanfear et al., 2012). This way of analysing the data is prone to introduce some bias and errors during tree estimation, especially if the partitions include insufficient data information.
The tree incongruences could also be systematic errors, which are the result of misspecification or violation of the model assumptions (Reddy et al., 2017), in which we over/under-estimate support. Our data partitioned by gene, codon or amino acids strongly support Sterrhinae + Larentiinae as sister groups, moreover, the tree topology test suggested this hypothesis as the best explained by our data. However, we interpret these results cautiously. A common observation in this, we believed that UFBoot2 and SH-like values may be overestimated in our data analyses. We have not tested the complexity of our dataset, but we acknowledge that our dataset is largely-sparse and has a big percentage of missing data. Given the partition strategies, we do not discard model misspecification, but more analysis needs to be done to confirm or reject this assumption.
The GHOST model, which we implemented also, is an alternative way to analyse phylogenomic data, which minimizes constraints and offers interesting biological insights when applied to the empirical data (Crotty et al., 2020). The topology inferred under a mixture model was not strongly supported (Figure 2b), but it was congruent with our species tree inferred from ASTRAL (Supplementary S1). Contrary to concatenation methods, summary trees have been proven to be more efficient and robust concerning poor taxon sampling and the long-branch attraction effect (Edwards et al., 2016;Liu et al., 2015).
Beyond the posterior probabilities and bootstraps from our species tree inference, we also used gCF as a complementary approach to analyse the phylogenetic position of Sterrhinae, but the percentage of genes supporting Sterrhinae as sister to all geometrids was also low (2019) study is the most complete phylogenetic study regarding taxon choices, and we believe the addition of more taxa compared to the Sihvonen et al. (2011) study helped to break up this grouping on their phylogenetic tree. It seems to us that increased taxon sampling will be a way to improve phylogenetic accuracy, helping to overcome the bias.
If the observed differences in the estimated topologies reflect violations of models, those may be difficult to detect, and further analyses would be required to discard one of the resulting hypotheses.
Our studies demonstrate that even with the addition of significantly more genetic data, deep nodes of geometrid lineages are still not unambiguously understood.

The phylogenetic position of Orthostixinae and Eumelea
All our analyses were congruent in grouping Orthostixinae within Desmobathrinae and in recovering Eumelea as a separate lineage.

Some earlier studies highlighted morphological similarities between
Orthostixinae and Desmobathrinae and questioned the validity of Orthostixinae as a subfamily (Beljaev, 2008;Hausmann, 1996Hausmann, , 2001. Previous molecular studies tried to infer the phylogenetic position of this subfamily Sihvonen et al., 2011); however, the lack of fresh material in those studies prevented the sequencing of an adequate number of genetic markers that could have conclusively resolved the systematic position of Orthostixinae within Geometridae.
Taking advantage of protocols developed to deal with old DNA extracts and museum samples , in this study we successfully extracted DNA sequences from the genome of Orthostixis cribraria, the type species of the subfamily and our results support Orthostixis as nested within Desmobathrinae. Despite the number of genes recovered in Orthostixis and Ozola being fewer than in the rest of the analysed taxa (Table 1) (Figures 4, 8), tapering ansa on tympanal organ (Figures 4, 8), tegumen with weakly developed lateral loops (Figures 3, 7), presence of  (Figure 3), although superficially similar to Ennominae, is thus considered a non-synapomorphic character. The current evidence, therefore, supports the Orthostixis association with Desmobathrinae, but the systematic position of Naxa still needs more research (see details in Holloway, 1996, Sihvonen et al., 2011. Thus, based on molecular and morphological data, we formally downgrade Orthostixinae Meyrick (e.g., Hausmann, 2001) to be a junior synonym of Desmobathrinae Meyrick syn. nov.
The phylogenetic position of Eumelea has also been controversial.
It has been recovered as sister to Oenochrominae (Sihvonen et al., 2011) or as sister to Geometrinae (Murillo-Ramos et al., 2021) in molecular studies, and our results placed this genus as sister to Oenochrominae, Geometrinae or as sister to Oenochrominae + Geo-

Concluding remarks
In summary, our results provide several steps forward in understanding better-focused parts in the phylogeny of Geometridae. We high-

SUPPORTING INFORMATION
Additional supporting information can be found online in the Supporting Information section at the end of this article.