Phylogenomic analysis of protein‐coding genes resolves complex gall wasp relationships

Gall wasps (Hymenoptera: Cynipidae) comprise 13 distinct tribes whose interrelationships remain incompletely understood. Recent analyses of ultra‐conserved elements (UCEs) represent the first attempt at resolving these relationships using phylogenomics. Here, we present the first analysis based on protein‐coding sequences from genome and transcriptome assemblies. Unlike UCEs, these data allow more sophisticated substitution models, which can potentially resolve issues with long‐branch attraction. We include data for 37 cynipoid species, including two tribes missing in the UCE analysis: Aylacini (s. str.) and Qwaqwaiini. Our results confirm the UCE result that Cynipidae are not monophyletic. Specifically, the Paraulacini and Diplolepidini + Pediaspidini fall outside a core clade (Cynipidae s. str.), which is more closely related to the insect‐parasitic Figitidae, and this result is robust to the exclusion of long‐branch taxa that could mislead the analysis. Given this, we here divide the Cynipidae into three families: the Paraulacidae stat. prom., Diplolepididae stat. prom. and Cynipidae (s. str.). Our results suggest that the Eschatocerini are the sister group of the remaining Cynipidae (s. str.). Within the Cynipidae (s. str.), the Aylacini (s. str.) are more closely related to oak gall wasps (Cynipini) and some of their inquilines (Ceroptresini) than to other herb gallers (Aulacideini and Phanacidini), and the Qwaqwaiini likely form a clade together with Synergini (s. str.) and Rhoophilini. Several alternative scenarios for the evolution of cynipid life histories are compatible with the relationships suggested by our analysis, but all are complex and require multiple shifts among parasitoids, inquilines and gall inducers.


INTRODUCTION
Gall wasps (Hymenoptera: Cynipidae) induce the development of highly modified plant tissues, termed galls, in which their immature stages develop (Melika & Abrahamson, 2002;Stone et al., 2002).The cynipid larva is enclosed inside a gall chamber lined with specialised nutritive cells formed by the plant in response to signals released by the gall wasp egg and larva (Harper et al., 2004;Hearn et al., 2019;Stone & Schönrogge, 2003).While all cynipids that appear induce the development of such nutritive tissues, several lineages-termed inquilines-can only induce nutritive tissue development within galls initiated by other species (Sanver & Hawkins, 2000).The inquilines can thus be seen as cynipids that induce a 'gall within a gall'.The presence of inquilines can negatively affect the fitness of the primary gall inducer, in many cases killing it (Lászl o & T othmérész, 2006).
Female Periclistus Förster inquilines have been reported to kill the larva of the inducing Diplolepis gall wasp by stabbing it with their ovipositor, potentially injecting harmful substances in the process (Shorthouse, 1973).
Several hypotheses on the mechanism of cynipid gall induction have been advanced, partly inspired by knowledge of other gallinducing organisms: secretion of auxins (Tooker & Helms, 2014), injection of virus-like particles (Cambier et al., 2019;Cornell, 1983), manipulation of plant nodulation factors (signalling molecules inducing nodules in legume roots; Hearn et al., 2019) or involvement of bacterial or fungal symbionts (Hearn et al., 2019).However, in contrast to some other gall induction systems (Harris & Pitzschke, 2020), there is no conclusive evidence for any of these hypotheses in cynipids, even though our understanding of the changes in gene expression patterns in early gall development is quickly increasing (Cambier et al., 2019;Hearn et al., 2019;Martinson et al., 2022).
Our understanding of the evolutionary origin of cynipid gall inducers and inquilines is equally poor.It has generally been assumed that the phytophagous forms constitute a monophyletic lineage, the family Cynipidae, although it has been surprisingly difficult to find morphological characters supporting their monophyly (Liljeblad & Ronquist, 1998;Ronquist, 1999;Ronquist et al., 2015).The Cynipidae are deeply nested within the insect-parasitic Apocrita (Blaimer et al., 2023;Heraty et al., 2011;Klopfstein et al., 2013;Peters et al., 2017;Ronquist, 1995Ronquist, , 1999;;Sharkey et al., 2011), and all other members of the superfamily Cynipoidea are insect parasitoids, so it has long been clear that the phytophagous gall inducers and inquilines must have evolved from insect-parasitic ancestors.Except for the Cynipidae, the superfamily Cynipoidea comprises the families Austrocynipidae, Ibaliidae, Liopteridae and Figitidae (Ronquist, 1995(Ronquist, , 1999)).The life history of several species of ibaliids and figitids is wellstudied (Ronquist, 1999, and references cited therein).They are all koinobiont endoparasitoids in the early larval instars.Towards the end of their development, they emerge and consume the remains of the moribund host as ectoparasitoids.The most diverse lineage is the Figitidae, which has appeared as the sister group of the Cynipidae in most previous analyses (Buffington et al., 2007(Buffington et al., , 2012;;Ronquist, 1995Ronquist, , 1999;;Ronquist et al., 2015; but see Blaimer et al., 2020).
The Diploplepidini induce galls on roses (Rosa L.) and include the wellknown bedeguar gall wasp, Diplolepis rosae (L.).The Pediaspidini and Eschatocerini are two small tribes, originally including a single genus each: Pediaspis Tischbein, a European genus inducing galls on maples (Acer L.), and Eschatocerus Mayr, a South American genus associated with galls on Vachellia Wight & Arn.(previously Acacia; Maslin et al., 2003) and other woody Fabaceae.The inquilines are grouped in this system into the Synergini, and the remaining gall inducers, mostly associated with herbaceous host plants, in the Aylacini.
Subsequent analyses of molecular data and combined molecular, morphological and life history data (Nylander et al., 2004;Ronquist et al., 2015) have revealed that the Synergini (s.lat.) are not monophyletic either.Specifically, Periclistus (inquilines in Diplolepis Geoffroy galls on Rosa) and Synophromorpha Ashmead (inquilines in Diastrophus Hartig galls on Rubus L.) are closely related to the Aylacini (s.lat.) genera Diastrophus and Xestophanes Förster, forming a lineage (now recognised as the Diastrophini) of gall inducers and inquilines associated with herbaceous and woody hosts in the Rosaceae.The remaining Aylacini (s.lat.) fall into three distinct lineages: (1) the Aylacini (s.str.), gall inducers associated with poppies (Papaver L.); (2) the Aulacideini, gall inducers mostly associated with Asteraceae and Lamiaceae, but also with a few other families, including Papaveraceae; and (3) the Phanacidini, gall inducers mainly associated with Asteraceae and Lamiaceae, and often inducing stem galls.The remaining inquilines fall into two distinct monophyletic lineages: (1) the Ceroptresini, including the single genus Ceroptres Hartig associated with Cynipini oak galls, and (2) and a lineage comprising the remaining inquilines of Cynipini oak galls and Rhoophilus Mayr, a South African inquiline in lepidopteran galls on species of Searsia F. A. Barkley (Anacardiaceae) (van Noort et al., 2007).Several analyses have supported a sistergroup relationship between Rhoophilus and the remaining members of the clade (Ide et al., 2018;Liljeblad & Ronquist, 1998;Ronquist et al., 2015), and recently, it was proposed to recognise separate tribes for Rhoophilus, the Rhoophilini (Lobato-Vila et al., 2022) and the remaining members, the Synergini (s.str.) (Table 1).Recent work has also shown that the latter, previously considered to consist entirely of inquilines, includes at least one deeply nested lineage comprising Synergus itoensis Abe, Ide & Wachi and related species that induce galls de novo inside acorns (Abe et al., 2011;Gobbo et al., 2020;Ide et al., 2018).
In recent years, two additional lineages associated with woody host plants have been added: (1) the tribe Qwaqwaiini, based on a newly discovered gall inducer on Scolopia Schreb.(Salicaceae) in South Africa (Liljeblad et al., 2011), and (2) the Paraulacini, a lineage of temperate South American cynipids associated with galls on southern beeches (Nothofagus Blume; Nothofagaceae) (Nieves-Aldrey et al., 2009).Somewhat surprisingly, a member of the Paraulacini was recently shown to be a parasitoid (Rasplus et al., 2022; see also Discussion below).
In summary, current classification of the Cynipidae comprises 13 tribes (Table 1).The most recent 'old-school' analysis of these relationships based on combined data from five molecular markers, morphology and life history data (Ronquist et al., 2015) failed to resolve relationships among these tribes, with three notable exceptions (Figure 1a): (1) Diplolepidini and Pediaspidini, both gallers of woody-rosid host plants, are sister groups; (2) the two major lineages of herb gallers, the Aulacideini and Phanacidini, are sister groups; and (3) the Rhoophilini and Synergini (s.str.) are sister groups, as mentioned above.Many of the tribes appear to represent isolated lineages, with no close relatives among the other tribes (Nylander et al., 2004;Ronquist et al., 2015).
Recently, Blaimer et al. (2020) used phylogenomic analysis of ultra-conserved element (UCE) DNA sequence data (Faircloth et al., 2012) to infer relationships among a phylogenetically diverse T A B L E 1 Life history of the 13 tribes of Cynipidae recognised currently (Lobato-Vila et al., 2022;Ronquist et al., 2015).relationship first hinted at in Hymenoptera-wide analyses (Peters et al., 2017; see also Blaimer et al., 2023) and indicating that herbivorous cynipids do not form a monophyletic clade.Finally, the analysis suggested that the Eschatocerini may be the sister group of the Figitidae, although the evidence for this was weak and alternative placements appeared under some analysis settings.Although these results seemingly conflict substantially with those of Ronquist et al. (2015), all robustly resolved relationships among major lineages are actually consistent between the studies, except for the different rooting of the cynipoid tree and a minor difference in the position of Euceroptresinae (Figure 1).The major advances in the UCE analysis are the drastically different rooting and the increased resolution of relationships among major lineages of figitids and cynipids.
The analysis we present here is the first phylogenomic analysis based on genome and transcriptome assemblies, and it allows a largely independent test of the results from the UCE analysis.In contrast to the UCE analysis, our taxon sampling is focused on cynipids.It lacks ibaliids and liopterids, and is relatively sparse with respect to figitids.However, it includes representatives of all cynipid tribes except the recently recognised Rhoophilini.Importantly, it includes the Qwaqwaiini and Aylacini (s.str.), both of which were missing from the UCE analysis.Blaimer et al.
(2020) used Aylax salviae Giraud as their only representative of the tribe Aylacini (s.str.).However, this species has long been placed in the genus Neaylax Nieves-Aldrey (Nieves-Aldrey, 1994, 2001), which does not belong to the Aylacini (s.str.) but instead is deeply nested inside the Aulacideini (Ronquist et al., 2015).Specifically, Neaylax salviae belongs to a clade of Aulacideini gallers of Lamiaceae related to the genus Antistrophus Walsh (Ronquist et al., 2015), and this is entirely consistent with the placement of 'Aylax' salviae in the UCE analysis (Blaimer et al., 2020).Another key taxon represented in our analysis but not in the UCE analysis is the single species in the recently described genus Protobalandricus Nicholls, Stone & Melika, P. spectabilis (Kinsey), which represents a divergent sister group to all other sampled Cynipini (Nicholls et al., 2018).F I G U R E 1 Recent hypotheses of cynipid relationships.(a) Combined analysis of data from five molecular markers, morphology and life history (Ronquist et al., 2015: fig. 2).The tree was rooted on Ibaliidae; the analysis lacked more distant outgroups.(b) Phylogenomic analysis of 1147 UCE loci (Blaimer et al., 2020: fig.1).The tree was rooted using multiple outgroups (indicated by the stalk).Some figitid lineages lacking in Ronquist et al. (2015) removed to facilitate comparison.In both cases, we only show clades with at least 95% support (posterior probability or bootstrap support).Figitid lineages are shown in smaller font.
T A B L E 2 Overview of the species included in this study.Importantly, by focusing on data from protein-coding genes, we can use sophisticated substitution models that accommodate variation in amino acid profiles across sites.These models are known to resolve some issues with long-branch attraction that can affect analyses under standard models, such as those used in the UCE analysis (Kapli & Telford, 2020).The most surprising UCE results do involve the placement of long, isolated lineages-the Paraulacini, Eschatocerini and Diplolepidini + Pediaspidini-and the tree is rooted with distant outgroups.It is therefore possible that long-branch attraction may be an issue.Based on the results of our analysis, which largely confirm and complement the UCE results, we propose a new family-level classification of the Cynipidae.We also discuss the implications of our findings for inference of the evolutionary origin of cynipoid gall inducers and inquilines.

Taxon sampling
Thirty seven species were chosen to represent all of the currently recognised tribes of cynipid gall wasps except Rhoophilini, and to include as much of the phylogenetic diversity within each lineage as possible (Table 2).New genomic data were generated for 21 of the 37 species (  1).The selection of exemplars for this study was completed before the appearance of the recent UCE study (Blaimer et al., 2020), but it does cover all major cynipid lineages detected in that analysis.

De novo genome assemblies
Two protocols were followed (Table S1).For the Andricus curvator Hartig and A. quercusramuli (L.) assemblies, DNA was extracted from single adults using the Thermo Scientific KingFisher Cell and Tissue DNA Kit and the KingFisher Duo magnetic particle processor.Genomes were sequenced by the Swedish National Genomic Infrastructure from ChromiumX libraries (Zheng et al., 2016)  Fastqc (version 0.11.9)(Andrews, 2010).Most genome assemblies were constructed using SPAdes (version 3.14.0)(Bankevich et al., 2012), with most species run in isolate mode with coverage cut-off estimated automatically and default k-mers.Exceptions to this were Cecinothofagus ibarrai Nieves-Aldrey & Liljeblad, Callaspidia notata (Fonscolombe), Periclistus spJH-2016 and Phaenoglyphis villosa (Hartig), which were assembled without a coverage cut-off and Fumariphilus hypecoi and Eschatocerus acaciae Mayr, which were both assembled with an additional k-mer of 99.Data for several species were first published in Hearn et al. (2019), but were reassembled as described here for consistency (Table S1, assembly origin column).Synergus species genomes and the Biorhiza pallida and Ganaspis species 1 transcriptomes were not re-assembled here.Quality statistics for all genomes and transcriptomes are given in Table S1.

Gene finding
To find conserved genes suitable for phylogenetic analysis, we predicted Hymenoptera and Eukaryota BUSCOs for each genome using BUSCO version 4.0.6 and OrthoDB version 10 (Simão et al., 2015) for each genome and transcriptome.The Hymenoptera dataset consisted of 5991 BUSCO groups predicted from 40 species (Table S1).
Lineage-specific BUSCO datasets are composed of genes present almost universally as single-copy genes, although duplications within test datasets can occur (Simão et al., 2015).
Only sequences classified as complete single-copy BUSCOs were used in our analysis.A predicted BUSCO is defined as complete if its length is within two standard deviations of that BUSCO group's mean length, that is, within 95% of its expected length (Simão et al., 2015;Waterhouse et al., 2018).BUSCOs were divided into categories based on the number of species in which the gene was retrieved in a complete, single copy.In total, we found 5890 complete single-copy BUSCOs in at least one of the 37 genomes/transcriptomes (Table 3).Our phylogenetic analyses focused on the 523 genes that were present in 34 or more of the 37 taxa (representing a total of 1.24 Mb of nucleotide sequence data after alignment) and subsets of this dataset.We also ran one analysis using a quality-filtered subset of the 1898 genes present in 30 or more taxa.The completeness of each genome/transcriptome assembly is given in Table S1.

Alignment and quality scoring
Sequences were aligned using ClustalOmega version 1.2.4 (Sievers et al., 2011), and the alignments were filtered using Gblocks version 0.91b (Talavera & Castresana, 2007), with default parameters except for gap treatment, which was set to 'all' to retain more phylogenetic information (Kück et al., 2010).For the purpose of phylogenetic reconstruction based on multiple genes, custom scripts were used to concatenate the desired alignments.
The putative quality of alignments was scored using the fraction of the total alignment length retained after Gblocks filtering.As alternative quality filtering and scoring options, we used HmmCleaner version 0.180750 (Di Franco et al., 2019) and OD-Seq version 1.22.0(Jehl et al., 2015), in the former case with and without previous Gblocks filtering, and in the latter after previous Gblocks filtering.

Phylogenetic analysis
Phylogenetic analysis was performed using IQ-Tree version 1.6.12(Nguyen et al., 2015) and PhyloBayes version 1.8 (Lartillot & Philippe, 2004) using models that accommodate site-specific amino acid profiles.Specific settings for each program are given below.

IQ-Tree
We used IQ-Tree for maximum-likelihood analyses based on the C60 + I + G5 model.The C60 option specifies a fast approximation of an amino acid profile mixture model with 60 profile categories estimated from reference data (Wang et al., 2018).We modelled rate variation across sites using a mixture of invariable sites and a discrete approximation of a gamma distribution with five categories (i.e., I + G5).Support values were estimated using the ultrafast bootstrap (Hoang et al., 2018;Minh et al., 2013) with 2000 replicates per analysis.For each inference problem, we ran two independent analyses to confirm that phylogenetic relationships and support values were consistent.All runs used 32 CPU cores.

PhyloBayes
In PhyloBayes, we used the CAT F81 model.The CAT model infers the amino acid profile for each site from the data, assuming that the profiles come from a Dirichlet process mixture.We assumed that the exchangeability rates were the same (F81) rather than trying to estimate them from the data (the GTR option).Estimating the exchangeability rates was too computationally complex for the analyses we attempted, and it is not obvious that the results would be more accurate, as rare changes can be explained both by unusual amino acid profiles and by low exchangeability rates under the CAT-GTR model, creating an identifiability problem that is potentially problematic.Rate variation across sites was modelled using a discrete approximation of the gamma distribution with four categories.For each inference problem, we ran two independent PhyloBayes analyses for 72 h using the MPI version on 32 CPU cores.Convergence diagnostics and consensus trees were generated for each pair of analyses using the bpcomp program in the PhyloBayes package, retaining every tenth sample and using a burn-in of 25% of samples.In all cases, the mean difference in split frequencies was less than 0.005, usually much less.The maximum difference in split frequencies was 0.09 for the analysis of the problematic 36-taxon dataset (see below), but was below 0.05 for all other analyses.

Individual gene tree analysis
As a complement to the analyses based on concatenated gene data, we also assessed node support using metrics summarising the information for individual gene trees.We used as the species tree the one inferred in PhyloBayes under the CAT-F81 model and based on the best third of the alignments that include at least 34 of the 37 taxa.Each individual gene tree was reconstructed using maximum likelihood with the best fit substitution model automatically selected by ModelFinder.First, using IQtree2 (Minh, Schmidt, et al., 2020), we calculated the gene concordance factor (gCF), which reflects the proportion of genes supporting a node while correcting for potential uneven taxon sampling per gene, and the site concordance factor (sCF), which estimates the concordance at the level of individual sites (Minh, Hahn, & Lanfear, 2020;Mo et al., 2022).Second, using RAxML version 8.2.12, we calculated internode certainty (IC), which informs about the certainty of a bipartition by considering its occurrence in a set of gene trees relative to the occurrence of the second-best bipartition (Salichos & Rokas, 2013).

UCE analysis
To directly test our results against the previous UCE analysis by Blaimer et al. (2020) and extend this analysis, we compiled a dataset combining data from both studies.We first blasted the UCEs in the previous study to our Gblocks-filtered data for genes present in at least 34 of the 37 taxa in our analysis.We saved all matches with an e-value <10 À30 , and then we removed UCEs with multiple hits in the same genomes and UCEs that were too close (less than 50 bp on either side) to the edge of a contig/scaffold.For the phylogenetic analysis, we followed Blaimer et al. (2020).Specifically, we adopted the SWSC-EN partitioning scheme, then used PartitionFinder2 (Lanfear et al., 2017) to find the optimal partitioning scheme, and finally ran an ML search in IQ-Tree under default settings for partitioned data, that is, using ModelFinder (Kalyaanamoorthy et al., 2017) to find the best nucleotide model for each partition, and then running an ML search for the best tree.We used 1000 ultrafast bootstrap replications in IQ-Tree to assess branch support in the resulting tree.

Tree figures
Illustrations of phylogenetic trees were generated using the R package ggtree version 3.2.1 (Yu et al., 2017) running under R version 4.1.1.

Alignment quality and phylogenetic signal
We first explored the data by analysing the dataset that contained the 31 genes that were present in all 37 taxa (Table 2).We will refer to this as the T37-G31 dataset, for 37 taxa and 31 genes.We then successively expanded the amount of data by analysing the T36-G123 dataset (123 genes present in 36 or more taxa), T35-G296 dataset (296 genes present in 35 or more taxa) and T34-G542 datasets (542 genes present in 34 or more taxa).This series represents a trade-off between completeness in terms of taxa and amount of genomic data included.
When these four datasets were analysed with IQ-Tree and a model accommodating site-specific amino acid profiles (C60 + I + G5), we discovered striking differences in topology (Figure 2).In the smallest dataset (T37-G31; Figure 2a), including only the complete alignments, Eschatocerus (Eschatocerini) diverges early in the tree.However, in the next smallest dataset (T36-G123; Figure 2b), Eschatocerus is instead grouped inside the core cynipid lineages, in a clade together with Protobalandricus (Cynipini), Phanacis Förster (Phanacidini) and Iraella (Aylacini s. str.).This is a somewhat surprising result, as it breaks the monophyly of the oak gall wasps (Cynipini), long presumed to be a monophyletic group.It also moves Phanacis (Phanacidini)-representing one major herb-galling cladefrom a sister-group relationship with the other major herb-galling clade (Aulacideini) to a position within a heterogeneous collection of lineages.As more genes (and more gaps) are added, Eschatocerus changes again to an early-diverging position (Figure 2c,d), but Protobalandricus remains outside the Cynipini, even though the support for this is quite poor in the largest dataset (Figure 2d).
In trying to understand these results, we noted that Eschatocerus is a long-branch taxon, and that the three other taxa that group with Eschatocerus in the next smallest dataset (Figure 2b) have three of the five most incomplete genome assemblies in terms of the number of retrieved genes (Table S2).This suggests that the clade consisting of Eschatocerus, Protobalandricus, Phanacis and Iraella may be spurious and caused by long-branch attraction and/or poor or misleading gene alignments.
We looked at long-branch attraction first.The C60 model in IQ- Tree is an approximation of the CAT model in PhyloBayes and may not accurately represent site-specific amino acid profiles in cynipoids.Such deviations could potentially cause problems with longbranch attraction in our analysis.To check this possibility, we repeated the analysis of the two smallest datasets (the others were too large) in PhyloBayes using the CAT-F81 model (Figure 3).The results were identical with those obtained with IQ-Tree, suggesting that the topological changes are not caused by problems with the C60 approximation.
sometimes removed substantial portions of the alignments.If a substantial portion of an alignment is unreliable, then the part that remains after filtering could also be potentially of doubtful quality.To examine this possibility, we divided the T34-G542 dataset into six approximately equal gene subsets based on the proportion of the alignments removed by Gblocks.When analysed with IQ-Tree under the C60 + I + G5 model, the three best data subsets (that is, the three subsets with the smallest proportion of site columns removed by Gblocks) resulted in trees (Figure 4a-c) that were identical to each other and to the tree from the no-gaps dataset T37-G31 (Figure 2a), except for a few minor details, most of which were not well supported.Notably, Eschatocerus always diverged early, Cynipini was monophyletic and Phanacis grouped with Aulacideini in all these trees.
The results for the three worst data subsets (Figure 4d-f) differed among themselves and from the results of the no-gaps dataset (T37-G31) in several respects, often involving contrasting placements of the four problematic taxa mentioned previously-Eschatocerus, Protobalandricus, Phanacis and Iraella-or unusual arrangements of more basal branching events, but with low support.Thus, it appears that the phylogenetic signal is consistent in the best gene alignments, that is, those that contain only small portions that are detected as problematic by the Gblocks filter.
To further test the effect of alignment quality, we also explored partitions of the T34-G542 dataset generated using other filtering and scoring methods.Specifically, we tried HmmCleaner, Gblocks + HmmCleaner and Gblocks + OD-Seq, and then divided the gene alignments into subsets based on how many sites were removed (HmmCleaner and Gblocks + HmmCleaner) or how many sequences (Gblock + OD-Seq) were removed by each of these pipelines.These tools represent different approaches to data cleaning: Gblocks removes site columns, OD-Seq removes outlier sequences and HmmCleaner removes sequence fragments.In all cases, the IQ-Tree analyses of the highest quality data subset or subsets resulted in trees that were identical or almost identical to the tree from the no-gaps analysis (Figures S1-S3).The quality scores for OD-Seq are few, and it was difficult to devise criteria that generated partitions of equal size.We therefore ended up with the best partition (no sequences removed) being much smaller than the other ones (approximately 2900 sites vs. 25,600-56,800 sites), and resulting in a poorly resolved tree with some unusual features (Figure S3A).Analysis of the next best OD-Seq partition, however, retrieved a tree that was highly similar to the no-gaps tree (compare Figure 2a to Figure S3B).
Based on these results, we conclude that it is mainly poor-quality alignments that generate the somewhat unexpected placements of Eschatocerus, Protobalandricus, Phanacis and Iraella in analyses of the T36-G123, T35-G296 and T34-G542 datasets.
Could the non-monophyly of Cynipidae be the result of long-branch attraction, pulling isolated cynipid lineages towards the outgroups?To examine this question, we focused on a dataset consisting of the two best subsets of the T34-G542 dataset according to the Gblocks criterion, and we used PhyloBayes as the best approach for detecting long-branch attraction.The analysis of the complete taxon set resulted in the tree with non-monophyletic Cynipidae (Figure 5a).
From this dataset, we then removed in turn Cecinothofagus, Eschatocerus, outgroups, Cecinothofagus + Eschatocerus, and Eschatocerus + outgroups.These were the only removals of long-branch taxa that left a sufficient number of remaining lineages to test non-monophyly of Cynipidae.In all cases, the support for non-monophyletic Cynipidae remained at 100% (Figure 5b-f).The results were almost identical when the same datasets were analysed with IQ-Tree (Figure S4).In conclusion, we find no evidence that the non-monophyly of Cynipidae is caused by long-branch taxa causing problems with the analysis.
F I G U R E 5 Testing the potential effect of long-branch taxa on phylogenetic results.For these analyses, we used the best third of the T34-G542 alignments, that is, the alignments where Gblocks filtered out 26% or less of the sites (see Figure 3).Notably, the phytophagous groups (gall inducers and inquilines, green) remain diphyletic in all analyses with respect to the parasitoid lineages (black).

Gene tree analysis
The gene tree concordance analysis shows that there is consistent signal across gene trees for the deep splits in the superfamily, that is, between Cecinothofagus and the remaining taxa, and between Diplolepis + Pediaspis on one hand and the remaining Cynipidae and Figitidae on the other (Figure S5).This is reflected both by a positive IC and a gCF > 40%.However, the relationships among Eschatocerus, remaining Cynipidae (s.str.) and Figitidae are not consistently resolved across gene trees.Similarly, this analysis indicates a fair amount of inconsistency across gene trees concerning tribal relationships within the Cynipidae (s.str.) excluding Eschatocerus.This could be because errors in the assemblies, errors in gene tree inference due to lack of data or biases in the simplified model used, or reflect genuine topological variation among the gene trees.However, we did not pursue this further.

Trade-off between taxon completeness and data quality filtering
To further explore the trade-off between taxon completeness and alignment quality filtering, we ran an IQ-Tree analysis under the C60 model also for the best gene alignments (those where Gblocks removed at most 10% of alignment columns) that had data for 30 or more of the taxa.The size of the filtered dataset (811,373 amino acid sites for 37 taxa) was close to the limit of what was computationally feasible with IQ-Tree under the C60 model on the cluster we used.This analysis is more relaxed with respect to taxon completeness and stricter with respect to alignment quality compared with the high-quality T34-G542 analysis (Figure 5a).The resulting tree (Figure S6) differed only in minor details from the T34-G542 tree but was less resolved, indicating that this analysis provided a less favourable balance between signal and noise.

Extended UCE analysis
The combined UCE analysis of our data and the data of Blaimer et al.
(2020) resulted in a tree (Figure S7) that is entirely consistent with the high-quality T34-G542 tree (Figure 5a).Notably, the clade consisting of Iraella, Cynipini and Ceroptresini remained strongly supported, confirming our conclusion that the Aylacini (s.str.) are not closely related to the Aulacideini + Phanacidini.The Eschatocerini were placed as sister to the remaining Cynipidae (s.str.) with strong support, in accordance with our high-quality T34-G542 tree and with some analyses of Blaimer et al. (2020).Interestingly, this analysis placed the Qwaqwaiini as sister group to the Rhoophilini, with strong support.Importantly, the extended UCE analysis suggests that the phylogenomic results presented here and in Blaimer et al. (2020) are robust to the choice of different types of data, analytical approaches and sampling of ingroup and outgroup taxa.

Phylogenetic relationships
As our best phylogenomic estimate of relationships, we present the PhyloBayes (CAT-F81) analysis of the two best subsets of the T34-G542 dataset according to the Gblocks criterion (Figure 6; see also Figures 5a and S4A), as it appears to represent the best trade-off between signal and noise as judged by clade credibility values, and uses the most sophisticated of the analytical approaches explored here.The tree is also largely congruent with (and never conflicts strongly with) the results from any of the other analyses of quality-filtered data.It is also entirely consistent with the results of the extended UCE analysis (Figure S7).On the tree, we have indicated Among the more early-diverging cynipoid lineages, the Diplolepidini, represented by Diplolepis, and the Pediaspidini, represented by Pediaspis, form a strongly supported clade, which appears to be the sister group of Figitidae + Cynipidae (s.str.).We propose here that this clade be recognised as a separate family, the Diplolepididae stat.
Given this, we suggest that these tribes are recognised as separate subfamilies, the Diplolepidinae stat.prom.and the Pediaspidinae stat.
Finally, our results support the conclusion of Blaimer et al. (2020) that the Paraulacini (represented by Cecinothofagus) form the sister group of the remaining cynipoid lineages.We propose here that also this clade be recognised as a separate family, the Paraulacidae stat.
prom.(Figure 6).The new classification is discussed in more detail in the Taxonomy section below.The tree is based on the best third of the alignments that include at least 34 of the 37 taxa (T34-G542 dataset, Gblocks filtering removed less than 26% of sites), analysed using PhyloBayes under the CAT-F81 model (the same analysis shown in Figure 5a).Current cynipid tribes and figitid subfamilies are indicated, together with the proposed new classification of cynipid lineages into three distinct families.

TAXONOMY Taxonomy
Given that our results provide solid and independent confirmation of the results from the UCE analysis (Blaimer et al., 2020) regarding the non-monophyly of Cynipidae, we find it appropriate to revise the family-level classification to reflect these findings here.As the circumscription of the 13 cynipid tribes and potential apomorphies characterising each of them have been discussed at length previously (Lobato-Vila et al., 2022;Ronquist et al., 2015), we just give a brief formal synopsis of the proposed family classification here.The Diplolepidini Latreille (Ronquist, 1999) Diagnosis: Distinctive morphological and biological features that separate this family from the Cynipidae (see key to tribes in Ronquist et al., 2015) are: female antenna with 12 or more flagellomeres and male antenna without modified F1 (Figure 8e); mesopleuron with a mesopleural longitudinal impression (Figure 8b); scutellar foveae faint or absent (Figure 8a); metatarsal claws simple; and hypopygium either ploughshare-shaped (Figure 8c) or hypopygial spine short.Gall inducers on Rosa or Acer.
Comments: Potential apomorphies of the Diplolepididae include the mesopleuron with a mesopleural longitudinal impression, faint or absent scutellar foveae and the female and male antenna having 12 or more flagellomeres (Ronquist et al., 2015: couplet 3 in the key to cynipid tribes).A quantitative analysis of the available morphological and biological evidence for Diplolepididae monophyly is still missing.
Before such an analysis is attempted, however, it would be valuable to reassess the morphological evidence in the light of the phylogenomic results.It is clear that such an analysis would have to span all major cynipoid lineages, and not be restricted to the former cynipid groups.
Circumscription: The family includes two subfamilies, Diplolepidinae and Pediaspidinae, corresponding to the tribes Diplolepidini and Pediaspidini in Ronquist (1999), here elevated to subfamily status.
Comments: Putative morphological apomorphies for the Diplolepidini include the ploughshare-shaped hypopygium, the broad and crenulate mesopleural impression, and the lack of lateral propodeal carinae (Ronquist et al., 2015).However, quantitative analyses have struggled to identify unique or distinct apomorphies for the subfamily, partly because of variation among the constituent taxa, and partly because of the previous difficulties in resolving relationships among cynipid tribes (Ronquist et al., 2015).There is some uncertainty surrounding the interpretation of the hypopygial character.Vyrzhikovskaja (1963) claims that the hypopygium of Liebelia is straight and not ploughshare-shaped as in Diplolepis, a claim that is repeated elsewhere without reference to the original source (Melika, 2006;Pujade-Villar et al., 2020).Nonetheless, the hypopygium of L. magna Vyrzhilkovskaja clearly appears to be ploughshare-shaped in the SEM illustration provided by Liljeblad et al. (2008), albeit less extremely so than the hypopygium of Diplolepis.The hypopygium of L. dzhungarica also appears to have some distinct similarities in shape to that of Diplolepis (Abe, Melika & Stone 2007).Given this uncertainty, the character would clearly warrant more detailed study in these and related taxa.Comments: The Pediaspidini are characterised by several unique or distinct apomorphies, among them the posteromedian scutellar impression (Ronquist et al., 2015).Himalocynips, a genus with a single species that was described within its own subfamily (Himalocynipinae Yoshimoto, 1970) was included within the Pediaspidini by Ronquist (1999).Its phylogenetic proximity to Pediaspis was later supported by a morphological phylogenetic analysis (Liljeblad et al., 2008).The biology and host plant of this species are however unknown, although it may (as for Pediaspis) be a galler on Acer (Sapindaceae).We were unable to include this rare and poorly studied species in our analysis, and a molecular confirmation of its placement within the Pediaspidini and the Diplolepididae is an obvious priority for future studies.
One genus gall inducer on Acer spp.(Sapindaceae); the other with unknown biology.
Comments: The position of the Eschatocerini is still highly uncertain, and its life history is also poorly studied.Future studies will have to show whether it truly belongs to the Cynipidae (s.str.), or elsewhere in the Cynipoidea, probably then as a separate family.The potential apomorphies of each of the remaining tribes have been analysed previously (Ronquist et al., 2015), although it would be valuable to reassess the morphological and biological evidence and re-analyse it in the light of the new phylogenomic results.The same applies to potential apomorphies for the Cynipidae (s.str.).In the latter case, there are no known apomorphies at present.
Although there is growing evidence that the Phanacidini and Aulacideini are sister groups, we prefer to keep them as separate tribes (in contrast to Blaimer et al., 2020), as there are distinct biological and morphological differences between the groups.The Phanacidini tend to be small and elongate species, and most of them are stem gallers.The Aulacideini tend to be larger and their body form is more rounded.They usually induce galls in other plant parts.
Although one could argue for the grouping of tribes within the Cynipidae (s.str.) into subfamilies, we consider it premature to do so at the current time.In particular, it would be advantageous if the position of the Eschatocerini could be determined unambiguously before further refinement of the classification is considered.Thus, for now, we argue that all extant tribes of the family should remain in a single subfamily, the Cynipinae (s.str.).

Alignment quality and phylogenetic signal
Assembling genomes or transcriptomes from short sequence reads and finding single-copy orthologs in those assemblies are challenging tasks.Thus, one might expect some variation in the quality of the resulting gene datasets.There is a plethora of tools for aligning the gene sequences in those datasets, and for filtering out alignment sites or sequences that may provide noisy or misleading phylogenetic signal.Nevertheless, it may be difficult to eliminate such data issues.
Our phylogenetic results varied depending on which gene alignments were included but were consistent for the high-quality alignments, regardless of method used to identify the latter (alignment completeness, Gblocks results, HmmCleaner results, Gblocks + HmmCleaner results, or Gblocks + OD-Seq results).This suggests that we had problems with misleading phylogenetic signal in poor alignments, rather than true conflict between different gene trees.This is also supported by the fact that the four taxa that were apparently incorrectly grouped together in analyses including poor alignments (Eschatocerus, Iraella, Phanacis and Protobalandricus) also were represented by some of the most incomplete genome assemblies.It is interesting to note that the taxa represented by transcriptomes (Biorhiza and Ganaspis) were not affected by similar problems with unstable phylogenetic placements, despite the rather incomplete representation of the genome in these transcriptomes (Table S2).This, in addition, supports the conclusion that some alignments included misleading phylogenetic signal from poor-quality genome assemblies and gives some confidence in the tree resulting from analysis of the high-quality alignments.
Interestingly, our results also suggest that quality filtering tools, such as the ones we tested (Gblocks, HmmCleaner and OD-Seq), are better at identifying problem alignments than they are at filtering out erroneous or misleading sites and sequences.None of these tools were able to remove the misleading phylogenetic signal from the poor alignments, although they might have had some positive effect.
The ultimate cause of the discordant phylogenetic signal remains unclear.The four problematic taxa may group together in some analyses simply because they share divergent or incorrect sequences for some genes.The signal could be entirely erroneous, for example, through sharing of specific gene pairs that can easily be merged into chimeric sequences in challenging genome assemblies, resulting in positively misleading phylogenetic signal that groups them together.
As several alternative approaches to filtering out poor gene alignments gave consistent end results, we are fairly confident that our phylogenetic analysis is not misled by erroneous genome assemblies.Using models that accommodate among-site variation in amino acid profiles, we applied some of the best available tools for resolving long-branch attraction due to model shortcomings (Kapli & Telford, 2020).We also note that removal of long-branch taxa in various combinations revealed no sign of an alternative phylogenetic signal obscured by long-branch attraction effects (Figure 5).

Phylogenetic relationships
The phylogenetic results from our analysis are largely congruent with and complement those of the earlier UCE analysis (Blaimer et al., 2020).Here we highlight the major agreements and disagreements between these two phylogenomic studies.forming the sister group of the remaining lineages in that clade, while the UCE analysis favoured a sister-group relationship between the Eschatocerini and Figitidae.Interestingly, the extended UCE analysis presented here (Figure S7) strongly supported the Eschatocerus position favoured in our main analysis.
However, as the Eschatocerus genome assembly is one of the least complete in our study, further genomic sequencing of this taxon would be highly desirable.
ii. Rejection of monophyly of cynipid herb gallers: Our study provides even stronger support for the conclusion of the UCE analysis (Blaimer et al., 2020) that the herb-galling clade of Aulacideini + Phanacidini is monophyletic.Both analyses are consistent with previous studies suggesting that these two tribes are reciprocally monophyletic (Liljeblad & Ronquist, 1998;Ronquist et al., 2015).
Unlike Blaimer et al. (2020), we prefer to keep the tribes Aulacideini and Phanacidini separate until there is evidence that this would conflict with phylogenetic relationships.Blaimer et al. (2020) also concluded that the cynipid herb gallers (apart from a few species in the tribe Diastrophini) form a monophyletic clade, Aylacini (s.lat.), which is the sister group of all remaining Cynipidae (s.str.).As mentioned above, this interpretation is based on the incorrect assumption that Neaylax salviae (which they name Aylax salviae) belongs to the Aylacini (s.str.).In fact, this species belongs to a clade of Lamiaceae gallers in the Aulacideini (Ronquist et al., 2015) and is unrelated to the true Aylacini (s.str.), all known species of which are associated with poppies (Papaveraceae).Our study is the first phylogenomic analysis to include a true representative of the Aylacini (s.str.), Iraella hispanica Nieves-Aldrey, and our analysis clearly shows that herb gallers in Aylacini (s.str.) and in Aulacideini + Phanacidini are phylogenetically divergent.Instead, Aylacini (s.str.) is deeply nested within the sister-group of Aulacideini + Phanacidini, a clade that is dominated by inquilines and gall inducers associated with woody rosids (the only exception being a few species of Diastrophini that are gallers of herbs in the genus Potentilla).Thus, galling of herbs in the family Papaveraceae by the Aylacini (s.str.) appears to be secondary.Our results are consistent with several earlier analyses suggesting that the Aylacini (s.str.) form a lineage that is distinct from that of the Aulacideini and Phanacidini (Liljeblad & Ronquist, 1998;Nylander et al., 2004;Ronquist et al., 2015), and they agree with preliminary analyses of a recent genome assembly of Aylax minor Hartig, another member of the Aylacini (s.str.) (AB, unpublished data).
iii.Phylogenetic placement of the Qwaqwaiini: Ours is the first phylogenomic analysis to include the Qwaqwaiini.Our analysis places Qwaqwaiia scolopiae, the only known species in the Qwaqwaiini, as the sister group of the inquiline clade consisting of Synergini (s.str.).The extended UCE analysis (Figure S7).further suggests that the Qwaqwaiini forms the sister group of Rhoophilini, and that these two groups together form the sister group of the Synergini (s.str.).(Ronquist, 1995(Ronquist, , 1999)).

Evolutionary implications
We end by discussing some alternative scenarios for the origin of cynipoid gall inducers and inquilines in the new light shed on this problem by the phylogenomic analyses and by recent findings on the life history of key groups.We aim to identify the most critical knowledge gaps and the most promising avenues towards making further progress in the quest of understanding the early evolution of cynipoids.

Transitions between phytophagous and parasitoid life cycles
The phylogenomic results suggest two alternative scenarios for the origin of gall inducers and inquilines from insect-parasitic ancestors, both appearing among the reconstructions presented by Blaimer et al. (2020).One scenario (independent phytophagy) assumes that the parasitoid life history traces back to the ancestral cynipoid (Figure 10a).If so, then herbivory (inquilinism/gall induction) must have originated twice from such ancestors (in Diplolepididae and Cynipidae s. str.).
The other scenario (parasitoid reversal) assumes instead that it is the phytophagous habit that traces back to the cynipoid ancestor (Figure 10b).If this scenario is correct, then parasitoids must have evolved twice independently from phytophagous ancestors (in Paraulacidae and Figitidae s. lat.).
The evidence for or against these scenarios hinges critically on the life history and phylogenetic placement of two lineages, the Paraulacidae and Eschatocerini.The Paraulacidae are associated with Nothofagus galls induced by the chalcidoid genus Aditrochus, but it has been unclear whether they are phytophagous inquilines or parasitoids of some other gall inhabitant (Nieves-Aldrey et al., 2009;Ronquist et al., 2015), which has added considerable uncertainty to the reconstruction of evolutionary scenarios (Blaimer et al., 2020;Ronquist et al., 2015).Interestingly, recent genetic analyses have provided a case where the genome of a paraulacid, Cecinothofagus ibarrai, was retrieved from a larva of Aditrochus coihuensis Ovruski, together with the Aditrochus coihuensis genome (Rasplus et al., 2022).This suggests that the Paraulacidae are not only parasitoids but they are also likely koinobiont endoparasitoids in early larval instars, like all other insectparasitic cynipoids in the sister group of Cynipidae (s.str.) including Parnipinae, Ibaliidae and numerous Figitidae (Ronquist, 1999, and references cited therein;Ronquist et al., 2018).Thus, if the parasitoid reversal scenario is correct, a complex koinobiont life history must have evolved twice independently in the Cynipoidea from phytophagous ancestors, which appears unlikely if not entirely impossible.
Another important piece in the puzzle is the life history of the  F I G U R E 1 0 Two possible scenarios for the origin of major life history types in the Cynipoidea.(a) Independent phytophagy scenario.The ancestor of cynipoids was a koinobiont endoparasitoid (at least in early instars) of gall-inducing insects (orange lineages, origin of koinobionts marked with "K").The ancestral life history persists today in the Paraulacidae and basal lineages of Figitidae (s.lat.), like the Parnipinae.Gall inducers and inquilines originated twice from these koinobionts of gall insects ("G").(b) Parasitoid reversal scenario.The koinobiont endoparasitoids of gall insects ("K") evolved independently in the Paraulacidae and Figitidae (s.lat.), possibly in both cases from phytophagous gall inducers and inquilines ("G").In both scenarios, advanced figitid lineages (in red) remained koinobiont parasitoids of insects but colonised hosts in other environments.
excluded, in which case the independent phytophagy scenario (Figure 10a) would gain additional support.
Interestingly, both Diplolepididae and Cynipidae (s.str.) include species whose genomes encode plant cell wall degrading enzyme genes (Hearn et al., 2019).These may have been acquired from an herbivorous shared common ancestor, or alternatively they may be essential components of cynipid herbivory that have been acquired convergently during independent evolution of galling lifestyles.Analyses of whether such complex genomic features associated with the two different life histories are likely to have a shared history or separate origins provides one of the most promising ways of distinguishing between the two possible scenarios.
F I G U R E 1 1 Three possible scenarios for the origin of gall inducers (green lineages) and inquilines (blue lineages) in the Cynipidae (s.str.).The tree is based on the analysis presented here, augmented with additional taxa (preceded with '*') based on the extended UCE analysis (Figure S7) and other recent analyses (Blaimer et al., 2020;Lobato-Vila et al., 2022;Ronquist et al., 2015).Eschatocerus was excluded from the tree because of uncertainty concerning its life history.Added clades with members from more than one genus marked with '+'.One of the most parsimonious reconstructions, weighting transitions equally, largely agrees with the gallers-first scenario but suggests that Synergus itoensis and related species regained the ability to induce galls on their own.

Transitions between gall-inducing and inquiline life cycles
The evolution of gall inducers and inquilines is clearly more complex than the origin of phytophagy, even though inquilines are only known from the Cynipidae (s.str.).If the transitions between gall inducers and inquilines were always in one direction, there would be two alternative scenarios (see also Blaimer et al., 2020;Ronquist et al., 2015).
If gall inducers always evolved from inquilines (inquilines-first scenario), then gall inducers must have evolved at least seven times independently, taking all available phylogenetic evidence into account (Figure 11a).At the other extreme, if all inquilines evolved from gall inducers (gallers-first scenario), there would have been at least 11 independent origins of inquiline cynipids from gall inducers (Figure 11b).
Thus, if we assume that transitions have always been in one direction, then the inquilines-first scenario appears more likely (Figure 11a).However, the only transition that appears to be clearly supported by phylogenetic evidence at this point is the origin of gall induction by Synergus itoensis and close relatives from inquiline ancestry within the Synergini (s.str.), because the gall inducers are so deeply nested within inquiline lineages (Ide et al., 2018).If we are willing to assume that this represents a reversal from an inquiline life cycle, then the gallers-first scenario provides a more parsimonious explanation of the remaining transitions (Figure 11c; note that there are alternative reconstructions with the same total number of changes but with more independent origins of gall inducers).
Which hypothesis is better supported depends crucially on the relative ease (in evolutionary terms) or weight (in terms of inferred state changes) of transitions between the alternative states of gall induction and inquiline life cycles (Stone & French, 2003).While both gall inducers and inquiline cynipids can cause the development of nutritive gall tissues on which the larvae feed, only true inducers can cause the development of gall tissues de novo, and the development of the structurally complex outer gall tissues that characterise many cynipid galls.If it is easier to transition from full gall induction to a simpler inquiline life history than vice versa, then a gallers-first scenario may be more likely a priori.Alternatively, it might be a relatively minor step in evolutionary terms for cynipids to transition from inquilinism to becoming a gall inducer.We currently know too little about the differences between these alternative life histories to provide any clear weighting of transition probabilities between them, beyond suspecting that unweighted parsimony may potentially be an unreliable guide.
While the evolution of gall induction in Synergus itoensis shows that gall induction can evolve from inquilinism, the galls they induce consist only of nutritive tissues and lack morphologically complex nonnutritive tissues.Some Synergini inquilines do modify the complex gall morphology of host galls usurped at a very early stage in their development (Pénzes et al., 2009), but no case is yet known of a shift from inquilinism to gall induction that also includes ability to induce complex gall phenotypes.
Again, the life history of some key taxa is important in weighing these alternative scenarios.Demonstration that Eschatocerini are inquilines would strengthen the inquilines-first scenario, while demonstration that they are true gall inducers would strengthen support for the gallers-first scenario.The Ceroptresini are commonly assumed to be inquilines, but detailed studies of their life history are sorely lacking, and there is one report claiming that members of Ceroptres are true gall inducers (Blair, 1949).The Qwaqwaiini is another taxon for which more detailed life history information would be valuable.
According to the only existing report, it is a gall inducer (Liljeblad et al., 2011), but it remains possible that it could be an inquiline, like most members of the Synergini (s.str.) and Rhoophilini.Such a demonstration would strengthen support for the inquilines-first scenario by removing one of the independent origins of gall inducers.That inquilines can easily originate from gall inducers is suggested by observations of facultative intraspecific inquilinism in Diastrophus (Diastrophini) (Pujade-Villar, 1984), and it has also been suggested that the remarkable parallelisms between the Aulacideini and Phanacidini in the evolution of host plant preferences could be due to facultative or obligate inquilinism among some cynipid herb gallers (Nieves-Aldrey et al., 2004;Ronquist & Liljeblad, 2001).Finally, we note that an ancestral state for inquilinism in Cynipidae (s.str.) requires that the ancestral host was not itself a cynipid gall inducer.
While rare examples of inquiline cynipids developing in non-cynipid galls are known (Askew, 1999;van Noort et al., 2007), it is notable that the vast majority of inquiline cynipids develop in cynipid galls.
Transitions between strikingly different life histories, such as those between koinobiont endoparasitoids, gall inducers and inquilines in cynipoids, should have major effects on genomes.For instance, transitions to or from a koinobiont endoparasitic life history should involve recruitment or loss of a swathe of genes or gene functions associated, for example, with suppressing or evading host immune systems, maintaining basic physiological functions within a host body and adjusting larval development and feeding patterns so that the host larva survives and develops normally as long as possible.
This should be noticeable as an unusual number of protein-coding genes with markedly increased or decreased rates of non-synonymous rates of evolution along branches of the phylogeny involving life history changes.Similarly, the genes undergoing unusual amounts of change should also belong to particular functional categories.Transitions between gall inducers and inquilines may be less dramatic but should nevertheless leave similar genomic signatures.A recent study suggests that this is indeed the case for the transition from inquilines to gall inducers in species related to Synergus itoensis (Gobbo et al., 2020).Whether such genomic signatures of life history transitions can be detected deeper down in the cynipoid tree remains unclear.However, this is clearly a possibility that is well worth investigating, and the genomic data reported here represent a first step in supporting such a line of research.Note that the combined analysis strongly supports a sister-group relationship between Qwaqwaiini and Rhoophilini, and a sister-group relationship between Eschatocerini and remaining Cynipidae (s.str.), and that Aylacini (s.str.) is not closely related to Phanacidini + Aulacideini.

AUTHOR CONTRIBUTIONS
Table S1.Species metadata and assembly statistics for the genomes and transcriptomes used in this study.Columns contain information on collection location where available, whether the data was assembled here or previously, and links to raw read data.Genome/ transcriptome assembly quality metrics include complete and partial BUSCO (version 4.0.6)scores for both Hymenoptera and Eukaryota datasets respectively, assembly size, N50 and number of contigs, repeated for contigs >200 bp.How to cite this article: Hearn, J., Gobbo, E., Nieves-Aldrey, J.L., Branca, A., Nicholls, J.A., Koutsovoulos, G. et al. (2024) Phylogenomic analysis of protein-coding genes resolves complex gall wasp relationships.Systematic Entomology, 49(1), 110-137.Available from: https://doi.org/10.1111/syen.12611 set of cynipoids.Their taxon set includes representatives of all families except the Austrocynipidae and spanned a significant amount of the known diversity within each family.Several surprising results emerged from this UCE analysis (Figure1b).First, the Liopteridae and Ibaliidae were placed within the Figitidae (s.lat.), among the earlydiverging gall-associated lineages.Second, the Paraulacini and the Diplolepidini + Pediaspidini were placed outside the clade formed by the Figitidae and the remaining Cynipidae (the Cynipidae s. str.), a Figitinae Melika in Karimpour, Tavakoli & Melika Gall inducer on Tragopogon L. (Asteraceae) NG Fumariphilus hypecoi (Trotter) Gall inducer on Hypecoum L. (Papaveraceae) NG Hedickiana levantina (Hedicke) Gall inducer on Salvia L. (Lamiaceae) NG Isocolus centaureae Dyakonchuk Gall inducer on Centaurea L. (Asteraceae) NG Ayalacini (s.str.)Iraella hispanica Nieves-Aldrey Gall inducer on Papaver L. (Papaveraceae) NG Ceroptresini Ceroptres masudai Abe Inquiline in Andricus Hartig galls on Quercus L. NG Phylogenetic relationships according to IQ-Tree analyses of different data subsets under the C60 + I + G5 substitution model.Support values (ultrafast bootstrap method in %) are shown on branches only if they are less than 100%.(a) Analysis of the 31 genes present in all 37 taxa (dataset T37-G31).(b) Analysis of the 123 genes present in 36 or more taxa (T36-G123).(c) Analysis of the 296 genes present in 35 or more taxa (T35-G296).(d) Analysis of the 542 genes present in 34 or more taxa (T34-G542).Note that the smallest dataset (a) results in strong support for Cynipini monophyly (blue clade).
U R E 3 Phylogenetic relationships according to PhyloBayes analyses of the two smallest data subsets under the CAT-F81 model.Support values (posterior probability in %) are shown on branches only if they are less than 1.0.(a) Analysis of the 31 genes present in all 37 taxa (dataset T37-G31).(b) Analysis of the 123 genes present in 36 or more taxa (T36-G123).Despite the more sophisticated CAT-F81 model, which learns the amino acid profiles from the data, the results are virtually identical to the corresponding results of IQ-Tree (Figure 2a,b).Cynipidae tribes Diplolepidini + Pediaspidini (represented by Diplolepis and Pediaspis) and Paraulacini (represented by Cecinothofagus Nieves-Aldrey & Liljeblad) lineages both fall outside a core cynipid clade that apparently constitutes the sister group of the Figitidae.In some analyses, the Eschatocerini (represented by Eschatocerus) also fall outside this clade.
Phylogenetic results for six equally-sized, quality-ranked subsets of the T34-G542 dataset, analysed using IQ-Tree under the C60 + I + G5 model.The raw alignments were subjected to filtering and quality ranking by Gblocks.Support values (ultrafast bootstrap in %) are shown on branches only if they are less than 100%.(a) Less than 13% of sites filtered out (best quality).(b) From 13% to 26% filtered out.(c) From 26% to 37% filtered out.(d) From 37% to 47% filtered out.(e) From 47% to 59% filtered out.(f) More than 59% filtered out (worst quality).The three best subsets (a-c) yield congruent results except for the position of Eschatocerus, which varies slightly but without strong conflict in support values.All have monophyletic Cynipini (blue lineages), and none of them group Phanacidini, Aylacini and Eschatocerini with each other or with Protobalandricus, as seen in some of the poor-quality data subsets (d, f).
For the best possibility of detecting model-related long-branch attraction effects, we used PhyloBayes and the CAT-F81 model.Branch support values (posterior probability in %) are only shown if they are less than 1.0.(a) Analysis of the full taxon set.(b) Cecinothofagus excluded.(c) Eschatocerus excluded.(d) Outgroups excluded.(e) Cecinothofagus and Eschatocerus excluded.(f) Eschatocerus and outgroups excluded.Regardless of taxon exclusion, the relationships among the included lineages remain identical to those in the full analyses, except for a slight variation in the position of Eschatocerus when outgroups are excluded (d). Figitidae
It is more difficult to exclude the possibility of shortcomings in the substitution model used for probabilistic inference resulting in artificial long-branch attraction.Resolving cynipoid relationships involves determining the branching order of several long-branch taxa (i.e., groups linked to other members of the taxon set by a long, nondividing branch inserting deep in the phylogeny), including the Eschatocerini, Paraulacidae and Diplolepididae.The problem is accentuated by the long evolutionary distance between known cynipoid and outgroup genomes.

i.
Division of Cynipidae into three families and placement of Eschatocerini: Both studies support division of the Cynipidae into three separate lineages-recognised here as the families Paraulacidae, Diplolepididae and Cynipidae (s.str.).However, the evidence on the placement of Eschatocerini is slightly different.Our analysis suggests that the Eschatocerini belong to the Cynipidae (s.str), Figitidae (a) Inquilines-first scenario.Gall inducers evolved repeatedly from inquilines, which always represent an intermediate stage in the origin of true gall inducers.(b) Gallers-first scenario.In this scenario, inquilines always represent gall inducers that have lost the ability to initiate galls.(c) Parsimonious galler-first scenario.

Figure S4 .
Figure S4.Testing the potential effect of long-branch taxa on phylogenetic results using IQ-Tree under the C60 + I + G5 model rather than PhyloBayes.Otherwise, this is an identical replicate of the analyses shown in the Main Text (Figure 4).Branch support values (ultrafast bootstrap) are only shown if they are less than 100%.(A) Analysis of the full taxon set.(B) Cecinothofagus excluded.(C) Eschatocerus excluded.(D) Outgroups excluded.(E) Cecinothofagus and Eschatocerus excluded.(F) Eschatocerus and outgroups excluded.Results are identical to those obtained with PhyloBayes and the CAT-F81 model (Figure 4), except a minor and weakly supported difference in the positions of Ceroptres and Iraella when Eschatocerus and outgroups are excluded (F).

Figure S5 .
Figure S5.Results from gene tree concordance analysis.(A) Gene tree consistency indices.(B) Gene tree concordance factors.(C) Site concordance factors.The shading of nodes corresponds to the certainty or concordance factor values, with darker shades corresponding to higher values.

Figure S6 .
Figure S6.Comparison of IQ-Tree C60 analyses using different criteria for taxon completeness and alignment quality.(A) Analysis of the T34-G532 dataset, including the alignments with less than 26% of site columns removed by Gblocks.(B) Analysis of the T30-G1898 dataset, including the alignments with less than 10% of site columns removed by Gblocks.The latter analysis provides notably less robust resolution of the Qwaqwaiini (66% vs. 98% bootstrap support) and the Aulacideini (90% vs. 96% for a different clade).The positions of Ceroptresini and Eschatocerini remain uncertain in both analyses.

Figure S7 .
Figure S7.Results from the IQ-Tree analysis of the combined UCE data from this study and that of Blaimer et al. (2020).Branch support values (ultrafast bootstrap) are only shown if they are less than 100%.

Table 2
).Our Cynipini selection included Protobalandricus spectabilis, inducing galls on Quercus section Protobalanus (Trelease) A. Camus oaks in California and seven other species from diverse Cynipini genera: Andricus Hartig, Belonocnema Mayr, Biorhiza Westwood, Druon Kinsey and Neuroterus Hartig.In the Aulacideini, we included two gallers of Asteraceae (Isocolus centaureae Dyakonchuk and Aulacidea tavakolii Melika), one galler of Lamiaceae (Hedickiana levantina (Hedicke)) and one galler of Papaveraceae (Fumariphilus hypecoi (Trotter)), thus covering much of the diversity in host plant preferences in the group.The last species, F. hypecoi, was originally placed in the genus Aylax Hartig, but has been known for some time to belong to the Aulacideini (Nieves-Aldrey, 2022; Ronquistet al., 2015)rather than the Aylacini (s.str).Our Diastrophini selection included one inquiline (Periclistus) and one gall inducer (Diastrophus), covering both of the major life history strategies in the tribe.Our Synergini (s.str.) selection was unfortunately restricted to the most species-rich genus, Synergus Hartig, but it included both a gall inducer (S. itoensis) and three inquilines (S. gifuensis Ashmead, S. japonicus

Table S2 .
Completeness of different BUSCO gene sets per taxon.The table gives the number of genes of each gene set present in each taxon, as well as the proportion of the total set being present in the taxon.Taxon names correspond to those used in the datasets.Note that, because of taxonomic changes or incorrect annotations in public databases, Beloc_tr is used for Belonocnema kinseyi, Calli_sp for Neuroterus valhalla and Andr_qln for Druon quercuslanigerum.