The Chalcidoidea bush of life: evolutionary history of a massive radiation of minute wasps

Chalcidoidea are mostly parasitoid wasps that include as many as 500 000 estimated species. Capturing phylogenetic signal from such a massive radiation can be daunting. Chalcidoidea is an excellent example of a hyperdiverse group that has remained recalcitrant to phylogenetic resolution. We combined 1007 exons obtained with Anchored Hybrid Enrichment with 1048 ultra‐conserved elements (UCEs) for 433 taxa including all extant families, >95% of all subfamilies, and 356 genera chosen to represent the vast diversity of the superfamily. Going back and forth between the molecular results and our collective knowledge of morphology and biology, we detected bias in the analyses that was driven by the saturation of nucleotide data. Our final results are based on a concatenated analysis of the least saturated exons and UCE datasets (2054 loci, 284 106 sites). Our analyses support an expected sister relationship with Mymarommatoidea. Seven previously recognized families were not monophyletic, so support for a new classification is discussed. Natural history in some cases would appear to be more informative than morphology, as illustrated by the elucidation of a clade of plant gall associates and a clade of taxa with planidial first‐instar larvae. The phylogeny suggests a transition from smaller soft‐bodied wasps to larger and more heavily sclerotized wasps, with egg parasitism as potentially ancestral for the entire superfamily. Deep divergences in Chalcidoidea coincide with an increase in insect families in the fossil record, and an early shift to phytophagy corresponds with the beginning of the “Angiosperm Terrestrial Revolution”. Our dating analyses suggest a middle Jurassic origin of 174 Ma (167.3–180.5 Ma) and a crown age of 162.2 Ma (153.9–169.8 Ma) for Chalcidoidea. During the Cretaceous, Chalcidoidea may have undergone a rapid radiation in southern Gondwana with subsequent dispersals to the Northern Hemisphere. This scenario is discussed with regard to knowledge about the host taxa of chalcid wasps, their fossil record and Earth's palaeogeographic history.


Introduction
Chalcidoidea (jewel wasps, chalcidoid wasps, hereafter called chalcid wasps) are among the most speciesrich, ecologically important, biologically diverse and morphologically disparate groups of terrestrial organisms.These minute wasps (mostly 0.5-2 mm in size) are ubiquitous in almost every terrestrial habitat on Earth.Their diversity is staggering with 2731 genera and 27 021 species placed in 27 families and 87 subfamilies (Table S1a) with estimates of >500 000 species (Heraty, 2009;Noyes, 2019).Although most species are parasitoids, phytophagous species are known from 11 families (B€ ohmov a et al., 2022;Burks et al., 2022).Their animal host range includes 13 insect orders, spiders, ticks, mites, pseudoscorpions and even gallforming nematodes (Austin et al., 1998;Gibson et al., 1999).Chalcid wasps attack all life stages of their hosts from eggs to adults, as internal or external parasitoids, and they can be primary parasitoids, hyperparasitoids (parasitoids of parasitoids) or even tertiary parasitoids (parasitoids of hyperparasitoids).The economic importance of Chalcidoidea in pest management is unparalleled and they are widely used in biological control programmes against major pests throughout the world (Noyes and Hayat, 1994;Heraty, 2009).
A few studies have addressed higher-level relationships within Chalcidoidea, although with only a sparse sampling of genes (Sanger data sets) or taxa (NGS datasets).Munro et al. (2011) used 18S + 28S ribosomal DNA for 649 species of Chalcidoidea in 19 families and 343 genera; Heraty et al. (2013) used a combination of 18S + 28S + morphology for 283 species in 19 families and 268 genera; Peters et al. (2018) analysed 3239 genes from transcriptomes for 37 species in 16 families and 35 genera; and Zhang et al. (2020) extended the transcriptome dataset to 5591 genes for 55 species in 17 families and 48 genera.Despite these efforts, the higher-level relationships of Chalcidoidea remain largely unresolved.Genome-scale data (transcriptomes) have proven particularly frustrating, presumably because of the lack of signal associated with an old, rapid radiation and the increasing probability of observing conflicting signals between markers (Peters et al., 2018;Zhang et al., 2020).When chalcid wasps are included in studies of the Hymenopteran treeof-life, conflicts or lack of signal that are reflected in poor statistical support of (some) nodes are highlighted in all datasets: Branstetter et al. (2017) 854 ultraconserved elements (UCEs), nine species of Chalcidoidea in nine families and nine genera; Peters et al. (2017) 3256 protein coding genes, six species in six families and six genera; and Tang et al. (2019) mitochondrial genomes, seven species in six families and seven genera.Blaimer et al. (2023) with 1100 UCEs addressed the largest sampling of Chalcidoidea for genomic data with 148 species in 23 families and 142 genera, but the results within Chalcidoidea were still poorly resolved, with several families not monophyletic.With the scarce taxonomic sampling of previously published phylogenomic datasets, it has not been possible to test the monophyly of all chalcid families or subfamilies that were questioned by Sanger datasets or morphology (Munro et al., 2011;Heraty et al., 2013).These earlier studies identified a number of incertae sedis taxa that can range from a single species to a clade.These lineages are never included within defined (sub)families and defy accurate placement in the resulting trees.Is the lack of definitive placement the result of limited sampling of taxa or genetic data, or could these represent the survivors of independent, possibly old evolutionary lineages, that still need to be resolved.More importantly, reduced taxonomic sampling coupled with poor resolution of phylogenetic trees limits our understanding of the drivers that fuelled this outstanding diversity of life forms and biologies across space and time.This overview of previous works suggests that the early evolution of Chalcidoidea is likely to represent a difficult phylogenetic problem, and to date, molecular data alone have not helped to resolve many of the problems.A major collaborative effort to provide a resolved morphological tree for the superfamily (233 morphological characters scored on 283 species in 19 families; Heraty et al., 2013) did recover several family-level groups not found in the earlier analysis of ribosomal markers alone, but still failed to recover families or relationships that are considered to be wellsupported or provided controversial results.
Within Chalcidoidea, when phylogenomic studies focused on smaller taxonomic units, the results were either in strong agreement with morphology, behaviour or biogeographic hypotheses (e.g.Baker et al., 2020;Rasplus et al., 2020), or strongly conflicting on some areas of the tree with intuitive and previously supported hypotheses (e.g.Cruaud et al., 2021;Zhang et al., 2022).When there is conflict, there is always the possibility that properties of the genomic data or confounding signal in morphological data may be affecting the results.Only with a thorough analysis and vetting of the data can we begin to understand the interplay between systematic bias and morphological convergence.
While genomic data offer great promise to resolve the tree of life, analytical challenges must be overcome.When more markers are analysed, the probability of observing conflicting signal among them increases (Kumar et al., 2012;Philippe et al., 2017;Zhang et al., 2020Zhang et al., , 2022)).Highly supported trees can be inferred that, without feedback from taxonomists, are considered to be accurate even though inferences may be flawed (Wiens, 2004;Zhang et al., 2022).Heterogeneity in base composition and in evolutionary rates inferred for both taxa and markers are major causes of analytical bias in phylogenomic analyses (e.g.Boussau et al., 2014;Romiguier and Roux, 2017;Borowiec et al., 2019;Rasplus et al., 2021).Recent studies have brought to light an old nemesis, mutational saturation (Philippe and Forterre, 1999), as a potentially major source of errors, especially for deep-time inferences (Borowiec et al., 2015;Borowiec, 2019;Duchêne et al., 2022).Ideally, to detect and reduce inference biases, analysed matrices should be rich both in taxa (Heath et al., 2008) and molecular characters of different origins (e.g.coding vs. noncoding; Reddy et al., 2017).They should also be mixed with or interpreted in the light of morphology, natural history and other data types (Wiens, 2004), which also can be misleading and must be interpreted cautiously.Thorough analysis of very large datasets that implement many proof-checking steps is computationally intensive.In addition, it is still impossible to perfectly describe evolutionary processes with mathematical models, which inevitably introduces bias (Kumar et al., 2012;Reddy et al., 2017).When independent data types do not converge towards the same results, molecular trees should certainly be acknowledged as valuable contributions, but considered only as hypotheses, instead of being hailed as "the resolved tree of life".
In this study, we brought together taxonomists and museum curators to assemble a taxon-and markerrich dataset [exons (protein coding genes) + UCEs and their flanking regions] for Chalcidoidea.To find our way through a forest of phylogenetic trees, we evaluated topologies obtained from each genomic data type in the light of our knowledge of chalcid morphology and natural history.Taxa or groups for which we inferred unlikely relationships were used to detect and reduce potential bias in the molecular datasets.In particular, we evaluated results for five specific benchmark relationships.Departure of results from benchmarks was not necessarily regarded as "wrong" but it was an indication that analytical issues needed to be carefully examined.(i) Basal relationships of Mymaridae and Rotoitidae.This has been proposed based on morphological characters (Gibson and Huber, 2000;Heraty et al., 2013).It should be noted that during the preparation of this manuscript, it was brought to our attention that the name Baeomorphidae is a senior synonym of Rotoitidae.Here, both names are used in text and figures up to the discussion section in which only Baeomorphidae as well as new family/subfamily names resulting from this study are used.(ii) Monophyly of Eurytomidae, including Heimbrinae (Lotfalizadeh et al., 2007;Gates, 2008).(iii) Monophyly of Chalcididae including Cratocentrinae; although somewhat difficult to demonstrate with molecular data alone (Cruaud et al., 2021), this is considered to be a well-supported family based on several morphological synapomorphies (Wijesekara, 1997;Heraty et al., 2013;Cruaud et al., 2021).(iv) Monophyly of Eurytomidae + Chalcididae (Cruaud et al., 2021).(v) Monophyly of the Eupelmidae in the broad sense (including Neanastinae and Metapelmatinae), which although questioned in the past (Gibson, 1989), has been proposed based on a number of derived and correlated morphological features associated with the ability to jump (Gibson, 1986b(Gibson, , 1989(Gibson, , 2008;;Heraty et al., 2013).(vi) Monophyly of Agaonidae including Sycophaginae, which is supported by a complex of morphological and life-history traits (Bou cek, 1988;Heraty et al., 2013), and their association with figs.In essence, these benchmarks were used in a process of reciprocal illumination to assess the results of our current molecular and morphological hypotheses (Hennig, 1966;Mooi and Gill, 2016).
Finally, we have used the combined exons + UCEs least biased dataset to provide the foundation for a new classification of the superfamily that was recently published (Burks et al., 2022).Here we also discuss the evolutionary history of Chalcidoidea and infer a timeline for its origin and worldwide colonization.

Taxonomic sampling
Representatives of all extant families, 80 extant subfamilies (95.2%), 68 extant tribes (77.3%) and 356 genera (13%) of chalcid wasps were included.Representatives of eight incertae sedis taxa at the suprafamilial or tribal levels, and of one new subfamily were also included.Our combined analysis included a total of 433 taxa (414 ingroups, 19 outgroups; Table S1b), of which 414 had sequences for exons [anchor hybrid enrichment (AHE) sequences] while 407 had sequences for UCEs.Exons and UCEs were obtained from the same species in 57% of the ingroup taxa, and congeneric specimens (monophyletic genera) were used in the remaining 43%.For seven taxa (four outgroups, three ingroups), exons and UCEs were obtained from species in different genera (all very closely related for ingroups).Our outgroups include a diverse array of Proctotrupomorpha, including Platygastroidea (two genera), Cynipoidea (five genera), Proctotrupoidea (two genera), Diaprioidea (five genera) and Mymarommatoidea (two genera).These outgroups form a grade in which the monophyletic Chalcidoidea was embedded in all recent analyses of Hymenoptera relationships (Heraty et al., 2011;Klopfstein et al., 2013;Branstetter et al., 2017;Peters et al., 2017;Blaimer et al., 2023).
The supplementary documents also include results obtained with a larger exon dataset of 520 taxa (494 ingroups, 26 outgroups; hereafter referred to as the AHE520 dataset).AHE414 is the subset of AHE taxa that could be paired with the UCE dataset (Table S1b).As compared to the AHE414 dataset, sampling within some families in the AHE520 data was larger but the same evolutionary lineages were included.We chose not to combine all of the taxa in the AHE520 dataset with the UCE dataset to (i) avoid potential issues with adding considerably more missing data, (ii) enable better comparison between properties and phylogenetic signal brought by the exons and UCEs, and (iii) decrease computational burden.Nevertheless, trees obtained with the AHE520 dataset (exons with RY coding of the third codon position and coded as amino acids) were compared with those obtained with the AHE414, the UCE, and the combined datasets.

Library preparation and sequencing
DNA extraction.DNA was extracted nondestructively using the DNeasy Blood and Tissue kit (Qiagen, Hilden, Germany) either at CBGP (Montferrier-sur-Lez, France); NMNH (Washington DC, USA) or UCR (Riverside, CA, USA), as outlined in previous work (Blaimer et al., 2016a, b;Cruaud et al., 2019;Baker et al., 2020;Zhang et al., 2020).DNA was quantified with a Qubit 2.0 Fluorometer (Invitrogen/ Thermo Fisher Scientific, Waltham, MA, USA).Vouchers were subsequently remounted on cards or slide mounted and deposited in either institution or returned to their owner (Table S1b).When possible, DNA extracts were shared among laboratories.
Exons.Exons were obtained for 520 taxa following the protocol in Zhang et al. (2022), of which 51 taxa were retrieved from previously published transcriptomes or genomes (Peters et al., 2018;Zhang et al., 2020), and 469 taxa (363 used in combined dataset) were enriched using the anchored hybrid enrichment (AHE) probe sets (Hym_Ich or Hym_Cha; Baker et al., 2020;Zhang et al., 2022).See Supporting information Methods S1 for further details.

Assembly of datasets
Exons.Assembly of loci, orthology assessment and contamination checking for the AHE520 dataset followed Zhang et al. (2022).Exons were selected for 414 taxa (AHE414) that were compatible for data combination with the UCE taxa, either as the same extraction, same species, same genus, or in three cases as closely related genera (Table S1b).However, the assembly protocol used for the AHE520 dataset was overly stringent, resulting in 28.5% of the nucleotides in the AHE414 dataset being ambiguous or missing.Therefore, raw data for the anchor-enriched taxa selected to be paired with UCEs (Table S1b) were re-processed to increase matrix completeness.Quality filtering and adapter trimming were performed with Trimmomatic-0.36 (Bolger et al., 2014) (LEADING:20 TRAILING:20 SLIDINGWINDOW:4:20 MINLEN:90).Paired reads were then analysed with HybPiper (Johnson et al., 2016) to retrieve exons, introns and supercontigs (introns-exons-introns).All exons of the AHE520 dataset (without contamination) were used as the reference database.One fasta file was then created with all sequences for each exon identified by HybPiper.However, comparison with the 989 reference AHE520 exons revealed that HybPiper exons were shorter than the reference exons.Therefore, exons were instead retrieved from HybPiper supercontigs using the longest sequence of each of the 989 exons of the AHE520 dataset as a reference.Supercontigs for each locus were aligned with the reference longest exon for that locus using LASTZ release 1.02.00 (Harris, 2007).In each of the 989 alignments, nucleotides before the 5 0 end and after the 3 0 end of the reference exons were trimmed.The final 989 loci were aligned with MAFFT and translated to amino acids.This latter step revealed that 24 reference exons contained one to two small introns.In these cases, loci were cut into coding versus noncoding sequences, and noncoding sequences were discarded.We ended up with 1007 exons (final % of ambiguous/missing nucleotides = 19.1%).
UCEs.Assembly of UCEs followed Cruaud et al. (2019).Quality control checks were performed with FastQC v.0.11.2 (Andrews, 2010).Quality filtering and adapter trimming were performed with .Overlapping reads were merged using FLASH-1.2.11 (Magoc and Salzberg, 2011) and demultiplexed with a bash custom script (Cruaud et al., 2019).Assembly of cleaned reads was performed with CAP3 (Huang and Madan, 1999) or, when the number of paired reads exceeded 300 000, with Trinity (Grabherr et al., 2011).Contigs were aligned to the set of 1432 reference UCEs that resulted from the assembly of the 2749 probes of Faircloth et al. (2015) using LASTZ.Contigs that aligned with more than one reference UCE were removed and different contigs that aligned with the same reference UCE were filtered out using Geneious R11.1.44 (https://www.geneious.com;samfiles were uploaded in Geneious and sorted by the number of contigs that aligned with the reference loci to identify multiple hits).Only UCEs for which sequences were available for >50% of the taxa were kept in the next steps of the analysis (N = 1048).
From this point, methods only refer to the AHE414 dataset that was formally compared and then combined with the UCE dataset and to the UCE dataset.Details for the methods used for treatment of the AHE520 dataset can be found in the Methods S1.

Quality controls of datasets
Alignment cleaning.Loci (exons and UCEs) were aligned with MAFFT using the -linsi option (Katoh and Standley, 2013).Exons (AHE) were translated to amino acids using EMBOSS (Rice et al., 2000) and sequences with stop codons were removed.Two successive rounds of TreeShrink (Mai and Mirarab, 2018) were performed on each locus (exons analysed as nucleotides) to detect and remove abnormally long branches in individual gene trees.The per-species mode was used and b (the percentage of tree diameter increasing from which a terminal should be removed) was set to 20.Loci were re-aligned with MAFFT after each round of TreeShrink.Gene trees were inferred with IQ-TREE v.2.0.6 (Minh et al., 2020b) with the best-fit model selected by ModelFinder (Kalyaanamoorthy et al., 2017).Positions with >50% gaps and sequences with >25% gaps were removed from the alignments of UCEs using SEQTOOLS (package PASTA; Mirarab et al., 2014) to speed up inference of individual gene trees.

Contaminations.
A BLAST search of all DNA sequences on themselves (i.e. by using sequences of all exons/UCEs for all samples as query and target sequences) was performed (blastn with -evalue 1e-20 and -max_target_seqs 2).Only hits for which the same locus was identified as both target and query sequence, and for which samples were different for target and query sequences were kept for downstream analysis.Putative contaminations were identified using a script that scored hits according to four criteria appropriate for our data (see https://github.com/mjy/cgq for details): (i) taxon difference: either subfamily or family was different between target and query sequences; (ii) proportional difference: the percentage of similarity between target and query was >99.95; (iii) proportional length difference: length of match divided by the smaller of length of the target and query was >0.95; (iv) plate similarity: target and query sequences were obtained from specimens processed on the same plate for DNA extraction/library preparation.When a criterion was met, a score of 1 was attributed, otherwise the score was 0. A composite score was calculated as the sum of criteria 1 to 4. When the composite score equalled 4 both target and query loci were considered as potentially contaminated.The ratio of query species DNA concentration to target species DNA concentration was calculated.When ratio ≦0.3 the sequence with the smaller qubit concentration was excluded from the dataset, otherwise both sequences were excluded.

Properties of taxa, loci, trees and exploration of bias
Workflow to detect and decrease bias.We focused on saturation, a property that could artificially distort relationships of an old group such as Chalcidoidea.We also explored whether properties of taxa (GC content and heterogeneity of evolutionary rates between taxa) could explain placements that were unexpected based on our six benchmark criteria (described in the introduction) or life-history traits.
We analysed data subsets that were less and less saturated, and analysed whether resulting topologies had a better fit to our benchmark criteria, contemporary classifications or biological (natural history) data.Reciprocally, morphological/biological data were re-examined to highlight convergences and assess support for unexpected relationships.Thus, to assess fit to morphological data we for example examined whether currently recognized (sub)family-level taxa were monophyletic in resulting trees.To assess fit to biological data, we determined whether clades supported by biological properties were monophyletic in resulting trees.
In order to incrementally decrease saturation, exons were analysed as three datasets: first as nucleotide sequences (exons), then with RY coding of the third codon position of each amino acid (exonsRY), and finally as amino acids (exonsAA; Table 1).RY coding of the third codon position of each amino acid was performed with the script RYplace.py(Ballesteros and Hormiga, 2016) and translation to amino acids was performed with EMBOSS.
In order to reduce saturation in the UCE dataset, nucleotide positions in each locus were kept only when they were present in ≥50%, 70% or 90% of the taxa.This resulted in UCE datasets with levels of saturation comparable to that of the concatenated exonAA dataset (Fig. 1).In addition, to avoid bias resulting from misalignment of extremities of short sequences, sequences with >25% gaps (i.e.shorter/incomplete sequences) were removed from each UCE (all trimmings with SEQTOOLS).Three corresponding UCE datasets were thus built, hereafter referred to as UCEs50-25, UCEs70-25 and UCEs90-25 (Table 1).
Calculation and analysis of properties.The R 2 of the linear regression of uncorrected p-distances against inferred distances in individual gene trees was used as a proxy for saturation, and  S2b.
*Note that the locus shared among the exons and the UCE dataset was removed from the UCE dataset before running the combined analysis.
saturation of loci was calculated as in Borowiec et al. (2015).Other properties of loci, properties of concatenated datasets and GC content of taxa (Table S2a) were calculated with AMAS (Borowiec, 2016).Long branch (LB) score heterogeneity for taxa in trees (taxon's percentage deviation from the average pairwise distance between taxa on a given tree) was used as a proxy of evolutionary rate of taxa and was calculated with TreSpEx (Struck, 2014).Statistical analyses of loci, tree and taxa properties were performed in R ( R Core Team, 2018).Hierarchical clustering of taxa based on GC content and LB scores was performed with the package cluster (Maechler et al., 2018).Strength and direction of association between variables were assessed (i) with Spearman's rank correlation using PerformanceAnalytics (Peterson and Carl, 2018) or (ii) by fitting linear models (with log-transformation of variables when relevant (Ives, 2015).Significant deviations from model assumptions (normality of residuals, homoscedasticity) and absence of highly influential data points were detected with DHARMa (Hartig, 2022) and performance (L€ udecke et al., 2021).A likelihood ratio test was used to test the significance of fixed factors.A Tukey post hoc test was used when more than two groups were compared [packages emmeans (Lenth, 2021) and multcomp (Hothorn et al., 2008)].Graphs were generated with ggplot2 (Wickham, 2016).

Combined dataset
The exonsAA and UCEs90-25 datasets that were considered as the best hypotheses (see "Results" section) were combined (Table 1).Before combination, overlap between exons and UCEs was tested with reciprocal BLAST (UCEs not trimmed with SEQTOOLS; blastn with -evalue 1eÀ20).Only a single locus was shared between datasets and it was removed from the UCE dataset before combination.
For IQ-TREE analyses, loci were merged and the resulting dataset was analysed (i) without partitioning, (ii) with one partition for each locus and (iii) with one partition for each data type (for the combined dataset only; one partition for the exons another for the UCEs).Best-fit models for each partition were selected with the Bayesian Information Criterion as implemented in ModelFinder.FreeRate models with up to ten categories of rates were included in tests for the unpartitioned exon and UCE datasets, but only common substitution models were tested when datasets were partitioned by locus.The candidate tree set for all tree searches was composed of 98 parsimony trees +1 BIONJ tree and only the 20 best initial trees were retained for NNI search.Statistical support of nodes was assessed with ultrafast bootstrap (UFBoot, Minh et al., 2013) with a minimum correlation coefficient set to 0.99 and 1000 replicates of SH-aLRT tests (Guindon et al., 2010).Gene (gCF) and site (sCF) concordance factors (Minh et al., 2020a) also were calculated.
In order to account for possible heterotachy, the combined exonsAA + UCEs90-25 dataset was also analysed with a GHOST model (Crotty et al., 2020) in IQ-TREE with four mixture classes in conjunction with the best-fit model chosen by ModelFinder that was fitted to each of the two partitions (i.e. one partition for each type of marker).Exploration of tree space and inference of statistical  1.For each panel, letters above box plots reflect pairwise comparisons of marginal means estimated from the best-fit models; distributions sharing a letter do not differ significantly.Points: raw data (Table S2a).In (e), saturation was assessed by calculating the R 2 of the linear regression of uncorrected p-distances against inferred distances in individual gene trees.Highest R 2 are for least saturated loci.The scale of the Y axis is reversed to better show decrease in saturation.(f) The convergence of trees as saturation decreases.The Y axis shows the relative RF distance between pairs of trees obtained with either the combined exons or the combined UCEs.The X axis show the absolute value of the difference between the medians of the R 2 of the linear regression of uncorrected p-distances against inferred distances in gene trees that were combined to get the compared trees [cf.(e)].Four comparisons were performed in each case as datasets were analysed with and without partitioning.
support for nodes followed those used with mixed models.Analyses with the CAT-GTR model to deal with substitutional heterogeneity were attempted with Phylobayes-MPI v.1.9(Lartillot et al., 2013) but were computationally not tractable as expected from previous reports (PhyloBayes-MPI 1.9 manual: https://github.com/bayesiancook/pbmpi and Whelan and Halanych (2017); memory issues with all tests using ≤96 CPUs and 192Go for a single chain).
For parsimony analyses with TNT, we used New Technology Search with default Sectorial Search, Ratchet, Drifting and Tree Fusing search criteria, and finding the minimum length ten times.We only analysed the final datasets: combined exonsAA+UCEs90-25 [433 taxa], UCEs90-25 [407 taxa], exonsAA [414 taxa] using the appropriate amino acid (prot) or DNA partition.Bootstrap support was assessed using absolute frequencies and 1000 replicates.

Divergence time estimates
Time-calibrated trees were generated with MCMCtree (Yang and Rannala, 2006).Twenty-one fossils (Table S3a) were used as calibration priors (see for rationale for selecting calibrations).Using lognormal or even normal priors may excessively restrict the breadth of posteriors (e.g. Brown and Smith, 2018).Following previous studies (e.g.Cruaud et al., 2012a;Tong et al., 2015;Blaimer et al., 2023) and given the large uncertainty for the Chalcidoidea fossil record, uniform distributions were used as calibration densities.Analyses were run with uncorrelated relaxed clock models.The combined IQ-TREE tree (partitioning by data type) was used as the input tree.Five datasets, each composed of 10 000 amino acid sites randomly selected (custom script in https://github.com/acruaud/saturniidae_phylogenomics_2022; Rougerie et al., 2022) from the exonsAA partition +10 000 nucleotide sites randomly selected from the UCEs90-25 partition, were used as sequence data to make computation tractable.Each dataset was partitioned into two partitions, exonsAA (WAG + G model) and UCEs (GTR + G model).Four chains were run for each dataset; 20 000 generations were discarded as burnin and chains were run for 2 million generations with sampling every 100 generations.Convergence was assessed in Tracer (Rambaut et al., 2018).Possible conflicts between priors and data were assessed by running MCMCtree without sequence data.Posterior estimates obtained with the different datasets were compared and combined with LogCombiner 2.6.0 (Bouckaert et al., 2019).

Historical biogeography
Distributions of species in each genus were mined from the Universal Chalcidoidea Database (Noyes, 2019).Occurrences were double-checked by the authors who are experts in each group.Species used as biocontrol agents or accidentally introduced with their host (plant or insect; Rasplus et al., 2010;Noyes, 2019) were removed.Genera were scored as present/absent in the six following biogeographical areas: Neotropical, Nearctic, Afrotropical, Palaearctic, Oriental and Australasian (Table S4a).The chronogram built with MCMCtree was used as input, but only one specimen per genus was included and outgroups were pruned to avoid artefacts.We acknowledge that a species-level analysis of ancestral areas would have been more appropriate than at the genus-level, because by using the genus-level the results could be biased towards more widespread ancestors (BioGeoBears manual).However, in old, hyperdiverse and widely distributed groups, covering the entire taxonomic and geographical diversity is impossible.Instead of keeping single or a couple of species that were randomly sampled and could bias results towards too restricted areas we chose to use generic distribution as input.Ancestral area estimations were performed in the R package BioGeoBEARS 1.1.1(Matzke, 2014).Dispersal-Extinction-Cladogenesis (DEC; Ree and Smith, 2008), BAYAREALIKE and DIVALIKE (Ronquist, 1997) models were used with and without considering the jump parameter for founder events (+J; Matzke, 2014).Model selection was performed based on statistical (AICc; Matzke, 2022) and nonstatistical (i.e.biological and geographical) considerations (Ree and Sanmartin, 2018).The maximum number of areas that a taxon could occupy was set to 6.To account for the main geological events that occurred during the diversification of Chalcidoidea, we defined five time periods with different dispersal rate scalers: (i) from their mean crown age to 145 Ma (Jurassic); (ii) 145 to 100 Ma (Early Cretaceous); (iii) 100 to 66 Ma (Late Cretaceous); (iv) 66 to 23 Ma (Palaeogene); and (v) 23 Ma to present (Table S4b).

Exploration of biases
The initial set of exons (AHE414 exons as nucleotides; Table 1) had a high number of loci recovered across taxa and was less saturated and less GC rich than the initial set of UCEs (UCEs50-25) that, in comparison, contained longer loci and more parsimony informative sites (Fig. 1; Table S2a).All of our phylogenetic analyses from either exons or UCEs returned somewhat similar results (Figs 2, 3 and S1; Appendix S1).Chalcidoidea was always monophyletic, as were many of the currently recognized families and subfamilies.
However, in the most saturated datasets (exons as nucleotides and UCEs50-25), there was overall lack of support for many of our benchmark relationships, with discordance in (i) the placement of Mymaridae or Rotoitidae (=Baeomorphidae) as sister to the remaining Chalcidoidea; (ii-iv) Eurytomidae and Chalcididae were neither monophyletic or grouped together; (v) Eupelmidae were not monophyletic; and (vi) Sycophaginae did not group with Agaonidae (Fig. 2).Clustering analyses of taxa properties (GC content and LB scores) did not provide evidence that these initial unexpected placements were driven by compositional bias or long branch attraction (Fig. S2; Table S5).Indeed, monophyletic groups with different GC content as well as polyphyletic groups with similar LB scores were recovered in the different trees.
As we reduced saturation (Fig. 2), we began to see some of our benchmarks stabilize in the results, such that (i) Mymaridae were sister to the remaining Chalcidoidea, (ii) Eurytomidae were monophyletic; (iii-iv) Chalcididae + Eurytomidae were monophyletic, but Chalcididae were paraphyletic with Cratocentrinae sister to the remaining Chalcididae + Eurytomidae.Other topological changes that may be attributed to reduction of mutational saturation are listed in Table S6.2013; 300 taxa).Vertical red bars with an X were not recovered as monophyletic in that analysis.Faded colour bars represent that the clade was included but relationships alternated.P indicates paraphyletic lineages.Clades without an X or bar were supported; the lack of a bar indicates the clade was supported but the deeper relationships were not.Higher group names refer to the classification before Burks et al. (2022).Family abbreviations expanded in Table S1.S2c) showed that trees obtained from the likelihood analyses of the exons and UCEs were the most similar when saturation was the lowest (i.e. with exonsAA and UCEs90-25).They also were more concordant with most, but not all, of our benchmark criteria.The exonsAA and UCEs90-25 datasets were thus combined to produce a phylogenetic tree on which phylogenetic relationships were discussed and dating analyses were performed.

Phylogenetic relationships
Differences were observed when the combined dataset was partitioned by data type (AA vs. UCEs) or by locus in IQ-TREE, but they were all observed in poorly supported sections of the topology (Fig. S1).The GHOST model did not improve the topology and one higher-level grouping disappeared (the "Tiny Wasp"; Fig. S1).Given that short loci generated numerical instability for the estimation of model  S2d).(c) Comparison of branch length for the backbone nodes and other ingroup nodes.Points: raw data (Table S2c).S1 for complete information on sampling) with successive grey and white boxes grouping the tip labels.The new familial classification from Burks et al. (2022) is shown to the right.For clarity, ancestral ranges are given only up to family level and only for the BAYEAREALIKE + J model (which was selected by AICc).All inferences of ancestral ranges are provided in Fig. S5.Inferences of ancestral ranges were conducted with only one specimen per genus as shown with brackets that connect tips.Current distribution of genera is shown with coloured boxes at tips.Sampling area of specimens is indicated in tip labels.NEO = Neotropical; NEA = Nearctic; AFR = Afrotropical; PAL = Palaearctic; ORI = Oriental; AUS = Australasian.UKN = Unknown when collection data are unavailable.Stars indicate that specimens were sampled in areas where species was introduced or not yet cited.Sampling area for the specimen used for sequencing exons is listed first, sampling area for the specimen used for sequencing UCEs is listed second; n.a. is used when no specimen was sequenced and only one sampling area is reported when exons and UCEs were obtained from specimens sampled in the same areas (or from the same specimen).Unless specified, nodes are supported by SHaLRT ≥80%, UFBoot ≥95% and sCF ≥34.3 (minimum support for a family that is well defined morphologically, Trichogrammatidae).Nodes with a grey circle are supported by SHaLRT <80% or UFBoot <95%; nodes with a black circle are supported by SHaLRT <80% and UFBoot <95%; nodes with a black triangle are supported with sCF <34.3.Images on the left of tentative family names are all at the same scale.Images on the right of tentative family names have been magnified.Photos ©K.Bolte (Baeomorphidae); ©J.-Y.Rasplus (all others).parameters as reported by IQ-TREE (partitioning by locus), we chose to rely on the IQ-TREE mixed model combined tree (Figs 4a and 5) for discussion of phylogenetic relationships.
More UCEs than exons supported the IQ-TREE combined tree (gCF; Fig. 4b), but a significant component of discordance was due to the lack of resolution of gene trees (gDFP).Distributions of sCF for nodes were identical for the two types of markers (Fig. 4b).
Statistical support was higher for longer branches (Fig. S3A,B).Absolute RF distances between the exon-sAA and the combined trees or the UCEs90-25 and the combined trees were close (ten more branches shared between the UCEs90-25 and the combined trees; Table S2c).The combined tree was thus considered as an acceptable synthesis of the two types of markers.
Nearly identical results were recovered from our combined IQ-TREE and TNT analyses (Figs 2, 3 and S1, RF between IQ-TREE mixed model and TNT trees = 98).Of the 25 extant families (Table S1a), 18 were recovered as monophyletic with strong support, while seven were recovered as paraphyletic or polyphyletic (Figs 3, 4a and S1).The worst case was Pteromalidae, which were spread across the entire tree (PTER; Fig. 5).Aphelinidae and Eulophidae were both monophyletic, but only if one genus was excluded from each family, Cales and Trisecodes, respectively.Finally, Agaonidae (two lineages), Chalcididae (two lineages) and Eupelmidae (five lineages) were polyphyletic.
There were some higher-level relationships inferred that may reflect biology more so than morphology (Figs 4a and 5).A clade of gall-associates emerged (hereafter referred to as the "Gall Clade"), which had not been proposed previously based on morphology.A clade of chalcids with planidial larvae as already found by Zhang et al. (2022) also was recovered.A group of "Tiny Wasps" (usually <4 mm in length and soft-bodied) mostly associated with Hemiptera was inferred as a monophyletic group with IQ-TREE.However, most analyses supported this group as a grade of small soft-bodied chalcids leading to a clade of larger more robustly sclerotized wasps that include Chalcididae and Eurytomidae.This latter higher level grouping (hereafter referred to as the "Weird clade") was unexpected based on either morphology or natural history.
All but two nodes (#1 and #2; Fig. 4a) in the backbone (i.e. the first 12 bifurcations) of the combined IQ-TREE were closer to each other than most ingroup nodes, suggesting near simultaneous old divergences for most nodes (Fig. 4c).Only these two backbone nodes were recovered in the set of gene trees.Sixty percent of the backbone nodes have SH-aLRT and UFBoot higher than the suggested cut-off for validity (≥80% and ≥95%, respectively; IQ-TREE manual; Figs 4a, 5 and S1).A generic cut-off for sCF across a tree that spans such a long time makes little sense.Hence, we used the lowest sCF obtained for families that are well-defined morphologically and biologically (Heraty et al., 2013): 34.3 for Trichogrammatidae.Using this value as a cut-off, 50% of the backbone nodes are supported (Figs 4a, 5 and S1).We observed a significant overall decrease in all statistical support for nodes (gCF, sCF, UFBoot, SH-aLRT) with increasing age (Figs 4d and S3C).

Divergence time estimates and historical biogeography
The chronogram obtained from the combined tree is shown in Figs 5 and S4.Divergence time estimates and confidence intervals for all nodes are given in Table S3b.Estimates of divergence time indicate that Chalcidoidea diverged from their sister group (Mymarommatoidea) 174.0 Ma [95% Equal-tail Confidence Interval (95% CI) 167.3-180.5 Ma].Crown Chalcidoidea is dated at 162.2 Ma (153.9-169.8Ma).The four first splits on the backbone [Mymaridae; Baeomorphidae (= Rotoitidae); "Tiny Wasp clade"; all other chalcid wasps] occur over a timespan of ~53 Myr.From ~110 Ma, divergences are closer in time with the remaining eight splits on the backbone spanning ~24 Myr.Ancestral range estimations for all biogeographical models are provided in Fig. S5.AICc favoured the BAYAREALIKE+J model (Table S4c; Fig. 5).A South Gondwanan origin of Chalcidoidea [Australasian (DEC, DEC + J, BAYAREALIKE + J; DIVALIKE + J) or Australasian + Neotropical (BAYAREALIKE; DIVALIKE)] is suggested by all models, with colonization of the rest of the world more or less delayed in time depending on the model used (Fig. S5).

Mutational saturation disturbs phylogenomic inferences
This study expands upon earlier investigations (Munro et al., 2011;Heraty et al., 2013;Peters et al., 2018;Zhang et al., 2020;Blaimer et al., 2023) to yield a more comprehensive phylogenetic framework for higher relationships within Chalcidoidea.We used a comparison of phylogenomic inferences from molecular datasets (exons for 414 taxa and UCEs for 407 taxa) and morphological/biological/ecological knowledge to enable us to detect systematic bias attributable to mutational saturation, and, conversely, morphological convergence.With decreasing saturation, exons and UCEs topologies tended to become more similar (Figs 1, 2 and S1; Appendix S1) and relationships of groups for which placement could be evaluated based on morphological data tended to become more concordant with morphology or biology (Table S6), though not always.Thus, we confirm that mutational saturation is an important source of error in phylogenomics, especially for deep-time inferences.
We comment below on the results with respect to selected groups of taxa, including our benchmarks, and what they reveal about inference bias due to saturation.

Mymaridae
and Baeomorphidae (=Rotoitidae).Mymaridae was recovered as sister to all other Chalcidoidea with strong support (Benchmark 1) in all but the most saturated UCE datasets.With this last dataset, Baeomorphidae was inferred as sister to the rest of chalcid wasps including Mymaridae.With decreasing saturation, the SHaLRT and UFBoot support for Mymaridae as sister to other Chalcidoidea increased in the UCE trees.Mymaridae and Baeomorphidae have long been hypothesized as the and second lineages of Chalcidoidea to diverge from their common ancestor (Gibson and Huber, 2000;Munro et al., 2011;Heraty et al., 2013).Blaimer et al. (2023) recovered the same relationships with strong support, but here we confirm this with a greater sampling of taxa.
Chalcididae and Eurytomidae.The polyphyly of Eurytomidae (Benchmark 2) as well as the nonmonophyly of Chalcididae + Eurytomidae (Benchmark 4) in the most saturated exon and UCE datasets can be attributed to mutational saturation.Indeed, Eurytomidae is a wellsupported family (Heraty et al., 2013).Cratocentrinae is considered to be the sister group of other Chalcididae (Cruaud et al., 2021).However, a monophyletic Chalcididae (Benchmark 3) was only recovered in the parsimony analyses of the AHE 414 and 520 datasets (Figs 3 and S1).A mesothoracic spiracle that is hidden in Eurytomidae and all Chalcididae except for Cratocentrinae gives support to Cratocentrinae being sister to the two other taxa, but 15 synapomorphies support Chalcididae as monophyletic (Cruaud et al., 2021).This suggests that despite high statistical support (SHaLRT = 94/ UFBoot = 97 for exonsAA; 100/100 for UCEs90-25 and combined), IQ-TREE inferences may be incorrect but this result requires more investigation.Benchmark 2 (monophyly of Eurytomidae) was only supported by the prot-AA dataset of Blaimer et al. (2023); in other cases, Heimbrinae was placed outside of this clade, similar to our most saturated analyses (Fig. 2).With regard to our Benchmark 3 (monophyly of Chalcididae), they did not include Cratocentrinae, so this latter hypothesis was not tested by them.
Calosotinae (Eupelmidae).Although morphology weakly supports Calosotinae as monophyletic (Gibson, 1989;Heraty et al., 2013), it was always recovered as polyphyletic in our analyses.Indeed, a group of Calosotinae (Eusandalum, Pentacladia and Paraeusandalum), which exhibit V-shaped notauli, never clustered with other Calosotinae that show paramedially parallel notauli (Gibson, 1989).This group is instead more closely related to several Pteromalidae genera (Heydenia, Ditropinotella, Grooca and Solenura), a result that is somewhat corroborated by morphology.With decreasing saturation, species belonging to Calosotinae were less scattered across the trees, and Calosotinae with V-shaped notauli became sister to the clade composed of other Calosotinae, some Pteromalidae genera and Eupelminae, with the result that a core group of Eupelmidae (Calosotinae and Eupelminae) was not monophyletic.This result was likewise supported by all of our preferred phylogenomic datasets (Fig. 3).Convergent modifications of mesosomal structure in "jumping" taxa, including an enlarged acropleuron (actually the external manifestation of profound changes in internal skeletomusculature; (Gibson, 1989) and associated pegs or spines on the mesotarsal segments linked to the ability to jump could have misled morphological studies (Peters et al., 2018;Zhang et al., 2020).Notably, none of the studies based on molecular data alone supported monophyly of a clade with jumping abilities that includes Eupelmidae, Cynipencyrtidae, Encyrtidae, Tanaostigmatidae and some Aphelinidae.Monophyly of Calosotinae, Eupelmidae and a clade that included Cynipencyrtidae, Tanaostigmatidae and Encyrtidae was only found in the combined morphological and molecular analysis of Heraty et al. (2013).Modifications linked to the ability to jump may be at the origin of one of the two characteristics considered as apomorphies defining the clade (mesoscutal lateral lobes "shoulder-like" on either side of pronotum).The presence of parapsidal lines in Calosotinae and in Solenura and Grooca (Gibson et al., 1999) is a potential argument to redefine Calosotinae, with not all taxa having an enlarged acropleuron.Despite all of our efforts, our fifth benchmark criterion, monophyly of Eupelmidae in the broad sense (including Neanastinae and Metapelmatinae) or even monophyly of Eupelminae and Calosotinae (the narrow sense), was never achieved and so was also the case in Blaimer et al. (2023).
Agaonidae (Benchmark 6).Sycophaginae and all of the other subfamilies of Agaonidae are associated with Ficus (Moraceae).They have been considered as part of the same family based on morphological data (Heraty et al., 2013), but this result was not supported by molecular data (Munro et al., 2011;Blaimer et al., 2023).In our results, Sycophaginae consistently clustered away from other Agaonidae.With decreasing saturation in the exon dataset and with UCEs, Sycophaginae were consistently recovered in the same group of pteromalids [now Pteromalidae sensu stricto (s.s.); Burks et al., 2022].While several morphological characters group Sycophaginae and other Agaonidae, several others are shared with lineages of Pteromalidae with which Sycophaginae clustered on our trees.Characters supporting a close relationship between Pteromalidae and Sycophaginae include the separated postgenae with an interceding lower tentorial bridge impressed relative to the postgena, the structure of the antenna, being 14-segmented with a terminal button, and presence of an axillular sulcus in all but a few highly derived species (Heraty et al., 2013).In addition, the gall-associated biology of Sycophaginae is similar to that of Colotrechninae (Pteromalidae), another lineage with which Sycophaginae cluster.This case is another example of a result not concordant with a benchmark criterion that is therefore rejected.Encyrtocephalus (Melanosomellidae).As saturation decreased, the genus Encyrtocephalus grouped within the Gall clade (Fig. 2).Morphologically, Encyrtocephalus shares characters with Melanosomellini and only from a majority of them by the large supracoxal flange of propodeum and a curved stigmal vein.However, in all our analyses, including parsimony, Encyrtocephalus never grouped within Melanosomellini or as its sister group, but instead was mostly recovered as sister to Tanaostigmatidae, a result that requires more investigation.This was not among our benchmark criteria, but an example of where better taxon sampling may help to resolve conflicting relationships.
Other bias and possible options to improve inferences.No objective criterion has been proposed so far to determine what fraction of genes/sites should be removed from a dataset to converge to a "correct" topology and it is unlikely that such a criterion will emerge in the future.Therefore, we advocate that the IQ-TREE combined tree (Fig. 5) is the best compromise we could achieve with this dataset, current evolutionary models and inference methods.In addition, this tree is close to the topology inferred with parsimony (Fig. 3).Results may be improved in the future either with increasing taxonomic sampling and/or better evolutionary models, assuming that analyses are computationally tractable (it was, for example, not feasible to use a CAT-GTR model on our dataset).However, the increased taxonomic sampling of the AHE520 dataset yielded nearly identical results both with maximum-likelihood and parsimony approaches (Figs 3  and S1).Bias will be difficult to track and alternative relationships hard to evaluate given the extreme plasticity of morphological characters in chalcid wasps.Indeed, the astounding diversity of morphologies that evolved in c. 160 Myr will continue to complicate the finding of strong synapomorphies to support many of the groups.To the best of our knowledge, no morphological analysis provides convincing evidence to reject the global topology inferred with the molecular dataset, but there are data suggesting that alternative placements of a few groups are as plausible as those recovered here.

Validity of current families
The complete revision, that relies on the present phylogenetic hypothesis and in which 50 extant families are now recognized has been published elsewhere (Burks et al., 2022).The main changes to the familial classification that resulted from this study are discussed below.To help readers, new family names are mapped on Fig. 5, and stem and crown ages are listed in Table S3c.Of the 25 previously (before Burks et al., 2022) recognized (extant) families, seven were recovered as paraphyletic or polyphyletic in the least biased molecular datasets, which confirms both rampant morphological convergence within Chalcidoidea and the problem of inadequate diagnoses of families such as the former characterization of Pteromalidae.
Chalcididae.Until proven otherwise and following the discussion in the previous section, Cratocentrinae is maintained within Chalcididae.
Trisecodes is either recovered as sister to Systasini (exonsAA 433/520, combined) or Trichogrammatidae (UCEs90-25).Trisecodes exhibits the 3-segmented tarsi of Trichogrammatidae, but this is a characteristic that has occurred several times independently across Chalcidoidea.Trisecodes shares with Systasini the presence of a mesofurcal pit on the mesotrochantinal plate between the mesocoxal insertions, which suggests a closer relationship between the two groups.Defining a family grouping for Trisecodes and Systasini (Systasidae) seemed the best solution, even though Trisecodes differs from Systasini in tarsomere and flagellomere count (Burks et al., 2022).
Eupelmidae.This family was never recovered as monophyletic in our analyses.No single morphological feature is unique to Eupelmidae (Gibson, 1989;Heraty et al., 2013), casting doubt on its validity.In all molecular trees, Neanastatinae (including Metapelma) and Calosotinae are polyphyletic, while Eupelminae are recovered as monophyletic.Furthermore, Eopelma never groups with other eupelmid clades and is instead consistently recovered as sister to Storeya, the unique genus of Storeyinae (Pteromalidae).The status and placement of the current genera and subfamilies of Eupelmidae are thoroughly discussed by Burks et al. (2022), but Metapelmidae and Neanastatidae were removed from Eupelmidae and treated as distinct families, and Eopelma was treated as incertae sedis in Chalcidoidea.
Eriaporidae.In all topologies, Cecidellis (Pteromalidae) renders Eriaporidae paraphyletic.However, although the genus is well-defined by a short lamina the posterior propodeal margin in the female, it resembles some Eriaporidae in coloration and eye divergence, and Pireninae in body shape features.In addition, Eriaporidae was recovered as sister to Pireninae (Pteromalidae) in all our reconstructions, a group with which it shares several morphological characters.Therefore, a single family, Pirenidae containing Erioporinae and Cecidellis in Cecidellinae was proposed by Burks et al. (2022).
Pteromalidae.Pteromalidae s.l. is scattered across all inferred trees.This hyperdiverse family that contains 33 subfamilies and nearly 650 genera has long been considered as a "taxonomic garbage can" for taxa that could not be easily included in other chalcid families (Burks et al., 2022).Pteromalidae has long been shown to be polyphyletic (Munro et al., 2011;Heraty et al., 2013;Peters et al., 2018;Zhang et al., 2020), but the lack of robust support for previous phylogenies has precluded any taxonomic rearrangement.Although the backbone of our Chalcidoidea tree remains unresolved in some places, shallower clades are strongly supported which has enabled a first revision of the former Pteromalidae (Burks et al., 2022).Our results do support a Pteromalidae sensu stricto (s.s.) that includes a number of odd fig-wasp parasitoids and the Sycophaginae that were previously treated as Agaonidae.Twenty families, that mostly correspond to the current subfamilies or tribes of pteromalids apart from Pteromalidae s.s., have thus been erected and their circumscription redefined by Burks et al. (2022).Most of these new families are ancient lineages (100-80 Ma; Table S3c) that are likely to have been grouped together within Pteromalidae based on symplesiomorphies, or because they had no apparent affinities with any other family.Other subfamilies have been included in other existing families that were redefined (e.g.Keiraninae and Chromeurytominae within Megastigmidae) or remained incertae sedis (e.g.Storeyinae) (Burks et al., 2022).

Higher level relationships
From here on, family names refer to the new classification (Burks et al., 2022).
Relationships in some places are still unresolved, although it is difficult to say whether it is to the consequence of a lack of signal or noise creating conflicting signal across the genome as previously suggested (Zhang et al., 2020).Depending on which measure of statistical support is used, the backbone is either a rake or moderately supported (Table S2e; Figs 4, 5  and S1).Only two backbone nodes are recovered in the set of gene trees (#1 and #2; Fig. 4), which is likely because of a lack of signal linked with the short length of loci that results in unresolved gene trees (Fig. 4b).Three backbone nodes (#4-6; Fig. 4) do not receive any significant statistical support (gCF, sCF, UFBoot and SH-aLRT) and the corresponding part of the topology should be better regarded as a polytomy.As with the early evolution of birds (Suh, 2016), phylogenomics may not be able to resolve these difficult nodes because of near-simultaneous speciation.Indeed, from backbone node #2 ("Tiny Wasp clade"), one lineage appears, on average, every 3 Myr (Table S3b; Fig. 5) and the branching pattern is characterized by very short branches that are likely to lead to gene tree incongruence.
Below we discuss some of the higher-level relationships inferred with decreasing saturation in the individual datasets or emerged in the combined tree in agreement with hidden support in the individual datasets (Fig. 4a).
"Planidial clade" (Eucharitidae + Perilampidae + Eutrichosomatidae + Chrysolampidae).We confirm that families with planidial larvae (Zhang et al., 2022) form a monophyletic group that is recovered in all trees.Notably, with the inclusion of Eutrichosomatidae, planidia are defined by their behaviour (parasitoids that move across different larval instars of the host and then finish development externally on the host pupa) and not as a clade by their morphology (Zhang et al., 2022).There is no morphological support for this clade based on immatures or adults.Statistical support for this clade in the combined tree is high (SHaLRT = 100/ UFBoot = 100/gCF = 0.197/sCF = 36.5)with, again, the exception of gCF.Sister to the "planidial clade" we recovered a clade of two old pteromalid lineages that split between 89 and 96 Ma that are now considered as one family (Spalangiidae; Burks et al., 2022).Only larvae of Spalangiinae have been described so far (Tormos et al., 2009) and, interestingly, Spalangia larvae appear to be mobile (Gerling and Legner, 1968), with a series of tubercles across the ventral region of body segments II-XII (Tormos et al., 2009).This latter feature also is documented in several lineages of the planidial clade [Chrysolampidae: Chrysolampinae (Askew, 1980;Darling and Miller, 1991); Philomidinae (Darling, 1992); and Eutrichosomatidae (Baker and Heraty, 2020)], which may corroborate the close relationship of these taxa and potentially expand the definition of the planidial clade.
"Gall clade" (Cynipencyrtidae + Epichrysomallidae + Melanosomellidae + Ormyridae + Tanaostigmatidae).This higher-level grouping is recovered here for the first time.Again, with the exception of gCF, statistical support in the combined tree is high (100/100 /0.049/34.2).On the morphological side, there is limited evidence to support this clade that groups wasp lineages previously classified in four different families.However, from the perspective of life-history strategy all lineages are gall-associated wasps.
"Tiny wasps" (Aphelinidae + Azotidae + Calesidae + Neodiparidae + Encyrtidae + Eulophidae + Eunotidae + Idioporidae + Signiphoridae + Trichogrammatidae).This higher-level grouping is highlighted for the first time although it was recovered as monophyletic only in the IQ-TREE combined tree with moderate support (100/74/0/30.9).However, the grouping of these taxa near the base of Chalcidoidea, either as a monophyletic or polyphyletic group, with the exclusion of Mymaridae and Baeomorphidae, is maintained across all of our analyses (Fig. S1).The backbone node (#3) that splits Mymaridae/ Baeomorphidae/"Tiny Wasp clade" from the other Chalcidoidea is moderately supported (100/73/0/37.9)and receives hidden support from the exonsAA (sCF = 38.6)and the UCEs90-25 (sCF = 40.4)datasets (Fig. S1).Statistical support for the group is possibly weakened by the ambiguous placement of the rogue Trisecodes (cf.previous section) that is either nested within this clade (UCEs) or that clusters with Systasidae and Tetracampidae (exonsAA).Tiny wasps form a grade in the TNT combined tree (Fig. 3).From a morphological point of view, the "Tiny Wasp clade" has hardly any support other than usually being small and mostly soft-bodied (except some Eulophidae).The group comprises several lineages characterized by a reduction in the number of flagellomeres or tarsomeres, and the frequent presence of a mesophragma that extends into the metasoma through a broad union with the mesosoma.Biologies are diverse in this group and there are no clear trends.With the exception of Eulophidae, which parasitize nearly all insect orders, and Trichogrammatidae, which are egg parasitoids, lineages of the "Tiny Wasp clade" are more frequently associated with Hemiptera.They appear to be mainly endoparasitoids of exophytic hosts (such as mealybugs), while other chalcid wasps are more frequently ectoparasitoids of endophytic hosts.In the same manner, species of this clade also appear to be more frequently oophagous than other chalcid wasps.Confirmation of this clade as a monophyletic lineage will require increasing taxonomic sampling, which may help to stabilize the placement of Trisecodes and would greatly facilitate a formal study of the evolution of host-associations within Chalcidoidea.
"Weird clade" (Megastigmidae + Leucospidae + Agaonidae + Metapelmatidae + Eurytomidae + Chalcididae + several poorly diversified families).This higher-level grouping was unexpected based on morphology or biology.Yet, this "Weird clade" is statistically slightly better supported than the "Tiny Wasp clade" in the combined tree (99.2/95/0/35.6).Hidden sCF support for this clade is 35.9 for UCEs and 34.3 for exonsAA.From a morphological point of view the "Weird clade", as its name indicates, groups disparate lineages of chalcid wasps that exhibit contrasting morphologies that could be correlated with their diverse biologies (Figs 4 and 5).Many families, such as Boucekiidae, Chalcedectidae, Cleonymidae, Lyciscidae, Metapelmatidae and Pelecinellidae, but also several genera of Eurytomidae and a few Chalcididae, are parasitoids of xylophagous insects (mostly Coleoptera).Leucospidae have shifted to solitary bees and wasps that nest in wood or in mud nests, but L. dorsigera also has been reported as a hyperparasitoid of xylophagous beetles (Hesami et al., 2005).Agaonidae, Megastigmidae and Eurytomidae are mostly phytophagous, but the last two families also comprise parasitoid species.Agaonidae enter figs (inflorescences of Ficus, Moraceae) through a small aperture called the ostiole and exhibit strong morphological adaptations (e.g.mandibular appendage, anelli fused into a hook-like process, short protibia with spurs; Cruaud et al., 2010).Some lineages have more specialized biology such as Moranilidae that are predators of mealybug eggs, Asaphesinae (incertae sedis) that are hyperparasitoids of aphids through Braconidae wasps, and Enoggerinae (incertae sedis) that are oophagous parasitoids of Coleoptera.Coelocybidae are gallassociated wasps.Finally, Chalcididae attack nearly all insect orders, but mostly Lepidoptera.
Sister group to Chalcidoidea.Finally, with the exception of the exonsAA and AHE520AA trees (weak support for Diapriidae as sister to Chalcidoidea), all of our results strongly support the sister-group relationship between Mymarommatoidea and Chalcidoidea.Gibson (1986a) was the first to propose a sister-group relationship between these two superfamilies.Munro et al. (2011) recovered Diaprioidea + Mymarommatoidea as sister to Chalcidoidea, although when sequence data were combined with morphology (Heraty et al., 2013), mymarommatids were sister to Chalcidoidea.A sister relationship with mymarommatids also was recovered in the UCE results of Blaimer et al. (2023).
Morphological convergences.Higher level morphological convergences are confirmed as exemplified by the evolution of an enlarged acropleuron and correlative transformation of legs linked to the ability to jump that may have happened at least seven times independently during the evolutionary history of chalcid wasps in (i) Encyrtidae, (ii) Lambrodegma and Neanastatus, (iii) Eopelma, (iv) Metapelma, (v) Tanaostigmatidae, (vi) Cynipencyrtus and (vii) Calosotinae; all relationships except Metapelma with maximum support.

Time line and historical biogeography
An important result of our study is a revision of the temporal scale over which Chalcidoidea have evolved and dispersed throughout the world (Figs 5 and S4; Table S3).Likely because our taxonomic sampling is one or two orders of magnitude higher than most previous phylogenomic studies (Branstetter et al., 2017;Peters et al., 2017Peters et al., , 2018;;Tang et al., 2019), we infer an older crown age for Chalcidoidea: 162.2 (154.0-170.0)Ma.Nevertheless, this age is close to the recent estimates made from the most-representative phylogeny of 3), Blaimer et al., 2023].
Compatibility of ages with the fossil record, Earth's palaeogeological history and previous works.Importantly, our estimates are compatible not only with the meagre fossil records of chalcid wasps, but also with Earth's palaeo-geological history (Fig. 5).The oldest fossils of chalcid wasps belong to the so-called "soft-bodied" chalcid wasps (Mymaridae, Baeomorphidae and "Tiny Wasp clade"; Haas et al., 2018).The oldest putative chalcid fossil is Minutoma yathribi Kaddumi, 2005 from Jordanian amber (Albian, ~113.0-100.5 Ma; Kaddumi, 2005).But the uncertain affinities of this fossil precluded its use in our analyses.The oldest unambiguous chalcid fossils are a Mymaridae [Myanmymar aresconoides ;Poinar Jr. and Huber, 2011] and a Baeomorphidae (Baeomorpha liorum; Huber et al., 2019) from Myanmar (Burmese) amber (minimum age 98.2 Ma; see rationale for calibration priors in the Supporting information Materials and Methods).Species of Baeomorphidae are also frequent in Cretaceous ambers of the Northern Hemisphere [in the retinites of Baikura (minimum age 94.3 Ma) and of Yantardakh (minimum age 83.6 Ma; Gumovsky et al., 2018), and Canadian ambers of Cedar Lake and Grassy Lake, which are Campanian in age (83.6-72.1 Ma;McKellar et al., 2008)].
Only a dozen fossils of "hard-bodied" Chalcidoidea are known from Cretaceous formations.Among them, three were not used as calibrations because they have uncertain affinities that prevent them from being assigned to a clade.Nevertheless, all of them fit relatively well within the proposed time frame for Chalcidoidea.Diversinitidae (Myanmar amber) possibly belongs to the group of "hard bodied" taxa (Haas et al., 2018); however, it has uncertain relationships with extant chalcid wasps.Two other undescribed fossils clearly belong to this large clade: (i) a few specimens with uncertain morphological affinities (Pirenidae or Micradelinae) (A.Gumovsky and M.-D. Mitroiu, pers. comm.) from Taimyr amber (86.3-83.6Ma), and (ii) an unidentified "torymid" specimen from Canadian amber (83.6-72.1 Ma) [fig. 4c in McKellar and Engel, 2012, that probably belongs instead to an extinct lineage].
Splits between Neotropical and Australian lineages also corroborated our new time frame for chalcid wasps.Indeed, several clades whose ancestor probably dispersed through the Antarctic land bridge pre-date connection break-ups and temperature decreases (i.e. 45 Ma; van den Ende et al., 2017).Thus, the clade grouping the Australian Aeschylia with the Neotropical Aditrochus and Plastobelyta (Melanosomellidae), all of which are gallers on Nothofagus, is dated at 60.1 Ma (Fig. S4; Table S3b), while the split between the Australian Liepara and the south Andean Lanthanomyia (Coelocybidae) that is, among others, a parasite of Aditrochus species is dated at 46.2 Ma.The split between the Neotropical Erotolepsia and its Australian/ Oriental sister Papuopsia (Spalangiidae) is estimated at 49.9 Ma.The stem age of Neotropical Lyciscidae that are nested within an Australian clade is estimated at 33 Ma, which is a very late time to cross Antarctica.
Only a few dating analyses have been performed for groups of chalcid wasps.Variability in mean age estimates were noted depending on the study, datasets and methods used (differences up to 12 Myr).Our estimates for the age of the planidial clade ] and ] are close to previous estimates (Murray et al., 2013;Zhang et al., 2022).The mean age for crown Agaonidae at 60.7 Ma (50.7-70.9) is 15 Myr younger than previous estimates of 75 Ma (94.9-56.2;Cruaud et al., 2012b) but this difference is likely to have been a consequence of the reduced set of taxa used in our study.Nevertheless, ages of all other groups of wasps that are strictly associated with Ficus (Epichrysomallidae, Sycophaginae, and other pteromaline fig wasps) postdate the age of Agaonidae.
Insights on the origin of Baeomorphidae.Recent findings by Heraty et al. (2018), strongly suggest that Baeomorphidae may be egg parasitoids of Peloridiidae (moss-feeding bugs) and/or of Myerslopiidae (treehoppers), ancient families of Hemiptera, that are (respectively) sister to all other Auchenorrhyncha and to all other Membracoidea (Johnson et al., 2018).Interestingly, both peloridiids and myerslopiids also have an amphi-Pacific distribution and Myerslopiidae fossils were only found at Crato in .In addition, the split between peloridiids from ; Ye et al., 2019) is close to our estimate for Baeomorphidae and corroborates our time-scale for Chalcidoidea.The fossil record shows that Baeomorphidae were largely distributed not only in Laurasia, but also on the Myanmar terrane, between the lower Cenomanian and the Campanian (98.2-72.1 Ma; Gumovsky et al., 2018;Huber et al., 2019), which suggests, in the framework of our scenario, a long northward dispersal.Our results contradict the hypothesis of a Laurasian origin for Baeomorphidae (Gumovsky et al., 2018) that has been made outside of a formal phylogenetic framework.However, given the area of origin for Mymaridae, and the current distribution of Rotoita and Chiloe, whatever the position of Baeomorpha + Taimyromorpha (fossil taxa) within Baeomorphidae, it would lead to a South Gondwanan origin for Baeomorphidae and a northward dispersal between 150 and 100 Ma.Their distribution fits well with that of fossils of Coleorrhyncha, the lineage to which Peloridiidae belongs and is now the only extant representative.Coleorrhyncha are divided into two large groups: the Progonocimicidoidea and the Peloridioidea.All extinct families of Peloridioidea occurred in Laurasia (between 201.6 and 112.6 Ma).The oldest fossil of Progonocimicidae occurred in Australia and was dated back to the Changhsingian, the uppermost stage of Permian (254.1-251.9Ma).Progonocimicidae are recorded from Gondwanan and Laurasian Triassic formations (Evans, 1956) but became mostly confined to Laurasia later on (Jurassic and Cretaceous) with the notable exceptions of two Gondwanan fossils found in Cretaceous ambers (Lebanese and Myanmar; Szwedo et al., 2011;Jiang et al., 2019).Most Coleorrhyncha groups declined during the mid-Cretaceous biotic crisis when vegetation was replaced by modern angiosperms and did not survive the Chicxulub impact (Dong et al., 2014).Our scenario suggests that baeomorphids expanded northward and experienced the same extinction event.We recognize that based on the ancient age and presence of numerous Laurasian fossils, their area of origin may be challenged but the retraction to the current relictual distribution is certain.Furthermore, the scarce Gondwanan deposits prevent to ascertain their absence in the Southern Hemisphere during the Early Cretaceous.
Divergences that coincide with increases in insect fossil diversity and diversification of flowering plants.The origin of Chalcidoidea and divergence of the two first lineages (Baeomorphidae 153.1 Ma; "Tiny Wasp clade" 136.1 Ma) coincide with a rise in insect fossil families in the late Jurassic to the Hauterivian-Barremian (Schachat et al., 2019).The next splits on the backbone (from 110.3 Ma) coincide with the second sharp increase in fossil diversity through the Albian and Cenomanian (Schachat et al., 2019).This rapid radiation of "hard-bodied" Chalcidoidea, between 110 and 80 Ma, coincides with the onset and diversification of flowering plants and holometabolan insects.Although this hypothesis should be formally tested through a thorough compilation of host associations and reconstruction of ancestral life histories, the first lineages to diverge (Mymaridae, Baeomorphidae, "Tiny Wasp clade") are first likely to have been oophagous and subsequently mostly associated with hemipteran hosts (Aphelinidae, Azotidae, Calesidae, Encyrtidae, Eunotidae, Signiphoridae).Recently it was discovered that the sister group to Chalcidoidea, Mymarommatoidea, are parasitoids of the eggs of Lepidopsocidae (Psocoptera; Honsberger et al., 2022), thus adding further support to an ancestral habit of egg parasitism for Chalcidoidea.Subsequently, chalcid wasps switched to virtually all orders and life stages of Holometabola.Interestingly, the first shifts to phytophagy (e.g.stemcrown gall clade: 102.1-98.2Ma) correspond to the beginning of the "Angiosperm Terrestrial Revolution" c. 100 Ma (Benton et al., 2022).
A first hypothesis on the global historical biogeography of chalcid wasps.Our biogeographical scenario for Chalcidoidea should be regarded as a first hypothesis.Indeed, although our sampling is highly representative of the main lineages occurring on Earth (few suprageneric taxa are missing) and representative of the distribution of described genera (Noyes, 2019), several hyperdiverse clades have been scarcely sampled (e.g.Trichogrammatidae, Encyrtidae).Finally, neither model of biogeographical inference appears fully realistic to cover the entire range of biological traits that may regulate dispersal and therefore impact biogeographical processes.The size of chalcid wasps varies from 0.15 to >25 mm long, and the tiniest ones can easily be blown by the wind (Glick, 1939) or disperse actively over long distances (≤100 km; (Grillenberger et al., 2009;Lander et al., 2014).Lifehistory characteristics also may be important in regulating dispersal; for example, idiobiont parasitoids are usually generalists, a characteristic that may facilitate establishment in newly colonized habitats.However, the success of long-distance dispersal remains to be demonstrated and relies mostly on host availability and quality.
Vicariance and therefore the support for the DEC/ DIVA models, can be suspected when species of chalcid wasps are subdivided geographically together with ancestral host species.Such allopatric speciation appears to be a frequent scenario in chalcid wasps (e.g.Nicholls et al., 2010).Conversely, chalcid wasps can undergo rapid speciation by the emergence of ecotypes (Malec et al., 2021) or cytoplasmic incompatibility (Gebiola et al., 2016) and thus may be prone to sympatric speciation (K€ onig et al., 2019).Logically, the range of a parasitoid species will be limited by the range of its host(s).The higher statistical support for the BAYAREALIKE + J model suggests an important role for sympatric speciation, a rapid colonization of the widespread host range and long-distance dispersal.Because the relative contribution of the different speciation modes in chalcid wasps is unknown, we took a conservative position and considered an East-Gondwanan origin for chalcid wasps that is inferred by most models (DEC, DEC + J, BAYAREALIKE + J; DIVALIKE + J) as the most likely.
Although there is no formal biogeographical analysis of Mymaridae published yet, their East-Gondwanan origin (inferred by all models) is reasonable.Indeed, the Australian fauna contains the earliest diverging lineages of Mymaridae as well as a large diversity of genera that encompasses all familial subgroups (Lin et al., 2007).Extant genera of Baeomorphidae occur in southern temperate rainforests of Chile (Chiloe) and New Zealand (Rotoita) and show a disjunct amphi-Pacific distribution (van den Ende et al., 2017).The split between these genera (91.7 Ma) appears contemporaneous with the breakup of southeast Gondwana and with the drift of Zealandia away from Antarctica (between 95 and 84 Ma) (Schellart et al., 2006;Mortimer et al., 2017).The conflicting hypothesis of the Laurasian origin of Baeomorphidae is discussed in the Supporting information.
Colonization of the rest of the world by the ancestors of other lineages is more or less delayed in time depending on the model used.Models that suppose identical inheritance of ancestral distributions by the two descendants immediately after speciation (BAYAREALIKE) suggest that the Cretaceous and rapid radiation of Chalcidoidea occurred in southern Gondwana and favour a subsequent "multiple dispersal out of Gondwana" hypothesis for younger lineages.Models that suppose allopatric speciation and vicariance (DEC, DIVA) infer a rapid colonization of the Northern Hemisphere (ancestors of all Chalcidoidea but Mymaridae and Baeomorphidae) with subsequent multiple recolonizations of southern Gondwana (Australia and/or Neotropics) during the radiation of the group.As detailed above, assumptions made by biogeographical models do not fully describe speciation modes in chalcid wasps.Therefore, and with regard to elements discussed below, we consider the scenario inferred by BAYAREALIKE + J, which furthermore was selected by statistical tests, as plausible.
The widespread distribution of most groups included in the "Tiny Wasp clade" makes corroboration of a southern Gondwana origin (inferred by DEC + J, DIVALIKE + J, and BAYAREALIKE without or +J) difficult.Diversity should not be considered as a clue for origin.However, the diversity and placement of some Australian or Neotropical endemic taxa in the topology support a southern Gondwana origin (Fig. S5).Within Eulophidae, Ophelimus and Perthiola (mainly Australian) are sister to all Entiinae, while Aleuroctonus and Dasyomphale (Neotropical) are the first lineages to diverge within Euderomphalini.In addition, Cales is subdivided in two species groups, one occurring mostly in Australasia, the other in the Neotropics (Mottern et al., 2011), therefore it possibly originates in southern Gondwana.Conversely, a Palaearctic origin for the "Tiny Wasp clade" (inferred by DEC and DIVALIKE) supposes a fast recolonization of the Neotropics through a pathway that is difficult to identify; possibly through Africa but no members of early diverging lineages within clades of tiny wasps currently in Africa.
A Gondwanan origin of the "hard-bodied" chalcid wasps (inferred by BAYAREALIKE without or +J) is well-supported by (i) the cosmopolitan lineages that have their early diverging taxa occurring in Australia, such as Eucharitidae with Akapalinae (Murray et al., 2013), Perilampidae with Euperilampus (Zhang et al., 2022), Pteromalidae s.s. with Sycophaginae (Cruaud et al., 2011), and Colotrechninae as well as Megastigmidae with Keiraninae and Chromeurytominae; and (ii) lineages that originated in southern Gondwana and are mostly Australian and Neotropical (Melanosomellidae, Lyciscidae, Coelocybidae).Additionally, two other families, Boucekiidae (represented by Boucekius) and Pelecinellidae (Leptofoenus) could be included here as they have Australian representatives (Chalcidiscelis and Doddifoenus, respectively), that, unfortunately, could not be included in our sampling.Conversely, scenarios that infer a widespread ancestral area for the "hard-bodied" chalcid wasps (all areas for DEC without or +J or Afrotropical + Oriental + Palaearctic for DIVALIKE without or +J) suggest successive long distance dispersals and cladogenetic events between Laurasia and either the Neotropics or Australasia that seem less plausible given the distance between these areas, the specificity of the chalcid-host association, the larger size of these wasps, and the absence of African lineages in early diverging clades.
On long distance dispersals and pathways of colonization of the northern hemisphere during the Cretaceous.Scenarios that favour a South Gondwanan radiation (BAYAREALIKE without or +J) identify several subfamilial or familial lineages that the northern hemisphere (Palaearctic or Oriental region) during the Cretaceous, at a time when intercontinental dispersals were difficult.Four of them may derive from ancient dispersal events between Antarctica + Australia and Africa (where they also occur) or from a colonization of the Neotropics followed by dispersal to Africa and recent colonization of the Palaearctic region: Eunotidae between 125 and 65 Ma; the ancestor of Neodiparidae + Signiphoridae + Azotidae between 133 and 127 Ma; Ceidae between 102 and 53 Ma and Cleonymidae between 80 and 33 Ma.Two other apparent long-distance dispersal events are difficult to explain as no members of these clades are presently known outside the Palaearctic region (Rivasia and Micradelus between 81 and 77 Ma, and Cynipencyrtus at 90.6 Ma).Some other lineages possibly colonized the Northern Hemisphere through the Neotropics (Exolabrum and Herbertia between 104 and 92 Ma; Trisecodes between 86 and 41 Ma).
Finally, Torymidae was hypothesized to have originated in the Palaearctic region (Jan sta et al., 2018).Scenarios that favour a South Gondwanan radiation suggest a dispersal from southern Gondwana to Laurasia between 106 and 80 Ma, possibly through South America and Africa.In this case, the first lineages of Torymidae have either disappeared from South America or have not yet been sampled there.Another possibility, that also may be the route used by Eopelma (c.86.6 Ma) and a new subfamily close to Erotolepsiinae (c.96 Ma) to reach the Northern Hemisphere (Sunda), would be through drifting India or through ancient terranes (Hall, 2012) or the Trans-Tethyan island arc that separated from northern Australia c. 120 Ma (Westerweel et al., 2019), bringing away the Gondwanan fauna (Poinar, 2018).Conversely, the Trans-Tethyan island arc might have been used to colonize southern continents for scenarios that infer a Gondwanan + Laurasian radiation, though as previously emphasized, multiple recolonizations of southern Gondwana seem less likely.

Conclusion
Chalcidoidea has deep origins in the middle Jurassic (95% CI for stem age = 167.3-180.5 Ma) followed by a tremendous diversification in the Palaeogene concomitant with the radiation of plants and the insects that feed upon them.The combined analysis of the exonsAA and UCEs90-25 datasets (2054 loci and 284 106 sites) produced a generally well-supported hypothesis.In some cases, we found clades that matched more with natural history over their morphological support and in others we found some historical morphological groups (i.e.Eupelmidae, Cleonyminae) that were never recovered in our phylogenomic results, which confirm previous analyses (Heraty et al., 2013).In either case, we need to further explore these groups to find the basis of disagreement.Notably, we found a clade of gall-associated wasps that was not previously envisioned, but is indeed a good example of molecules suggesting a clade that is agreeable with re-examined morphology (Burks et al., 2022).Our biogeographical inference hypothesizes a general dispersal of taxa from a southern Gondwanan origin; however, a much larger sampling is required to assess this.Importantly, even with large and independent datasets, the importance of taxonomic evaluation and attempts to reduce saturation and homoplasy were all important factors in developing a concrete phylogenetic hypothesis for this massive radiation.

Fig. 1 .
Fig.1.Comparison of properties of the analysed datasets.Datasets (AHE414 and UCEs) are described in Table1.For each panel, letters above box plots reflect pairwise comparisons of marginal means estimated from the best-fit models; distributions sharing a letter do not differ significantly.Points: raw data (TableS2a).In (e), saturation was assessed by calculating the R 2 of the linear regression of uncorrected p-distances against inferred distances in individual gene trees.Highest R 2 are for least saturated loci.The scale of the Y axis is reversed to better show decrease in saturation.(f) The convergence of trees as saturation decreases.The Y axis shows the relative RF distance between pairs of trees obtained with either the combined exons or the combined UCEs.The X axis show the absolute value of the difference between the medians of the R 2 of the linear regression of uncorrected p-distances against inferred distances in gene trees that were combined to get the compared trees [cf.(e)].Four comparisons were performed in each case as datasets were analysed with and without partitioning.

Fig. 5 .
Fig. 5. Global historical biogeography of Chalcidoidea and new classification.The chronogram obtained from the complete set of ingroup taxa is illustrated.The previous classification is used to annotate tips (four letter prefixes; see also TableS1for complete information on sampling) with successive grey and white boxes grouping the tip labels.The new familial classification fromBurks et al. (2022) is shown to the right.For clarity, ancestral ranges are given only up to family level and only for the BAYEAREALIKE + J model (which was selected by AICc).All inferences of ancestral ranges are provided in Fig.S5.Inferences of ancestral ranges were conducted with only one specimen per genus as shown with brackets that connect tips.Current distribution of genera is shown with coloured boxes at tips.Sampling area of specimens is indicated in tip labels.NEO = Neotropical; NEA = Nearctic; AFR = Afrotropical; PAL = Palaearctic; ORI = Oriental; AUS = Australasian.UKN = Unknown when collection data are unavailable.Stars indicate that specimens were sampled in areas where species was introduced or not yet cited.Sampling area for the specimen used for sequencing exons is listed first, sampling area for the specimen used for sequencing UCEs is listed second; n.a. is used when no specimen was sequenced and only one sampling area is reported when exons and UCEs were obtained from specimens sampled in the same areas (or from the same specimen).Unless specified, nodes are supported by SHaLRT ≥80%, UFBoot ≥95% and sCF ≥34.3 (minimum support for a family that is well defined morphologically, Trichogrammatidae).Nodes with a grey circle are supported by SHaLRT <80% or UFBoot <95%; nodes with a black circle are supported by SHaLRT <80% and UFBoot <95%; nodes with a black triangle are supported with sCF <34.3.Images on the left of tentative family names are all at the same scale.Images on the right of tentative family names have been magnified.Photos ©K.Bolte (Baeomorphidae); ©J.-Y.Rasplus (all others).

Table 1
Description of the exons (AHE414) and UCE datasets.
Detailed properties are given in Table