Untying Gordian knots: unraveling reticulate polyploid plant evolution by genomic data using the large Ranunculus auricomus species complex

Kevin Karbstein , Salvatore Tomasello , Ladislav Hoda c , Natascha Wagner , Pia Marin cek , Birthe Hilkka Barke , Claudia Paetzold and Elvira H€ orandl Department of Systematics, Biodiversity and Evolution of Plants (with Herbarium), Albrecht-von-Haller Institute for Plant Sciences, University of G€ottingen, 37073 G€ottingen, Germany; Georg-August University School of Science (GAUSS), University of G€ottingen, 37073 G€ottingen, Germany; Department of Biogeochemical Integration, Max Planck Institute for


Introduction
Polyploidy, the presence of more than two chromosome sets, occurs across eukaryotes (Otto & Whitton, 2000;van de Peer et al., 2017). All flowering plants are of ancient polyploid origin, and several additional polyploidization events were found in various lineages (van de Peer et al., 2017;Leebens-Mack et al., 2019). Young polyploidization events (neopolyploidy) tend to be followed by an upshift of diversification rates (Landis et al., 2018). Two different main polyploidization types exist: allopolyploids are formed by hybridization between different species/lineages whereas autopolyploids arise within species (Comai, 2005;Blischak et al., 2018). Consequently, autopolyploids contain genetically similar subgenomes whereas allopolyploids are composed of previously diverged subgenomes. Both types can occur within the same species or species complex (H€ orandl, 2022). Allopolyploidy is frequently associated with higher degrees of genomic, transcriptomic, and epigenomic changes than autopolyploidy, and is thus considered particularly likely to create novel genomic features (Abbott et al., 2013;Wendel, 2015;Spoelhof et al., 2017;Rothfels, 2021).
Polyploid lineages are not only shaped by evolutionary origin and genomic contributions of progenitors, but also by postorigin processes, resulting in a mosaic-like genome structure (Soltis et al., 2015). Expression bias due to epigenetic changes and homoeologous exchanges (HEs) directly after allopolyploidization can distort original genomic contributions, leading to subgenome expression and/or sequence dominance (Blischak et al., 2018;Alger & Edger, 2020). Moreover, Mendelian segregation in the first diploid hybrid generations before polyploidization, backcrossing of polyploids to their sympatric progenitors, and/or gene flow among polyploid lineages might influence original contributions (Hoda c et al., 2018(Hoda c et al., , 2019; Melich arkov a et al., 2020; Wagner et al., 2020).
Reduced representation sequencing (RRS; Davey et al., 2011) genomic data provide orders of magnitude more information than traditional markers and have proven to be effective at resolving diploid phylogenetic relationships among species that diversified c. 0.1-50 million years ago (Ma) (e.g. Pellino et al., 2013;Hipp et al., 2014;Eaton et al., 2017;Carter et al., 2019;Wagner et al., 2020). Restriction site-associated DNA sequencing (RAD-Seq) covers a subset of noncoding and coding regions across the entire genome and is mostly used for analyzing closely related groups within species or genera (Davey et al., 2011;McKain et al., 2018). Target enrichment aims at collecting rather conservative low-copy nuclear genes able to resolve relationships at the genus level (Schmickl et al., 2016;Carter et al., 2019). Although RAD-Seq yields many more loci and single nucleotide polymorphisms (SNPs) than target enrichment, read assembly is bioinformatically more challenging and locus dropout can become problematic with increasing genetic divergence (Eaton et al., 2017;Karbstein et al., 2020c). Target enrichment loci are usually longer, allowing for allele phasing, gene tree estimation, and coalescent-based approaches. Allelic information is particularly important for correct phylogenetic inferences in highly reticulate, young evolutionary relationships (Eriksson et al., 2018). In addition, while sequencing targeted nuclear genes, plastid data can be easily gained from off-target reads (Hyb-Seq) (Weitemier (a) (b) Fig. 1 Evolutionary processes in neopolyploid species complexes and methods to address these processes. The figure was drawn to trace evolutionary processes in young species groups, like the here studied Ranunculus auricomus complex. (a) Evolution of an apomictic polyploid complex from two sexual progenitor species and evolution of lineages after origin (concept of Babcock & Stebbins (1938); see also Grant (1981) and Coyne & Orr (2004) for modern interpretations). (b) Description of evolutionary processes and applied methods. CP, chloroplast (plastid) regions; RAD-Seq, restriction site-associated DNA sequencing loci; TEG, target enriched nuclear genes. See Fig. 2 for the sampling and Fig. 3 for the detailed workflow applied in this study.  Phytologist et al., 2014;Folk et al., 2015). Nuclear-plastid discordances help to identify past hybridization or allopolyploidization events (Huang et al., 2014;Stull et al., 2020) and the maternal progenitor of polyploids. Consequently, a promising approach is the combination of both RAD-and Hyb-Seq to cover the entire plant genome and unravel reticulate polyploid plant evolution.
In this study, we thus aim at understanding recent reticulate relationships and speciation by using the polyploid Ranunculus auricomus plant species complex as a model system. This complex ranges from Europe to Siberia, and spans arctic, boreal, temperate, and Mediterranean climates (Jalas & Suominen, 1989). More than 840 taxa (morphospecies) have been described; they are mainly tetraploid or hexaploid apomicts and inhabit stream sides, semi-dry to marshy meadows, and forests (Jalas & Suominen, 1989;Karbstein et al., 2020cKarbstein et al., , 2021a. Only four diploid and one tetraploid, genetically and geographically distinct, recently newly circumscribed sexual species exist (Fig. 2;Karbstein et al., 2020b,c): Ranunculus cassubicifolius, Ranunculus flabellifolius, Ranunculus envalirensis, Ranunculus marsicus (only tetraploid), and Ranunculus notabilis. Ranunculus cassubicifolius and R. flabellifolius are characterized by nondissected basal leaves whereas the other species show a strongly heterophyllous leaf cycle within a year and dissected basal leaves during anthesis (Karbstein et al., 2020c). Sexuals diverged 0.83-0.58 Ma from a European-wide, forest-understory ancestor during Pleistocene climatic fluctuations (Tomasello et al., 2020). Studies on single lineages revealed that apomictic polyploids arose from hybridization of sexual progenitors (Paun et al., 2006;H€ orandl et al., 2009;Pellino et al., 2013;Hoda c et al., 2014Hoda c et al., , 2018. Polyploids occupy larger, more northern areas, possess higher levels of genome-wide heterozygosity, and are obligate apomictic or with low levels of facultative sexuality (Karbstein et al., 2021a). Nevertheless, due to missing information on sexual progenitors, limited sampling, and low resolution of traditional molecular markers, the evolution of the entire complex remains unclear.
In this study, we use RRS genomic data of all five sexual progenitors and 78 polyploid R. auricomus taxa, and a comprehensive workflow to tackle the outlined issues. We combined genomic RAD-Seq loci, phased nuclear genes, plastid regions, and previous knowledge (sexual progenitors, ploidy, reproduction modes) with up-to-date structure, network, and SNPdiscovery methods (Figs 1, 3), to address the following questions: (1) Do RAD-Seq and phased nuclear gene data reflect a clear genetic and/or geographical structure? (2) Do nuclear and plastome data deliver conflicting signals? (3) Are apomictic lineages of autopolyploid or allopolyploid origin? (4) How many progenitors contributed to their genomes? (5) To which extent are polyploid genomes influenced by post-origin evolution?

Population sampling
We included all five sexual species and 78 of the most widespread, triploid to hexaploid apomictic R. auricomus taxa (Fig. 2, see references in legend). We collected 293 samples from 248 populations across Europe for genetic analyses (Supporting Information Table S1). The sampling included three populations per taxon on average, and 29 diploid sexual, three triploid, 206 tetraploid, and 10 hexaploid facultative to obligate apomictic populations (Karbstein et al., 2021a). Further details about locations, ploidy, reproduction modes, samples per population, and material for DNA sequencing are given in Table S1. In almost all cases, we selected the same samples among datasets. Data sampling and the workflow are described in Fig. 3.

Laboratory work, locus assembly, and parameter optimization
We used the already sequenced 280 R. auricomus and two outgroup RAD-Seq samples from Karbstein et al. (2020cKarbstein et al. ( , 2021a. DNA extraction, quality check, RAD library preparation, singleend sequencing of 100 bp reads (Baird et al., 2008), and subsequent quality filtering also followed Karbstein et al. (2020cKarbstein et al. ( , 2021a. For target enrichment, we added 85 newly sequenced samples to the already existing 28 samples sequenced by Tomasello et al. (2020), totaling 113 accessions. All plastid data (CP) from off-target reads is published herein. The employed bait set, library preparation, hybrid capture, and Illumina sequencing followed Tomasello et al. (2020).
For de novo assembly of RAD-Seq loci and parameter optimization, we used IPYRAD v.0.9.14 (Eaton & Overcast, 2020). We exactly followed Karbstein et al. (2021a): the within-sample clustering was optimized separately for each ploidy level balancing number of clusters, cluster depth, and clusters rejected due to high heterozygosity. Then, the among-sample clustering (merged assembly) was optimized for number of polymorphic loci, SNPs, loci filtered by maximum number of SNPs, removed duplicates, shared loci, and new polymorphic loci. To assess the effects of number of loci and missing data on (phylo)genomic analyses, we selected different minimum amounts of samples per locus, i.e. 74% ('min10'), 55% ('min30'), and 44% ('min50') missing data. We allowed a maximum number of four alleles per locus because the majority of samples were tetraploid and alignments showed almost never more than four alleles (randomly checked with  Kearse et al., 2012). Moreover, diploid base calling of IPYRAD leads to lumping of alleles in only two phases/haplotypes. Nevertheless, loss of genetic information is unlikely due to rare occurrence of more than two bases per site (randomly checked with GENEIOUS) and low overall genetic differentiation. In addition, choosing only one RAD-Seq SNP per locus (SNMF, PHYLONETWORKS) eliminates the bias of merging different polyploid subgenomes in genetic analysis. For target enrichment data analysis, reads were processed with HYBPHYLOMAKER v.1.6.4 (F er & Schmickl, 2018), using target regions as pseudoreference for read mapping (Notes S1; Table S2). Data filtering yielded 579 genes from which the most informative, nonhomoplasious, and free-from-paralog-sequences 48 genes were selected (Notes S2; Table S3). We phased the selected loci using a similar approach as described in Eriksson et al. (2018) (Table S4). We processed the mapped BAM files of all samples with SAMTOOLS v.0.1.19 (Li et al., 2009). The polyploid samples were phased further, looking at the phased BAM files in IGV v.2.8.9 (Robinson et al., 2011) and manually adding additional alleles to the alignments when detected. Off-target plastid reads were also assembled with HYBPHYLOMAKER using the Ranunculus repens plastome (Dann et al., 2017; Table S5) as reference, excluding plastome regions with high amounts of missing data (c. 50% plastome completeness; Notes S3; Table S6).
RADPAINTER calculates the nearest neighbor haplotype coancestry values, which gives information about genetic similarity of an individual to all other individuals across alleles and loci. We used FINERADSTRUCTURE to assign individuals to groups (1000 000 burn-in; 1000 000 iterations) including a tree building MCMC algorithm (100 000 burn-in), and plotted the results using 'FINERADSTRUCTUREPLOT.R' (R v.4.0.3; R Core Team, 2020). We selected the min10 dataset (minimal 10% samples per locus; 97 312 loci, 74% missing data) for further interpretations because it yielded the best genetic resolution (Fig. S1).

Genetic structure (nuclear genes)
To unravel genetic structure based on phased nuclear genes, we utilized the species delimitation software STACEY v.1.2.1 Fig. 2 Locations of studied Ranunculus auricomus populations across Europe. We investigated 248 sexual and apomictic populations (Supporting Information  Table S1). Symbols represent reproduction modes of populations (colored circles = sexuals or subgenomes, dark gray solid triangles = obligate apomicts, dark gray dashed squares = facultative apomicts, light gray solid triangles = tested obligate apomicts, light gray dashed squares = tested facultative apomicts; Karbstein et al., 2021a). The color scheme of sexual species was also applied to Figs 4-8. The species complex occurs in entire Europe (except for the southern Mediterranean) and sexual species are mainly distributed around the sampling regions (details in Karbstein et al., 2020c;Tomasello et al., 2020

Research
New Phytologist (Jones, 2017b). This coalescent-based approach models the genetic structure using allelic information (Jones, 2017b;Andermann et al., 2019), and is appropriate for mixed-ploidy datasets. Input files were prepared in BEAUTI v.2.6.1 (Bouckaert et al., 2014) using the 48 phased loci. Each sample was treated as 'minimal cluster' (alleles of one accession are not allowed to be split into different species). Sequence substitution models were selected for each locus separately using the Bayesian information criterion (BIC) in MODELTEST-NG v.0.1.6 (Darriba et al., 2020). Substitution models, clock models, and gene trees were treated as unlinked for all loci. To reduce the search space, parameters of the substitution models were fixed to those found in MODELTEST-NG. Detailed STACEY settings are described in Notes S4.

Plastome (CP) and nuclear (RAD-Seq, TEG) phylogenies
To investigate the presence of nuclear-plastid discordances, we used 71 selected plastid regions to infer a maximum likelihood (ML) tree and 100 Bootstrap (BT) replicates with RAXML-NG v.0.9.0 (Kozlov et al., 2019). Models of sequence evolution were assessed for each region separately using MODELTEST-NG. All alignments were concatenated with AMAS v.0.98 (Borowiec, 2016) and different regions were treated as different partitions, each with its respective sequence evolution model. To gain additional information about haplotype evolution, we calculated neighbor-net networks running SPLITSTREE v.4.14.6 (Huson & Bryant, 2006) as described in Karbstein et al. (2021a).
For each of the nuclear datasets (RAD-Seq, TEG), trees and quartet sampling (QS) analyses (Pease et al., 2018) were calculated following Karbstein et al. (2021b) (Figs S11-S15). Since QS revealed several conflicting patterns, the assumptions of bifurcating relationships were clearly rejected, and hence trees were not used for further analyses.

Subgenome contribution of polyploids (RAD-Seq, nuclear genes)
To investigate polyploid genomes in more detail, we selected polyploid individuals with obvious reticulation signals per main Bioinformatic pipeline to resolve neopolyploid groups. Here, we used the Ranunculus auricomus species complex as a model system. We analyzed sexual species and apomictic taxa together, incorporating a priori (multi-approach) information about sexual species (Karbstein et al., 2020b,c) and ploidy levels and reproduction modes of polyploids (Karbstein et al., 2021a). Analyses are based on the optimized alignments of three datasets covering different parts of the genome: Restriction site-associated DNA sequencing (RAD-Seq), nuclear target enrichment (TEG), and plastid regions (chloroplast, CP). We used RAD-Seq datasets to perform genetic structure analyses, phylogenetic networks, and single nucleotide polymorphisms (SNP)s discovery analyses. Moreover, we computed genetic structure (species delimitation) analysis and phylogenetic networks based on phased nuclear genes. A maximum likelihood (ML) tree and distance-based networks of CP data were also included to get further details about nuclear-plastid discordances and maternal progenitors of polyploids. All results were combined (consensus) and polyploids were evaluated for subgenome contributions and formation types (Figs 7, 8; Consensus results Consensus results

Research
New Phytologist genetic cluster for further subgenome contribution testing (Tables 1, S1). We calculated the median across all coancestry values within the RADPAINTER matrix. This median reflects the general genetic relatedness among all samples, and we interpreted single values above this threshold as significant reticulation signal (subgenome contributions). The same procedure was also applied to the STACEY posterior probability. To ensure comparability among datasets, we aimed at selecting the same individuals (except for the monophyletic taxon 'R. 9 elatior' (H 2 ), see also Figs S11-S15).

Phylogenetic network analyses (RAD-Seq)
To corroborate the already gained genetic structure information, we carried out analyses with maximum pseudolikelihood, coalescent-based PHYLONETWORKS v.0.12.0 using unlinked SNPs per species (Sol ıs-Lemus et al., 2017). We applied the R function SNPS2CF.R v.1.2 (Olave & Meyer, 2020) to transform SNPbased min30 RAD-Seq alignments into quartets and quartet concordance factors (CF; proportion of locus-based trees supporting the quartet). A custom R script converted the ustr (one SNP per locus, two phases) IPYRAD output file into an adequate input format for SNPS2CF. We created 10 subsets each containing one tetraploid accession (H 1 -H 10 ) and all available accessions of diploid sexual progenitors (except R. marsicus due to no significant subgenome contribution in previous analyses). We specified 'between species only', no maximum number of SNPs, maximum number of quartets of 1000, and 100 BTs.
Based on the quartet CF matrices and quartet CF starting trees, SNaQ analyses were run with default settings to receive networks and inheritance probabilities (proportions of subgenome contributions inherited from progenitors). We initially allowed no hybridization event. Afterward, the output was used as a start network (net0) for the following analysis allowing one hybridization event (net1). Since SNaQ takes no constraints and polyploids are derived from their diploid parents, we selected the network with the polyploid as hybrid. In most cases, this was the likeliest network (except in H 4 , H 5 , and H 9 ; see the Discussion section and network results on FigShare).

Phylogenetic network analyses (nuclear genes)
Moreover, we performed network analyses based on the 48 phased nuclear genes. We also investigated H 1 -H 10 , and used gene trees as input and two different maximum pseudolikelihood, coalescent-based approaches taking alleles per species into account: SNaQ implemented in PHYLONETWORKS as for RAD-Seq data and InferNetwork_MPL implemented in PHYLONET v.3.8.2 (Than et al., 2008;Wen et al., 2018). For each polyploid tested, alignments were modified to include all appropriate diploid accessions and the respective polyploid individual. Models of sequence evolution were selected with MODELTEST-NG, and 100 BT gene trees were inferred with RAXML-NG for each of the 48 loci. Therefore, 100 gene trees per locus were used as input. For PHYLONETWORKS, we took the gene trees and a mapping-alleles-to-species file to calculate a CF table. We continued the analyses as for the RAD-Seq dataset, with the only exception that the starting tree was inferred running ASTRAL III v.5.6.3 (Zhang et al., 2018). For the PHYLONET MPL analyses, the polyploid was always specified as the putative hybrid. We performed 10 runs per search, each returning five optimal networks. After the search, the returned species networks were optimized for branch lengths and inheritance probabilities under full likelihood, using the default settings.
Origin of polyploids (RAD-Seq, nuclear genes, CP) To infer parental subgenome contribution per polyploid, we listed all previously generated results in Table S7 (RAD-Seq, nuclear genes). Parent 1 (P 1 ) is always the parent with the largest subgenome contribution (coancestry/posterior probability values, inheritance probabilities) followed by the other parental contributions (P 2 -P 4 ). We applied multiple criteria for building a consensus based on all generated results (Table S7): (1) Take the most abundant parent within a column; (2) if there are two equal abundant parents (e.g. two-times 'C' and 'F') within a column, both parental subgenome contributions were taken ('C/F'). To validate the obtained consensus and to infer genome evolution (autopolyploid vs allopolyploid), we submitted all previous results before consensus building to the full likelihood approach implemented in PHYLONET (Yu et al., 2012). The CalcProb function calculates the likelihood of gene trees under a given species network and thus the total likelihood of the same network.
To include genetic structure results, networks were manually constructed using the tree backbone topology in Karbstein et al. (2020c) and the first two putative progenitors identified by these methods. The autopolyploid scenario was tested utilizing the ASTRAL III trees already used as starting tree for the PHYLONETWORKS analyses. We rooted all networks with R. cassubicifolius to make scenarios more comparable. To compare autopolyploid scenarios with allopolyploid ones, we scored results applying the Akaike information criterion (AIC), taking into account that the number of parameters in a tree/network is equal to the number of branch lengths plus (for the networks) the parental contributions (i.e. k = 8 and k = 13 for the tree and the networks, respectively). We determined the final subgenome contribution(s) by correcting the consensus results by the previously generated full likelihood approach results and inferred haplotypes (Table S7).

Origin of SNPs (RAD-Seq)
To investigate post-origin evolution of allopolyploids in more detail, we carried out SNIPLOID (v.March 17, 2016;Peralta et al., 2013;Wagner et al., 2020; scripts on Github). SNIPLOID compares an allotetraploid and a diploid putative parental species (DIPLOID2) with a diploid parental reference (DIPLOID1). The resulting SNPs were categorized: cat1&2 result from interspecific, post-origin hybridization (SNPs either DIPLOID1/2); cat3/4 represent lineage-specific, post-origin SNPs (SNP does not match SNPs of DIPLOID1/2); cat5 represents the homeoor hybrid-SNPs from both parents from the allopolyploidization We created references of diploids by merging all accessions of a single progenitor species into a single FASTQ file to include dominant parental SNPs and to reduce bias by selecting only one individual, and conducted within-sample clustering in IPYRAD  (SNPs)). The darker the square, the higher the genetic similarity between a pair of individuals (positive value range; legend on the right). (b) Similarity matrix of STACEY species delimitation analyses of 111 R. auricomus individuals (48 phased nuclear genes). Posterior probabilities for belonging to the same cluster (1.0 = maximum, 0.0 = minimum) are shown for pairs of individuals (legend on the right). Samples were grouped according to tree structures, and main clusters were determined visually and according to the tree support (figures on FigShare). We indicated supported genetic clusters with solid lines (I-V), shared similarity among (sub)clusters with dotted lines, and sexual species with colored, broadly dashed squares (subgenomes C, F, M, N, and E; coloring according to Fig. 2). Subgenome M (tetraploid Ranunculus marsicus) showed no significant genetic similarity/posterior probability signals to polyploid apomicts and was therefore illustrated in brackets. Small black squares ('H n ') indicate tested tetraploids (Table 1,  Supporting Information Table S1). Using lines and colored circles/ellipses, we highlighted the potential parental subgenome contributions for each polyploid (P 1 -P 4 ; see Fig. S1; Table S7 for more details). (2022)

Research
New Phytologist (settings identical to Karbstein et al., 2020c). Obtained consensus files containing ambiguity codes (ignored) were used as DIPLOID1 (reference) and merged FASTQ files as DIPLOID2. We specified a minimum read depth per position of 20 to filter sequencing errors. We excluded the category 'others' (heterozygous positions of DIPLOID2) from final results. Moreover, we always observed dominance of interspecific SNPs of cat2 compared with cat1 SNPs, independent of parental combination. This was probably due to the majority rule base call references neglecting natural genetic variation. Therefore, we generally summarized both categories to 'cat1&2' to avoid biases within interspecific SNP category.

Results
Genetic structure analyses RADPAINTER+FINERADSTRUCTURE structure analysis based on RAD-Seq-SNPs revealed three supported main clusters (Fig. 4a). Sexual species clustered with several polyploid apomicts: (I) R. cassubicifolius (subgenome C) with tetraploid to hexaploid taxa, (II) R. flabellifolius, R. marsicus, and R. notabilis (subgenomes F, M, and N) with triploid to hexaploid taxa, and (III) R. envalirensis (subgenome E) with tetraploid taxa. Commonly, polyploids showed high coancestry values, i.e. orange to red colors, with different clusters indicating reticulation events (see particularly polyploids H 1À H 10 ; Fig. 4a). Highest values were found with sexual subgenomes occurring in the same cluster (Table S7). Polyploids of cluster I showed highest similarity values with subgenome C and lowest ones with subgenomes N and F. In contrast, polyploids of cluster II are genetically more heterogeneous and divided into subclusters IIa and IIb: polyploids of IIa shared high similarity values with subgenome N and low coancestry values with subgenomes F and C whereas polyploids of IIb showed low values with F and E. In cluster III, polyploids only exhibited high similarity to subgenome E. STACEY analysis based on phased nuclear genes revealed similar results ( Fig. 4b; Table S7). Each sexual species is surrounded by polyploid apomictic taxa, which showed several reticulations and highest posterior probabilities with intra-cluster sexual progenitor subgenomes. The former RAD-Seq cluster II is divided into three distinct clusters each containing a single sexual species (II-IV), and many polyploids of the former RAD-Seq cluster II are incorporated into cluster V. In addition, polyploids of cluster I also shared significant posterior probabilities to subgenome E, and polyploids of cluster V to subgenomes F, C, and N. The polyploid sexual subgenome M shared no significant coancestry/posterior probability values with polyploid apomicts. Whereas RAD-Seq clusters I/nuclear gene clusters I + II were predominantly characterized by undivided basal leaf taxa, the remaining clusters exhibited only dissected ones.
SNMF structure analysis based on unlinked RAD-Seq SNPs also unraveled three to four main clusters (Fig. 5a,b). Although polyploid apomictic taxa were characterized by a dominant genetic partition, they also showed 1-3 minor ones (Fig. S8a,b). The likeliest number of clusters, K = 3, showed a west-east distribution of clusters across Europe (Fig. 5a,b). The clusters themselves are north-south distributed. Ranunculus envalirensis and related polyploids (E, green partition) mainly inhabit regions in western Europe. Ranunculus flabellifolius, R. marsicus, and R. notabilis and related polyploids (F, M, and N, orange partition) predominantly occupy central Europe. Ranunculus cassubicifolius and related polyploids (C, blue partition) are mainly distributed in central-eastern Europe. Results of K = 4 showed the emergence of a central European cluster without a sexual species (gray partition) out of the former green one (Fig. 5a,b). The SNMF results are comparable to previous ones (except that the gray partition is predominantly found in RAD-Seq cluster III/nuclear gene cluster V).

Plastome phylogeny compared with nuclear data
The ML tree based on plastid regions (CP) revealed four wellsupported main clades (i.e. haplotype groups; BT = 100; Figs 6a, www.newphytologist.com S9). In general, within-clade relationships were mainly low or not supported (FBP < 70). Clade I consists of haplotypes from R. cassubicifolius (C), R. flabellifolius (F), and various related polyploids. Accessions of R. cassubicifolius and R. flabellifolius were completely intermingled, contrary to nuclear datasets (Figs 4, 5). Clade II contained only haplotypes from polyploid taxa. The remnant two haplotype clades III and IV consisted of R. envalirensis (E) and few polyploids, and R. notabilis (N), R. marsicus (M), and various polyploids, respectively. Interestingly, the diploid R. notabilis and the tetraploid sexual R. marsicus belong to the same haplotype group contrary to nuclear gene data (Fig. 4b).
The splits graph of the neighbor-net analysis also exhibited four differentiated clusters with low genetic distance (Figs 6b, S10).

Phylogenetic networks combined with genetic structure and plastid data
Phylogenetic networks based on RAD-Seq-SNPs and alleles of phased nuclear genes mainly showed two different subgenome contributions for polyploids (Figs 7, 8; Table 1). These polyploids were usually characterized by a dominant (P 1 = 51-99% inheritance probability, mean 74%) and a minor subgenome DNA sequence contribution (P 2 = 1-49% inheritance probability, mean 26%). Concerning PHYLONET likelihood + AIC calculations, reticulate evolution and thus allopolyploid origin was confirmed in most cases. Within cluster I (Fig. 4a,b), polyploids H 1 -H 4 possessed the dominant subgenome C whereas minor ones came from F followed by E and N. Their blue haplotype C + F matched the dominant subgenome C. Final results indicated that 'R. 9 platycolpoides' (H 1 ) is composed of subgenomes C and N, 'R. 9 elatior' (H 2 ) of C and F, 'R. 9 pseudocassubicus' (H 3 ) of C and E, and 'R. 9 hungaricus' (H 4 ) of C and F. Moreover, we inferred varying subgenome contributions for the polyploid 'R. 9 pilisiensis' (H 7 ) of cluster II (RAD-Seq)/III (nuclear genes), but consensus supported by CP results revealed subgenome F as the dominant one. The likeliest scenario is an autopolyploid origin. The polyploid 'R. 9 indecorus' (H 8 ), positioned in cluster II (RAD-Seq)/IV (nuclear genes), showed three subgenomes, whereas N was the slightly dominant one, and C and E the minor ones. 'R. 9 fissifolius' (H 5 ) was characterized by the orange haplotype N and is also composed of three different subgenomes (E, F, and N).

Research
New Phytologist Fig. 7 (a-h, left) Reconstructed phylogenetic networks based on restriction site-associated DNA sequencing (RAD-Seq), nuclear target enrichment genes (TEGs), and plastid data (CP). Final networks of allopolyploids are based on genetic structure and phylogenetic network results (consensus results) corrected by the full likelihood approach + Akaike information criterion (AIC) calculations in PHYLONET and CP data. P 1 defines the largest subgenome contribution, followed by P 2 and P 3 . The network topology follows the published rooted phylogeny of Ranunculus auricomus sexuals (without tetraploid Ranunculus marsicus; Karbstein et al., 2020c). Curves indicate subgenome contributions (P 1 -P 3 ). (a-h, right) Bar charts based on SNIPLOID results show single nucleotide polymorphism (SNP) origins in percentages (cat1&2 = SNPs identical to DIPLOID2 or DIPLOID1/reference, cat3/4 = derived SNPs, cat5 = homeo-SNPs). Concerning H 5 and H 8 , we calculated two SNIPLOID analyses because three parents have contributed to their origin. Coloring of sexual progenitor subgenomes is according to Fig. 2 The polyploids 'R. 9 glechomoides' (H 6 ), 'R. 9 subglechomoides' (H 9 ), and 'R. 9 leptomeris' (H 10 ) exhibited subgenome E as the dominant contribution. In most genetic structure analyses, these polyploids were also situated close to E. CP analyses showed the green haplotype E for H 6 and H 10 , but not for H 9 . Final results indicated E and F subgenome contributions for H 6 and H 10 . 'R. 9 subglechomoides' (H 9 ) exhibited the gray, unknown haplotype U and PHYLONET AIC + likelihood calculations detected similarly-likely scenarios of reticulate (E and F) or tree-like evolution (E) ( Table 1).

Discussion
Evolution of neopolyploids is an emerging and bioinformatically challenging field, with important consequences for understanding plant speciation and subsequent macroevolution (Soltis et al., 2015;Landis et al., 2018;Rothfels, 2021). The applied workflow disentangled polyploid relationships, parental contributions, geographical patterns, and genomic compositions in the less than 1.0 Ma R. auricomus complex. Our results confirmed that allopolyploidy is the dominant polyploid formation type (Sochor et al., 2015;Dauphin et al., 2018;H€ orandl, 2018;Rothfels, 2021), but also demonstrated substantial post-origin evolution, which is unique compared to other studies. The R. auricomus model system is thus the first well-studied large apomictic polyploid species complex using RRS genomic data. Several genomic datasets (RAD-Seq, nuclear genes, plastid data) along with a priori (multi-approach) information were combined with up-to-date bioinformatic tools into a comprehensive workflowstarting with diploid progenitors and ending with the origin of polyploid derivatives, to receive a complete picture of the underlying reticulate evolutionary processes.  (Figs 2, 7). Differently dashed lines to the left and right specify parental subgenome contributions of allopolyploids. Subgenome dominance is shown by the relative position of the polyploid to the progenitors, for example, 'R. 9 elatior' is closer to subgenome C indicating C subgenome dominance. We also illustrate characteristic basal leaf types of taxa during anthesis (variation not covered, two types illustrated for R. flabellifolius due to the occurrence of undivided and divided types, frequently observed in collections). The hybrid scheme is based on results of Table 1. Currently used taxon names of allopolyploids are given, but we specified the polyploids as nothotaxa because of known hybrid origin (H€ orandl, 2022

Integration of datasets and analyses
Our study demonstrates that the combination of different RRS genomic datasets is necessary to resolve young reticulate relationships. RAD-Seq provided the highest number of genetic information, which is important to tackle among and within-species relationships of groups characterized by low genetic divergence, reticulations, and ILS. RADPAINTER tolerates moderate to high amounts of missing data and compares all SNPs independent of ploidy levels across loci and individuals (Malinsky et al., 2018;Wagner et al., 2021), which is important for allopolyploid analysis. Moreover, the employed SNMF algorithm easily takes mixed-ploidy datasets; it is not only faster, but also less sensitive to apomictic population structure and missing data than the popular STRUCTURE software (Frichot et al., 2014;Stift et al., 2019;Frichot & Franc ßois, 2020). RADPAINTER and SNMF impressively showed hybridity of polyploids. However, drawbacks are the extraction of only the average evolutionary signal (RADPAINTER) or incorporation of less genetic information (SNMF). Therefore, analyses based on phased nuclear genes were conducted to allow more accurate inferences of reticulate polyploid relationships (Dauphin et al., 2018;Eriksson et al., 2018;Lautenschlager et al., 2020;Rothfels, 2021). Coalescent-based STACEY species delimitation compared to RADPAINTER more clearly delimited the genetic structure of the polyploid complex (Fig. 4a,b). STACEY takes allelic information into account, and was thus also able to find shared high posterior probability between allopolyploids and their sexual progenitor species. Compared to RADPAINTER and STACEY, limitations of phylogenetic networks (PHYLONETWORKS, PHYLONET) are the restriction to only two progenitors in hybrid modeling. Nevertheless, these analyses with subsequent allopolyploid vs autopolyploid origin testing based on phased nuclear genes (see also Tiley et al., 2021) were the most important parts to unravel final subgenome contributions of polyploids. Recently, methods have been developed able to assign alleles to subgenomes and infer advanced polyploid phylogenetic networks (Jones, 2017a;Freyman et al., 2020;Lautenschlager et al., 2020;Slenker et al., 2021). These methods need subgenomes that are genetically well-differentiated, knowledge about diploid parental contributions, and/or small-to medium-sized datasets (< 100 loci), which is still unfeasible for analyzing young nonmodel polyploid groups with large-scale genomic information.
RAD-Seq and Hyb-Seq datasets represent the genome and can be much more easily gained and analyzed at lower costs for a large number of samples than entire transcriptomes or genomes (McKain et al., 2018;Johnson et al., 2019). Comprehensive RAD-Seq and Hyb-Seq results were supported by well-resolved haplotype groups. CP data unraveled nuclear-plastid discordances and pinpointed maternal progenitors, which in most cases corresponded to the dominant progenitor subgenome. Performing several network methods across different datasets informed by plastid information (i.e. consensus making) is therefore the best way to get a reliable picture of young polyploid evolution. Moreover, disentangling genetic information (SNPs) of polyploids for post-origin processes using SNIPLOID provides crucial information about divergence and stability of lineages. A limitation of SNIPLOID is that the parental species must be defined for the input, only single samples can be analyzed, and that the algorithm is so far limited to tetraploids. However, results for the tested polyploids here were strikingly similar, suggesting that they reflect well the post-origin processes of this neopolyploid complex.
Genetic structure, nuclear-plastid discordance, and origin of polyploids Despite predominant apomictic reproduction, young allopolyploid lineages interacted many times with their sexual progenitor species and each other, resulting in large, network-like relationships. Genetic structure analyses based on RAD-Seq and nuclear genes surprisingly revealed that all 83 included R. auricomus taxa were grouped in only 3-5 main clusters (Figs 4,5). No clear cluster-specific morphological trend is recognizable, except that taxa with undivided basal leaves were predominantly found in cluster I. Each main cluster contained at least one sexual species surrounded by polyploid apomictic taxa, indicating that polyploids received a main genetic contribution from their clusterspecific diploid progenitors. This pattern is rarely found in other polyploid complexes, where usually groups with several diploid and (allo)polyploid taxa, but also groups with polyploids only were detected (Kirschner et al., 2015;Carter et al., 2019;Wagner et al., 2020). The observed main clusters are west-east distributed in Europe (Fig. 5a), each ranging from southern to northern Europe, respectively. This pattern probably reflects the allopatric distributions of sexual progenitors in combination with migration of populations due to past climatic changes (Abbott et al., 2013;Tomasello et al., 2020).
Moreover, the reticulate evolution of the R. auricomus complex is supported by several nuclear-plastid discordances. For example, the central European subcluster (Fig. 5b) widely corresponds to the gray 'polyploid-only' haplotype of the ML plastid tree and splitsgraph (Fig. 6a,b). Therefore, these polyploids possess a plastid type from an unknown diploid, suggesting an already extinct or unsampled maternal sexual progenitor. The fact that R. auricomus populations have been extensively studied for more than 80 yr (Koch, 1939;Borchers-Kolb, 1985;H€ orandl & Gutermann, 1998) and ploidy levels have been well documented within the last decades in central Europe (Jalas & Suominen, 1989;Dunkel et al., 2018;Paule et al., 2018;Karbstein, 2021;Karbstein et al., 2021a) makes it unlikely that an extant diploid was simply overlooked. Extinction of sexuals is a commonly considered or observed phenomenon in young polyploid complexes shaped by past climatic fluctuations (Sochor et al., 2015;Rothfels, 2021) and is supported here by missing speciation events between 0.6 and 0.3 Ma (Tomasello et al., 2020).
The tested allopolyploids are composed of various subgenome combinations. The presence of multiple different copies in (allo) polyploids probably provides larger physiological and phenotypic flexibility to respond to different environmental conditions (H€ orandl, 2006;Blaine Marchant et al., 2016;van de Peer et al., 2017;Karbstein et al., 2021a). This partly explains the larger geographic distribution of polyploid apomicts compared to their diploid progenitors (Bierzychudek, 1985 ). This lineage might also represent a segmental allopolyploid, as autopolyploidy and allopolyploidy are connected by transitions (Comai, 2005;Spoelhof et al., 2017). All diploid sexual progenitor species were involved in allopolyploid formation (Fig. 8). Polyploids were probably formed multiple times out of different progenitor combinations. Extant sexuals have restricted ranges and are separated by thousands of kilometers across Europe (Fig. 2). Nevertheless, all main genetic clusters are present in central Europe (Fig. 5a,b). Sexual progenitors might have repeatedly hybridized in this region during past interglacial times, resulting in multiple allopolyploidization 'waves' with varying subgenome contributions. The phenotypic diversity of polyploid R. auricomus biotypes with more than 840 described morphospecies is therefore probably formed from only four extant and at least one likely extinct diploid sexual progenitor species. Other studies already demonstrated that few diploid progenitors were capable of producing a magnitude of allopolyploids, for example in Botrychium (Dauphin et al., 2018), Rubus (Sochor et al., 2015;Carter et al., 2019), or Taraxacum (Kirschner et al., 2015).

Post-origin genome evolution
We observed in the proportions of SNPs/alleles a dominance of one subgenome over the other which has the consequence that allopolyploids are grouped close to one of their diploid progenitors (Figs 3-7; Table 1). Per main genetic cluster, allopolyploids were usually composed of a dominant intra-cluster subgenome and 1(À2) minor, varying inter-cluster subgenome(s), although trigenomic polyploids showed rather similar contributions. Plastid data supported the dominant subgenome contribution and, in some cases, unraveled an additional or extinct progenitor (Table 1). Subgenome expression and sequence dominance is a common feature of allopolyploid post-origin evolution (Blischak et al., 2018;Alger & Edger, 2020;Mason & Wendel, 2020). In our neopolyploids, we detected considerable proportions of interspecific, post-origin SNPs (62-93%), and only a minority of SNPs from hybridogenic origins (3-36%; Fig. 7). Subgenome dominance is probably caused by HEs, segregation after hybridization, and gene flow due to facultative sexuality of apomicts after polyploidization. Homoeologous exchange is able to transfer chromosome segments of one parent to the homoeolog of the other during crossovers at meiosis, resulting in genomically homozygous regions (Mason & Wendel, 2020). This mechanism could have slightly increased the proportions of cat1&2 interspecific SNPs in R. auricomus polyploids directly after their origin. In addition, since apomixis establishes only stepwise, the first diploid hybrid generations still exhibit predominant sexual reproduction (Barke et al., , 2020, allowing for extensive Mendelian segregation and a substantial increase of cat1&2 interspecific SNPs. However, after shift to obligate apomixis, HE and Mendelian segregation are no longer effective and hence explain probably not all of the observed substantial subgenome dominance. Facultative sexuality and maintenance of functional pollen probably allowed backcrossing between the newly formed polyploid hybrids and their parental species, and among polyploids. Varying degrees of observed facultative sexuality under natural conditions (mean 2%, range 0-34%; Karbstein et al., 2021a) potentially enabled these scenarios. In apomictic Rubus or Pilosella polyploids, which are also characterized by highly variable, facultative sexuality, post-origin scenarios of interlineage gene flow and backcrossing to sexual parents have also been considered (Sochor et al., 2015;H€ orandl, 2018;Nardi et al., 2018;Carter et al., 2019).
Within main clusters, the majority of described polyploid morphospecies are intermingled and nonmonophyletic (except for H 2 , H 6 , H 7 ; Figs S11-S15). Support metrics of these trees showed highly conflicting signals at middle and terminal branches, suggesting that cladogenetic (bifurcating) speciation from a polyploid ancestor is unlikely due to the presence of many reticulation processes. According to Grant's (1981) definition, the R. auricomus complex is in an early mature stage of evolution, with extant diploid progenitors and a broad array of apomictic hybrid biotypes, but still without stable lineages. This hypothesis is confirmed by the low proportion of lineage-specific SNPs (2-5%; Fig. 7), which is very low compared to similarly aged or older allopolyploids (Gordon et al., 2020;Wagner et al., 2020). Genetic information on lineage characteristics and stability is crucial for the classification and delimitation of species (Grant, 1981;H€ orandl, 2018), and gained knowledge of this study can be used in future comprehensive taxonomic revisions of the large R. auricomus species complex.

Supporting Information
Additional Supporting Information may be found online in the Supporting Information section at the end of the article.               Notes S1 HYBPHYLOMAKER settings (target enrichment analysis).
Notes S3 Further sample filtering for target enrichment data analysis.
Notes S4 Detailed STACEY settings (target enrichment data analysis).

Research
New Phytologist