Genomes shed light on the evolution of Begonia, a mega‐diverse genus

Summary Clarifying the evolutionary processes underlying species diversification and adaptation is a key focus of evolutionary biology. Begonia (Begoniaceae) is one of the most species‐rich angiosperm genera with c. 2000 species, most of which are shade‐adapted. Here, we present chromosome‐scale genome assemblies for four species of Begonia (B. loranthoides, B. masoniana, B. darthvaderiana and B. peltatifolia), and whole genome shotgun data for an additional 74 Begonia representatives to investigate lineage evolution and shade adaptation of the genus. The four genome assemblies range in size from 331.75 Mb (B. peltatifolia) to 799.83 Mb (B. masoniana), and harbor 22 059–23 444 protein‐coding genes. Synteny analysis revealed a lineage‐specific whole‐genome duplication (WGD) that occurred just before the diversification of Begonia. Functional enrichment of gene families retained after WGD highlights the significance of modified carbohydrate metabolism and photosynthesis possibly linked to shade adaptation in the genus, which is further supported by expansions of gene families involved in light perception and harvesting. Phylogenomic reconstructions and genomics studies indicate that genomic introgression has also played a role in the evolution of Begonia. Overall, this study provides valuable genomic resources for Begonia and suggests potential drivers underlying the diversity and adaptive evolution of this mega‐diverse clade.


Introduction
The mechanisms underlying the diversification of large clades of closely related species (often designated taxonomically as genera) remain one of the biggest mysteries in plant biology (Frodin, 2004). Although speciose genera have received widespread attention from evolutionary biologists, typically few genomic resources are available for these closely related, species-rich clades. Representative completely assembled nuclear genomes of only three of the 10 largest angiosperm genera (Frodin, 2004) have been published, namely Solanum (A. Song et al., 2019), Dendrobium (Yan et al., 2015) and Begonia (Griesmann et al., 2018). However, these genomic studies either focused on the specific characteristics of the reference species (A. Yan et al., 2015;Song et al., 2019) or were part of a large comparative genomic project (Griesmann et al., 2018); none of them used genomic data to explore the evolutionary patterns in these mega-diverse clades.
Begonia L. (Begoniaceae, Cucurbitales) is well known for a huge diversity of leaf shapes, patterns and textures (Fig. 1). The genus is pantropical and comprises more than 2000 currently accepted species (Hughes et al., 2015) of herbs and occasionally subshrubs; it thus represents an excellent evolutionary study system for processes that generate numerous closely related species. Species of Begonia are mostly narrow endemics occupying specific microhabits. Begonia has high species diversity in the New World and Asia and relatively low species numbers in Africa, the continent of its putative origin (Neale et al., 2006). This high species diversity forms a stark contrast with its sister genus, the monotypic Hillebrandia Oliv. comprising the rare Haiwaiian endemic H. sandwicensis. Previous studies have suggested that the overall patterns of speciation in Begonia may be driven by local speciation in fragmented habitats (Hughes & Hollingsworth, 2008), hybridization and polyploidization (Dewitte et al., 2011).
Most begonias are shade-adapted and become sun-damaged when exposed to full sun. However, exceptions and intermediates regarding habitat preferences also exist. Begonia species exhibit a continuum of light adaptation ability ranging from deep shade to full sun, affording us the opportunity to unravel mechanisms of adaptation to cope with variable levels of light. Understanding how shade-adapted species optimize photosynthesis and physical defense, while suppressing the shade-avoidance syndrome (SAS; strong elongation growth away from shaded microconditions and accelerated flowering), will be valuable for crop improvement (Gommers et al., 2013). However, the genetic footprints and molecular basis of shade adaptation on the genome level remain elusive.
We de novo sequenced and assembled chromosome-scale genomes of four Begonia species, and generated whole genome shotgun (WGS) data for an additional 74 Begonia species, representing 37 of the 70 recognized sections of Begonia (Moonlight et al., 2018) across all three major continental clades (Supporting Information Fig. S1). We compared the four Begonia genomes and reconstructed the paleogenome of Begonia, explored the evolutionary impact of a Begonia-specific whole genome duplication (WGD) event (Brennan et al., 2012), analyzed the content of transposable elements (TEs), and analyzed their potential impact on the genomic landscape and potentially on species adaptation. We also examined cytonuclear incongruences detected in our study, and investigated the molecular basis of shade adaptation in Begonia.
The interpretation of Begonia genomic diversity in an evolutionary context will not only contribute to a better understanding of the origin, evolution and shade adaptation of this megadiverse clade, but also provide valuable reference genomes for molecular breeding of these highly valued ornamental plants.

Materials and Methods
Sample collections and DNA/RNA extraction All Begonia samples were collected from the glasshouse in Fairy Lake Botanical Garden (Shenzhen, China) where plants were cultivated at 26°C : 18°C (day : night) with a relative humidity of 65-80%. Specimens have been deposited in the Herbarium of Fairy Lake Botanical Garden. For WGS, genomic DNA from young leaves of each individual was extracted using the cetyl trimethylammonium bromide (CTAB) method (Porebski et al., 1997). For single tube long fragment read (stLFR) sequencing, high-molecular-weight genomic DNA was isolated using the IrysPrep ® Plant Tissue DNA Isolation kit (RE-014-05; Bionano Genomics, San Diego, CA, USA) following the manufacturer's instructions. DNA quality and quantity were evaluated using pulsed field gel electrophoresis and Qubit ® 3.0 Fluorometer (Thermo Fisher Scientific, Waltham, MA, USA). For transcriptome sequencing, total RNA from different tissues (root, stem/ rhizome, leaf, peduncle and flower) from four Begonias were isolated using TRIzol ® reagent (Invitrogen, Carlsbad, CA, USA), respectively. All the RNA samples were quality controlled using a NanoDrop TM One UV-Vis spectrophotometer (Thermo Fisher Scientific) and a Qubit ® 3.0 Fluorometer.

Library preparation and sequencing
All DNA libraries for WGS were constructed using the MGIEasy FS DNA Library Prep Set (1000006988) with 300-500 bp fragment sizes, and sequenced on an Illumina Hiseq2000 platform to generate paired-end (PE) reads of 150 bp. Transcriptome libraries were constructed with a MGIEasy RNA Library Prep Kit (1000006384) with inserts of 200-400 bp and sequenced with PE reads of 100 bp. More than 5 Gb of sequence data were generated for each library. The stLFR library was prepared with the MGIEasy stLFR Library Prep Kit (1000005622) (Wang et al., 2019) and sequenced with PE reads of 100 + 42 bp, generating > 150 Gb of raw sequence data for each library. 10 9 Genomics Chromium TM Genome libraries with insert sizes of 350-500 bp were prepared with Chromium Genome Reagent Kit (v2 Chemistry, Pleasanton, CA, USA) following the manufacturer's protocols with modified PCR primers to introduce sequencing primers suitable for the BGISEQ-500 platform and then sequenced with PE reads of 150 bp. SMART library preparation and sequencing details are given in Methods S1.

Genome assembly
For assembly of the sequences from 10 9 Genomics Chromium and stLFR libraries, the clean reads were obtained using an inhouse script and de novo assembled using SPERNOVA (v.2.1.1) (Weisenfeld et al., 2017) with default parameters. A minimum fasta record size of 100 bp was specified at the 'mkoutput' stage for outputting the assembly in the 'pseudohap' style. De novo assemblies of the PacBio long reads for B. masoniana and B. darthvaderiana were conducted by CANU (v.0.1) (Koren et al., 2017). Subsequently, two rounds of iterative corrections were performed with PacBio long reads using the software RACON (v.1.2.1) (Vaser et al., 2017), and two rounds of corrections with PILON (v.1.22) (Walker et al., 2014) using 10 9 Genomics reads (see details in Methods S1).

Variant analysis
A total of 468 Gb 150 bp PE Illumina reads were generated, yielding an average coverage of 79 per accession. Raw reads were quality controlled using TRIMMOMATIC (A. M.  to remove adaptors and low-quality bases. The clean reads were aligned against the reference genome of B. masoniana using BWA-MEM (v.0.7.10) (Li, 2013) with default parameters. Variant detection was performed using the genome analysis toolkit (GATK; v.3.5-0-g36282e4) (Mckenna et al., 2010) following the best practices workflow for variant discovery. The resulting BAM files were locally realigned using the INDELREALIGNER to remove erroneous mismatches around small-scale insertions and deletions. Variants were called in each accession separately using the HAPLOTYPECALLER and individual gVCF files were merged using GENOTYPEGVCFs. This two-step approach includes quality recalibration and regenotyping in the merged vcf file, ensuring variant accuracy. Single nucleotide polymorphisms (SNPs) were filtered based on the following criteria: SNPs in repeat regions; SNPs with read depth > 1000 or < 5; SNPs with missing rate > 40%; SNPs with < 5 bp distance with nearby variant sites; and nonbiallelic SNPs were removed. Phylogenetic reconstruction, admixture analysis, principal component analysis (PCA), diversity statistics and ABBA-BABA analysis based on the SNP data are detailed in Methods S1.

Phylogenetic analysis
For nuclear phylogenetic analysis, SNPs within 4000 Begonia single-copy nuclear genes identified using the software ORTHOFINDER (Emms & Kelly, 2015) with four newly sequenced Begonia genomes with default settings were extracted from vcf files and filtered based on sequence length (> 100 bp) and taxon occurrences (> 50%), aligned with MAFFT (Katoh et al., 2005), and trimmed with GBLOCKS (Talavera & Castresana, 2007). A supermatrix method was used to infer the nuclear phylogeny using RAXML (v.7.2.3) (Stamatakis, 2006). The maximum-likelihood tree inferred from concatenated nuclear SNPs was used as a starting tree to estimate species divergence time using MCMC TREE as implemented in PAML (Yang, 2007). One calibration point of the Begonia crown group (24 million years ago (Ma) AE 3.57 million years with a normal distribution) was defined following Moonlight et al., 2018). For plastid phylogenetic analysis, we newly generated 78 Begonia plastid genomes with NOVOPLASTY (Dierckxsens et al., 2017) using the seed sequence of rbcL. These plastid genomes were annotated and the conserved 83 plastid protein-coding genes were extracted for phylogenetic inference in GENEIOUS 10.0.2 (Biomatters, Auckland, New Zealand). The concatenated nucleotide dataset was evaluated with PARTITIONFINDER (Lanfear et al., 2012) for the optimal data partition scheme and the associated nucleotide substitution models, with an initial partitioning strategy by both locus and codon positions, resulting in 13 partitions. The concatenated dataset was analyzed using RAXML (v.7.2.3) (Stamatakis, 2006) with 500 bootstrap replicates.

Genome synteny
The syntenic blocks between two species were defined by MCSCAN (Tang et al., 2008) based on core-orthologous gene sets identified by BLASTP (E-value ≤ 1e-5; number of gene pairs required to call synteny ≥ 5). The resulting dot plots were inspected to confirm the paleoploidy level of Begonia in relation to the other genomes by counting the syntenic depth in each genomic region. The synonymous Ks value for homologous gene pairs was calculated using the software PAML (Yang, 2007) and a custom perl script (https://ftp.cngb.org/pub/CNSA/data3/ CNP0001056/CNS0227982/CNA0013976/), respectively (see details in Methods S1).

Chlorophyll fluorescence analysis
For Chl fluorescence measurement, the plants were dark-adapted for 30 min before the measurements with the MAXI version of the Imaging-PAM M-Series Chl fluorescence system (Heinz-Walz Instruments, Effeltrich, Germany), as described by Jin et al. (2018). For measurements of the light-response curves of photosystem II (PSII) quantum yield (Ф PSII ), plant leaves were illuminated at the following light intensities: 0, 1, 21,56,111,186,281,336,396,461,531,611,701 and 801 lmol pho-tonsÁm À2 s À1 . The PSI electron transport rate (ETRI) was measured using light gradients of 0,5,13,31,89,167,209,325,496,754 lmol photons m À2 s À1 .
Identification and phylogenetic analysis of light-harvesting Chl a/b-binding proteins superfamily For the identification of LHCs, all LHCs previously described from Arabidopsis and Oryza sativa (Umate, 2010) were retrieved from the database of TAIR (www.arabidopsis.org) and NCBI (www.ncbi.nlm.nih.gov/protein/), respectively. Representative members of the subfamilies of Arabidopsis were used as queries to perform BLASTP searches against the protein database of each species with an E-value cutoff of 1e-5. Candidate sequences identified as LHC orthologs were then aligned using MAFFT (Katoh et al., 2005) to remove those that did not contain the intact domain (PF00504). For phylogenetic analysis, sequences of LHC orthologs of four Begonias, Crocus sativus, as well as Arabidopsis thaliana and O. sativa were aligned using MAFFT (Katoh et al., 2005), followed by phylogenetic reconstruction with PHYML (v.3.1) (Guindon & Gascuel, 2003).

Genome sequencing and genome characteristics
Seventy-eight Begonia species were sequenced to acquire genome skim data for comparative genomic studies ( Fig. 1 (Table S5).
To evaluate the quality of the assemblies, RNA-sequencing reads from root, stem/rhizome, flower, peduncle and leaf tissues were mapped to their cognate assemblies (Table S6). About 90.92-98.83% of the reads were aligned to their corresponding genomes (Table S7). The completeness of the assemblies in terms of gene content was assessed with Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis. Of the core 1614 conserved plant genes evaluated, 97.00, 91.00, 92.20 and 96.80% were complete in the assemblies for B. loranthoides, B. masoniana, B. darthvaderiana and B. peltatifolia, respectively; c. 0.80-2.50% of the genes were fragmented (Table S5). Collectively, these results demonstrated that our four genome assemblies were of high quality in terms of contiguity, base accuracy and genome completeness.

Repeat annotation and gene prediction
Repetitive elements were estimated to represent 66.52%, 68.40%, 70.33% and 51.47% of the genome assemblies in B. loranthoides, B. masoniana, B. darthvaderiana and B. peltatifolia, respectively (Table S5). Most of these repeats were TEs that were further subclassified into nine groups (Table S8). Long-terminal repeat retrotransposons (LTR-RTs) represent 42.80-65.60% of the genome assembiles, with Gypsy elements being the most abundant transposon superfamily in all four Begonia species (30.39-48.60%), followed by the Copia superfamily (7.32-18.36%) ( Table S8). The pattern of LTR distribution patterns varied across the genomes for different elements (Fig. 2a). The density of Gypsy scaled negatively with that of the genes whereas Copia was distributed more evenly across the genome and showed no obvious correlation with gene elements (Figs 2a, S5). This is expected since it is known that Gypsy elements accumulate predominantly in heterochromatin and centromeres, whereas Copia elements are normally scattered across the genome (Neumann et al., 2011).
De novo and homology-based approaches were combined to predict protein-coding genes. In total, 22 059, 22 861, 23 444 and 23 010 complete genes were predicted for B. loranthoides, B. masoniana, B. darthvaderiana and B. peltatifolia, respectively, with the highest gene density being 69 genes per Mb in B. peltatifolia, and 28-31 genes per Mb in the other three species (Table S5; Fig. S6), which correlated with the relatively small genome size of B. peltatifolia among the four analyzed Begonia species. The numbers of protein-coding genes are relatively consistent within Cucurbitales (18 292-32 203), but except for B. peltatifolia the gene densities in Begonia (28-31 genes per Mb) are near two-fold lower than those in Cucurbitaceae (64-117 genes per Mb), probably due to higher transposon content (Table S9).

Whole genome duplication and gene evolution
The fraction of synonymous substitutions per synonymous site (K S ) distributions of paralogs clearly showed two peaks ( Fig. 2e), one around 1.5 representing the c hexaploidization event shared by the core eudicots (Jaillon et al., 2007;Chanderbali et al., 2017), and the other around 0.5 indicating that a lineage-specific WGD event occurred in Begonia. By performing a comparative genomic analysis of Begonia with Vitis vinifera, we identified a 2 : 1 syntenic depth ratio (Fig. 2b), which confirms the WGD previously reported in Brennan et al. (2012). We speculate that the WGD event occurred 35 AE 8 Ma ( Fig. 2c; Methods S1), and hence before the split of Begonia (median, c. 25 Ma) and its monotypic sister Hillebrandia, the only other genus of Begoniaceae (Moonlight et al., 2018). This is supported by the fact that Hillebrandia has also been found to possess the WGD, probably indeed shared with Begonia (Mart ınez, 2017). Following the WGD event, 2850 gene families were retained in the common ancestor of the four species of Begonia we sequenced. The retained gene duplicates shared by the four species were considered as core retained genes. This set was enriched for terms such
Individual species retained some specific groups of duplicated genes (Figs S8-S11); for instance, genes annotated as involved in the 'cutin, suberine and wax biosynthesis' pathway were differentially retained in B. loranthoides. This gene retention might be associated with the characteristic waxy leaves of this species. Specific retention of genes involved in 'phenylpropanoid biosynthesis' and 'flavonoid biosynthesis' might be responsible for the colorful leaves of B. masoniana and B. darthvaderiana (Fig. S2). As variegated leaves are commonly found among Begonias and are largely attributed to the accumulation of anthocyanins, we looked more closely at the anthocyanin biosynthesis pathway gene families. We found that in contrast to expansion of gene families of the upstream general phenylpropanoid pathway from Curcubitaceae, Begonia species show significant expansion of gene families related to anthocyanin biosynthesis, especially for Chalcone synthase (CHS) in B. masoniana and Flavanone 3hydroxylase (F3H) and Dihydroflavonol 4-reductase (DFR) in B. darthvaderiana (Fig. S12), and there is recent evidence for relaxed selective constraints and differential expression of paralogs in CHS in Begonia (Emelianova et al., 2021).
Based on a high-confidence phylogenetic tree reconstructed by 193 single-copy nuclear gene families of 13 angiosperm species including V. vinifera, Populus trichocarpa, Glycine max, Prunus mume and five species from Cucurbitaceae, we identified gene families that have experienced significant expansions and contractions during the evolution of Begonia and related species (Fig. S13). Twenty gene families, including 1071 genes, were significantly expanded (P < 0.05) in Begonia species compared to the other groups. GO and KEGG enrichment analyses found these to be enriched in terms including 'zinc ion binding', 'transition metal ion binding' and 'metal ion binding' (Table S10), which are primarily involved in the 'Oxidative phosphorylation', 'Endocytosis' and 'Pyrimidine metabolism' pathways (Table S11). Surprisingly, many resistance-and defenserelated gene families such as 'NB-ARC' were significantly contracted in the Begonia lineage. The TIR-NBS-LRR (TNL) disease resistance gene family (Kim et al., 2009) was completely lost (Fig. S14). We looked for expansion of other disease-related genes and found that only the Autophagy 17 (APG17) family showed significant expansion in Begonia (Table S12). Other GO terms which were underrepresented in the set of contraction gene families included the 'protein kinase domain', 'Cytochrome p450' and 'Terpene synthase' gene families (Table S13).

Chromosome evolution
To reconstruct the evolutionary events leading to current genome structures in Begonia, homologous chromosome segments between different species were identified. There were 122 shared syntenic blocks in the four species of Begonia sequenced here, which accounted for 74.6%, 78.6%, 67.0% and 74.6% of the B. loranthoides, B. masoniana, B. darthvaderiana and B. peltatifolia genomes, respectively. The lowest percentage of syntenic blocks in B. darthvaderiana among those of the four species was consistent with the low TE proportion in these regions compared to B. masoniana and B. loranthoides (Fig. S15). Synteny analyses between them showed that each chromosome had a nearly oneto-one syntenic relationship with chromosomes from other species (Fig. 2b); the relationship was especially strong for those three species with the same chromosome number. Some large inversions could be inferred for each species. One translocation was detected in chromosomes 2 and 17 in B. loranthoides. Chromosome fissions and fusions were identified in the genomes of B. loranthoides and B. masoniana. We suggest that chr9 and chr12, chr1 and chr18, chr3 and chr11, and chr8 and chr19 in B. loranthoides experienced breakage with fusion to chr4, chr1, chr11 and chr15 in B. masoniana, respectively (Fig. 2b).
Conserved gene adjacencies suggest that the ancestral Begonia karyotype reconstructed based on the four species noted above consisted of 22 conserved ancestral regions (CARs), following an ancestral WGD that occurred early in the history of Begoniaceae characterizing all extant members (Fig. S16). From the 22 CARs of the ancestral karyotype, the 15 chromosomes of B. masoniana might be derived by three fusions, and four deletions, while the 15 chromosomes of B. darthvaderiana were shaped through one fission, four fusions and three deletions, the 15 chromosomes of B. peltatifolia through four fissions and 11 fusions, and the 19 B. loranthoides chromosomes through seven fissions and 10 fusions (Fig. S16). Although B. peltatifolia has the same chromosome number as B. masoniana and B. darthvaderiana, it appears to have undergone a large number of chromosome fissions and fusions after the split from their common ancestor. This suggests that genomic rearrangements may be even more frequent in Begonia than apparent from the highly variable chromosome numbers (2n = 16-156) (Dewitte et al., 2009(Dewitte et al., , 2011.

Transposable elements evolution and distribution
Transposable elements generally comprise the bulk of plant genomic DNA and their numbers show a positive correlation with genome size (Wendel et al., 2016). In our Begonia samples, this also appears to be the case: B. peltatifolia has both the smallest genome and the smallest number and proportion of TEs (Fig. S17). Amongst the most abundant superfamilies of TEs, the number of Gypsy and Copia LTR elements were most strongly and positively correlated with genome size (Fig. S17). As the four Begonia species have similar numbers of protein-coding genes (Table S5), genome size variations between them are essentially attributed to the variation of TE abundance between the different Begonia species.
The investigation of TE representation in our four Begonia genome assemblies showed that they had different compositions of TE superfamilies ( Fig. 3a; Table S8), and are quite variable for full -length Gypsy and Copia families (Fig. S18). The analysis of full-length LTR-RTs indicated several transposon bursts occurred during the last 8 million years, including recent expansions in all species, especially in Gypsy elements compared to Copia (Fig. 3b). When the full-length Gypsy and Copia families were analyzed in  (Fig. 3c).
To determine the historical dynamics of the different lineages of Gypsy and Copia elements in the Begonia genomes, the divergence of their reverse transcriptase (RT) sequences was analyzed. Evolutionary analysis revealed different patterns among different LTR lineages in the four species (Fig. S19). For example, SIRE elements of Copia show a recent activity burst from a few ancestor sequences in B. masoniana, B. loranthoides and B. darthvaderiana, but no burst is observed in B. peltatifolia. A few speciesspecific bursts were also observed for the Gypsy Tekay element in all four genomes investigated (Fig. S19). Furthermore, several Copia Ivana and Gypsy CRM copies from the common ancestor of the four Begonia species we investigated have been maintained and are still active (Fig. S19). These findings show that Begonia has a long and ongoing history of active TE elements.
Based on the presence and abundance of TE elements in each species, PCA recovered three well-circumscribed Begonia lineages, corresponding to the three major geographical groups of the genus, indicating similar TE compositions in geographically restricted clades (Fig. 3d). Closely related species showed similar TE abundance, even in some species that have diverged more than 10 Ma (Fig. S20). Our investigation reveals a congruence of TE abundance with the phylogenetic tree, indicating that TEs are specifically accumulating across clades of species.
To look for effects of TE activity on gene function, we analyzed TE distribution upstream and downstream of genes. The numbers of genes with adjacent Copia and Gypsy elements

Research
New Phytologist insertions was very similar for the four Begonia species (Fig. S21a). However, comparison of other enriched TE families surrounding genes indicated different composition patterns in these four Begonia species (Fig. S21b). About 743-2751 (3.23-12.03%) and 1705-2378 (7.41-10.78%) genes have TE insertions in their intron and promoter regions, respectively (Fig. S22). Functional enrichment analysis of those genes with TE insertions identified stress-related and metabolic process pathways as over-represented in the set (Tables S14, S15), with distinct differences between the basal African lineage represented by B. loranthoides and the three Asian species (Fig. S23).

Evolution of shade adaptation
As classical shade-dwelling plants, all Begonias have lower total Chl and lower ratios of Chla/b compared with those of a typical sun-exposed plant such as Gerbera hybrida (Table S16). Through comparative genomic analysis, we found several gene families belonging to the core components of light perception; that is, Cryptochromes (CRYs), Phototropins (PHOTs), Phytochromes and UV Resistance Locus 8 (UVR8) were obviously expanded in Begonia following the lineage-specific WGD event compared to other plant species (Figs 4d, S24-S27; Table S17). Furthermore, we found that all these lineage-specific WGD-retained photoreceptor genes of B. masoniana displayed differential expression responding to light and dark treatment. Notably, the two retained copies of UVR8 showed divergent expression between light and dark (Fig. 4f). Thus, a higher copy number of these genes might contribute to shade adaptation by increasing the complexity of the light response regulation network .
Although  (Table S16), along with lower maximum photochemical efficiency of PSII (F v /F m ) and quantum yield than the two semishade species (Fig. 4a-c). Comparative gene family analyses revealed significant expansions of the LHCs family in the two 'deep shade' Begonias ( Fig. 4e; Table S18). Notably LHCB, and especially the LHCB1 subgroup, show prominent expansions in these two shade-dwelling species due to parallel tandem duplications (Figs 4e, S28). All the duplicated LHCB1 gene pairs showed upregulation in the dark, and downregulation in the light (Fig. 4f), which may indicate their strengthened ability of light harvesting under low light. Together, these results suggest that both WGD-and tandem-driven photoreceptors and light-harvesting genes contribute to shade adaptation of Begonia.

Genetic variation and admixture patterns
Begonia originated in Africa and spread across all the tropical regions except Australia (Neale et al., 2006). We selected 78 accessions (Fig. 5a) that cover the full distribution of Begonia, representing 37 out of 70 sections, to investigate patterns of genetic variation across the genus. We detected 1 137 696 SNPs and 66 862 small indel variants (< 10 bp). Phylogenetic analysis using a subset of 926 407 SNPs within regions of putatively single-copy genes (SCGs) clearly differentiate Begonia accessions into three distinct clades (Fig. 5c). A weakly supported African clade is sister to a clade consisting of two monophyletic lineages including one consisting of largely Neotropical accessions and one consisting of exclusively Asian accessions.
Genetic clustering analysis with ADMIXTURE showed an optimal value of K = 3 subpopulations (Fig. 5d), which is consistent with the PCA (Fig. 5b). We observed evidence of interspecific admixture within the Neotropical and African Begonia accessions, respectively, and the highest nucleotide diversity (p) in the Neotropical accessions (0.0005755) compared with that of the African (0.0002595) and Asian (0.0002434) accessions (Fig. S29).

Phylogenomic incongruences and hybridizations
Species of Begonia are known to hybridize in nature (Peng & Ku, 2009;Hughes et al., 2018), and previous work (Goodall-Copestake et al., 2010) identified possible hybridization events early in Begonia evolution. To investigate this further we compared phylogenetic inferences between plastid and nuclear phylogenies. The plastid tree supports the African origin of Begonia and shows successive divergences of four major clades, corresponding to the African, Neotropical I, Neotropical II and Asian clades (Figs 5c, S30, S31). Our plastid phylogeny differs from previous phylogenetic studies based on three plastid markers (Moonlight et al., 2018) in the position of the yellowflowered African Begonia (YFAB) clade. The YFAB clade forms a sister group with the Fleshy-fruited African Begonia (FFAB) clade in our study (Fig. 5c) whereas in previous multilocus studies it diverged at the base of Begonia (Moonlight et al., 2018). The nuclear trees (Figs S32-S34) and TE topology (Fig. S20) in our study consistently recovered a topology with three major geographically restricted clades: the African, Neotropical and Asian clades. Some conspicuous incongruences between nuclear and plastid trees can be identified within the Neotropical clade: the well-resolved EB (East Brazil) clade containing sections Trachelocarpus, Pereira, Astronthrix, Solananthera, Gaerdtia and Latistigma in the nuclear tree is split into three independent lineages (EB1, EB2, EB3) diffusely distributed between the two Neotropical clades in the plastid tree (Fig. 5c). The position of the two SDAAB (Seasonally dry adapted African Begonia) accessions also show strong cytonuclear incongruence, suggesting hybridization, introgression or incomplete lineage sorting (ILS).
We observed strong discordance for the Neotropical species in the species tree constructed with ASTRAL-III (Fig. S32). Hybridization or ILS are possible explanations for this and are also suggested by the SPLITSTREE network analysis which revealed a reticulate evolution for these Neotropical accessions (Fig. 5f).

Discussion
Putative WGDs have been identified across the eukaryote tree of life, especially in the green plant clade. Many of these WGDs are considered as driving forces contributing to species diversification and evolutionary innovations (Van de Peer et al., 2017;Ren et al., 2018;Leebens-Mack et al., 2019;Wu et al., 2019). WGD may be followed by lineage-specific loss of duplicated genes, contributing to adaptation to new niches, survival in response to environmental stress and subsequent rapid accumulations of species diversity (Landis et al., 2018;Ren et al., 2018;Van de Peer et al., 2021). In this study, we confirmed the occurrence of a lineage-specific WGD event in the common ancestor of Begoniaceae (c. 35 Ma), before the split of cosmopolitan Begonia (median, c. 25 Ma) from the Hawaiian endemic Hillebrandia (Moonlight et al., 2018) (Fig. 2c). As a shade plant, shade adaptation is the key driving force underlying the diversification of Begonia. We provide evidence that the expansion of light signaling pathway genes retained following WGD may have contributed to shade adaptation of Begonia (Fig. 4).
However, WGD is not always associated with species diversification (Landis et al., 2018), as shown in the stark contrast of species diversity between the two genera. The present lack of species diversity in Hillebrandia on the Hawaiian Archipelago is potentially linked to its relict status on the older islands (Clement et al., 2004) and highly homozygous genome (Mart ınez, 2017). It is tempting to speculate that H. sandwicensis is a dying ember of a once much more species-rich clade, with diversity having been extinguished in the scramble to colonize the archipelago as islands sank and emerged during its geological evolution.
In addition to WGD, hybridization and introgression have also contributed to the species diversity of Begonia. Through population genomic analysis, we detected several putative hybridization events, especially in the Neotropical clade (Fig. 5). These events may have partially contributed to the exceptional species diversity and genetic diversity of Neotropical Begonia though novel combinations of genotypes, introgression and hybridization-based genome rearrangements or TE activation. Further genomic studies on Neotropical Begonia might help elucidate which factors have contributed to this high species diversity.
Plant genomes tend to accumulate large amounts of LTRs, and these have been shown to create different landscapes across closely related taxa. The presence and activity of TEs in plant genomes has been widely observed in many other plant groups, from largely studied taxa such Brassicaceae (Joly-Lopez & Bureau, 2014; Rogivue et al., 2019), Solanaceae (Parisod et al., 2012;de Assis et al., 2020) and Poaceae (Ma et al., 2004;Altinkut et al., 2006;Wyler et al., 2020), to nonmodel plant groups such as Quercus (Mascagni et al., 2019), Passiflora (Sader et al., 2021), Anacyclus (Vitales et al., 2019) or Melampodium (McCann et al., 2020), among many others. We show that transposons are also an important source of genetic variation in Begonia. Two thousand genes in Begonia genomes have TE insertions in their promoter regions (Fig. S22). KEGG functional annotations of these genes with TE insertions in the promoter regions revealed a similar pattern for the three Asian Begonias with enrichment in the pathways of carbohydrate and energy metabolism (Fig. S23). This consistency suggested that TE insertions in the promoter regions might be under some selection constraints rather than neutral and random processes (Baduel et al., 2019). Moreover, the GO enrichment analyses found these genes with TE insertions to be specifically enriched in the function of photosynthesis, negative regulation processes, response to biotic stimulus and stress, and defense response (Tables S14, S15). This result suggests that TE insertions into the regulatory regions in Begonia genomes might play some adaptive role, as has been demonstrated in Arabidopsis Baduel et al., 2019) and maize (Freeling et al., 2015).
In summary, we have assembled for the first time four chromosome-level genome assemblies of Begonia, and also provide WGS data for 74 representative species within the genus. Through comparative genomics, we confirmed that a lineagespecific WGD event pre-dates the radiation of Begonia and may have provided substantial genetic materials for the phenotypic evolution and shade adaptation. Moreover, we found considerable variation in the compositions and abundance of TEs, and strong phylogenetic signal in TE feature clustering. Speciesspecific patterns of TE insertions in promoters and introns might have played a role in the adaptative evolution of Begonia. Furthermore, we provide evidence for introgression during the evolution of Begonia, especially for the Neotropical clade. This study not only provides high-quality genomic resources for Begonia, but

Research
New Phytologist also reveals new insights into the evolution mechanisms of a mega-diverse clade.

Supporting Information
Additional Supporting Information may be found online in the Supporting Information section at the end of the article.                                  Methods S1 Supplemental methods.
Table S1 Summary of 78 Begonia species for whole genome shotgun sequencing.

Table S2
Genome size estimation based on K-mer analysis.          Table S13 Statistics and annotations of the contracted gene families in Begonia.

Table S14
The significantly enriched GO terms of biological processes for genes with TEs inserting in introns across four Begonia species.

Table S15
The significantly enriched GO terms of biological processes for genes with TEs inserting in promoter across four Begonia species.
Table S16 Chlorophyll data of the sun-loving plant Gerbera hybrida and four Begonia species.
Table S17 Comparisons of the gene numbers for the light signaling genes in 10 angiosperm genomes.