Genome downsizing after polyploidy: mechanisms, rates and selection pressures

SUMMARY An analysis of over 10 000 plant genome sizes (GSs) indicates that most species have smaller genomes than expected given the incidence of polyploidy in their ancestries, suggesting selection for genome downsizing. However, comparing ancestral GS with the incidence of ancestral polyploidy suggests that the rate of DNA loss following polyploidy is likely to have been very low (4 – 70 Mb/million years, 4 – 482 bp/generation). This poses a problem. How might such small DNA losses be visible to selection, overcome the power of genetic drift and drive genome downsizing? Here we explore that problem, focussing on the role that double-strand break (DSB) repair pathways (non-homologous end joining and homologous recombination) may have played. We also explore two hypotheses that could explain how selection might favour genome downsizing following polyploidy: to reduce (i) nitrogen (N) and phosphate (P) costs associated with nucleic acid synthe-sis in the nucleus and the transcriptome and (ii) the impact of scaling effects of GS on cell size, which inﬂuences CO 2 uptake and water loss. We explore the hypothesis that losses of DNA must be fastest in early polyploid generations. Alternatively, if DNA loss is a more continuous process over evolutionary time, then we propose it is a byproduct of selection elsewhere, such as limiting the damaging activity of repetitive DNA. If so, then the impact of GS on photosynthesis, water use efﬁciency and/or nutrient costs at the nucleus level may be emergent properties, which have advantages, but not ones that could have been selected for over generational timescales.


INTRODUCTION
Polyploidy or whole-genome duplication (WGD) is prevalent amongst many vascular plant lineages and is a major driver of evolutionary novelty and speciation (Escudero and Wendel, 2020;Fox et al., 2020). It has been suggested that all angiosperms (flowering plants) have experienced one or more episodes of polyploidy in their ancestry (Bowers et al., 2003;Jaillon et al., 2007;Jiao et al., 2011;Landis et al., 2018;Vision et al., 2000;Wendel, 2015) and it is commonly encountered amongst individuals of extant species (Kol a r et al., 2017). Indeed, angiosperms have the highest incidence of ancestral WGDs among all land plant groups (Van de Peer et al., 2017). Post-polyploidy genome divergence is typically associated with various 'diploidisation' processes, including loss of gene duplicates and descending dysploidy, which returns the chromosome number back to a diploid-like form (Dodsworth et al., 2015a;Mand akov a and Lysak, 2018;Wendel, 2015). In addition, since the genome sizes (GSs) of polyploids are typically smaller than expected given the incidence of WGDs in their ancestry, diploidisation is also considered to be accompanied by extensive loss of DNA or 'genome downsizing' (Leitch and Bennett, 2004;Pellicer et al., 2018;Zenil-Ferguson et al., 2016).
An analysis of the distribution of GSs across angiosperms reveals that in contrast to gymnosperms (the other extant group of seed plants), most angiosperms have small GSs, with the modal size being only 0.58 Gb/1C (Dodsworth et al., 2015b;Pellicer et al., 2018), despite a huge 2400-fold range in GS (Figure 1a). Whilst the GS of the ancestral seed plant is unclear largely because of the extinction of many key seed plant lineages (Hilton and Bateman, 2006), in contrast to gymnosperms, it is likely that genome downsizing has accompanied the divergence of angiosperms, given the frequency of polyploidy across many lineages (Van de Peer et al., 2017). However, the mechanisms, rates and selection pressures driving this genome downsizing are not well understood. Here we explore each of these, in turn, to provide new insights into the impact of polyploidy on the immense diversity of GSs encountered in angiosperms and why most have small genomes.

IMPACT OF WHOLE-GENOME DUPLICATIONS ON GENOME SIZE DIVERSITY IN ANGIOSPERMS
A large number of WGD events in the ancestry of angiosperm lineages is inferred from studies of mutation patterns in gene duplicates (Ks analysis), studies of divergence of paralogues (MAPS analysis) and synteny analyses (Jiao et al., 2011;Li et al., 2015;Van de Peer et al., 2017;Landis et al., 2018). To look for signatures of the numerous WGD events experienced across angiosperms in the distribution of GSs amongst extant species, we reconstructed ancestral GSs across all angiosperms. Using the same approach as in Carta et al. (2020), but with an expanded dataset of 5866 species (Figure 1b), an ancestral GS of 1.70 Gb/1C at the base of all angiosperms was recovered, which is close to previous inferences based on smaller datasets (i.e. 1.42 Gb/1C by Puttick et al. (2015) and 1.69 Gb/1C by Carta et al. (2020)). Figure 1b also shows that all major lineages (e.g. monocots, superasterids, superrosids) and most branches were similarly reconstructed as having small GSs, typically below 2 Gb/1C (see blue and green coloured branches), as reported previously using ancestral GS reconstructions (Puttick et al., 2015;Soltis et al., 2009;Simonin and Roddy, 2018;Carta et al., 2020). We know, however, that the divergence of many of these species has been accompanied by numerous WGD events. For example, all Poaceae species have undergone three WGD events (r, q, s), subsequent to (a) (b) Figure 1. (a) The contrasting distribution of genome sizes (GSs) and mean, modal and median 1Cvalues and ancestral GS in angiosperms and gymnosperms. In angiosperms, despite GSs ranging 2400-fold, most species have small genomes (mean GS = 5.02 Gb/1C), with an ancestral GS of 1.70 Gb/ 1C. The distribution of GS between 0 and 5 Gb/1C is shown in detail. In gymnosperms, the GSs range only 16-fold but have a mean value five times greater than that of angiosperms (mean GS = 17.95 Gb/1C). Data for 10 770 angiosperms and 420 gymnosperms were taken from the Plant DNA C-values database (release 7.1) (Pellicer and Leitch, 2020). (b) Ancestral reconstruction of GS across the major lineages of seed plants. Reconstruction of ancestral GS (a1C) in Gb, following the approach used in Carta et al. (2020), but with an updated dataset of 5866 species extracted from the Plant DNA C-values database (Pellicer and Leitch, 2020) combined with the GBOTB large tree (Gen-Bank taxa with a backbone provided by Open Tree of Life v 9.1) available from Smith and Brown (2018). After log 10 transforming 1C-values, a1C reconstruction was carried out under maximum likelihood and plotted using Phytools functions fas-tAnc and contMap (Revell, 2012). The rho (r), sigma (q) and tao (s) whole-genome duplication (WGD) events within monocots and the WGD at the base of all angiosperms (Jiao et al., 2011) are shown. The outer ring shows the distribution of GS diversity across the tree with the value for each species shown as a bar (s = ancestral branch leading to angiosperms, a1C = 1.70 Gb; • = ancestral branch leading to gymnosperms, a1C = 8.22 Gb). a putative WGD event (e) at the base of all angiosperms (Figure 1b, Van de Peer et al., 2017;Clark and Donoghue, 2018). Indeed, the ancestral GS of the grass family Poaceae is 1.53 Gb/1C, which is smaller than that at the base of all monocots (i.e. 1.77 Gb/1C, Figure 1b). This extended analysis confirms that, despite the numerous ancient WGD events, overall genome downsizing subsequent to polyploidy must have been common and widespread during the evolution of angiosperms, or at least there is no signature of these polyploidy events in GS reconstructions using data from extant species. Given this analysis, it is not surprising that the average GS of angiosperm species is not correlated with ploidy level predicted from chromosome counts (Leitch and Bennett, 2004;Zenil-Ferguson et al., 2016), and most angiosperm species have smaller GSs than might be expected given the incidence of polyploidy in their ancestries. For example, the lineage leading to Arabidopsis thaliana is thought to have undergone multiple WGD events (Van de Peer et al., 2017), such that it is predicted to be 48-ploid (Wendel, 2015). Given an ancestral GS for angiosperms of 1.70 Gb/1C (Figure 1b), this would give a GS of approximately 82 Gb/1C, which is approximately 512 times larger than its actual GS of just 0.16 Gb/1C (Bennett et al., 2003). In contrast, available data in ferns (Clark et al., 2016) and gymnosperms (Farhat et al., 2019;Ickert-Bond et al., 2020;Wu et al., 2020) show that GS does reflect ploidy level estimated from chromosome counts, and for gymnosperms, outside of Gnetales, there have been no widely accepted ancestral WGD events since the common ancestor of all seed plants ( Van de Peer et al., 2017;Zwaenepoel and Van de Peer, 2019).
In the forthcoming sections, we explore possible mechanisms that may be responsible for genome downsizing, provide the first estimates of the rates of genome downsizing and suggest possible sources of selection that might have played a relevant role in driving genome downsizing.

MECHANISMS UNDERPINNING GENOME SIZE DIVERGENCE
The remarkable variation in GS across extant seed plants arose through differences in the activities of processes that increase and decrease GS. Tandem duplications, unequal recombination, insertion/deletion of (retro)transposable elements, chromosome non-disjunction and polyploidy, coupled with genetic drift and/or selection, all contribute to GS change (Bennetzen et al., 2005;Panchy et al., 2016). Of these, polyploidy is a driver of step changes in GS, acting to generate multiple copies of the genome, both genic and non-genic.
It is notable that many of the mechanisms that lead to GS change are associated with DNA double-strand break (DSB) repair involving DNA recombination (Schubert and Vu, 2016). These include (i) the deletion and insertion of transposable elements, particularly, retroelements in plants (e.g. Cossu et al., 2017;Hawkins et al., 2009;Senerchia et al., 2014), (ii) the amplification and deletion of tandem repeats (Vitales et al., 2019) and (iii) structural rearrangements of chromosomes which can move sequences into new chromatin domains with different local dynamics in recombination frequencies, leading to altered patterns of DNA loss and gains (Ren et al., 2018). In addition, at meiosis, aberrant control of recombination can lead to the formation of univalents and multivalents, resulting in non-disjunction and polyploidy, all impacting GS. Thus, insights into GS diversity require a recognition of the activities of DSB repair and DNA recombination pathways.
There are numerous ways in which DSBs can be repaired (Figure 2a), each of which can lead to different outcomes in terms of DNA losses and gains, and if they remain stable over multiple generations it will drive GS divergence (Schubert and Vu, 2016). For example, homologous recombination (HR) involving Holliday junctions has the potential to lead to GS stasis, upsizing or downsizing, depending on the sequences involved and the way that junctions are resolved ( Figure 2b). In contrast, singlestrand annealing (SSA) and non-homologous end joining (NHEJ) will likely lead to varying degrees of genome downsizing ( Figure 2a). In tobacco (Nicotiana tabacum), it is estimated that most DNA repair occurs via SSA, with about one-third being HR , but such ratios could vary, depending on cell type, cell cycle and developmental stage. Variation can also arise in the extent of resectioning (Figure 2a) that occurs at a DSB, which, in turn, can influence the particular DNA repair pathway adopted and hence the amount of DNA lost during DSB repair (Clouaire and Legube, 2019;Shibata et al., 2011). Mechanisms can also be intrinsically biased in their action; for example, in Drosophila melanogaster (fruit flies), nonallelic gene conversion ( Figure 2b) can lead to approximately 3.5 times more deletions than insertions (Assis and Kondrashov, 2012).
There can also be differences in the types of repair pathways adopted dependent on the genomic location of the sequences or the proximity of homologous sequences. For example, in transgenic A. thaliana, deletion of DNA occurred in about one-third of repair events involving HR, if templating was possible from adjacent sequences, as in tandemly arranged genes .
Furthermore, studies have shown that the repair of DSBs can differ between species, leading to different outcomes that have the potential to impact GS. For example, distinct differences in the DSB repair pathways were observed between A. thaliana and barley (Hordeum vulgare), which differ 35-fold in GS (i.e. 157 and 5500 Mb/1C, respectively) (Vu et al., 2017). Whereas the DSB repairs in Arabidopsis were typically skewed towards more frequent and larger deletions, barley was characterised by repairs that led to  (i) In HR involving Holliday junctions, resectioning (i.e. the removal of DNA) generates two 3 0 single-stranded DNA (ssDNA) overhangs which provide a platform for recruiting the HR repair-related proteins (e.g. Rad51, BRAC1 and BRAC2). The 3 0 single-strand overhangs then invade homologous DNA sequences, to form a structure known as the D-loop. A double Holliday junction arises after DNA synthesis (indicated by dashed blue lines extending from the DNA strands) replaces those DNA sequences lost by resectioning, and strand ends are ligated. (ii) In contrast, the repair of a DSB via the single-strand annealing (SSA) pathway of HR is associated with DNA loss. This is caused by resectioning, which can be extensive as it will continue until regions of sequence homology (as indicated by red segments in the DNA strands) are encountered between the two strands. These homologous regions will then form a duplex which is ligated, while any 3 0 overhangs generated that lack sequence homology are deleted. (iii) Alternative NHEJ (alt-NHEJ) requires only microhomologies (2-25 bp) generated by resectioning to enable strand annealing and end joining, accompanied by variable lengths of deletions and minor insertions at the break sites (≤3 bp). (iv) In contrast, classical or canonical NHEJ (c-NHEJ) does not require resectioning but is usually associated with short microdeletions (<30 bp) and minor insertions at the DSB (≤3 bp). See Table 1, which lists the functions and abbreviations for the proteins shown.
(b) Processing of Holliday junctions. How the Holliday junctions in (a) are processed will influence the genetic outcome and impact GS evolution. The Holliday junctions can undergo dissolution without cleavage of the DNA, resulting in no GS change. Alternatively, depending on which two of the four DNA strands are cleaved (as shown by the alternative positions of the scissors), the outcome will either be a non-crossover or a crossover event, the latter potentially leading to GS divergence. more and larger insertions of DNA. Such differences were considered to arise in part through the more extensive resectioning at DSBs observed in Arabidopsis compared with barley, perhaps due to less efficient protection of free DNA ends at the breakpoints. Similarly, in comparisons of multiple species of Drosophila with Laupala (crickets), whose genomes are 11-fold larger, the studied non-long terminal repeat (non-LTR) transposable elements in Laupala had more insertions and lower rates of DNA deletion caused by fewer large (>15 bp) indels (Petrov et al., 2000).
In the tunicate Oikopleura dioica, genes essential for canonical NHEJ are missing and DSBs are instead repaired using alternative NHEJ (alt-NHEJ). The authors suggest that the mutations by alt-NHEJ ultimately led to the organism's small compact genomes (Deng et al., 2018). In all such examples, differences in DSB repair over multiple generations can be expected to lead to GS divergence. Overall, the particular repair pathways used ( Figure 2a) can vary at the molecular level (e.g. histone phosphorylation levels, expression levels of repair protein genes), at the cellular level (e.g. cell type, cell cycle), between developmental stages and with species and environment (e.g. under stress) (Boyko et al., 2006a,b;Clouaire and Legube, 2019;Harris et al., 2018;McVey and Lee, 2008;Shrivastav et al., 2008;Vu et al., 2014). Any bias in the germline lineage over generations in the relative frequencies of these DSB repair pathways or the activity of the different proteins involved (Table 1) may result in GS divergence. This can potentially arise from bias in (i) the relative activities of the DNA repair mechanisms (Figure 2a), (ii) the tendency for Holliday junctions to resolve with a crossover or not (Figure 2b), (iii) the frequency of intra-chromatid crossovers (Figure 2c), (iv) the frequency of misalignment between homologous chromosomes or sister chromatids leading to inter-chromosomal or unequal crossovers, respectively ( Figure 2c) and/or (v) the tendency of chromosomes to recombine with extrachromosomal DNA (e.g. during retroelement insertion). If selection is favouring any such bias such that DNA is lost, it will drive genome downsizing.

PREDICTING RATES OF GENOME SIZE CHANGE FOLLOWING WHOLE-GENOME DUPLICATIONS
To gain insights into the drivers of genome downsizing, it is necessary to determine the rate of DNA loss following polyploidy, to see if DNA losses are sufficiently large to be visible to selection. Given that such data have not previously been estimated to the best of our knowledge, the following approach was taken to provide insights.
We used available information on (i) the phylogenetic relationships between species, (ii) GS data and (iii) the estimated dates of WGD events identified in Van de Peer et al. (2017) to predict the rate of GS change following the most recent WGD event (Figure 3). This was done by reconstructing the ancestral GSs before WGDs and comparing the multiple of these values (to account for the WGDs) with the actual GSs. This required reconstructing the GSs of individual lineages leading to a WGD event, requiring data from 'diploid' lineages that diverged prior to that WGD event, a situation that occurs seven times on the phylogeny in Figure 3. The results reveal that the rate of DNA loss per generation or per million years is low (Figure 3). For example, in the lineage leading to the walnut (Juglans regia), which is predicted to have experienced a WGD approximately 50 million years ago, a predicted ancestral GS of 712 Mb before the WGD event means an expected GS of 2 9 712 Mb/1C = 1412 Mb/1C in the extant J. regia. However, its actual GS is 620 Mb/1C, indicating that over half the DNA has been lost since the WGD. Assuming a generation time of 30 years, this then equates to only 482 bp/generation. Over seven species, representing both eudicots and monocots, the GS losses per generation are similarly low (4-482 bp/1C per generation, depending on species). It is, therefore, difficult to explain what may be driving genome downsizing following WGD given such small changes in GS are taking place per generation (i.e. in the 10 1 -10 2 bp/1C range compared with the GSs of most plant species (in the 10 8 -10 11 bp/1C range), which are 10 6 -10 10 orders of magnitude bigger). Such small losses in DNA are unlikely to be visible to selection, so, assuming a constant rate of DNA loss per generation, we would expect GS to drift, leading to either genome upsizing or downsizing subsequent to polyploidy. Therefore, the following question arises 'why is there a general trend towards genome downsizing?. ' We address this question in the following sections.

POTENTIAL SOURCES OF SELECTION ACTING ON GENOME SIZE FOLLOWING WHOLE-GENOME DUPLICATIONS
There are at least two possible sources of selection acting to enhance plant fitness following a WGD event and leading to genome downsizing. In species with large GS these are: (i) the increased nutrient costs of nucleic acids in species and (ii) the impact of increased cell size detrimentally influencing gas exchange parameters and hence photosynthesis efficiency and water use efficiency. Each of these is considered below.
Selection for small genomes generated by the nutrient costs of nucleic acids Nucleic acids are comprised of 15% nitrogen (N) and 9% phosphorus (P) and are, therefore, amongst the most Nand P-demanding molecules of the cell (Sterner and Elser, 2002). However, when nutrients are limited, as is typical of many natural systems, the allocation of N and P to nucleic acid synthesis, maintenance, repair and transcription must be traded off with other key processes, including photosynthesis and protein synthesis, and hence impacts growth and fitness (Hessen et al., 2010).
Evidence of selection for N-efficient nucleotide and amino acid usage has been observed in bacteria (Shenhav and Zeevi, 2020) and plants (Acquisti et al., 2009a,b;Kelly, 2018). For example, Kelly (2018) revealed the strongest selection for codons with efficient N usage in the genomes of plants needing more N for photosynthesis (i.e. C 3 versus C 4 plants). These examples suggest there is selection for N and P use efficiency at the genomic level and hence support a model in which species with larger genomes may be more compromised when competing with smallergenome plants in nutrient-poor environments. This prediction is supported by studies showing that GS influences the ecological selection of species under nutrient limitation. GS analysis of species growing in long-term fertilised grassland in the United Kingdom and Germany showed that polyploid species with large GSs competed most successfully only in the presence of high levels of N and P, while species with smaller GSs were more successful when one or both nutrients were limited (Guignard et al., 2016;Smarda et al., 2013). These data support the hypothesis that GS represents a significant cost to a plant, influencing fitness. In species of the plant genus Primulina that are specialists to limestone karst, leaf N levels are positively correlated with GS, indicative of a nutrient cost of the genome (Kang et al., 2015). Furthermore, lab-based experiments using both Chamerion angustifolium and A. thaliana found that synthetic neotetraploids (with double the GS of diploids) only had a competitive advantage over their diploid progenitors when growing in both N-and P-enriched conditions, where they generated more biomass and had increased reproductive fitness (Anneberg and Segraves, 2020;Bales and Hersch-Green, 2019;Walczyk and Hersch-Green, 2019). Of note, the parasitic species mistletoe (Viscum album) has a particularly large genome (approximately 85 Gb/1C), perhaps because its parasitic lifestyle means there is no N and P limitation and the genome has been free to drift to large GS. In contrast, some of the smallest genomes are found in insectivorous plants (e.g. Genlisea and Utricularia), which typically have a streamlined genome, perhaps due to evolutionary or ecological selection on GS in lownutrient environments (Veleba et al., 2020).
Selection for increased N and P use efficiency under nutrient limitation may, therefore, result in selection pressures to eliminate DNA, especially 'junk' DNA such as certain repetitive DNA sequences. The selective advantage of this DNA elimination may not only lie in lower N and P costs of the repetitive DNA itself (e.g. arising from proteins , DNA replication and maintenance and DNA repair), but also in the RNA costs associated with repeats. The latter includes transcriptional costs associated with the selfish (retro)transposition of repeats in their amplification as well as costs associated with their silencing, via canonical and Figure 3. The predicted impact of major whole-genome duplication (WGD) events on genome size (GS) for 56 angiosperms and the predicted rate of GS change following the most recent WGD for five different angiosperm lineages on a dated phylogenetic tree. Each rectangle on a branch indicates a WGD event (grey, yellow or brown). The colour of the branches shows the predicted GS considering the impact of the WGD. The colour of branches in the 'Extant GS' panel indicates the GS range in which the GS of the species falls. The polyploid species with predicted rates of GS change following the most recent WGD event are given in red for five angiosperm lineages (dotted grey boxes). All extant GSs are given as 1C-values. Ancestral GS reconstruction of the sister lineages and outgroups to the polyploid (excluding the polyploid itself) provides an estimate of the ancestral GS before the WGD event (red star). For each polyploid species analysed, a comparison is made between the multiple of this ancestral GS (to account for the WGD, i.e. double the value for a genome duplication, or triple for a genome triplication) and the actual GS of the polyploid species to calculate the rate of DNA loss since the WGD event. The image with polyploidy events is modified from Van de Peer et al. non-canonical RNA-directed DNA methylation (RdDM) pathways (Kenchanmane Raju et al., 2019). Although the magnitude of these costs and the extent to which cell nutrient allocation trade-offs are influenced by GS remain to be determined, it appears from the available experimental data noted above that these costs may be visible to selection in angiosperms. It is also possible that selection for species with small genomes and increased nutrient use efficiency is an emergent property of genome downsizing processes that are not themselves under such selection (e.g. some DNA recombination and repair pathways). Over many generations, the cumulative effects of many small DNA losses could nevertheless lead to sufficient nutrient use efficiency gains that individuals with substantially smaller GSs have ecological advantages.
Selection for small genomes to maximise photosynthesis and/or water use efficiency GS is negatively correlated with cell size (Beaulieu et al., 2008;Jovtchev et al., 2006), as the quantity of nuclear DNA plays a role in setting the lower limit in the volume of the nucleus and consequently a lower limit in the size of the cell that houses and supports it (Cavalier-Smith, 2005;Doyle and Coate, 2019;Greilhuber and Leitch, 2013;S ımov a and Herben, 2012). These patterns are observed in animals and plants. In animal cells, such as erythrocytes, which carry oxygen to tissues, the scaling relationship between GS and cell size impacts the surface area-tovolume ratios of cells and hence the efficiency of oxygen delivery (i.e. species with bigger GS and hence larger cells have a lower surface area-to-volume ratio, which reduces the speed of oxygen delivery). Selection for small GS in animals with high rates of metabolism (e.g. bats and hummingbirds) may be driven by the need to maximise rapid delivery of oxygen. In contrast, selection on GS may be relaxed in animals with slow rates of metabolism and hence lower oxygen demands (e.g. lungfish), enabling GS and cell size to drift upwards (Gregory, 2001). In plants, such scaling relationships can impact gas exchange in the leaf by determining the minimum stomatal pore size, which, in turn, is inversely correlated with stomatal density (Beaulieu et al., 2008). Indeed, ploidal levels and/or GS impact the density of cells in leaves, the maximum rate of carboxylation in photosynthesis and water use efficiency (Roddy et al., 2020;Wilson et al., 2021), which may also explain why GS is also correlated with seed number and seed mass .
Plant species with larger genomes typically have larger, sparser stomata, limiting stomatal conductance of CO 2 and water (Franks and Farquhar, 2001;Simonin and Roddy, 2018) and slowing the speed at which stomata respond to daily changes in environmental conditions (Lawson and Vialet-Chabrand, 2019;Roddy et al., 2019). Given these relationships, selection acting upon stomatal size to optimise trade-offs between photosynthesis and water use efficiency may select for genome downsizing, particularly under water stress and during periods of low atmospheric CO 2 concentrations, as exist today (Franks and Beerling, 2009). Under such conditions, small stomata at a high density better supply CO 2 to photosynthetic tissues, while also increasing water use efficiency as small stomata close more rapidly than larger ones (Lawson and Vialet-Chabrand, 2019;Roddy et al., 2019). This is supported by a study of vernal geophytes, which showed that species with adaptations to maintain humid conditions and higher CO 2 concentrations around the stomata had larger stomata and GSs than those without (Vesel y et al., 2020). Nevertheless, existing evidence relies mostly on correlations, and hence more research is urgently required to demonstrate the causal effects of GS change on photosynthetic rates.
However, as with nutrient use efficiency potentially being an emergent property of other processes acting over multiple generations (see above), so too could any benefical effects of smaller GS on cell size, and its consequences for photosynthesis and water use efficiency be a byproduct of those processes. If so, species with smaller genomes may have advantages, but not ones that would have been under selection over generational timescales.
HOW MIGHT GENOME DOWNSIZING ARISE? Lynch and Conery (2003) suggested that population sizes of complex eukaryotes may be too small for selection to act on GS. Furthermore, Lynch and Marinov (2015) proposed that the energetic costs of nucleic acids (DNA replication, transcription and translation costs of associated histones) are unlikely to be sufficient to offer a selective pressure for genome downsizing. Instead, they argued that GS has drifted to be large in the most complex multicellular organisms with relatively small effective population sizes. However, these analyses are based on theoretical predictions using data from prokaryotes and eukaryotes with small GSs (i.e. up to 157 Mb/1C). It is, therefore, possible that such dynamics may not be applicable across the full 2400-fold range of GSs found in plants, with new dynamics becoming apparent as the GS increases. Indeed, the energetic cost related to GS may be more visible to selection than might be expected from these analyses, and other costs (e.g. N and P) have not been explored so thoroughly. Certainly, the evolution of GS does not appear to be dominated by genetic drift in angiosperms, as this would give rise to a normal distribution of GS centred around large genomes (due to the frequency of ancestral WGDs in angiosperms), rather than the strongly skewed distribution dominated by small genomes (Figure 1a). By contrast, gymnosperms, which have a much lower frequency of polyploidy compared with angiosperms, have a near-normal distribution of GS (Figure 1a). Thus, the overall loss of DNA following polyploidy in angiosperms points towards a role for selection to reduce GS. In order to explain the observation that most angiosperms have small GSs despite (i) the high frequency of polyploidy, (ii) the small amounts of DNA predicted to be lost per generation following WGD ( Figure 3) and (iii) the power of drift in driving GS divergence (Lynch and Conery, 2003), we need to establish where a strong selection pressure might act on GS or where processes might be triggered that indirectly lead to DNA loss.

Genome downsizing arises through selection against genome upsizing
It is possible that genome downsizing arises because processes that result in increases in GS (e.g. [retro]transposition) may be more deleterious than those causing downsizing (e.g. deletions) and be more strongly selected against. This might then lead to the rates of DNA loss exceeding the rates of gain, which would inevitably lead to GS decreasing over time. This might continue until such times that DNA losses themselves (e.g. through loss of gene function) become more deleterious than gains, when the balance of selection pressures may reverse. In the context of a young polyploid such a scenario may, indeed, occur, since gene redundancy might provide minimal selection against gene loss in early generations, yet amplification of elements, e.g. via (retro)transposons, may have costs associated with mobility and insertion, including RdDM costs and those associated with disrupted gene networks (e.g. through altered promoter and regulator activity) (Lisch, 2013). In contrast, in ancient polyploids, duplicate gene losses may have resulted in little or no gene redundancy and so selection against processes that cause the deletion of DNA may be stronger than those associated with GS increases.

Genome downsizing is selected for in early-generation polyploids
The only way to reconcile a role for photosynthesis, water use efficiency and/or N and P costs in acting as selection pressures driving genome downsizing is to assume that DNA loss is extensive and rapid in the early polyploid generations. For example, this has been reported in the first two generations of colchicine-doubled Phlox drummondii (25% decrease in GS over two generations, Raina et al., 1994). Such large reductions in GS might then be visible to selection. However, direct evidence for rapid reductions in GS following natural polyploid formation is currently lacking, although we do know from studies on retention patterns of gene duplicates in yeast lineages that gene loss is at first random, impacting orthologs and paralogs equally, but later, amongst the more slowly diverging genes, the losses are no longer random and instead are frequently targeted at specific copies (Scannell et al., 2007). Such data suggest that genome dynamics may change with the age of the polyploid, and potentially such changes could also impact rates of DNA loss. Certainly, in early-generation polyploids there can be a much higher frequency of recombination between homologs and homeologs leading to multivalent formation during meiosis compared with older, more established polyploids (Lloyd and Bomblies, 2016;Yant et al., 2013). This will not only generate considerable chromosomal diversity in young polyploids, but also impact the segregation of diversity, including indels. For example, some young polyploid species show much karyotype diversity between individuals, as in allotetraploids of Tragopogon (approximately 40 generations and 80 years old) (Chester et al., 2012). Indeed, it has been hypothesised that a positive feedback loop exists, whereby homeologous recombination in young allopolyploids causes depletion in DNA mismatch repair proteins, which enhances aberrant recombination and DNA loss, leading to even more homeologous recombination in future generations (Comai, 2000). Potentially from such diversity, selection could favour variants with smaller GS. It would be particularly relevant to know if the recombination frequency and/or fitness of polyploid individuals with smaller GS increase when growing under nutrient-limited or water stress conditions, but such data are lacking.

Genome downsizing is a byproduct of high recombination frequencies in polyploids
Genome downsizing may indirectly arise as a consequence of changes in recombination frequencies triggered by WGD. Initially, WGD is expected to increase the number of recombination events per generation assuming regular bivalent formation, because of the increased chromosome number in the polyploid nucleus and the typical occurrence of one or two chiasma per bivalent (Stapley et al., 2017). If such increased meiotic recombination is associated with an increased frequency of recombination-based DNA excision (e.g. frequency of intra-chromatid recombination), this could lead to greater DNA losses in early polyploids, which would decline as polyploid genomes diploidise. Nevertheless, in young polyploids, selection is predicted to favour reduced chiasma frequency to facilitate bivalent formation (Le Comber et al., 2010;Yant et al., 2013). Over longer timeframes, reductions in chromosome number as part of the diploidisation process following WGD (Dodsworth et al., 2015a;Wendel, 2015) may reduce chiasma frequency. Chromosome number reduction may itself be driven by selection on chromosome fusions that bring adaptive genes into linkage groups (Hoffmann and Rieseberg, 2008;Ritz et al., 2017). Whatever the cause, the return to a diploid-like chromosome number with polyploid age might act to reduce the rate of DNA loss over time. Genome downsizing is a byproduct of selection for a nutrient-efficient transcriptome Although data are limited, studies have shown that the size of the transcriptome in a cell can change with development (Sulpice et al., 2014), in association with polyploidy (Doyle and Coate, 2019) and growth rate (Reef et al., 2010). Nevertheless, available estimates of transcriptome size reveal that RNA:DNA ratios typically range from about 1:1 to 20:1 in plants (e.g. Avicennia marina [0.51 Gbp/1C]; Reef et al., 2010). We are unaware of evidence that suggests that the RNA:DNA ratio scales with GS. Assuming so, species with larger genomes tend to have bigger transcriptomes. Much of the increase in transcriptome size may arise through the transcription of repeats, which themselves scale in number with GS (Nov ak et al., 2020) and which are likely to induce a rising nutrient cost with GS upon which selection may act.
Although there are rather few studies, it is estimated that as much as 50-80% of the total cellular P is contained in nucleic acids (Hessen et al., 2010), probably mostly ribosomal RNA, but the value of this component can also be regulated, as can the relative size of mRNA and other RNA fractions (Doyle and Coate, 2019), which include transcripts from repeats. Such differences arise through variation in (i) rates of transcription, (ii) the amount of RNA stored (especially pre-ribosomes in the nucleolus) and/or (iii) rates of RNA cycling (i.e. transcription and breakdown, Hessen et al., 2010).
As previously noted, the nutrient costs of the transcriptome can influence the selection of N-efficient nucleotides in plants with less N-efficient photosynthesis (Kelly, 2018). Perhaps these costs also influence GS indirectly through selection towards lower N and P costs from the transcriptome by reducing the transcription of repetitive DNA. Wasteful transcription results in reductions in available N and P for photosynthesis and biomass production, a cost that may select for processes that eliminate repeats. Deletion of repeats could potentially remove a substantial nutrient burden, far beyond that of the genome itself, since the transcription of repeats is ongoing in every cell, whereas the cost of the DNA in the genome may be less demanding once it has replicated (but still requiring DNA maintenance and repair). Even repeats present in low copy numbers may have a substantial cost if they are transcriptionally active, as seen in some transposing repeats (Dias et al., 2015). In addition to transcriptome costs, even the smallscale elimination of an actively transcribed and/or transposing repeat could have a selective advantage if it disrupted gene regulation and function (Lisch, 2013).
We know that polyploidy can stimulate much epigenetic reprogramming (Nieto Feliner et al., 2020), which can lead to increased transcriptional and transpositional activity of repeats (Petit et al., 2010) that are not silenced by the epigenetic machinery. In early polyploid lineages, this may result in the upregulation of NHEJ and/or unequal HR in DNA repair, leading to elevated frequencies of DNA elimination ( Figure 2). Such upregulation may have a selective advantage if the eliminated DNA 'kills' actively transcribed and transposing repeats. Potentially, gene redundancy in young polyploids means that there is little selective disadvantage to any random loss of genes this process might also cause. The fixation of such a process may set in train processes that ultimately lead to substantial and significant genome downsizing over time. This would continue until gene redundancy is exhausted and there is once again selection for more faithful HR DNA repair, without Box 1. Summary of the review.
• Following polyploidy or WGDs, the genomes of angiosperms are typically downsized, pointing to the action of selection on GS.
• The differential activities of HR and NHEJ DNA repair pathways might be responsible for genome downsizing and the divergence of GS following polyploidy.
• Ancestral GS reconstructions suggest that rates of genome downsizing (base pairs per generation) following polyploidy are likely to be very low, raising the conundrum as to how selection might act.
• Possible sources of selection driving genome downsizing include reduced nutrient costs of the genome and transcriptome and/or reductions in cell size, facilitating efficient gas exchange for photosynthesis and enhanced water use efficiency.
• Genome downsizing may be a byproduct of processes that lead to smaller genomes, and derived smaller might offer a selective advantage as an emergent property. substantial, or any, DNA loss. A prediction, therefore, is that the altered regulation of, and selection for, different recombination pathways will vary with polyploid age.

CONCLUDING REMARKS AND FUTURE PERSPECTIVES
Clearly, there is still much to learn about the mechanisms, rates and selection pressures to distinguish between these different hypotheses underpinning genome downsizing following polyploidy. Nevertheless, addressing these uncertainties is now within reach given the growing availability of high-quality chromosome-level genome assemblies, large transcriptomic and methylome datasets, resolved phylogenetic relationships between species and advances in machine learning approaches. These data will provide information on genome structure and organisation, the recombinational landscape and measures of selection acting on nucleotides, codons (in the genome and transcriptome) and amino acids. Together they will enable us to derive detailed insights into how and why genomes are downsized by searching for signals of selection across genes, non-coding sequences and chromosomal regions, as well as signals of DNA deletions and insertions in comparative analyses of polyploids of different ages (recent to ancient). In addition, there is a need to combine these genomic studies with robust ecological experiments to track the fitness (e.g. photosynthesis and water use efficiency) of polyploids with different GSs and from different phylogenetic lineages growing under different environmental conditions. Together, these data will enable us to distinguish between species-specific and general responses to selection pressures, and hence generate a holistic understanding of the evolutionary and ecological forces responsible for genome downsizing.