Low dispersal and ploidy differences in a grass maintain photosynthetic diversity despite gene flow and habitat overlap

Geographical isolation facilitates the emergence of distinct phenotypes within a single species, but reproductive barriers or selection are needed to maintain the polymorphism after secondary contact. Here, we explore the processes that maintain intraspecific variation of C4 photosynthesis, a complex trait that results from the combined action of multiple genes. The grass Alloteropsis semialata includes C4 and non‐C4 populations, which have coexisted as a polyploid series for more than 1 million years in the miombo woodlands of Africa. Using population genomics, we show that there is genome‐wide divergence for the photosynthetic types, but the current geographical distribution does not reflect a simple habitat displacement scenario as the genetic clusters overlap, being occasionally mixed within a given habitat. Despite evidence of recurrent introgression between non‐C4 and C4 groups, in both diploids and polyploids, the distinct genetic lineages retain their identity, potentially because of selection against hybrids. Coupled with strong isolation by distance within each genetic group, this selection created a geographical mosaic of photosynthetic types. Diploid C4 and non‐C4 types never grew together, and the C4 type from mixed populations constantly belonged to the hexaploid lineage. By limiting reproductive interactions between photosynthetic types, the ploidy difference probably allows their co‐occurrence, reinforcing the functional diversity within this species. Together, these factors enabled the persistence of divergent physiological traits of ecological importance within a single species despite gene flow and habitat overlap.


| INTRODUC TI ON
Geographical isolation leads to genetic divergence of populations and, over time, speciation (Bolnick & Fitzpatrick, 2007;Butlin et al., 2008;Jordan, 1905;Mayr, 1947;Templeton, 1981). Populations in allopatry may experience different local selection pressures, which can drive phenotypic divergence. Following the removal of geographical barriers, secondary contact between populations with different phenotypes allows gene flow to resume (Barton, 2001;Dong et al., 2020). If the different phenotypes are selectively equivalent, or one is advantageous over the whole species' range, secondary genetic exchanges will lead to a homogenization of the phenotype over time, potentially erasing some of the character states that evolved in isolation (Coyne & Orr, 2004). When populations with different locally adapted phenotypes come into secondary contact, the balance between gene flow and selection maintains the different phenotypes in their respective environments and can result in a stable hybrid zone (Abbott, 2017;Barton, 2001;Barton & Hewitt, 1985;Endler, 1977;Ingles & Biglione, 1952;Slatkin, 1973). These dynamics have been well studied in the case of simple traits controlled by a few genes and for quantitative traits (Gay et al., 2008;Mallet, 1986;Mallet et al., 1990;Nürnberger et al., 1995;Poelstra et al., 2014;Toews et al., 2016). Here, we test whether similar processes dictate the fate of complex physiological adaptations that emerge only when multiple genes are modified. C 4 photosynthesis is a complex physiological trait that evolved over the ancestral C 3 type and increases plant productivity in warm and arid conditions (Atkinson et al., 2016;Ehleringer & Monson, 1993;Hatch, 1987;Sage et al., 2012). The C 4 phenotype results from the coordinated action of multiple enzymes in a suitable leaf anatomy (Hatch, 1987;Sage et al., 2012;Williams et al., 2013). Most C 4 lineages evolved 5-30 million years ago (Christin et al., 2011), and extant species generally include a single photosynthetic type. The clearest exception is the palaeotropical grass Alloteropsis semialata, which presents unequalled intraspecific photosynthetic diversity, with C 4 genotypes and non-C 4 populations with different amounts of C 4 cycle activity (Ellis, 1974, Maquia et al., (2019). Populations are coloured based on their genetic lineage (see Figure S3). Populations with strong signs of admixture in the cluster analyses ( Figure 4) are indicated with triangles, and the positions of the densely sampled population (ZAM1715; Figure S1) and one with strong signs of introgression between diploid C 4 and diploid non-C 4 (TAN1603; Figure 5) are indicated 1981; Frean et al., 1980;Lundgren et al., 2016). The species originated in the Central Zambezian miombo woodlands of Africa around 3 million years ago (Bianconi et al., 2020;Lundgren et al., 2015; Figure 1).
Lineage isolation, potentially linked to episodic woodland contraction in the region during glacial cycles, was followed by the emergence of the C 4 phenotype, while migration during periods of re-expansion allowed secondary contacts between C 4 and non-C 4 groups (Bianconi et al., 2020;Olofsson et al., 2016), although whether hybrid zones were created remains unknown. The species later spread outside of its region of origin, with C 4 lineages colonizing the rest of Africa, Asia and Oceania, while a non-C 4 lineage migrated to southern Africa (Lundgren et al., 2015). Previous genome analyses have found evidence of admixture between C 4 and non-C 4 populations that happened recurrently over the past 2 million years (Bianconi et al., 2020;Olofsson et al., 2016), showing that reproductive barriers are at most partial. Despite their ability to hybridize, C 4 and non-C 4 lineages of A. semialata have coexisted in the Central Zambezian region for more than 1 million years (Bianconi et al., 2020). The factors allowing the persistence of these two photosynthetic types remain unknown.

Variation of ploidy levels has previously been reported within
A. semialata. All the non-C 4 individuals analysed so far are diploid, while C 4 individuals can be di-, tetra-, hexa-, octo-or even dodecaploids (Bianconi et al., 2020;Ellis, 1981;Frean & Marks, 1988;Liebenberg & Fossey, 2001;Lundgren et al., 2015;Olofsson et al., 2016). Genome analyses have shown that polyploidization happened repeatedly in different parts of the species range, in some cases facilitating mixing of non-C 4 and C 4 genetic groups (Bianconi et al., 2020). In the Central Zambezian region, diploid and hexaploid C 4 have been reported alongside diploid non-C 4 (Bianconi et al., 2020). The effect of this variation of ploidy levels on gene flow dynamics across the region remains unexplored.
In this study, we analyse the photosynthetic types, geographical distribution, genome dynamics and ploidy levels of A. semialata populations to test for mechanisms responsible for the coexistence of C 4 and non-C 4 populations in Central Zambezian miombo woodlands.
First, we characterize the geographical distribution of the different photosynthetic types to test the hypothesis that the photosynthetic innovation was accompanied by habitat displacement upon secondary contact (Brown & Wilson, 1956;Pfenning & Pfenning, 2010;Raabová et al., 2008;Schluter, 2000;Schluter & McPhail, 1992).
Second, we establish the distribution of ploidy levels among photosynthetic types and genetic lineages to test the hypothesis that ploidy differences generate reproductive barriers (Moyle et al., 2004;Ramsey & Schemske, 1998). Third, we quantify the population structure and introgression events to test the hypothesis that secondary exchanges coupled with selection created one or more hybrid zones among groups of the same ploidy level (Barton, 2001;Barton & Hewitt, 1985;Endler, 1977;Ingles & Biglione, 1952). Fourth, we evaluate the patterns of differentiation across the genome to test the hypothesis that the photosynthetic types are maintained by C 4 genes acting as barrier loci (Barton & Bengtsson, 1986;Ravinet et al., 2017;Wolf & Ellegren, 2017), due to the breakdown of the C 4 apparatus in hybrids (Brown & Bouton, 1993;Oakly et al., 2014). Overall, our investigations indicate that different states for the C 4 complex trait are maintained among diploids because of low dispersal, while the presence of polyploids increases the overall diversity.

| Population sampling and determination of photosynthetic types
Populations of Alloteropsis semialata were located during consecutive field trips conducted in the rainy seasons between 2012 and 2019 (Table S1). Populations were visually identified while driving or found during stop-and-walk searches. At least one individual was sampled from all identified populations. At each collection point, GPS coordinates were recorded and leaf material from individuals growing at least 1 m apart were sampled and dried in silica gel. For some populations, live cuttings or seeds were also collected. Large populations where different photosynthetic types were suspected to co-occur after inspecting the vein architecture with a hand lens were sampled more intensively. In particular, for one population, ZAM1715, a total of 103 individuals were collected along parallel transects and individual-level GPS coordinates were recorded ( Figure S1).
The photosynthetic type of each plant was determined using stable carbon isotopes. This method differentiates plants that grew using a C 4 photosynthetic cycle fixing the majority of their carbon with phosphoenolpyruvate carboxylase (PEPC) from non-C 4 plants that acquired most of their carbon without the C 4 cycle (non-C 4 plants; Bender, 1971;Stata et al., 2019). For some individuals, carbon isotope ratios were retrieved from previous studies (Table S1). For others, a total of 1-2 mg of dried leaf tissue was sampled and analysed using an ANCA GSL preparation module coupled to a 20-20 stable isotope analyser (PDZ Europa). The carbon isotopic ratio (δ 13 C, in ‰) was reported relative to the standard Pee Dee Belemnite (PDB).
Individuals with a δ 13 C value above −17‰ were considered to be C 4 whereas all other individuals were considered to be non-C 4 , which can include plants without any C 4 activity (referred to as C 3 plants) and plants with a weak C 4 cycle activity (Stata et al., 2019). Non-C 4 individuals of A. semialata from the Central Zambezian region previously analysed used a weak C 4 cycle, while those from southern Africa were C 3 (Bianconi et al., 2020;Lundgren et al., 2016). The generality of this pattern cannot be confirmed based solely on carbon isotopes and we consequently refer collectively to these plants as non-C 4 . If there was enough material left, the isotope measurements were repeated for individuals where the δ 13 C values did not match other individuals of the population and genomic group (Table S1).

| Estimation of genome sizes
Genome sizes for some individuals were retrieved from the literature (Bianconi et al., 2020;Olofsson et al., 2016) and used to infer ploidy levels based on known genome sizes of diploids and hexaploids of A. semialata (Olofsson et al., 2016). To supplement this data set, we estimated the genome sizes of additional populations from the Central Zambezian region. We first focused on individuals for which we had collected live cuttings and then used individuals grown from field-collected seeds. Finally, we added estimates based on silica gel-dried leaves for some populations in an effort to maximize the geographical and photosynthetic diversity of populations with genome size estimates (Table S1). The new genome sizes were estimated by flow cytometry using internal calibration standards, as described in Bianconi et al. (2020).

| Population-level RAD sequencing and genotyping
Genomic diversity was assessed using a reduced level sequencing approach. DNA was extracted from a 1-to 3-cm dried leaf fragment using the DNeasy Plant Mini Kit (Qiagen), following the manufacturer's protocol. Double-digested restriction-associated DNA sequencing (ddRADSeq) libraries were produced as previously outlined . Briefly, a barcoded and a common adaptor (Petersen et al., 2014) were ligated to ~200-700 ng DNA (7 µl DNA Raw reads were trimmed, cleaned and demultiplexed as previously described . In short, the program trimmomatic tool kit (Bolger et al., 2014) was used to remove adaptor and primer sequences (ILLUMINACLIP option in palindrome mode).
Using trimmomatic, low-quality bases (Q < 3) were further removed from the 5′ and 3′ ends, as were all bases with a low quality (Q < 15) in a four-base sliding window. The cleaned reads were demultiplexed and barcodes were removed using the process.RADtag.pl script from the program stacks (Catchen et al., 2013). Reads mapping to the chloroplast were then removed using samtools version 2.2.3 (Li, 2011;Li et al., 2009). Clean reads were mapped to the reference genome of A. semialata (ASEM_AUS1_v1.0; GenBank accession QPGU01000000;  using the default settings for paired-end reads in bowtie2 version 2.2.3 (Langmead & Salzberg, 2012). Variants (single nucleotide polymorphisms [SNPs]) covered by 2-20 reads were called from uniquely mapped reads using a combination of samtools and the consensus calling function in bcftools version 1.1.1 (Li, 2011) treating all samples as if they were diploids, because treating individuals as higher ploidy levels would be unreliable with this type and depth of sequencing (Stift et al., 2019). Finally, all variants were merged and filtered, only keeping individuals with less than 99% missing data and only keeping variants at least 1 kb apart on one of the nine chromosomes with a minor allele count of 56 (minor allele frequency of ~0.05) and less than 90% missing data.

| Phylogenetic analyses, population structure and tests for isolation by distance
A maximum likelihood phylogeny was inferred in raxml version 8.2.11 (Stamatakis, 2014) for the nuclear SNP alignment using a GTR+G substitution model. Node support was evaluated with 100 rapid bootstraps.
Population structure was evaluated with a principal component analysis (PCA) implemented in the r package adegenet (Jombart, 2008;Jombart & Ahmed, 2011;R Core Team, 2018) and with a Bayesian admixture analysis implemented in ngsadmix (Skotte et al., 2013). While some clustering analyses are robust to ploidy variation (Stift et al., 2019), PCAs tend to group individuals by ploidy level (Meirmans et al., 2018). These analyses were used here to describe the broad patterns of variation, with subsequent analyses repeated without suspected polyploids. In the PCA, missing data were coded as the mean of the allele frequency among individuals with data at each respective SNP, whereas they were coded as zero in the admixture analysis. The optimal number of population clusters in the admixture analysis was estimated using the Evanno et al., (2005) method, as implemented in clumpak (Kopelman et al., 2015) from 10 independent runs for an increasing number of population clusters from one to 20. A separate clustering analysis was performed solely on individuals of the densely sampled population ZAM1715.
An analysis of molecular variance (AMOVA) among the five genomic groups (see Results) was performed on the variants using arlequin version 3.5.2.2 (Excoffier & Lischer, 2010). Populations where non-C 4 and C 4 individuals co-occurred (see Results) were split into two groups based on the assignment of individuals to nuclear clades. This analysis was repeated excluding the nuclear clade that contains mainly hexaploid samples (clade IIIb) as well as the known polyploid population from clade II (ZAM1958) and those from clade IV (Table S1).
Isolation by distance was evaluated by calculating pair-wise F ST in vcftools version 0.1.14 (Danecek et al., 2011) between populations with at least three individuals. Populations from clade IV were not included as they span several continents and islands, making dispersal distances difficult to compare. Mixed populations were separated by nuclear clade, as above, and the polyploid population ZAM1958 was excluded, resulting in 96 populations (15 from clade I, 19 from clade II, 21 from clade IIIa and 41 from clade IIIb).
Geographical distance (in km) between each pair of these populations was extracted from the GPS coordinates taking the curve of the Earth into account in r. The relationship between genetic and geographical distances was tested using Mantel tests. These analyses were conducted independently for individuals belonging to the same or different genomic groups (see Results), in each case using the Spearman correlation and 9,999 permutations. For intragroup comparisons, rows and columns of the square, symmetrical distance matrix were shuffled simultaneously. The intergroup comparisons produced rectangular, nonsymmetrical matrices, and their columns and rows were shuffled independently. Slopes, intercepts and R 2 were estimated using linear regressions.

| Tests for introgression
Introgression between genetic clusters was tested with Patterson's D-statistics (ABBA-BABA statistics), using the software package dsuite version 0.4, which counts site patterns using population allele frequencies (Malinsky et al., 2020). Based on the phylogenetic tree (see Results), clade I was used as the outgroup

| Genome scans
Genetic differentiation (F ST and d xy ) was evaluated across the whole genome (nine chromosomes) of A. semialata between the two geographically overlapping non-C 4 and C 4 genomic groups containing diploids (groups II [excluding the hexaploid ZAM1958 population] and IIIa; see Results) as previously outlined . In short, F ST was calculated in 500-kb sliding windows (10-kb slide) from all variants using vcftools. All variants with a minor allele count of 21 were included to allow for a minor allele frequency of approximately 0.05 among the included individuals (107 non-C 4 from clade II and 104 C 4 from clade IIIa). Absolute divergence (d xy ) was calculated in 500-kb sliding windows (10-kb slides) from allele frequencies obtained from vcftools using a combination of custom python, r and bash scripts as previously outlined . Null distributions of F ST and d xy were obtained according to Olofsson et al., (2019). Windows above the 99 th percentile or below the 1 st percentile of the F ST and d xy null distributions were considered outliers. Finally, the F ST and d xy in windows were plotted along the genome using the python library matplotlib (Hunter, 2007). The values of the two statistics were retrieved from the windows containing three genes with known differential expression between photosynthetic types in A. semialata  and were compared to the rest of the genome. To evaluate the influence of the general increase in differentiation and divergence around putative centromeres , we also compared differentiation of windows containing the three C 4 genes to those from the rest of the genome with a gene density ≥50/Mb .

| Different photosynthetic types can co-occur in the same location
Populations were densely sampled across Central Zambezian miombo woodlands (87 distinct locations within Zambia and southwest Tanzania; Figure 1), where most of the photosynthetic diversity in Alloteropis semialata is found (Bianconi et al., 2020;Lundgren et al., 2016). Other populations across the global species' range were added to the data set to provide comparison points ( Figure 1).
Carbon isotope ratios can identify individuals that grew using the C 4 photosynthetic cycle (Bender, 1971). Values for a total of 593 individuals of A. semialata show a bimodal distribution, as previously reported ; Figure S2, Table S1).
All values above −17‰ were categorized as C 4 , with all others referred to as non-C 4 . The whole range of isotope values is observed within the Central Zambezian miombo woodlands, where C 4 and non-C 4 types geographically overlap (Figure 1). Most populations in this region are composed of individuals with a single photosynthetic type (either C 4 or non-C 4 ), but non-C 4 and C 4 carbon isotopes were detected in seven populations spread across Zambia (JKO2, JKO25, ZAM1503, ZAM1507, ZAM1715, ZAM1936 and ZAM1957; Table S1). In two of these (JKO2 and JKO25), the existence of different photosynthetic types (C 4 and non-C 4 ) was supported by a single individual, a lack of biological material prevented repetition of the isotopic measurements (Table S1), and these individuals were not genetically distinct from the other individuals in the same population (see below). These two populations were therefore not considered as mixed in this study. In the other five populations, repeated isotope measurements confirmed the existence of different photosynthetic phenotypes, with multiple individuals of each type sampled in four cases (Table S1). The coexistence of different photosynthetic types within five populations is thus unambiguous (populations indicated in Figure 1).
Within one of these five populations (ZAM1715), sampling of 103 individuals confirmed that C 4 and non-C 4 types are spatially mixed ( Figure S1; Table S1), and in some cases grow <3 cm apart. These distribution patterns show that the photosynthetic types overlap and can in some cases be found completely mixed.

| C 4 and non-C 4 types correspond to different genetic groups
All sampled individuals were assigned to previously delimited nuclear lineages (Bianconi et al., 2020;Olofsson et al., 2016) Figure S3).
The first four axes of a PCA on nuclear markers explain a total of 39.81% of the variance in the data set and jointly identify five clusters corresponding to the three nuclear phylogenetic clades I, II and IV, plus clade III divided on the fourth component into two groups referred to as IIIa and IIIb (Figure 2; Figure S3). In the phylogenetic tree, clade IIIb forms a poorly resolved group that is paraphyletic with respect to the well-supported clade IIIa ( Figure S3).
Based on our sampling, the nuclear clade IIIb is the most frequent in Zambia, but the distributions of clades II, IIIa and IIIb overlap geographically in the studied region (Figure 1). Our field records show that each group is found in both open grasslands and habitats with a tree cover ( Figure S4). In one population with different photosynthetic types (ZAM1936), the non-C 4 individuals occupied an open grassland while the C 4 individuals were spread in the adjacent woodland, but in the four others (ZAM1503, ZAM1507, ZAM1715 and ZAM1957), C 4 and non-C 4 individuals appeared mixed within the same habitat. In all five populations with differences in photosynthetic types, the non-C 4 individuals belong to clade II and the C 4 individuals belong to clade IIIb (Table S1). This pattern was confirmed with a dense sampling of population ZAM1715 ( Figure S1a).  Table S1), as previously reported for diploid and hexaploid A. semialata individuals, respectively (Bianconi et al., 2020;Olofsson et al., 2016). Four individuals with values close to 7 Gb/2C are possibly octoploids, but these values were estimated from silica gel-dried material, and analysis of fresh samples is needed to distinguish hexa-from octopolyploids ( Figure 3; Table S1).

|
In the Central Zambezian region, all 28 non-C 4 individuals with genome size estimates are diploids (Figure 3; Figure S2, Table S1).

| Distinct genetic groups are maintained despite gene flow
The optimal number of population clusters estimated using Bayesian admixture analyses as implemented in ngsadmix is three, which like the PCA based on nuclear markers splits the samples into groups corresponding to the non-C 4 clades I+II, the C 4 clade III, and the C 4 clade IV ( Figure 4; Figure S6). A secondary optimum of six population clusters then sorts the samples into the five genomic groups identified in the PCA with clade IIIa splitting into two sub-groups (Figure 4; Figure S6).
Most individuals are assigned to a single genomic group, but admixture between C 4 and non-C 4 clusters is clear in the population previously shown to contain dodecaploids (Bianconi et al., 2020), but also in some newly reported hexaploids from Zambia (Figure 4; Figure S7).
Importantly, admixture between distinct photosynthetic types is also suggested in some diploid individuals (Figure 4; Figure  The connectivity among populations was evaluated by testing for isolation by distance, excluding clade IV, which is spread among multiple continents and islands (Figure 1), and the polyploid population ZAM1958. After correcting for multiple tests, the relationship between genetic differentiation and geographical distance is significant within each of the four clades I, II, IIIa and IIIb ( Figure 6). However, the relationship is weaker within the non-C 4 group II, which might indicate the existence of undetected genetic subgroups. During field collections, we observed morphological variation among the non-C 4 from clade II, some having short, dense racemes while the others had long racemes similar to those of the C 4 plants. These two types were retrieved in the admixture analysis with seven clusters ( Figure S6).
When they were analysed separately, each showed evidence of isolation by distance ( Figure S9). While some barriers might exist within group II, gene flow is therefore prevalent within the other genomic groups and mainly restricted by geography ( Figure 6). When considering pairs of populations from different genomic groups, the pattern of isolation by distance is significant between the two non-C 4 clades I and II, and between the C 4 group IIIb that contains hexaploids and the non-C 4 group I as well as the C 4 group IIIa, which both contain diploids ( Figure S8). In these cases, however, the intercept is high (0.67, 0.58 and 0.33, respectively), and the three other comparisons do not show evidence of isolation by distance ( Figure S8). Similar conclusions are reached when considering the two subgroups from clade II with different inflorescence types ( Figure S9). Gene flow between the genomic groups is therefore limited and in most cases it is independent of the geographical distance ( Figure S8). These patterns, which mirror those reported in another grass species with marked F I G U R E 4 Genetic structure within Alloteropsis semialata. Results of an admixture analysis are shown for (a) three (K = 3) and (b) six clusters (K = 6). The nuclear groups are delimited at the bottom and coloured as in Figure  1. Each bar represents one individual. Samples with signs of admixture between different photosynthetic types are shown in detail on top, and the ploidy levels of samples within these populations are indicated with symbols next to the names; circle =2x, square =6x or 8x, triangle =12x ecotypes (Grabowski et al., 2014), argue against clear hybrid zones, and show that the main genomic groups generally retain their identity despite close proximity and episodes of gene exchange.

| Genetic differentiation is widespread across the genome
The differentiation between the overlapping diploid clades II (excluding the polyploid ZAM1958) and IIIa (non-C 4 and C 4 , respectively) was evaluated with genome scans (F ST and d xy ). Genomic divergence is generally increased in regions with low gene content and high levels of transposable elements ( Figure S10), a pattern attributable to reduced recombination around putative centromeres (Cruickshank & Hahn, 2014;Roesti et al., 2013), which was previously observed in A. semialata . Outside of these regions, numerous windows of low divergence suggest rampant introgression, while peaks of high differentiation are scattered across all chromosomes ( Figure S10).
Importantly, these peaks are not associated with genes known to be involved in C 4 photosynthesis. In particular, the three genes that play a role in the C 4 cycle and were shown to be differentially expressed among photosynthetic types of A. semialata  showed patterns of diversity and differentiation similar to the rest of the genome, even after excluding regions with low gene density (Figure 7). These patterns indicate that divergence is driven by background selection or multiple regions scattered around the genome that do not encompass the genes previously identified to be involved in the photosynthetic differentiation of this species.

| Photosynthetic types can share the same habitat
Based on phylogenomics studies conducted across the species' range, the different photosynthetic types of Alloteropsis semialata diverged during periods of geographical isolation, and thus the present-day F I G U R E 5 Evidence of introgression among C 4 and non-C 4 populations from the Central Zambezian region. (a) D statistics were used to test for gene flow between the non-C 4 clade II and one of the geographically overlapping C 4 clades. Generic positions are indicated at tips of the phylogeny, together with the groups considered in the tests (see Figure S3 for complete phylogeny). The arrow shows the introgression corresponding to a significant test. overlapping ranges represent a secondary contact that occurred more than 1 million years ago (Bianconi et al., 2020;Olofsson et al., 2016). Across the entire species' range, the ecological niches of C 4 and non-C 4 A. semialata have been shown to overlap (Lundgren et al., 2015) and this is confirmed here by the co-occurrence of C 4 and non-C 4 individuals in five populations spread across Zambia (Figure 1; Figure S1), which adds to previously reported mixed populations in South Africa (Frean et al., 1980;Liebenberg & Fossey, 2001). These cases of co-occurrence show that the photosynthetic types are selectively equivalent, at least in some environments.
C 4 photosynthesis is advantageous mainly in warm, open habitats (Ehleringer et al., 1997;Ehleringer & Pearcy, 1983), and some Central Zambezian miombo woodlands might lie at the point where the C 4 and non-C 4 types are energetically equivalent. Alternatively, biotic factors might lead to niche differentiation within the same habitat, as reported in several animal systems (Comeault et al., 2015;Lindtke et al., 2017;Losey et al., 1997;Olendorf et al., 2006), thereby allowing the two types to exist in various environments.
First, the two types might exploit different soil layers, with variation in rooting systems, which is affected by photosynthetic type (Wade et al., 2020), potentially allowing ecological partitioning of nutrient and water uptake (Fargione & Tilman, 2005;McKane et al., 2002;Phoenix et al., 2020). Second, mycorrhizal fungi might drive niche divergence in mixed populations, as previously reported in other systems (Osborne et al., 2018). Third, C 4 plants are generally less palatable and therefore preferentially avoided by herbivores (Boutton et al., 1978;Caswell et al., 1973). Such an advantage might counterbalance the energetic costs of the C 4 biochemical pathway, and allow C 4 populations to colonize habitats where non-C 4 individuals would otherwise be favoured. The coexistence of non-C 4 and C 4 A. semialata in mixed populations, nevertheless, demonstrates that they can grow in the same habitats. Our data therefore refute a complete habitat displacement of one of the photosynthetic types following secondary contact.

| Low dispersal probably prevents the homogenization of photosynthetic types among diploids
Previous genomic analyses found evidence of admixture between C 4 and non-C 4 plants in the Central Zambezian region (Bianconi et al., 2020;Olofsson et al., 2016), and the recurrent mixing of the genetic groups corresponding to the two photosynthetic types is confirmed here with our denser sampling. Strong admixture was detected in particular in polyploid individuals (Figure 4; Figure S7), mirroring other study systems (Fay et al., 2019;Grabowski et al., 2014;Novikova et al., 2020;Parisod et al., 2010;Schmickl & Koch, 2011).
In addition, we obtained statistical evidence of admixture between diploid groups of C 4 and non-C 4 individuals ( Figure 5), demonstrating that the distinct lineages are not completely incompatible. Despite these recurrent exchanges, the identity of each genomic group is maintained after more than 1 million years in the same region (Figures 2 and 4).
Despite overlapping ranges (Figure 1), the diploid non-C 4 and C 4 groups were never found in mixed populations (Figure 3). We hypothesize that the presence of diploids from the other photosynthetic type might interfere with reproduction. On the one hand, frequent pollination by the other type would decrease seed production if it results in unviable embryos or interferes with pollination F I G U R E 6 Patterns of isolation by distance within each group. For each of the four genetic groups within Alloteropsis semialata, pairwise genetic differentiation (F ST ) is plotted against geographical distance. (a) Non-C 4 clade I, (b) non-C 4 clade II, (c) C 4 clade IIIa and (d) C 4 clade IIIb. In each case, the p-value, obtained with a Mantel test and corrected for 10 tests (including intergroup analyses; Figure S8), is indicated. For significant relationships, slopes, intercepts and R 2 were estimated with linear regressions. Points are coloured as in Figure 1 (a) by the same type (e.g., Nishida et al., 2014). On the other hand, successful cross-fertilization would decrease fitness if the combination of genetic components disrupts the C 4 photosynthetic machinery through epistasis, as previously observed in crosses between photosynthetic types (in Atriplex and Flaveria; Brown & Bouton, 1993;Oakly et al., 2014). Our genome scans show that the divergence between C 4 and non-C 4 diploids is driven by a large number of regions scattered across chromosomes, which do not include the few genes with a known C 4 function that are differentially expressed between photosynthetic types of A. semialata (Figure 7 Independent of the mechanism, both pre-and post-zygotic negative interactions would decrease the fitness of the photosynthetic type that is at low frequency in a mixed population and therefore has a high probability to reproduce with the alternative photosynthetic type (Lewis, 1961;Ray et al., 1979;Toll & Willis, 2018). In the absence of ecological differentiation, the photosynthetic type that first colonized and became abundant in a given area would therefore retain an advantage due to its higher frequency. While some introgression might still occur, newly arrived individuals would gradually disappear from each locality (Nishida et al., 2020;Ray et al., 1979;Templeton, 1981;Whitton et al., 2017), leading to the spatial sorting of photosynthetic types (Nishida et al., 2020). In the case of A. semialata, the low connectivity between diploid populations from the Central Zambezian region (i.e., strong isolation by distance; Figure 6) probably makes complete homogenization of photosynthetic types very slow and potentially impossible. We conclude that the low dispersal of diploids of A. semialata is key to the maintenance of distinct photosynthetic types within this region.

| Polyploidization enables co-occurrence of different photosynthetic types
Polyploids have arisen repeatedly in different parts of the range of A. semialata (Bianconi et al., 2020). In South Africa, C 4 polyploids cooccur with non-C 4 diploids (Frean et al., 1980;Liebenberg & Fossey, 2001) and we report here several populations spread across Zambia where non-C 4 diploids grow mixed with C 4 hexaploids (Figures 1 and   3). Some of the polyploid populations reported here present clear signs of admixture among photosynthetic types (e.g., ZAM1958; Figure 4).
Polyploids might also facilitate gene flow among the two photosynthetic types, and tetraploidy in Arabidopsis lyrata has been shown to restore compatibility with diploids from the otherwise incompatible species Arabidopsis arenosa (Lafon-Placette et al., 2017). Future studies should test whether such a process occurs between diploids and hexaploids of A. semialata, but D-statistics indicate that gene flow occurs predominantly among the non-C 4 and C 4 clades that contain diploids ( Figure 5). Based on analyses of organelle genomes, the polyploids from the C 4 clade IIIb emerged long after the split of the C 4 and non-C 4 lineages (Bianconi et al., 2020), and the initial divergence of photosynthetic types therefore occurred in a diploid context. However, our results indicate that ploidy differences might be required to allow the mixing of non-C 4 and C 4 individuals in the same population.
In all five populations from the Central Zambezian region with both C 4 and non-C 4 individuals, the C 4 plants belong to the nuclear group associated with hexaploids (Figures 1 and 3c). Similarly, in South Africa where C 4 and non-C 4 individuals occasionally form mixed populations, the C 4 individuals are polyploids while the non-C 4 individuals are diploid (Frean et al., 1980;Frean & Marks, 1988;Liebenberg & Fossey, 2001). Diploids might be disadvantaged in the presence of conspecific tetraploids because of asymmetrical gene flow and reproductive interference (Husband et al., 2002), but post-pollination barriers preventing gene flow between diploids and hexaploids have previously been reported in another system (Castro et al., 2011;Münzbergová et al., 2013). If cross-fertilization is reduced between diploid and hexaploid A. semialata, explaining the rarity of tetraploids (none detected out of 61 samples with genome sizes from the Central Zambezian region), the ploidy difference would allow long term coexistence of conspecifics with contrasted photosynthetic types. Similar conclusions were reached for ecotypes of the switchgrass, showing that ploidy generally helps maintain intraspecific functional diversity (Grabowski et al., 2014).

| CON CLUS IONS
In this study, we assessed the population genomics of the grass Alloteropsis semialata to determine how divergent photosynthetic types can have coexisted over more than 1 million years within a region of Africa. We showed that C 4 and non-C 4 individuals overlap geographically, sometimes occurring mixed within the same habitat, ruling out complete displacement upon secondary contact.
We also found evidence of admixture and introgression between C 4 and non-C 4 populations. Groups with distinct photosynthetic types, however, behave as independent genomic entities, without classical hybrid zones. We suggest that selection against hybrids maintains the genetic groups, while low dispersal prevents their homogeneization. Polyploidy, which occurred repeatedly in the species, enables mixed populations, presumably because it strengthens prefertilization isolation. We conclude that low dispersal in the Central Zambezian region coupled with selection against hybrids and polyploidization explain the persistence of divergent photosynthetic types despite gene flow and a lack of habitat differentiation.

DATA AVA I L A B I L I T Y S TAT E M E N T
All raw reads are available in the short sequence archive under Accession nos. PRJNA560360 and PRJNA64872. Sample information is available in Table S1.