Hybridisation and chloroplast capture between distinct Themeda triandra lineages in Australia

Abstract Ecotypes are distinct populations within a species that are adapted to specific environmental conditions. Understanding how these ecotypes become established, and how they interact when reunited, is fundamental to elucidating how ecological adaptations are maintained. This study focuses on Themeda triandra, a dominant grassland species across Asia, Africa and Australia. It is the most widespread plant in Australia, where it has distinct ecotypes that are usually restricted to either wetter and cooler coastal regions or the drier and hotter interior. We generate a reference genome for T. triandra and use whole genome sequencing for over 80 Themeda accessions to reconstruct the evolutionary history of T. triandra and related taxa. Organelle phylogenies confirm that Australia was colonized by T. triandra twice, with the division between ecotypes predating their arrival in Australia. The nuclear genome provides evidence of differences in the dominant ploidal level and gene‐flow among the ecotypes. In northern Queensland there appears to be a hybrid zone between ecotypes with admixed nuclear genomes and shared chloroplast haplotypes. Conversely, in the cracking claypans of Western Australia, there is cytonuclear discordance with individuals possessing the coastal chloroplast and interior clade nuclear genome. This chloroplast capture is potentially a result of adaptive introgression, with selection detected in the rpoC2 gene which is associated with water use efficiency. The reason that T. triandra is the most widespread plant in Australia appears to be a result of distinct ecotypic genetic variation and genome duplication, with the importance of each depending on the geographic scale considered.


| INTRODUC TI ON
Understanding why some species are found in a broad range of environments whilst others are restricted to particular habitats is one of the fundamental questions in evolutionary biology. Widely distributed species are typically able to tolerate multiple environmental conditions, through a combination of extensive phenotypic plasticity and/or locally adapted genetic variation (Stamp & Hadfield, 2020).
Populations that are genetically adapted to their local environmental conditions can be defined as ecotypes (Hufford & Mazer, 2003).
A single species can be composed of multiple genetically distinct and locally adapted ecotypes (Linhart & Grant, 1996). It is increasingly clear that ecotypes can evolve multiple times in response to the same selection pressures (Butlin et al., 2014;James et al., 2021;Jones et al., 2012;Ostevik et al., 2012), and that this can happen over very short timescales (e.g. <250 generations, Papadopulos et al., 2021). The formation of distinct ecotypes is dependent on numerous factors including the levels of gene flow, the extent of standing genetic variation, mutation rate and the scale of environmental heterogeneity (Via & Lande, 1985). Determining how ecotypes evolve and interact when they come into secondary contact and hybridize is important for our understanding of how local adaptation can ultimately lead to speciation (Lowry, 2012;Nosil, 2012).
The origin of ecotypes in plants is often driven by the difference in soil water availability between habitats, with ecotypes from drier environments generally being smaller and flowering earlier (Latta et al., 2007;Lowry, 2012;Milano et al., 2016). Detailed studies in Panicum virgatum have shown that the genetic basis of ecotype differentiation is controlled by multiple loci distributed across the genome (Milano et al., 2016), and occasional gene flow among ecotypes can accelerate climate adaptation through the introgression of adaptive loci (Lovell et al., 2021). Polyploidy can accelerate ecotype divergence as relaxed selection on individual subgenomes can facilitate the accumulation of adaptive genetic variation enabling neo-or subfunctionalisation (Lovell et al., 2021). Genome duplication itself can also have immediate effects, with autopolyploids having increased cell size which can translate into larger plants that are more robust and produce more seed (Garbutt & Bazzaz, 1983), and which are potentially more invasive (Te Beest et al., 2012). Polyploidy can also increase reproductive isolation among ecotypes, reducing gene flow when they come into secondary contact so that they maintain their divergence (Olofsson et al., 2021). Additional work is required to determine the importance of ploidy in the formation and maintenance of ecotypes, with its role likely to be linked to population demographics and the evolutionary time over which the ecotypes have been isolated.
In plants, the chloroplast is responsible for photosynthetic energy production and optimizing its function for local environmental conditions is essential to maximize fitness. As a result, the encoded enzymes are often subject to positive selection (Hu et al., 2015;Gao et al., 2019;Yang et al., 2021). The organelles can be introgressed between species or ecotypes, in a process termed chloroplast capture.
The transfer of a resident chloroplast could provide an evolutionary advantage for a species colonizing a new habitat by effectively accelerating the process of local adaptation (Muir & Filatov, 2007;Percy et al., 2014;Tsitrone et al., 2003). Chloroplast capture is typically inferred phylogenetically as a result of incongruence between the chloroplast and nuclear genome topologies that can be explained by introgression. This process has been widely documented in families across the plant kingdom (Rieseberg & Soltis, 1991). However, many of the earlier cases of chloroplast capture were identified using individual markers and subsequent reanalysis with whole chloroplast genome data has not always confirmed these cases, as seen in willows (Percy et al., 2014;Wagner et al., 2021). It is therefore essential to use whole organelle sequences to reliably infer chloroplast capture, and test for signatures of positive selection to confirm if the acquisition was adaptive rather than the result of neutral processes (Bock et al., 2014).
Themeda triandra is one of the most widely distributed C 4 grass species in the world (Sage, 2017). It has a relatively recent evolutionary history, originating c. 1.5 milliion years ago (Ma), most likely in Asia (Dunning et al., 2017). Themeda triandra has rapidly spread across Asia, Africa and Australia where it is found across a range of climates (Snyman et al., 2013). It is a perennial tussock-forming grass that is capable of reproducing both sexually and asexually through apomixis (Evans & Knox, 1969). Sexual reproduction is more common in diploid populations and plants are self compatible (Hayman, 1960;Liebenberg, 1990), although the actual rates of selfing are still unknown (Ahrens et al., 2020). It is a dominant grassland species with significant ecological, cultural and economic importance (Snyman et al., 2013). The great morphological diversity displayed by T. triandra has led to the description of different taxa with several potential synonyms (Arthan et al., 2021), ranging from regional varieties with restricted distributions to the globally invasive T. quadrivalvis (Arthan et al., 2021).
In Australia, T. triandra is the most widely distributed plant species (Gallagher, 2016) and ecotypic differences (coastal versus inland form) appear to be largely a result of variation in ploidy level (Hayman, 1960). Coastal populations that grow in cooler and wetter climates are predominantly diploid, while inland populations from drier xeric areas are prominently tetraploid (Godfree et al., 2017).
Experimental manipulations of both ploidal levels collected from the same regions showed that under drought and heat stress the tetraploids produced up to four times as much seed as the diploids (Godfree et al., 2017), with elevated fitness attributed to its increased genome size. The repeated formation of polyploids within a population was thought to be the most parsimonious explanation for the origin of the tetraploids (Hayman, 1960), although a single polyploidisation event followed by the spread of that cytotype could not be ruled out (Godfree et al., 2017). This conclusion has been supported by work that shows polymorphisms within a population are common, and that tetraploids do not form their own distinct genomic group (Ahrens et al., 2020). However, both of these recent studies are limited to south-eastern Australia and their conclusions assume that the widespread inland tetraploid ecotype is derived from the coastal diploid form, and that the ecological differences between ploidy levels are not more ancient, predating the colonization of Australia. Recent phylogenetic studies have highlighted that Australia may have been colonized by T. triandra multiple times (Dunning et al., 2017), but it is not yet clear whether these colonisations correspond to the different coastal and inland ecotypes.
In this study we use whole genome sequencing data from over 80 Themeda accessions to reconstruct the evolutionary history of Themeda triandra in Australia. We specifically aim to test whether the two Australian ecotypes evolved rapidly following colonization, potentially due to auto-polyploidisation, or alternatively, if the ecotypes arrived in Australia independently. Overall, our results show that considering a broader evolutionary history rather than focusing purely on local diversity is essential to elucidate the mechanisms of rapid environmental adaptation in widely distributed species.

| Reference genome assembly
A de novo reference genome assembly was generated for a Themeda triandra accession (TtPh16-4) collected in 2016 from the Carranglan region of the Philippines (15°56′35.8 ″N 121°00′26.2 ″E). A PacBio library was prepared by The University of Sheffield Molecular Ecology Laboratory, and sequenced on two PacBio Sequel SMRT cells. The PacBio data were cleaned and assembled using Canu version 2.0 (Koren et al., 2017) with default parameters. Organelle genomes were then generated for the TtPh16-4 accession. The chloroplast genome was assembled using a genome walking approach (see below for details). The mitochondrial genome was manually assembled from the PacBio contigs. In brief, the complete set of mitochondrial genes was extracted from a mitochondrial assembly (NC_008360.1) of Sorghum bicolor, a closely related grass from the same tribe (Andropogoneae), and used as a Blastn version 2.8.1 query to identify the top-hit TtPh16-4 contig for each gene. These contigs were then truncated to the matching regions, retaining the intergenic regions if multiple loci were present on a single contig. Finally, duplicated regions were removed and the remaining contigs concatenated into a single pseudomolecule with gaps represented by 100 Ns. The completeness of the TtPh16-4 mitochondrial genome was estimated using the MITOFY version 1.3.1 webserver (Alverson et al., 2010).
The TtPh16-4 organelle genomes were used to mask organellar DNA in the Canu genome assembly prior to additional homologybased scaffolding. Contigs containing organellar DNA were first identified using Blastn, with a minimum alignment length of 1000 bp and sequence similarity ≥99%. These scaffolds were then masked using RepeatMasker version 4.0.6 (Smit et al., 2013) with the organelle sequences as a custom database. The organelle masked contigs were then scaffolded in relation to the genome of Sorghum bicolor (GenBank accession: GCA_000003195.3;McCormick et al., 2018) using RagTag version 2.1.0 (Alonge et al., 2021). The TtPh16-4 genome assembly completeness was estimated using BUSCO version 3.1.0 (Simão et al., 2015) with the poales_odb10 database, and by comparing the assembly size to the 1C genome size estimated for another individual collected from the same area (TtPh16-2) that was estimated by flow cytometry using the one-step protocol (Doležel et al., 2007) with minor modifications (see Clark et al., 2016).

| Sampling and whole-genome re-sequencing
An initial Australia wide survey was conducted using two regions of the ribosomal DNA (rDNA), ETS and a second portion containing ITS1, 5.8S and ITS2. These data were used to assess the diversity within the T. triandra/T. quadrivalvis clade. Our ITS/ETS data set contained 373 ingroup accessions from across the entire T. triandra range ( Figure S1A), including 33 previously published accessions (Dunning et al., 2017), and 340 newly sequenced accessions (271 from across Australia [ Figure S1B], and 58 and 11 from across Africa and Asia respectively). We extracted DNA, amplified the target region and generated a maximum parsimony phylogenetic tree as in Jobson et al. (2017). From these data we identified 12 weakly resolved clades ( Figure S1A) and selected 61 Australian T. triandra samples for whole genome resequencing to represent the geographic range within each. We also selected six outgroup samples for sequencing,

| Chloroplast genome assembly and phylogenetics
Chloroplast genomes were assembled from the raw whole genome sequencing data using NOVOPlasty version 4.2.1 (Dierckxsens et al., 2017) with default parameters and a matK seed alignment extracted from a chloroplast genome assembly of a closely related species (Andropogoneae: Heteropogon sp., GenBank accession: KY707768.1). The resulting contig options were aligned to preexisting T. triandra and T. quadrivalvis chloroplast assemblies from NCBI genbank using MAFFT version 7017 (Katoh et al., 2002), and manually rearranged so that the short single copy and inverted repeat was in the same orientation for each individual using Geneious version 5.3.6 (Kearse et al., 2012) if required. If the chloroplast assembly was incomplete, the process was repeated using a different seed alignment (the rbcL gene or the entire chloroplast sequence from the same Heteropogon sp. accession). Phylogenies were inferred with and without one of the inverted repeats. The inverted repeats within a plastome recombine with each other meaning they are generally identical and including them effectively inflates the weight to these positions (Blowers et al., 1989;Palmer, 1985). Maximumlikelihood phylogenetic trees with 100 bootstrap replicates were inferred using PhyML version 20120412 (Guindon et al., 2010), with the best-fit nucleotide substitution model selected with SMS version 1.8.1 (Lefort et al., 2017).

| Mitochondrial genome assembly and phylogenetics
We used a reference-based approach to generate consensus sequences for the mitochondrial genome (Bianconi et al., 2020). The 68 Themeda samples sequenced here (the 67 samples selected across the Australian T. triandra range plus TtPh16-4) were supplemented with 14 T. triandra and T. quadrivalvis data sets retrieved from the NCBI Sequence Read Archive (SRA; Burke et al., 2016;Dunning et al., 2017;Arthan et al., 2021). Prior to mapping the Themeda sequencing data to the TtPh16-4 genome, the data were cleaned using Trimmomatic version 0.38 (Bolger et al., 2014) to remove adaptor contamination, low quality bases (4 bp sliding window with mean Phred score <20) and short reads (<50 bp). NGSQC Toolkit version 2.3.3 (Patel & Jain, 2012) was then used to discard reads where 80% of the sequence had a Phred score <20 or the read contained an ambiguous base. Finally, PRINSEQ version 0.20.3 (Schmieder & Edwards, 2011) was used to remove duplicated reads. The cleaned data were then mapped to the TtPh16-4 reference genome using Bowtie2 version 2.3.4.3 (Langmead & Salzberg, 2012) with default parameters. Consensus sequences for each Themeda sample were generated from the short-read alignments to the mitochondria using previously described methods (Bianconi et al., 2020). In short, only bases with five times the expected nuclear genome coverage were called to remove the potential effect of organelle-nuclear transfers.
The alignment was subsequently trimmed with trimAl version 1.2rev59 (Capella-Gutiérrez et al., 2009) using the -automated1 option which optimizes alignment trimming for maximum-likelihood phylogenetic tree reconstruction. A maximum-likelihood tree was inferred from the 429,412 bp alignment using IQ-TREE version 1.6.12 (Nguyen et al., 2015) with 1000 ultrafast bootstrap replicates.

| Nuclear marker assembly and phylogenetics
We inferred the nuclear history of Themeda using two different data sets, the nuclear ribosomal DNA (rDNA) array and genomewide single-copy loci. We assembled the rDNA array using the same method as Becher et al. (2020). This approach consisted of using NOVOPlasty with a 594 bp reference rDNA sequence for the seed alignment from a closely related species (Andropogoneae: Heteropogon triticeus, GenBank accession: KY991073.1). The resulting assemblies were processed in the same way as the chloroplast (see above), and the alignment was subsequently trimmed to the 5.80 kb rDNA coding region consisting of 18S, ITS1, 5.8S, ITS2, 26S. A phylogenetic tree with 100 bootstrap replicates was inferred using PhyML, with the best-fit nucleotide substitution model selected with SMS.
A reference-based approach was used to generate consensus sequences for single-copy nuclear genes to infer phylogenetic relationships Olofsson et al., 2016Olofsson et al., , 2019. Our analysis focused on the 3303 Benchmarking universal single-copy orthologues (BUSCO; Simão et al., 2015) in the poales_odb10 database identified in the TtPh16-4 genome. Consensus sequences for each Themeda sample were then generated from the bowtie2 alignments generated above for each single-copy gene in the TtPh16-4 genome using previously described methods for low-coverage whole genome data Olofsson et al., 2016Olofsson et al., , 2019. Each individual gene alignment was subsequently trimmed with trimAl, short sequences (<200 bp) were discarded from the trimmed alignment, and the entire gene alignment was discarded if it was either <500 bp or did not include all samples (n = 2096 genes retained).
A maximum-likelihood tree was inferred from a concatenated nuclear alignment of all 2096 genes (alignment length = 3,393,588 bp) using IQ-TREE version 1.6.12 (Nguyen et al., 2015) with 1000 ultrafast bootstrap replicates. Individual gene trees were also generated using SMS and PhyML as described above. A DensiTree version 2.2.7 (Bouckaert, 2010) plot was made by overlaying the individual gene trees that could be transformed to be ultrametric using the chronopl function (lambda = 1) as part of the ape version 5.2 (Paradis & Schliep, 2019) package in R version 3.4.3. A coalescence species tree was generated from the individual gene trees using ASTRAL version 5.7.5 (Zhang et al., 2018) after collapsing branches with <10% bootstrap support using Newick utilities version 1.6 (Junier & Zdobnov, 2010). Phyparts version 0.0.1 (Smith et al., 2015) was used to evaluate individual gene tree support for the coalescence species tree. The results were visualized using the phypartspiecharts.py python script written by M. Johnson (available from: https://github. com/mossm atter s/phylo scrip ts/blob/maste r/phypa rtspi echarts).

| Population structure
Genotype likelihoods were estimated across the entire TtPh16-4 nuclear genome for the 81 Themeda acessions (68 sequenced here) using ANGSD version 0.929-13-gb5c4df3 (Korneliussen et al., 2014) and the bowtie2 alignments generated above with organelle sequences excluded. Mapped reads and bases with a Phred score <20 were discarded, a per-individual maximum depth of 20 was used, and sites had to be present in at least two individuals to be considered. A principal component analysis (PCA) of the genotype likelihoods was generated using PCAngsd version 0.973 (Meisner & Albrechtsen, 2018) to estimate a covariance matrix before plotting the results in R with eigenvector decomposition. The number of genetic clusters (K) in the genotype likelihoods was examined using NGSadmix (Skotte et al., 2013), with default parameters and 10 replicates for K between 1 and 10. The optimal K was determined using the ΔK method (Evanno et al., 2005) implemented using CLUMPAK (Kopelman et al., 2015). PCA plots and the admixture analysis were repeated for only the Australian T. triandra samples using default parameters. Pairwise F ST was estimated using ANGSD among Australian populations both globally and using a sliding window method (window = 50 kb, slide = 10 kb), with this analysis restricted to individuals that had 99% of their genome assigned to a single genetic cluster.
To support the population structure results, we specifically tested for introgression between accessions by calculating Patterson's D-statistic (ABBA-BABA statistic) and the f4-ratio using Dsuite version 0.4.r38 (Malinsky et al., 2021). To calculate these statistics we used the VCF file previously generated by ANGD and selected the combination of taxa depending on the scenario being investigated. The analysis was restricted to individuals assigned as diploid (see below), and in certain cases to those with known collection locations if geographic distance within Australia was deemed as important (Table S1). All p-values were Benjamini-Hochberg corrected to account for multiple testing and summary statistics are based solely on the significant results (corrected p-value < .05).

| Estimating ploidy
The ploidy of each sample was estimated using HMMploidy (Soraggi et al., 2021), a method which has been developed to infer ploidy from low-depth sequencing data. HMMploidy uses both sequencing depth and genotype likelihoods to infer ploidy, leveraging population frequencies to account for genotype uncertainty in low-coverage data (Soraggi et al., 2021). A multisample mpileup file was generated for HMMploidy from the bowtie2 alignments with SAMtools version 1.9 (Li et al., 2009), only including reads with a minimum read mapping quality (mapQ) of 20, counting anomalous read pairs and setting a maximum per-file depth of 100. Genotype likelihoods were generated using HMMploidy with default parameters, which calculates likelihoods for a range of ploidy levels (up to 6x). Ploidy levels were then inferred in 100 kb windows across the chromosomes from the TtPh16-4 reference genome, with a minimum number of two individuals per locus to be considered. The percentage of 100 kb windows supporting each ploidy level was then calculated, ignoring those that were inferred to be haploid.
Accessions were arbitrarily assigned to a single ploidy if it was supported by ≥60% of windows. If no single ploidy level was supported, but the possible polyploid levels had a combined support of ≥60% of windows, then the sample was generically assigned as polyploid.
Finally, if neither of these criteria were met the sample was classified as unknown ploidy.

| Inferring positive selection
Pairwise estimation of the ratio (ω) of synonymous (dS) and nonsynonymous (dN) substitutions between sequences was calculated using the Yang and Nielsen (2000) method implemented in yn00, distributed as part of the paml version 4.9j package (Yang, 2007). Site (M1a and M2a) and branch-site models (BSA and BSA1) models, to infer if a gene was evolving under significant positive selection, were implemented in codeml, distributed as part of the paml version 4.9j package (Yang, 2007). For these models we used the topology of the whole chloroplast genome tree ( Figure 1 and Figure S2). TtPh16-2). The BUSCO poales_odb10 database contains 4986 genes, of which 81.5% were complete in the TtPh16-4 genome (14.0% duplication, 2.4% fragmented and 16.1% missing). No direct genome size or chromosome count is available for TtPh16-4, but we assume it to be diploid given the genome assembly shows relatively low levels of duplication (14.0% according to BUSCO), HMMploidy inferring the sample to be diploid, and the assembly size being smaller than the 0.84 Gb 1C flow-cytometry genome size estimate for another accession from the same population, TtPh16-2. Furthermore, the 1C value of TtPh16-2 is approximately half that of previously studied tetraploids (Estep et al., 2014;Godfree et al., 2017), which also suggests TtPh16-2 is diploid.

| Phylogenetic relationships inferred from the chloroplast
As part of this study, we generated over 700 Gb (mean = 5.18 Gb, SD = 4.72 Gb per sample) of whole-genome resequencing data for 68 Themeda accessions, with an emphasis on sampling T. triandra (and potential taxonomic synonyms) from across its range in Australia (n = 61; Figure 1). These data were used to assemble whole chloroplast genomes, before being aligned with previously pub-  (Figures S2 and S3). (SD = 50 kb) and 99.3% of sites were identical (99.6% with T. triandra).

| Phylogenetic relationships inferred from the mitochondrial genome
The higher-order topology was identical for both organelles (Figure 1), although there were differences within each of the clades themselves ( Figures S2-S4). When comparing the coastal (green) and claypan (blue) T. triandra accessions from Clade I with the western Australia form from Clade II (yellow), there are 244 and 256 fixed biallelic SNPs respectively. Within Clade I there are 10 fixed biallelic SNPs separating the coastal (green) and Pilbara (blue) T. triandra accessions.

| Phylogenetic relationships inferred from the nuclear genome
The nuclear data set was generated by mapping the short-read data from 82 Themeda accessions to the TtPh16-4 reference, with a mean of 67.5% (SD = 9.6%) of data mapping per sample. As expected, a higher proportion of data mapped for T. triandra (69.9%; SD = 5.6%) than for the other species: T. quadrivalvis (54.2%; SD = 5.5%), T.
The nuclear phylogenetic relationships were inferred using data from 2096 single-copy genes with a combined alignment length of 3,393,588 bp. The concatenated maximum-likelihood tree and the coalescent species tree both support T. triandra as being monophyletic, sister to T. quadrivalvis (Figure 2 and Figures S5-S7). In the nuclear phylogeny the Australian T. triandra accessions are monophyletic, and within this group the two distinct clades identified in the chloroplast phylogeny are recapitulated with just a few exceptions.
Most notably is the Pilbara clade which is identified as Clade I based on the chloroplast data (in blue; Figure 1), but is nested deep within Clade II based on the nuclear genome ( Figure 2). There are also two accessions from Far North Queensland which are part of Clade II in the chloroplast genome analyses but are nested within Clade I based on the nuclear genome data, where they form a clade with other accessions from northern Queensland. Evaluating the individual gene tree support for the nuclear topology shows that many of the nodes are poorly supported beyond delimiting the main clades, with frequent minor conflicts (Figure 2b and Figure S5). This lack of gene-tree support for the species tree topology can be a result of incomplete lineage sorting, hybridisation and/or a lack of genetic variation.
The rDNA coding region alignment was 5.80 kb in length and 92.7% of sites were identical (94.3% within T. triandra). Although the rDNA tree is poorly resolved (Figure S8), several of the associations identified using the single copy nuclear loci are also recovered ( Figure 2), namely the association of the Pilbara accessions (blue) with those from Clade II (yellow), and the grouping of potential hybrid accessions from northern Queensland.

| Genetic variation and structure within Themeda
Genome-wide genotype likelihoods were used for the population genomics analyses. The principal component analysis largely

| Introgression in Themeda
The admixture analysis did not detect any signs of potential introgression among T. triandra and T. quadrivalvis, but this could be due  (Figure 1). Bootstrap support values for the concatenated maximum likelihood tree are shown, followed by local posterior probabilities from a coalescence species tree for the same nuclear loci. The pie charts on key nodes represent the individual gene tree support for the topology shown. (b) a densitree plot of overlaid nuclear gene trees. Colours match the chloroplast clades, and truncated branches are indicated. For samples not assigned to a clade a three letter abbreviation is used (PHI = Philippines and CHN = China), and the clade containing potential hybrids from northern Queensland (nQ) is indicated.

When testing for introgression between solely the Australian
T. triandra accessions and T. quadrivalvis there was low-level introgression detected (maximum D-statistic = 0.036; maximum f4-ratio = 0.019), but this was uniformly distributed among accessions regardless of distance between the T. triandra and T. quadrivalvis accessions considered. This probably indicates no ongoing or recent gene flow between T. triandra and T. quadrivalvis in Australia ( Figure S10D).
The D-statistic supported the admixture analysis inference of gene flow between Asian and Australian T. triandra ( Figure S11), and between the Australian nuclear clades I and II ( Figure S12).
High levels of admixture were detected between the Taiwanese and Australian T. triandra accessions (mean D-statistic = 0.058, F I G U R E 3 Nuclear genetic variation and structure within Themeda. (a) a principal component analysis across the first two axes is shown, with genetic groups coloured based on the chloroplast phylogeny shown in Figure 1. (b) the mean likelihood and standard error for a range of K's is shown, with these values used to calculate ΔK (c) as in Evanno et al. (2005). (d) the assignment to genetic clusters is shown for two values of K. samples are arranged within their chloroplast clade (indicated by that bar underneath the admixture plots), and ordered from west to east within each group. The asterisks indicate samples with a high degree of admixture from northern Queensland, and accessions from the Philippines (PHI) and China (CHN) are also indicated on the admixture plot.

| Polyploidy is restricted to the inland ecotype
All outgroups and non-Australian T. triandra accessions were assigned as diploid (Table S3). All accessions of the coastal ecotype (green clade in Figure 1) sampled in this study were assigned as diploid apart from two where the ploidy level was unclear (Table S3). No diploids were assigned in the claypan population (blue clade in Figure 1), with a majority (n = 6) classified as polyploid and one sample not assigned.
Finally, in the inland clade (yellow clade in Figure 1) a majority were assigned as polyploid (56.8%), along with diploid (29.7%) and unassigned accessions (13.5%). There was a lack of resolution within the polyploid assignments, and no accession was assigned as tetraploid, even though previous sampling in southeastern Australia indicated that the tetraploid cytotype was the most common ploidal level for the inland ecotype (Ahrens et al., 2020;Godfree et al., 2017). The phylogenetic placement of the polyploids ( Figure S13) and admixture analyses ( Figure S9D) hints that they are recent auto-polyploids, a conclusion supported by previous research showing tetraploids are more closely related to diploids from the same population that those of the same ploidy level elsewhere (Ahrens et al., 2020). All except one of the introgressed individuals of both ecotypes from northern Queensland were confidently assigned as diploid (supported by at least 95.6% of windows), indicating that these were not allopolyploids and probably represent early-generation hybrids.

| Positive selection in a single chloroplast gene
We looked at sequence variation within all 75 protein coding chloroplast genes to detect signs of positive selection between the two Australian T. triandra clades potentially involved in a chloroplast capture scenario (i.e., blue and yellow clades; Figure 1) using the ratio of synonymous to nonsynonymous nucleotide substitutions (ω). Purifying selection is indicated by ω < 1, whereas ω = 1 implies neutral evolution and ω > 1 indicates positive selection. Out of the 75 genes, 60 had no fixed differences, 11 had fixed differences but they were all synonymous mutations, and four (ndhf, rpl22, rpoA, and rpoC2) contained nonsynonymous mutations. Only rpoC2 had more than three fixed differences and was therefore used for positive selection analysis. A simple pairwise comparison between sequences using the Yang and  (Table S4).

| DISCUSS ION
Themeda triandra is a particularly interesting tropical grass species.
Despite being relatively young (median T. triandra crown age 1.48 Ma, 95% HPD range 0.78-3.45; Dunning et al., 2017), this species has become dominant in many African, Asian and Australian grasslands, and has even been dubbed the "The food of the Serengeti grazers" (Sage, 2017). It is the most widely distributed plant species in Australia (Gallagher, 2016) with two predominant ecotypes largely restricted to the wetter, cooler coastal regions or the drier, hotter interior. It is therefore a great model system to investigate rapid environmental adaptation. Here, we use whole-genome sequencing from over 80 Themeda accessions to show that divergence between the inland and coastal populations pre-dates the colonization of Australia by T. triandra based on the organelle phylogenies (Figure 1), and they are therefore not a result of a recent polyploidisation event. We also provide evidence of contemporary gene-flow where ecotypes come into contact.

| Ecotypes predate the colonization of Australia
This study confirms previous findings that Australia was colonized at least twice by Themeda triandra (Dunning et al., 2017). Interestingly, these independent colonisations actually represent the arrival of the two different ecotypes, one that inhabits the cooler and wetter coastal regions and the other that is found in the hotter and drier Australian interior (Figure 1). Previous studies have concluded that the differences between these ecotypes can largely be attributed to recent auto-polyploidisation events in some of the inland individuals (Ahrens et al., 2020;Godfree et al., 2017). While our results support the previous conclusions that ecotypes do largely segregate by ploidy level (Figure S13), the adaptation to these different environments probably pre-dates the colonization of Australia and any genome duplication events. This is further supported by the presence of known diploids in inland populations identified here, and in previous studies (Ahrens et al., 2020). Therefore, to understand the colonization of the dryland Australian interior it is essential to consider the ecotype's original diversification in Asia. A recent study of T. triandra in Yunnan-Guizhou Plateau in southwest China also found distinct cool-and warm-adapted lineages that are at least two million years old (Chu et al., 2021). However, the distinct Chinese populations are unlikely to be the source of the two Australian ecotypes as the Chinese samples form a monophyletic group based on chloroplast markers, although this is based on a reduced set of markers and a topology with relatively low support (Chu et al., 2021). Even though both ecotypes probably originated in Asia, further sampling across this continent is required to retrace their precise evolutionary origins.
Themeda triandra is also widespread in Africa where it grows in a wide range of climatic regions and exhibits a similar diversity in ploidy levels (Snyman et al., 2013) as found here. Although not currently possible, it would be interesting to compare the Australian results with a similar study in Africa. Potentially the African continent was also colonized by multiple ecotypes, and indeed it is notable that African accessions have a similar phylogenetic pattern, being paraphyletic for the chloroplast genome (placement of Tanzanian accession, Figure 1) and monophyletic for the nuclear genome ( Figure 2).
Comparisons between Australia and Africa will ultimately show if the broad climatic niche T. triandra inhabits on both continents is attributed to ancestral genetic variation or rapid convergent evolution.
Ecotypic divergence in other species has been shown to have a complex genetic background, and ploidy differences can accelerate the accumulation of divergent adaptive genetic variation (Lovell et al., 2021). Potentially polyploidy is having the same effect in T. triandra, with the inland ecotype more commonly undergoing genome duplication ( Figure S13). It is also likely that there is a complex genetic basis to the ecotype differentiation

| Hybridisation between ecotypes
The nuclear phylogenies and population genomics results both show incongruences with the organelle data and indicate that there is hybridisation between ecotypes with ongoing gene flow where they come into contact (Figures 2 and 3 and Figure S12).
The highest levels of introgression in our data set were localized in individuals from northern Queensland where diploid accessions of both ecotypes are found in close proximity and which have the appearance of early generation hybrids (Figure 2). Populations at increasing distance from this potential hybrid zone contain successively reduced signs of introgression. Themeda triandra only relatively recently colonized Australia (<1.3 Ma; Dunning et al., 2017) and at present it is unclear how stable the hybrid zone in northern Queensland is, and it may represent a promising geographic location to investigate the genetic basis of the two ecotypes, although undetected hybrid zones likely exist in other areas as well. It is also likely that the predominant ploidy differences between ecotypes aid in maintaining their divergence (Olofsson et al., 2021).
Potentially, there is also ongoing gene-flow from Asia into northern Australia ( Figure S11). Gene flow between Australian and Asia might also explain why when restricting the analysis to Australian accessions K = 3 appears as a secondary optimum as it is still detecting the signal of introgression into accessions from northern Australia from a nonincluded population. Further sampling, particularly in Southeast Asia, is required to confirm this conclusion.  Figures 1 and 2).

| Chloroplast capture in the Pilbara cracking claypans of Western Australia
However, there is clear nuclear and chloroplast discordance in these accessions indicating that the Pilbara population evolved within Australia as a result of adaptive divergence and chloroplast capture.

| Recent speciation of Themeda quadrivalvis
Whether T. quadrivalvis is a synonym of T. triandra, or if it is a separate species has been debated for some time (Keir & Vogler, 2006;Veldkamp, 2016;Dunning et al., 2017;Arthan et al., 2021). Themeda quadrivalvis is a globally distributed invasive weed and the only apparent fixed difference between the species is that T. quadrivalvis is annual whereas T. triandra is perennial. This is the first study to sequence multiple genomes of T. quadrivalvis and the results support a previous conclusion that this species has only recently diverged from T. triandra. In the early stages of speciation, a daughter species would sit within the larger paraphyletic parental species (Pennington & Lavin, 2016). This is exactly what we observed in the slower evolving chloroplast genome (Figure 1), whereas each species is monophyletic in the nuclear genome ( Figure 2). Despite occurring in the same geographic location within Australia, we failed to detect any meaningful ongoing gene flow between these putative species (Figures 3 and Figure S10), suggesting that they are now largely reproductively isolated in Australia and that T. quadrivalvis is the product of a recent speciation event. However, we detected introgression between T. quadrivalvis and T. triandra in Asia, involving up to 6.9% of the nuclear genome ( Figure S10). Further work is required to determine if gene flow is ongoing in the native range of T. quadrivalvis outside of Australia.

| CON CLUS ION
Themeda triandra represents one of the most recent and successful rapid radiations of grasses. In a relatively short space of time, it has become the most widely distributed plant species within Australia across a very broad ecological spectrum. Previous research restricted to New South Wales showed that on a relatively local scale adaptation to arid regions can be driven by genome duplication (Ahrens et al., 2020;Godfree et al., 2017). However, on larger continent-wide scales our research shows that background genetic variation may be more important. Indeed, the ability to occupy almost every climatic niche in Australia is probably a result of independent colonization of the continent by ecotypes within this species, with ploidy variation expanding each of their respective niches. Secondary contact between these ecotypes may further enhance local adaptation by facilitating the introgression of adaptive genetic variation. In summary, the ecotypic differences in Themeda triandra appear to be driven by both standing genetic variation and genome duplication, with the importance of either depending on the geographic scale considered.

CO N FLI C T O F I NTE R E S T
The authors declared no conflict of interest for this article.