Artificial selection causes significant linkage disequilibrium among multiple unlinked genes in Australian wheat

Abstract Australia has one of the oldest modern wheat breeding programs worldwide although the crop was first introduced to the country in 1788. Breeders selected wheat with high adaptation to different Australian climates, while ensuring satisfactory yield and quality. This artificial selection left distinct genomic signatures that can be used to retrospectively understand breeding targets, and to detect economically important alleles. To study the effect of artificial selection on modern cultivars and cultivars released in different Australian states, we genotyped 482 Australian cultivars representing the history of wheat breeding in Australia since 1840. Computer simulation showed that 86 genomic regions were significantly affected by artificial selection. Characterization of 18 major genes known to affect wheat adaptation, yield, and quality revealed that many were affected by artificial selection and contained within regions under selection. Similarly, many reported QTL and genes for yield, quality, and adaptation were also contained in regions affected by artificial selection. These included TaCwi‐A1, TaGw2‐6A, Sus‐2B, TaSus1‐7A, TaSAP1‐7A, Glu‐A1, Glu‐B1, Glu‐B3, PinA, PinB, Ppo‐D1, Psy‐A1, Psy‐A2, Rht‐A1, Rht‐B1, Ppd‐D1, Vrn‐A1, Vrn‐B1, and Cre8. Interestingly, 17 regions affected by artificial selection were in moderate‐to‐high linkage disequilibrium with each other with an average r 2 value of 0.35 indicating strong simultaneous selection on specific alleles. These regions included Glu‐B1, TaGw2‐6A, Cre8, Ppd‐D1, Rht‐B1, Vrn‐B1, TaSus1‐7A, TaSAP1‐7A, and Psy‐A1 plus multiple QTL affecting wheat yield and yield components. These results highlighted the effects of the long‐term artificial selection on Australian wheat germplasm and identified putative regions underlying important traits in wheat.

Australian wheat growers as they used European (predominantly British) cultivars which evolved in completely different climates and latitudes, and consequently, Australia relied on the importation of a considerable proportion of its wheat demand (Henzell, 2007). Since then, Australian wheat germplasm had two major shifts (Pugsley, 1983). The first was at the beginning of the 20th century when William James Farrer developed the early-maturing and high-quality cultivar "Federation," through successful crossbreeding between three cultivars of broad geographical origins (Henzell, 2007). The second has occurred since the beginning of the Green Revolution in 1970, where cultivars developed at the International Maize and Wheat Improvement Center (CIMMYT) have dominated breeding efforts across the country (Brennan & Quade, 2006;. Previous studies have shown that cultivars released before and after 1900 are genetically similar but differentiated from those released after 1970 . Today, Australia is one of the largest wheat exporters globally, due to the extensive efforts of the Australian wheat breeders in crossing and selecting high-yielding and high-quality cultivars adapted to Australian climates. Although breeding efforts have improved the genetic gain for different traits in Australian wheat, the overall genetic diversity in modern cultivars is reduced when compared to older cultivars . This reduction is a result of repeatedly selecting for the same desirable alleles that are controlling economically important traits from a limited gene pool. For example, the semidwarf alleles Rht-B1b and Rht-D1b became the main target for selection after the Green Revolution (Rebetzke & Richards, 2000). Different allele combinations for the photoperiod (Ppd) and vernalization (Vrn) genes that control flowering time have been selected for wheat grown in different agriproduction zones across Australia (Cane et al., 2013;Eagles et al., 2010;Eagles, Cane, & Vallance, 2009). Multiple quality genes have been also subject to selection across Australia depending on industrial end-use requirements (Cane et al., 2008;Crawford et al., 2011;Eagles et al., 2006). The cereal cyst nematode (CCN) resistance gene Cre8 has been widely adopted across Australia (Jayatilake et al., 2015;Safari et al., 2005;Williams, Willsmore, Olson, Matic, & Kuchel, 2006).
Both artificial and natural selection cause detectable changes in the allele frequencies of the selected sites and their flanking regions.
Several statistical methods have been developed to detect selection signatures in natural populations, and their concepts fall into one of three main categories: first, methods that detect abnormalities in the allele frequency spectrum within a single population such as the composite likelihood ratio (CLR) method (Nielsen et al., 2005); second, methods that target distortion in linkage disequilibrium or haplotype structure in a single population such as the integrated haplotype score (iHS) or the number of segregating sites by length (nSL) method (Ferrer-Admetlla, Liang, Korneliussen, & Nielsen, 2014;Voight, Kudaravalli, Wen, & Pritchard, 2006); and third, methods that exploit differentiation between populations such as F st (Weir, 1996) and the cross-population composite likelihood ratio test (XPCLR; Chen, Patterson, & Reich, 2010). Artificial selection can be also detected using methods proposed to identify natural selection (Beissinger et al., 2014). However, detecting selection in breeding populations has two advantages over that in natural populations.
Firstly, the founding parents of breeding populations are usually available, which allows a better estimation of allele frequency changes, compared to natural populations that encounter long-term simultaneous selection across a large number of traits (Beissinger et al., 2014). Secondly, experimental populations are typically much younger than natural populations, which increases the detection power particularly for haplotype-based methods due to the smaller number of historical recombination events (Tang, Thornton, & Stoneking, 2007).
Differentiating true signals of selective sweep from arbitrary genetic drift is the major problem affecting the power of different population genetics studies such as genomewide selection scans.
Irrespective of the statistical method, the approach widely used to declare a locus to be under selection, that is, when its score is an outlier given the empirical distribution of the statistical test, has been shown to reduce the detection power and to cause very high false-positive or false-negative error rates (Kelley, Madeoy, Calhoun, Swanson, & Akey, 2006;Teshima, Coop, & Przeworski, 2006). The problem was reported to be best addressed by simulating the empirical population under a neutrality assumption to compare the observed allele frequencies with those expected under genetic drift only. This approach enables significance thresholds specific to the studied population to be set (Pavlidis & Alachiotis, 2017). Similarly, empirical factors that can potentially have confounding effects on the results, such as ascertainment bias associated with SNP chip data, can also be simulated to take their distorting effect into consideration (Albrechtsen, Nielsen, & Nielsen, 2010).
In this study, we used 482 cultivars representing the history of wheat breeding in Australia since 1840 to scan for signatures of artificial selection. We also characterized the germplasm for allelic variation at 18 genes that control important traits. We compared post-Green Revolution (Post70) with pre-Green Revolution (Pre70) varieties, as well as cultivars released in different Australian states for both artificial selection tests and characterized genes. We attempted to identify the specific traits underlying the selective sweeps by examining their association with characterized genes and published QTL known to affect wheat adaptation, yield, and quality traits.

| Plant materials and genotyping
The 482 Australian cultivars used in this study were released between 1840 and 2011. Seed and pedigree information was obtained from the Australian Grains Genebank, Horsham, Victoria. The germplasm was genotyped using an Infinium iSelect 90K SNP bead chip assay (Wang et al., 2014). Full experimental details and genotype information were previously described in .
Cultivars released after 1970 were considered the post-Green Revolution cultivars (Post70).

| Amplicon resequencing
Eighteen genes known to affect wheat adaptation, yield, and grain quality traits that could have been targets of selection in Australian wheat breeding program were selected for amplicon resequencing: the characterized genes that controlled agronomic traits selected for analysis: Sus-2B (sucrose synthase 2), Ppd-D1 (photoperiod), ; those affecting quality traits: Glu-A1, Glu-B1, Glu-B3, Glu-D1 (glutenin), Ppo-A1, Ppo-D1 (polyphenol oxidase), Wx-B1 (waxy), PinA, PinB (puroindoline), and Psy-A1 (phytoene synthase 1); and the CCN resistance gene Cre8. A total of 31 PCR primers were used to characterize major alleles for these 18 genes (Table S1). PCR primers targeted the variants that differentiate the targeted alleles as described in Table S1. PCR primers were designed using Primer3 and were aligned to the wheat reference genome using GYDLE software (https ://www.gydle.com/) to ensure that they would not amplify multiple copies. All PCR products for each sample were quantified using qPCR, and the PCR products were pooled together. The pooled amplicons were cleaned using SPRI magnetic beads with a ratio of DNA:beads of 1:2. DNA libraries were prepared for next-generation sequencing using the KAPA Hyper Prep Kit. After attaching a unique barcode for each sample, samples were pooled in two groups and cleaned using SPRI magnetic beads with a ratio of DNA:beads of 1:1.
A KAPA Library Quantification Kit was used to quantify the library for sequencing. The pooled amplicons (two pools) were sequenced using two runs on an Illumina MiSeq (Reagent Kit v3, 0-to 300-nt paired-end reads). The runs generated 30.03 million reads with number of reads per sample for all amplicons ranging from 21,914 to 364,269.

| Analysis of resequencing data
The raw FASTQ files were first processed with the fastq_quality_trimmer from the FASTX toolkit (http://hanno nlab.cshl.edu/fastx_toolk it/). The minimum quality threshold was set to 37, and reads shorter than 150 nt were discarded. Paired reads were then aligned to the IWGSC genome assembly v0.3 of wheat cultivar Chinese Spring (https ://www.wheat genome.org) using GYDLE software (https :// www.gydle.com/) to ensure they were derived from the targeted locus. The amplicons were then divided into two classes. Class I amplicons exhibited SNP variants or short insertions-deletions. Class II amplicons contained large insertions-deletions or multiple variants in which many alternative alleles could not be aligned unambiguously to the reference allele. The MEM algorithm implemented in the BWA aligner (Li, 2013) was used to align the high-quality paired-end reads to the reference with a minimum seed length equal to 30 or 100 for the Class I amplicons and Class II amplicons, respectively. The resulting SAM files were processed with SAMtools (Li et al., 2009). Alignments with MAPQ values smaller than 10 were discarded, and only properly aligned pairs were kept. The Class I amplicon pairedend reads were aligned to the reference allele, and SAMtools was used to generate an mpileup file. Variants were called in VCF format with the VarScan software (Koboldt et al., 2012) using the mpile-up2cns option with a minimum of 10 reads and threshold p-value of 0.01. The Class II amplicon paired-end reads were aligned to a reference that included all expected alleles for each amplicon, and an extra filtering step was performed in which only paired reads that exactly matched and had the same insert size as one of the reference alleles were kept. This is appropriate for this study as the majority of cultivars used in this study are expected to have one of these known alleles, and because rare or novel alleles were not important for the purpose of this study.

| Genomewide scan for selection signatures
Four different statistical approaches were used to detect selection signals: F st , XPCLR, iHS, and nSL. F st was calculated following Nicholson et al. (2002). F st values were averaged using a sliding window size of 15 SNPs, with a step size of one SNP. The window size was chosen considering the linkage disequilibrium and marker coverage in this germplasm, as previously described . The XPCLR (Chen et al., 2010) scan was run with a 1 Mb grid size and 1 cM window size. The maximum number of SNPs within each window was set to 50. One random SNP was selected for each pair of SNPs with r 2 > 0.95 within each window. The default parameters for iHS and nSL were used in our analysis as described in Voight et al. (2006) and Ferrer-Admetlla et al. (2014). For all analyses, adjacent windows that passed the significance threshold and showed strong LD were considered as a single signature. The single-population methods iHS and nSL were also applied to the whole population with 482 cultivars. SNP genetic positions were obtained from Wang et al. (2014). Detected regions were compared to previously reported QTL in Australian or CIMMYT germplasm to define potential candidate genes or QTL subject to selection within these regions. Genetic positions were compared across different previously published maps using the CMap tool (http://ccg.murdo ch.edu.au/cmap/ ccg-live/).

| Simulating the diversity of Australian wheat germplasm
The polyploid genome simulator (PolySim) was used to simulate the Australian hexaploid wheat germplasm (Jighly et al., 2018). One hundred replicates with different random seeds for the following simulation scheme were conducted. Each replicate started with 500 individuals of a common ancestor population with seven diploid chromosomes and 100,000 loci per chromosome. The mutation rate was set to 1 × 10 -5 per loci taking as a guide from the mutation rate in maize which was previously estimated to be between 1 × 10 -4 and 4.9 × 10 -6 (Drake, Charlesworth, Charlesworth, & Crow, 1998).
Recombination was sampled from a Poisson distribution with lambda equal to one. The A genome evolution was set at generation 6,000, the B genome at generation 6,200, and the D genome at generation 11,000, and all of them had 500 individuals (population size). The hybridization between the A and B genomes was set at generation 12,000 with a population size equal to 500, while the targeted hexaploid population with 482 individuals evolved at generation 18,000 by hybridizing the tetraploid (AB) population with the D genome. The PolySim simulation was run for 25,000 generations. This number of generations was selected to be sufficient to ensure the progenitor population reached mutation-drift equilibrium before evolving new taxa and at the end of the simulation after 25,000 generations. Each new taxon started with one individual at its first generation and reached the target population size within 100 generations.
After running the previous simulation for 25,000 generations, the hexaploid population was randomly split into two subpopulations, one with 259 individuals representing the Pre70 subpopulation and the other with 223 individuals representing the Post70 subpopulation. Extra generations for each subpopulation were run with restricted gene flow between both subpopulations, while tracking the differentiation between them (F st ) after each generation (Hayes et al., 2009). The simulation stopped when it reached the empirical F st between both subpopulations, which was previously calculated as 0.13 . Ascertainment bias was simulated to mimic the development of the 90K SNP chip (Wang et al., 2014), which was developed using 19 bread wheat cultivars and 18 durum accessions. To do this, 19 individuals were randomly selected from the Post70 subpopulation and 9,858 random polymorphic loci in these 19 individuals were selected to simulate the 90K SNP genotyping (the same number of SNPs used in this study). The genotyping for the whole population (Post70 and Pre70) was then assumed to be conducted using these selected loci only. We also simulated ascertainment bias using 19 individuals from the Post70 population and 18 tetraploid genotypes to exactly mimic the discovery panel used in Wang et al. (2014). However, this analysis did not show any significant difference to the results generated from simulations performed using only the 19 individuals from the Post70 subpopulation (data not shown). Before running the previous one hundred replicates, we ran multiple simulations to define the best cross-pollination rate that produced similar LD patterns between the empirical data and our simulated data. Different studies reported outcrossing rates for wheat ranging from 0% to 10.6% (Hanson, Mallory-Smith, Shafii, Thill, & Zemetra, 2005;Lawrie, Matus-Cadiz, & Hucl, 2006). We first ran ten simulations using an outcrossing rate between 0.01 and 0.1 and a step equal to 0.01. This analysis revealed that the best LD match was a value between 0.03 and 0.04. We then ran another nine simulations with a step equal to 0.001 between 0.03 and 0.04 to define the optimal outcrossing rate.

| Linkage disequilibrium analysis
Pairwise linkage disequilibrium (LD) between the amplicon resequencing alleles and 90K SNPs located on the same chromosome was calculated, as r 2 , following Hill and Robertson (1968) using the R package "snpstats" (Clayton, 2015). Similarly, LD was assessed between the amplicon resequencing alleles and candidate regions for artificial selection.  estimated that the 99th percentile of the r 2 values of the background LD, LD between unlinked SNPs on different chromosomes, is equal to 0.161. Pairwise candidate regions for artificial selection were considered in LD if more than half of the SNPs in each region had an interregion LD higher than the 99th percentile of the background LD. Individual SNPs with very high LD between both regions that did not follow the general LD pattern were excluded to avoid mapping errors. Candidate regions that had less than five SNPs were also excluded from this analysis. The highest r 2 value from the remaining representative SNPs was reported.

| Simulated Australian wheat germplasm
To account for genetic drift and any ascertainment bias in the iSelect 90K SNP genotyping assay, simulations were performed to closely mimic the genomic architecture of the Australian wheat germplasm used in this study. After testing multiple values, an outcrossing rate equal to 0.033 was found to produce the best approximation of linkage disequilibrium decay between the simulated and empirical datasets ( Figure 1). In both the simulated and empirical populations, LD started at 0 cM with an r 2 value of ~0.4 and decayed to an r 2 of ~0.3 at 1 cM. The heterozygosity (He) in the simulated data was equal to 0.02 ± 0.006, which, as we discuss later, is an acceptable value for our germplasm. The simulation required 35.2 ± 4.7 generations to differentiate the Pre70 from the Post70 cultivars with an F st value equal to 0.13 ± 0.002. The number of generations required to differentiate the cultivars released in each state from those released in the other states by the corresponding F st value for each state ranged from 5.2 ± 2.3 for New South Wales (NSW) to 15.6 ± 3.6 for Queensland (QLD). Interestingly, differentiation-based methods (F st and XPCLR) did not detect any candidate regions for selection in NSW. Figure 3 shows a Manhattan plot for the four methods applied on the Post70 population, while the remaining comparisons can be found in Figure S1.

| Amplicon resequencing
Several characterized alleles showed different patterns of allele frequencies between the pre-and post-Green Revolution cultivars, or when compared across different Australian states indicating differential selection on them ( Table 2). The frequency of the resistance allele at the Cre8 locus which confers resistance to CNN was sig-    Another interesting finding was 17 unlinked selective sweeps and characterized genes (Glu-B1u, Cre8_R, Psy-A1b&e, Ppd-D1a, Rht-B1b, and Vrn-B1a) with moderate-to-high differentiation across different subpopulations that exhibited moderate-to-high LD with each other. Many of the sweeps could be potentially affecting grain yield and/or its components as we will discuss later. The average r 2 among these genomic regions was 0.35, which is equivalent to the 99.97th percentile of the background LD for unlinked SNPs. This result suggests simultaneous and strong selection for specific alleles across the wheat genome (Table 3). Of the 17 sweeps, the regions separating the sweeps located on chromosomes 3B and 6A had no or weak LD with the regions under artificial selection.   To assess the accuracy of the simulation, we looked at how well the simulated genomic architecture (in terms of LD and He) matched that observed in the population of 482 Australian cultivars.

| Simulations to determine significance thresholds for selection analyses
Matching the genomic architecture (usually LD and He) between the simulated and empirical datasets is reported to be the best determinant for measuring the precision of simulated populations used for different purposes (Jighly et al., 2018). The differences between the LD decay smoothing lines of the empirical and simulated populations were minor (Figure 1) and fell within the standard deviation limits of the simulated populations, indicating high similarity for LD decay between the simulated and empirical datasets. A direct assessment of He between the simulated and empirical datasets could not be made due to limitations in calling heterozygous genotypes for hexaploid wheat using the iSelect 90K SNP genotyping assay (Wang et al., 2014). This is because of the large number of clusters produced by the genotype calling software (given the duplicated targets for the hybridization probes in polyploid species) and the low density of the clusters representing heterozygote genotypes in self-pollinated crops. However, it is possible to infer the accuracy for He, given that wheat breeders usually release cultivars after several generations (six or more) of self-pollination to ensure a stable cultivar with high homozygosity. This leads to an expected empirical He equal or less than 3.125% (i.e., 1/2 6−1 ) for a polymorphic locus, where 6 is the number of selfed generations. Therefore, the simulated He should be acceptable given that its 95% confidence interval was between 3.2% and 0.8%, which is almost equivalent to six to eight generations of selfing. When simulating ascertainment bias, including durum genotypes in the ascertained panel did not have any significant effect on the significance threshold possibly because durum is a common ancestor for both Pre70 and Post70 populations. TA B L E 2 Allele frequency and F st (between brackets) results for major alleles of the 18 characterized genes. Yellow-highlighted values represent alleles that got significantly increased, while orange-highlighted values represent alleles that got significantly decreased. Best LD describes the position (in centimorgans) on the same chromosome that has the highest LD with the characterized allele  . Sixty-one regions had no overlap with previously reported genes/QTL and thus represent potential novel regions under selection, and it will be of interest to further understand their contribution to the evolution and adaptation of the Australian wheat population.

| Genomic regions under artificial selection
Cultivars that are less affected by nearby plant competition were reported to have higher yield potential (Reynolds, Acevedo, Sayre, & Fischer, 1994). It was previously proposed that a large proportion of CIMMYT post-Green Revolution cultivars' potential is due to their adaptability to planting density (Reynolds et al., 1994). The two QTL with the highest significance and r 2 from the three QTL reported in Sukumaran, Reynolds, Lopes, and Crossa (2015) that were associated with adaptation to agronomic planting density on chromosomes 1B and 3B exhibited very strong selective sweeps in the germplasm used in this study. Both sweeps were found in multiple populations, and the 1B sweep was detected with both differentiation and haplotype-based methods indicating their robustness. The 1B sweep region also included Glu-B1 and a QTL for spike ethylene production under heat stress, while the 3B had multiple yield and yield component-related QTL, dough softening and stability, bake mixing time, spike ethylene, and adaptation to density for grain number QTL Maphosa et al., 2013;Sukumaran, Dreisigacker, Lopes, Chavez, & Reynolds, 2015;Valluru, Reynolds, Davies, & Sukumaran, 2017). Ethylene production can be induced under heat stress, which limits grain yield (Hays, Do, Mason, Morgan, & Finlayson, 2007). Given the large number of potential genes targeted by artificial selection, these sweeps were the largest two sweeps in this study spanning 34.3 cM and 31.8 cM for the 3B and 1B sweeps, respectively. The third largest sweep identified here was the region 7A-371.3:399.2 spanning 27.9 cM and was detected in all subpopulations using all statistical tests. This region encompasses the genes TaSus1-7A and TaSAP1-7A as well as a number of yield, different yield components, and quality QTL (Bennett, Izanloo, Edwards, et al., 2012;Maphosa et al., 2013;Mohler et al., 2016;Sukumaran, Lopes, Dreisigacker, & Reynolds, 2018). The average size of LD blocks in this germplasm was previously reported to be ~19 cM . Thus, it is expected that selection may cause such large LD blocks in our germplasm.
In addition to the selective sweeps described earlier, six genomic regions under selection on chromosomes 2A, 2B, 3B, 5A, 6A, and 6B were colocated with genes or QTL that directly affect grain yield and some other traits. These six regions were detected in the Post70 population as well as in some states (Table 1). The 6A region involved the well-documented grain yield gene TaGw2-6A, while the 6B region could potentially be the homoeologous gene TaGw2-6B (Mohler et al., 2016). The 6A sweep was previously reported in multiple studies to be associated with different yield and yield component traits as well as some quality traits such as grain protein content and water-soluble carbohydrates Maphosa et al., 2013;Sukumaran, Dreisigacker, et al., 2015;Sukumaran et al., 2018). The 9-cis-epoxycarotenoid dioxygenase (CCD4 or NCED4) gene, which plays an important role during heat stress and thermoinhibition of seeds (Huo, Dahal, Kunusoth, McCallum, & Bradford, 2013), was also in this region (Colasuonno et al., 2017). The region 5A-460.6:466.8 had LD with Vrn-A1, and Lopes, Dreisigacker, Peña, Sukumaran, and Reynolds (2015) reported a grain yield QTL in this region. The yield QTL located within the sweep 2A-355.6:363 is most probably related to the cell wall invertase gene TaCwi-A1 Mohler et al., 2016;Sukumaran et al., 2018).
The selective sweep 2B-311.4:320.4 involved the sucrose synthase 2 gene Sus-2B which is colocated with QTL for tillers per square meter, thousand kernel weights, grain number, and flag leaf width ; ; Sukumaran et al., 2018). The characterization results of the Sus-2B gene also showed significant differentiation for its causal variant in SA similar to this sweep (Table 2).
Interestingly, all major genes and yield-related QTL were detected with F st /XPCLR indicating hard selection on yield genes and QTL.
As discussed before, many of the previous grain yield or yieldrelated sweeps involved quality-related traits such as the sweep 1B-195.4:227.2, which involved the glutenin gene Glu-B1. The flour color genes Psy-A1 (Jayatilake et al., 2015) and Psy-A2 (Colasuonno et al., 2017) that are important for flour quality were also under artificial selection in our germplasm. Previous reports on Australian wheat (Crawford et al., 2011) showed that Psy-A1a and Psy-A1p may not be responsible for the variation in flour color and the frequency of both alleles did not have any significant change in any comparison in the present study. The Psy-A1e allele was previously reported to be common in Australia and responsible for white flour color, while the Psy-A1c, Psy-A1r, and Psy-A1s,t were associated with cream-toyellow color (Howitt et al. 2009;Crawford et al., 2011). Although the white flour color allele Psy-A1e has become significantly more common after 1970, alleles responsible for cream-to-yellow color had significantly higher frequency in SA and WA. The white and bright flour is preferable for most end products, but the creamy colored flour, which is produced mainly in WA, is preferable for some products like the Japanese Udon noodles. The grain hardness genes (PinA and PinB) did not significantly differ after the Green Revolution, but the frequency of the presence of a single hardness allele showed a significant decrease in VIC. The Glu-B1e allele that produces weak and extensible dough (Eagles et al., 2006) as well as Glu-B3c that produces weak dough (Eagles, Hollamby, Gororo, & Eastwood, 2002) also showed significant increased frequency in VIC. Victoria is a major producer for the Australian soft wheat which is used for cakes, pastries, and some types of biscuits that require weak and extensible dough (Eagles et al., 2006;Simmonds, 1989 This delay was larger when combined with the spring alleles of the three Vrn1 homoeologs than when it was combined with Vrn-B1a only indicating epistatic interactions between these genes (Cane et al., 2013). This may explain the significant increase in both Ppd-D1d and Vrn-B1a in SA.
An interesting finding was the moderate-to-high linkage disequilibrium among 17 unlinked genomic regions and characterized genes that were the subjects of artificial selection (Table 3). All but one of these regions were detected by the differentiation-based methods indicating hard selective sweeps. All were detected in the Post70 subpopulation, and some overlapped with the different Australian states (Table 1). These regions involved the two sweeps associated with yield adaptation to agronomic planting density, and one which involved Glu-B1, a known major driver for bread making quality.
Another three regions encompassed QTL affecting grain yield, and two regions were associated with QTL affecting different yield components. Of the characterized genes, Cre8, Psy-A1, Ppd-D1, Rht-B1, and Vrn-B1 were involved in this LD cluster. Several allele combinations for different unlinked height, vernalization, and photoperiod genes have previously been reported to be prevalent in Australian wheat germplasm (Cane et al., 2013;Eagles et al., 2009). Similarly, we found that some of these unlinked genes exhibited moderateto-high LD with one other, as well as with other genes controlling grain quality, disease resistance, and yield potential, indicating the ongoing and simultaneous selection on specific alleles.

| CON CLUS ION
The results presented here provide a greater understanding of the selection events that shaped current Australian wheat germplasm.
Defining the genes targeted by artificial selection has the potential to further guide the grain industry to adapt new genetic resources with novel genetic variation that are differentiated from the present Australian gene pool to sustain long-term genetic gain. The TA B L E 3 r 2 values for between the 17 unlinked genomic regions and characterized alleles that showed moderate-to-high correlation with each other should be applied to maintain higher levels of genetic diversity and avoid the severe reduction in the germplasm effective population size. These could include adapting new genetic resources to reduce the dependency on the same genes; improving the diversity of the genomic region by recombining multiple diverse haplotypes flanking the desired genes targeted by selection; and exploiting the natural variation of these genomic regions after editing the gene targeted by selection to produce a desired allele.

ACK N OWLED G EM ENT
The authors would like to thank the Australian Grains Genebank (AGG) for providing the seed materials, La Trobe University for providing a scholarship to R.J., and the Department of Economic Development, Jobs, Transport and Resources (DEDJTR). The authors would also like to thank Dr. Abdulqader Jighly for providing the second version of PolySim and assisting with the simulation.

CO N FLI C T O F I NTE R E S T
The authors declared no competing interests.

DATA A R C H I V I N G S TAT E M E N T
Data for this study are available at the Dryad Digital Repository: https ://doi.org/10.5061/dryad.06c67 .