Implementing within‐cross genomic prediction to reduce oat breeding costs

Abstract A barrier to the adoption of genomic prediction in small breeding programs is the initial cost of genotyping material. Although decreasing, marker costs are usually higher than field trial costs. In this study we demonstrate the utility of stratifying a narrow‐base biparental oat population genotyped with a modest number of markers to employ genomic prediction at early and later generations. We also show that early generation genotyping data can reduce the number of lines for later phenotyping based on selections of siblings to progress. Using sets of small families selected at an early generation could enable the use of genomic prediction for adaptation to multiple target environments at an early stage in the breeding program. In addition, we demonstrate that mixed marker data can be effectively integrated to combine cheap dominant marker data (including legacy data) with more expensive but higher density codominant marker data in order to make within generation and between lineage predictions based on genotypic information. Taken together, our results indicate that small programs can test and initiate genomic predictions using sets of stratified, narrow‐base populations and incorporating low density legacy genotyping data. This can then be scaled to include higher density markers and a broadened population base.

of genotyping existing germplasm. Recent work has shown that a modest number of markers can be sufficient to achieve accurate predictions in small populations with high linkage disequilibrium (LD; Gonen et al., 2018;Norman et al., 2018). It is necessary to consider how to gain value from the upfront cost of genotyping material that may not be progressed within an active breeding program. Additionally, training sets should logically be developed from breeding lines or populations (Akdemir & Isidro, 2019;Akdemir, Sanchez, & Jannink, 2015;Asoro, Newell, Beavis, Scott, & Jannink, 2011;Isidro et al., 2015;Ou & Liao, 2019;Rincent et al., 2012). Therefore, in small programs, the gradual generation and use of genotypic data in narrow-based populations can support the longer-term adoption of GS.
In a biparental crossing scheme, high levels of LD can be exploited to minimize genotyping cost. In cultivated oat (Avena sativa L.), high levels of long-range LD (Esvelt Klos et al., 2016) and large haplotype blocks have been reported (Bekele, Wight, Chai, Howarth, & Tinker, 2018), along with clustering of Diversity Array Technology (DArT) markers (Tinker et al., 2009). Selfing limits the amount of recombination per generation, reducing LD dissipation, and increasing genetic variance between the resulting recombinant inbred lines (RILs), thus promoting the emergence of superior transgressive segregants (McClosky, LaCombe, & Tanksley, 2013). These factors make within-cross GS feasible to assess performance relative to the other lines in the same (rather than different) subpopulations (Asoro et al., 2013;Gonen et al., 2018;Gorjanc et al., 2017aGorjanc et al., , 2017b. Previous work has tested biparental prediction approaches via simulation in a maize (Zea mays) genome (McClosky et al., 2013), concluding that gains attributable to selfing are achievable across different population sizes, trait heritabilities, and selection intensities. Lorenzana and Bernardo (2009) evaluated GS in two double haploid biparental barley (Hordeum vulgare) populations using historical trial data on production and quality traits and 223 polymorphic markers. They reported that the simple and computationally efficient best linear unbiased prediction (BLUP) approach was ideally suited to biparental GS. Additionally, extensive LD and large linkage blocks meant that fewer markers were needed for accurate predictions (Lorenzana & Bernardo, 2009).
Cultivated hexaploid oat is a cereal crop used to produce grain in temperate regions and forage in the subtropics (Hoffman, 1995). The allopolyploid oat genome is large (12.5 gigabases) and highly repetitive, making the large-scale adoption of genomics-based breeding methods difficult (Yan et al., 2016). Recent advances in genomic resources (e.g., Chaffin et al., 2016;Huang, Poland, Wight, & Jackson, 2014) mean GS is now more tractable for uptake within oat breeding programs. Previous work to evaluate the application of GS in elite-cultivated North American oat lines for both production and quality traits demonstrated that GS could Core Ideas • Predictions based on low coverage genotyping can recover missing phenotypes in early generations. • Mixed data types can be effectively integrated to improve prediction accuracy in oat. • Differentially penalized regression can optimally weight mixed data.
be effective even at modest marker density (∼every 2cM; Asoro et al., 2011), although no plateau was reached with low density DArT marker numbers. Comparison of GS to traditional phenotypic and marker-assisted selection for the complex quality trait β-glucan showed that the benefits of GS could be realized based on a per cycle basis via the scaling of selection to two cycles per year (Asoro et al., 2013). More recently, Bekele et al. (2018) described the prediction of heading date in a large cultivated oat panel, reporting a minimal increase in accuracy from increasing marker density. However, their results showed that prediction from genotyping-by-sequencing (GBS) derived single nucleotide polymorphisms (SNPs) gave higher prediction accuracy than using tag-level haplotype markers (Bekele et al., 2018).
Here we report the implementation of genomic prediction within a biparental cross between two cultivated winter oat varieties, 'Buffalo' and 'Tardis'. The population has been previously used to update the oat consensus map based on GBSderived, tag-level haplotypes (Bekele et al., 2018). In this study the population was stratified for both genotyping (at the F 2 and F 7 generation) and phenotyping (segregated at the F 3 generation with one stream of material progressed to field assessment and the other undergoing rapid single seed descent [SSD] to the F 7 generation). Using low-coverage genotypic information in the early generation, we investigate the recovery of missing phenotypes via genomic prediction, which is required for accurate representation of true phenotypic value and variance. We also extend this to the F 7 generation to test prediction of missing yield data. We demonstrate that using mixed marker data is feasible-with both low cost dominant markers and more expensive co-dominant markers integrated to improve accuracy.

Plant material, genotyping and phenotyping
An F 2 mapping population of 194 individuals was produced from a cross between the two winter oat varieties 'Buffalo' and 'Tardis' at Aberystwyth University, United The Plant Genome F I G U R E 1 The stratification of within-population advance of material in the 'Buffalo' × 'Tardis' population, including derivation of phenotyping and genotyping data used in this study Kingdom. The population was created to capture key differences between the parents; 'Buffalo' is a dwarf variety with low kernel content and small grains and 'Tardis' is a conventional-height variety with high kernel content and large grains. The DNA was extracted from the seedling leaves of F 2 plants and the parents using a QIAGEN DNeasy 96 Plant Kit (QIAGEN, Crawley, United Kingdom) and genotyped using 121 polymorphic microsatellites (Dumlupinar et al., 2016;Jannink & Gardner, 2005;Li, Rossnagel, & Scoles, 2000;Pal, Sandhu, Domier, & Kolb, 2002;Wight, Yan, Fetch, Deyl, & Tinker, 2010;Wu, Zhang, Chen, & He, 2012) and with the oat DArT array (Tinker et al., 2009; Diversity Arrays Technology Pty, Canberra, Australia) which identified a further 424 polymorphic (dominant) loci.
From each F 2 plant, families of F 3 seed were harvested and multiplied in the field to produce F 4 bulks to enable sufficient seed for replicated field trials. Each F 2 -origin plant therefore defines a lineage with the resulting progeny forming a family and F 3 and F 4 genotypes are inferred from the F 2 . A RIL mapping population of 227 individuals was derived by SSD from individual seeds of the F 3 plants, giving a slightly larger number of individuals than the initial 194-line F 2 population. The population size was increased by selecting single seeds from individual F 3 dwarf and tall plants. Progeny were advanced through SSD to the F 7 generation where leaf material was sampled for DNA extraction as previously described (Figure 1). In addition to microsatellites and DArT markers, GBS libraries were constructed following the oat protocol developed and described by Huang et al. (2014) and processed as reported in Bekele et al. (2018). In this analysis, 1,046 markers were used for the RILs, and between the F 2 and RIL datasets there were 401 common markers, of which 100 were codominant and 301 were dominant. Genotype calls and map locations are integrated into The Triticeae Toolbox oat platform (http://triticeaetoolbox.org/oat/genotyping) as reported in Bekele et al., 2018. Stratification of the population for both phenotyping and genotyping is summarized in Figure 1. Phenotypic assessment for production-related traits (maturity, ear emergence, Internode 1 length, kernel content, panicle length, panicle extrusion, winter hardiness, height, grain yield, mildew, hullability, grain length, grain width, and grain area) was conducted in either the field or polytunnel at the F 2 (2005), F 3 (2006), and F 4 (2007-2010) generation. In addition, the F 7 RILs were phenotyped (2010-2014) for both the production characteristics (as previously) and the quality trait grain β-glucan content at the RIL (F 7 ; 2010-2014) generation (Table 1). All field trials were conducted in Aberystwyth, United Kingdom (52.43 lat, 4.02 long) and used standard pre-emergence and early spring weed control with no fungicides or growth regulators applied. Nitrogen fertilizer (70 kg ha −1 ) was applied in a split dose at GS31 and GS35 (Zadoks, Chang, & Konzak, 1974). The traits were assessed using a range of standard phenotyping methods, summarized in Supplemental Table S1. The number of individuals phenotyped for each trait varied, and data was averaged across trial entries to derive phenotypic means (Table 1).

Genomic prediction models
Two genomic prediction methods were used: ridge regression-BLUP (RR-BLUP; Piepho, 2009) and differentially penalized regression (DiPR; Bentley et al., 2014;Ward, Rakszegi, Bedő, Shewry, & Mackay, 2015). The use of two methods allowed T A B L E 1 Within generation predictions of traits assessed on the 'Buffalo' × 'Tardis' population with different generations of phenotyping (Gen P ) and genotyping (Gen G ) assessed in a range of years in field (F) or polytunnel (PT) trials. The ridge regression best linear unbiased predictor (RR-BLUP) predictions are made across the full set of available lines (All), and within Dw6 classes (Tall, Dwarf). SD, standard deviation for validation of models, including testing the use of marker information in a single matrix against differential weighting of the dominant (DArT) and codominant (microsatellite and GBS) marker data combined using DiPR. The RR-BLUP analysis used the package rrBLUP v4.6 (Endelman, 2011) in R v3.3.3 for Windows (R Core Team, 2016). Predictions were compared within a generation between lineages and between generations using an integrated data matrix. For the integrated data matrix, genotype data for dominant markers were attributed half scores to account for their uncertainty, akin to an imputed marker (i.e., AA or AB: 0.5; AB or BB: −0.5), whereas whole value scores were used for codominant marker data (i.e., AA: 1; AB: 0; BB: −1). Imputation of further missing marker data was performed using the random forest algorithm implemented with the R package missForest v1.4 (Stekhoven & Buhlmann, 2012) with 1,000 trees and using Chi-squared tests for parameterization of the missForest model with artificially removed data. Five-fold cross-validation (CV) within generations was performed with 100 replications via Monte Carlo cross-validation (MCCV; Xu & Liang, 2001). Cross-validation between generations, between and within lineages, was performed with k-fold CV (k = 2, 3, 4, and 5) to examine differential sampling depths from the available population (i.e., simulating a breeder having genotyped and phenotyped 0.50, 0.33, 0.25, or 0.20 of the early generation, respectively). Within generation predictions were made independently on Tall and Dwarf classes (as determined by F 2 genotyping) to account for the known segregating Dw6 gene (Molnar et al., 2012) when the training set size was greater than 30 individuals. In the full dataset, 27% of lines were classified as Tall.
All prediction accuracies are reported as pairwise Pearson correlations. In the within-generation models, Fisher's Z-transformation was used to convert Pearson correlations to a normal distribution (as Z is normally distributed whereas r is not) before averaging and back-conversion. In the between-generation models, where all available marker and phenotype data common to both generations were used to train and predict from the early generation (F 2 , F 3 and F 4 ) to RILs, accuracy is reported as the pairwise Pearson correlation within a family. In the between-generation, between-lineage models, genotypes were randomly sampled without replacement 100 times according to that k-fold CV analysis (where k = 2-5). Accuracy is reported for both within and between family Fisher's Z-transformed mean Pearson correlations with back-conversion across the 100 per-k iterations.
To implement DiPR, the common marker data from F 2 to RIL genotypes were divided into dominant (DArT) and codominant (microsatellite and GBS) marker types. Markers were thinned at an r 2 value of .90 to prevent oversampling and an additive relationship matrix was derived for each marker type. These were linearly combined into a single matrix with separate weighting factors (w and 1−w), between w = 0 and w = 1 in 0.01 steps, to produce a single input to RR-BLUP, as previously described (Bentley et al., 2014;Ward et al., 2015). Model fitting used the R package 'RR-BLUP' (Endelman, 2011) and the optimal w-value was determined as the maximum cross-validation correlation. At w = 0, only the codominant markers contributed to the prediction and at w = 1, only the dominant markers contributed. The intervening weights use differential penalization consistent between matrices but with the two marker sets contributing to the additive relationship matrix proportional to their weighting (Bentley et al., 2014).

RESULTS
Across the early generation (F 2 ) genotypes, there were 545 genotyped markers, of which 424 were dominant and 121 codominant. For the RIL (F 7 ) population there were 1,046 codominant genotyped markers. Between the two datasets there were 401 common markers, of which 100 were codominant and 301 were dominant.

Within-generation predictions
In order to determine the added value of early generation genotyping, within-generation models were tested. Predictions were made using F 2 genotype data to predict phenotypes at the F 2 , F 3 , and F 4 level, while RIL genotype data (F 7 ) was used to predict phenotypes at the RIL level (F 7 ). A total of 15 phenotypes were predicted with 12 predicted at both the early and later RIL generation, and all data are presented in Table 1. At the F 2 genotype level, all the phenotypes were compared across all lines as well as within Dw6 genotypic Tall and Dwarf classes. The accuracy of prediction varied across traits and generations. The traits that were predicted to the highest levels of accuracy across generations were height (range .81-.89), ear emergence (.64-.79), and kernel content (.60-.66). For the majority of traits, the accuracy of prediction was higher when using F 7 genotypic and phenotypic data compared to predicting in early generations (kernel content, maturity, mildew, panicle extrusion, grain yield, grain length, grain width, and hullability). Variation was observed for the accuracy of trait prediction when using different phenotyping generations or trial years for some traits including Internode 1 length (.34 from F 4 compared to .79 from F 2 and .83 from F 7 phenotypes) and winter hardiness (.09 from F 3 , .47 in 2011 F 7 trials to .70 from F 4 , and .77 from F 7 in 2012 phenotypes). Predicting within Dw6 classes gave generally low predictions for all traits when compared to predicting across the full dataset, with the exception of height, grain length, and width in the F 4 and the overall low prediction traits (maturity, mildew, winter hardiness, and panicle length).

Between generation predictions
To examine whether between-lineage predictions were possible from early to late generations, predictions were made within and between lineages as well as across all available data. The phenotypic correlation between early generation and RIL phenotypes was used as a proxy for the accuracy of imposing selection on phenotype alone at the F 4 generation for comparative purposes. Nine traits were selected for comparison to assess differences in predictive accuracy between the early and late generations and all data are presented in Table 2. In general, this showed that low-level genotyping (combined with phenotyping) in the early generation was sufficient to allow relatively accurate predictions to be made for later generation (F 7 ) phenotypes with the exception of kernel content (.66 for all markers, dropping to .39 with 50% of individuals). High prediction accuracies were maintained for Internode 1 length for predictions from the F 2 to F 7 (.58-.76) and F 4 to F 7 (.54-.77) across genotyping coverage as well as for panicle extrusion, grain yield, and height (Table 2). Where predictions overall were low (maturity, mildew, and winter hardiness), accuracies were maintained or slightly reduced with coverage. For ear emergence, the predictions from early generation to field grown F 7 lines were high overall (.62) and showed a slow pattern of reduction with genotyping density, but predictions from early generations to F 7 polytunnel phenotypes were low (.29 for F 2 , .12 for F 3 ). This was not the case for height, with predictions stable across both field and polytunnel trials (Figure 2).

Comparing methods for handling mixed marker data
Dominant markers provide less information than codominant markers but are cheaper to generate, meaning that a greater number are likely to be available (or required) to generate accurate predictions. The DiPR was implemented across nine traits (Table 3) to assess the predictive advantage of proportionally combining marker types in a single additive relationship matrix. When compared to predictions based on a single marker type using RR-BLUP, the DiPR method performed as well or better than a RR-BLUP model using a single matrix with results summarized in Table 3. Low RR-BLUP predictions for maturity (.36), mildew (.49), and winter hardiness (F 3 predictions .43) were all improved through the implementation of DiPR (.47, .52, and .48, respectively) although their optimal weighting factors varied. Differential weighting for these low-prediction traits showed that only using codominant markers (DiPR w opt = 0.00) improved maturity and winter hardiness predictions whereas using only dominant markers (w opt = 1.00) optimized prediction of mildew.

DISCUSSION
As genotyping costs fall, there is an opportunity to use genomic prediction to reduce the number of individuals phenotyped in within-cross breeding populations. We demonstrate that it is feasible to use the genotypic information from a full set of biparental lines to make within generation, betweenlineage genomic predictions. This can recover information on missing phenotypic data to improve selection resilience representing an added value to early generation genotyping beyond deselection of unfavorable alleles, as previously described for wheat (Triticum aestivum; He et al., 2016) and soybean (Glycine max; Ma et al., 2016). Early generation genotypic data can also be used to reduce the number of lines required in later generation phenotyping based on siblings progressed to generate stable, genotyped homozygous lines. Our results demonstrate that early generation genotyping need not cover the full population in order to attain accuracies in line with true trait correlation between early and late generation phenotypes (a proxy for selecting on early generation phenotypes alone), as has been previously shown in small populations (Wong & Bernardo, 2008). We therefore propose that strong within-cross selection could be imposed early in a breeding cycle whilst retaining accuracy of selection. Prior simulations of within-cross genomic prediction have been reported in maize, suggesting that gains plateau with selfing rounds, with the F 4 capturing 90% of the F 8 gains due to an increase in the maximal breeding value of the population (McClosky et al., 2013). If these gains can be identified within lineages in the early stages, then accurate selection could be imposed before phenotypic selection.
In this study we performed predictions with F 2 and F 7 (RIL) genotype data. In the first instance, F 2 genotypes were employed to make models with early generation (F 2 , F 3 , and F 4 ) phenotypes with 80% of available phenotype data as the training set and 20% as the test set. This simulates lost data in early generation phenotyping when an accurate representation of the cross' phenotypic value is required for selection. A major limitation to the implementation of GS within small breeding programs is the high upfront genotyping cost (Varshney et al., 2012). Our results indicate that there is an advantage to early generation genotyping, and that this need not be at high coverage in order to provide value to between-generation RIL performance prediction.
Prediction within the F 7 RILs had generally high accuracy and demonstrates potential savings in later stage phenotyping costs. Where seed is generated for RIL phenotyping in a shuttle breeding framework (Borlaug, 1968;Forster et al., 2015), there is a requirement to transfer substantial quantities of seed between environments. Our data indicates that an alternative to the movement of large amounts of seed could be to use separate sets of families to be tested in multiple target environments and to use within-generation prediction for the missing environment performance. However, this would need to be empirically tested as the effect of environmental variability on robustness of prediction are well documented (Burgueño, de los Campos, Weigel, & Crossa, 2012;Jarquín et al., 2014). This would be particularly attractive in Europe where out-of-season multiplication takes place in climatically matched environments in the southern hemisphere, representing a major cost. Using sets of small families could also enable the use of GS for adaptation to multiple target environments at an early stage in the breeding program. This is currently limited by seed availability and would have both cost and logistical advantages in using sibling predictions to avoid phytopathological quarantine requirements.
Our data indicate that between family predictions across generations could allow for earlier lineage selection. Early selection is currently limited due to high levels of heterozygosity and uncertain phenotypic value of lineages. However, if a portion of the F 2 lineages are genotyped (as single plants), and their derived F 4 field phenotypes (based on siblings from F 3 rows) are used in conjunction with low-coverage F 2 genotyping, a genomic prediction model could be developed. Following subsequent production and genotyping of fixed RILs, a prediction can be used to select which of the cross' lineages are likely to perform best and reduce the number of entries into fully replicated field trials, therefore accelerating the breeding cycle (Jannink, Lorenz, & Iwata, 2010). This offers the ability to use F 2:4 families to predict F 7 s derived from different F 2 s and to rapidly generate F 7 lines while producing a prediction equation over one or two seasons of yield testing. Selection among the F 7 is then made on the predicted trait values. This theoretical program design is summarized in Supplemental Figure S1.
We also compared different proportions of F 2 genotyping as an approximation for a breeder varying the level of financial investment in F 2 genotyping, with all derived RILs then being genotyped. There was a reduction in predictive accuracy as the proportion of F 2 genotyping declined although, even at low representation, some traits could still be predicted to the same levels as for phenotypic selection at the F 4 generation. Similar results have previously been shown in biparental maize population simulations (Bernardo & Yu, 2007).
The employment of within-cross predictions reported here must be tailored to the existing breeding program, particularly with respect to number of crosses per cycle and selection intensity in order to ensure financial viability. The evaluation of economic aspects of GS implementation are essential for wider application (Abed, Pérez-Rodríguez, Crossa, & Belzile, 2018). However, given the ability to achieve rapid generation time (Watson et al., 2018), our accuracy results suggest that selections could be made much earlier, although this remains to be empirically tested within breeding programs. Given that between-lineage accuracies are similar to withinlineage accuracies, our data suggest that independent families can be used to predict across lineages. In addition to showing that between-lineage prediction is possible, we also show that F 3 and F 4 segregated material (as used in shuttle breeding or remote testing) can be used to reduce the costs associated with multi-environment testing. Bekele et al. (2018) recently demonstrated heading date prediction accuracies of up to .67 in independent training and test populations. The accumulation of data from many crosses also represents a first step to the full implementation of GS within a program (Gorjanc et al., 2017a;Sverrisdóttir et al., 2017;Edwards et al., 2019). However, we note that further work is required to compare the T A B L E 3 Comparison of methods to handle mixed dominant and codominant marker types for predictions in the 'Buffalo' × ' Tardis' population. The training population consists of early generation phenotyped (Gen P ) individuals (Phen n ) for common traits assessed in the test set composed of F 7 recombinant inbred lines (RILs) in either the field (F) or polytunnel (PT). Prediction is compared between standard ridge regression best linear unbiased predictor (RR-BLUP) and differentially penalized ridge regression (DiPR), with the associated optimal weight value (DiPR w opt ) given within-cross predictions reported here to wider performance across a breeding program with analysis of all crosses jointly (Jannink et al., 2010). The longer-term adoption and implementation of a multi-subpopulation training population (de Roos, Hayes, & Goddard, 2009) offers an attractive gradual adoption model for GS in small programs if LD can be maintained with higher marker densities (Asoro et al., 2011). Finally, we demonstrate that the use of mixed marker data can be optimized using DiPR. Although dominant marker use is declining, they still represent the cheapest genotyping method for low-resource crops and much legacy data exists. Dominant markers are less informative than codominant markers and their use can be problematic for GS across generations because of varying levels of heterozygosity that cannot be accounted for. We considered an alternative to a linear combination of dominant and codominant markers that separately weighted marker types as components of a single additive relationship matrix. Implemented as DiPR (Bentley et al., 2014;Ward et al., 2015), this showed that for some traits an optimized weighted combination of the two marker types improved prediction accuracy compared to a combined matrix using all available data. When the weight factor (w) was zero, only codominant data was used in the prediction. As the weight tends toward one, more weight is applied to the dominant marker data. For example, for kernel content (training: F 4 2007, 2008, 2010; test: RIL 2012) an intermedi-ate optimal solution (w opt = 0.22) was found. This compares to mildew (training: F 3 2006; test: RIL 2012) which had a dominant marker optimum (w opt = 1.00) and winter hardiness (training: F 3 2006 and F 4 2007; test: RIL 2012) which had a codominant marker optimum (w opt = 0.00). Although dominant markers have been largely superseded by SNP-based methods of genotyping, our results indicate that for some traits they provide useful information. The low frequency or uneven distribution of SNP markers across the oat genome (Bekele et al., 2018) may explain why the dominant markers used here made higher, or complete contributions to optimal predictions for disease (typically a dominant genetic effect, controlled by a limited number of loci; Okoń & Ociepa, 2018). Conversely, winter hardiness was optimally predicted from codominant markers and it is a documented complex, quantitative trait that has limited tractability in oat breeding programs (Chawade et al., 2012). Therefore, we propose that the genetic architecture of a trait combined with marker coverage are determinants of optimal DiPR weighting.
Our findings are potentially useful for other studies looking to combine data types in predictions. Asoro et al. (2013) previously proposed the use of selection criteria to weight low-frequency favorable alleles in GS to avoid loss of diversity with increasing gains for β-glucan in oat breeding. We also demonstrate the utility of within-and between-generation predictions in a narrow-base oat population. The predictions reported here would have benefits to a breeding program where genotyping costs are less than field trial costs. In this study we use a modest number of individuals and markers, but scaling to higher density markers, larger numbers of individuals, and broadening the population base are all opportunities for achieving future breeding gains.