Optimizing genomic selection for blight resistance in American chestnut backcross populations: A trade‐off with American chestnut ancestry implies resistance is polygenic

Abstract American chestnut was once a foundation species of eastern North American forests, but was rendered functionally extinct in the early 20th century by an exotic fungal blight (Cryphonectria parasitica). Over the past 30 years, the American Chestnut Foundation (TACF) has pursued backcross breeding to generate hybrids that combine the timber‐type form of American chestnut with the blight resistance of Chinese chestnut based on a hypothesis of major gene resistance. To accelerate selection within two backcross populations that descended from two Chinese chestnuts, we developed genomic prediction models for five presence/absence blight phenotypes of 1,230 BC3F2 selection candidates and average canker severity of their BC3F3 progeny. We also genotyped pure Chinese and American chestnut reference panels to estimate the proportion of BC3F2 genomes inherited from parent species. We found that genomic prediction from a method that assumes an infinitesimal model of inheritance (HBLUP) has similar accuracy to a method that tends to perform well for traits controlled by major genes (Bayes C). Furthermore, the proportion of BC3F2 trees' genomes inherited from American chestnut was negatively correlated with the blight resistance of these trees and their progeny. On average, selected BC3F2 trees inherited 83% of their genome from American chestnut and have blight resistance that is intermediate between F1 hybrids and American chestnut. Results suggest polygenic inheritance of blight resistance. The blight resistance of restoration populations will be enhanced through recurrent selection, by advancing additional sources of resistance through fewer backcross generations, and by potentially by breeding with transgenic blight‐tolerant trees.


| Historical background
Efforts to restore the American chestnut (Castanea dentata) have been ongoing for nearly 100 years. The chestnut blight fungus (Cryphonectria parasitica), first introduced into North America from Asia in the early 1900s, killed approximately 4.2 billion C. dentata stems from northern Mississippi to coastal Maine by the 1950s (Gravatt, 1949;Hepting, 1974;Little, 1977;Newhouse, 1990). The extirpation of C. dentata reduced wildlife carrying capacity and altered nutrient cycling in forests throughout its native range (Dalgleish & Swihart, 2012;Ellison et al., 2005). Today, an estimated 431 million American chestnut stems survive as seedlings and collar sprouts, but their stems rarely flower and almost never produce viable seed before being re-infected with the blight (Dalgleish, Nelson, Scrivani, & Jacobs, 2016). Publicly funded breeding programs, initiated in the 1920s by the U.S. Department of Agriculture and the Brooklyn Botanical Garden, hybridized C. dentata with Asian Castanea species that are tolerant of chestnut blight (Anagnostakis, 2012;Burnham, Rutter, & French, 1986). However, these F 1 hybrids were not sufficiently competitive in the mixed hardwood forests typical of the historical C. dentata range (Schlarbaum, Hebard, Spaine, & Kamalay, 1998), and these early chestnut breeding programs were largely discontinued by the 1960s (Jaynes, 1978).
In 1983, the American Chestnut Foundation (TACF) was founded and backcross breeding was proposed to generate hybrids that combined the blight resistance of Chinese chestnut (Castanea mollissima) with the timber-type form of American chestnut (Burnham, 1981(Burnham, , 1988Burnham et al., 1986). Backcrossing C. mollissima × C. dentata hybrids to C. dentata over three generations was expected to generate BC 3 F 1 hybrids that inherited an average of 15/16ths (93.75%) of their genome from C. dentata. The BC 3 F 1 trees were intercrossed to generate BC 3 F 2 populations from which a subset of trees was predicted to be homozygous for blight resistance alleles from C. mollissima. Large quantities of blight-tolerant BC 3 F 3 seed for restoration would then be generated through open pollination among the selected homozygous blight-tolerant BC 3 F 2 trees.
The backcross method was initially implemented based on two hypotheses. First, alleles for blight resistance segregate at a few loci with incomplete dominance. Second, trees that are heterozygous for blight resistance at all loci can be reliably selected in each backcross generation. Incomplete dominance of blight resistance was surmised from the observation that F 1 hybrids develop blight cankers that are intermediate in size and severity between C. mollissima and C. dentata (Graves, 1950). Burnham et al. (1986) hypothesized that blight resistance segregates at two loci based on observations of Clapper (1952) that F 1 hybrids backcrossed to C. mollissima segregate at a ratio of three small cankered trees (blight-tolerant) to one large cankered tree (susceptible). Later, Kubisiak et al. (1997Kubisiak et al. ( , 2013 found that three QTLs on three linkage groups (B, F, and G) explained 40% of the variation in canker severity in a full-sib (C. dentata × C. mollissima) × (C. dentata × C. mollissima) F 2 family.
The American Chestnut Foundation began backcross breeding in 1989 by pollinating two (C. dentata × C. mollissima) × C. dentata BC 1 hybrids (the "Clapper" and "Graves" trees) with C. dentata pollen from F I G U R E 1 Map of the American Chestnut Foundation orchard locations across the native range of Castanea dentata Backcross Demonstration Progeny Test Seed Orchard Transgenic Meadowview multiple trees in southwest Virginia (Hebard, 2006;Steiner et al., 2017). These BC 1 trees were chosen as sources of blight resistance to reduce the number of additional generations of breeding and selection required to reach the BC 3 F 3 generation. The "Clapper" and " Graves" trees have different C. mollissima grandparents (Clapper, 1963;Hebard, 2006) and were bred as distinct sources of resistance based on the possibility that blight resistance would segregate at different loci among the progeny of these trees. Phenotypic selection was performed in the BC 2 F 1 and BC 3 F 1 generations at TACF's Research Farms in Meadowview, Virginia, by artificially inoculating stems with C. parasitica and selecting trees with subjective canker severity ratings that were indistinguishable from F 1 hybrids (Steiner et al., 2017). Additional selection was made for leaf and twig characteristics that resembled those of C. dentata (Diskin, Steiner, & Hebard, 2006;Hebard, 1994).
Citizen scientists affiliated with TACF have subsequently pollinated wild-type trees ranging from Alabama to Maine with pollen from selected BC 2 F 1 and BC 3 F 1 trees from the Meadowview breeding program to increase the genetic diversity and adaptive capacity of backcross populations (Westbrook, 2018; Figure 1).

| Current state of the Meadowview breeding program
The Meadowview backcross breeding program is now reaching the final stages of selection for blight resistance. Large segregating BC 3 F 2 populations have been generated by open pollination among selected BC 3 F 1 descendants of the "Clapper" and "Graves" trees.
Between 2002 and 2018, approximately 36,000 BC 3 F 2 progeny of 83 "Clapper" BC 3 F 1 selections and 28,000 BC 3 F 2 progeny of 68 "Graves" BC 3 F 1 selections were planted in two seed orchards. Each seed orchard contains nine blocks, and each block contains family plots of 150 BC 3 F 2 half-sib progeny from each C. dentata backcross line. Backcross lines correspond to the C. dentata grandparent of the BC 3 F 1 selections, and there are 25 backcross lines in the "Graves" seed orchard and 29 backcross lines in the "Clapper" orchard (Steiner et al., 2017). Assuming that blight resistance segregates at three unlinked loci (Kubisiak et al., 1997), that all BC 3 F 1 selections were heterozygous for C. mollissima alleles at these loci, and that 80% of BC 3 F 2 seeds planted would survive to inoculation, Hebard (1994Hebard ( , 2002 surmised that there is a 99% probability of generating nine homozygous blight-tolerant BC 3 F 2 trees from each backcross line. Between 60% and 80% of BC 3 F 2 trees were culled on the basis of significant canker expansion 6 months after inoculation. Additional culling was performed based on late-developing blight phenotypes that tend to be expressed in trees 5 years and older such as the survival of the main inoculated stem and the severity of additional cankers that developed as a result of natural infection by C. parasitica (Hebard, 2006). As of 2018, approximately 3,300 "Clapper" and 4,300 "Graves" BC 3 F 2 trees remain. To accurately estimate the genetic resistance of the remaining BC 3 F 2 selection candidates, TACF has planted randomized field trials of their open-pollinated BC 3 F 3 progeny. After inoculating these trials with C. parasitica, average canker severity of the most blight-tolerant BC 3 F 3 families was intermediate between Chinese chestnut and American chestnut. This finding led Steiner et al. (2017) to hypothesize that blight resistance segregates at more loci than previously assumed and that phenotypic selection has not been sufficiently accurate to select for the complete set of resistance alleles from C. mollissima founders in all backcross lines.

| Rationale for genomic selection
The final objective for the Meadowview seed orchards is to cull all but 1% of the most blight-resistant BC 3 F 2 parents (i.e., select approximately 600 trees). The aim of selection is to enhance the average blight resistance of the BC 3 F 3 progeny derived from intercrossing the BC 3 F 2 selections, while also representing genetic diversity from most C. dentata parents among the selections. Progeny testing all 7,600 remaining BC 3 F 2 selection candidates is not feasible as it would require planting hundreds of thousands of progeny and waiting many years for all BC 3 F 2 selection candidates to flower.
In this study, we developed genomic prediction models for blight resistance to accelerate final selections. Model development entailed genotyping a training population of BC 3 F 2 mothers whose progeny have inoculated with C. parasitica. The blight resistance of the unknown fathers of the BC 3 F 3 progeny may have biased the progeny canker severity breeding value estimates, especially for small BC 3 F 3 families. To control for this potential bias, we also used genomic relationships among BC 3 F 2 trees to estimate breeding values for five late-developing blight phenotypes for individual selection candidates. We summed the breeding values for the latedeveloping phenotypes of the BC 3 F 2 trees with breeding values for average canker severity of their BC 3 F 3 progeny to estimate a Blight Selection Index for each selection candidate. The selection index enabled simultaneous ranking of a large number of BC 3 F 2 selection candidates including trees <5 years old that have not flowered and were too young to reliably express blight resistance phenotypes.

| Study objectives
Our first aim was to optimize an analytical pipeline for genomic selection for blight resistance in American chestnut backcross populations. Toward this end, we generated a draft reference genome for C. dentata and performed genotyping by sequencing on 1,230 BC 3 F 2 selection candidates from the Meadowview breeding program. We optimized the single-step (HBLUP) method, in which breeding values were predicted from a blend of pedigree and genomic relationships with trees in the training population ).
Our second aim was to investigate the genetic architecture of blight resistance by estimating the correlation between the proportion of BC 3 F 2 trees' genomes inherited from C. dentata and breeding values for blight resistance. Hypothetically, a major gene architecture would be implied if a subset of BC 3 F 2 trees demonstrated high blight resistance and inherited a high percentage (>90%) of their genome from C. dentata. By contrast, a strong negative correlation between blight resistance and genome inheritance from C. dentata would suggest polygenic control. We also compared the predictive ability of HBLUP to Bayes C regression. Bayes C, which includes only the largest effect markers in the prediction model, has been found to have greater predictive ability than HBLUP for traits that are controlled by few major effect loci. In contrast, HBLUP and Bayes C have similar predictive ability for polygenic traits (Chen, Li, Sargolzaei, Schenkel, 2014;Resende et al., 2012;Yoshida et al., 2018).

| Phenotyping BC 3 F 3 progeny
Between 2011 and 2016, 7,173 BC 3 F 3 progeny from 346 "Clapper" and 198 "Graves" open-pollinated BC 3 F 2 mothers were evaluated for blight resistance. Between 27 and 33 BC 3 F 3 progeny from each BC 3 F 2 mother were planted at TACF's Meadowview Research Farms in a completely randomized design (2011-2013 tests) or an alpha-lattice incomplete block design -2016Patterson & Williams, 1976). In their third growing season, the main stems of BC 3 F 3 trees were inoculated with the SG2,3 (weakly pathogenic) and Ep155 (highly pathogenic) strains of C. parasitica at two stem heights approximately 25 cm apart using the cork borer agar disk method (TACF, 2016). The SG2,3 and Ep155 strains were originally isolated from American chestnut trees in Virginia and Maryland, respectively (M. Double, personal communication). The BC 3 F 3 family rankings for average canker severity using these two strains were strongly genetically correlated (r genetic > 0.95), suggesting generalized rather than strain-specific mechanisms of host blight resistance (Steiner et al., 2017;Westbrook & Jarrett, 2018).
Canker lengths and subjective ratings were phenotyped 5-6 months after inoculation. Cankers were rated as 1 = minimal expansion beyond initial lesion, 2 = some expansion, but canker partially contained by callus formation, or 3 = canker large, sunken, and sporulating ( Figure S1). The trait "canker severity" was calculated separately for each strain of C. parasitica (SG2,3 & Ep155) by scaling the variation in canker lengths and canker ratings to mean 0 and standard deviation 1, and summing the standardized rating and length. The canker severities for each strain of C. parasitica were then summed to obtain a single canker severity value for each tree.
Canker severity phenotypes were obtained for 48% of the BC 3 F 3 seeds that were planted, and 2-40 BC 3 F 3 progeny (median = 13) were phenotyped per BC 3 F 2 mother. Canker severity phenotypes of BC 3 F 3 trees were continuously distributed, and there was no difference in the average canker severity in the Clapper and Graves BC 3 F 3 populations ( Figure S2).

| Phenotyping BC 3 F 2 selection candidates
Trees remaining in Meadowview seed orchards that were between 5 and 16 years old were phenotyped for five binary traits hypothesized to be indicative of blight resistance or susceptibility. All trees were phenotyped for main stem survival. Trees with a living main stem were then phenotyped for four additional traits on the main stem, namely the presence or absence of any canker longer than 15 cm; the presence or absence of exposed wood; the presence or absence of sporulation of C. parasitica conidia from cankers; and the presence or absence of sunken cankers. In total, 1,134 "Clapper" and 1,042 "Graves" BC 3 F 2 selection candidates were phenotyped for these traits.

| Generation of a draft reference genome for Castanea dentata
We generated a draft reference genome sequence for the immediate purpose of detecting SNP variants in backcross populations.
We sequenced the "Ellis1" clone of C. dentata by whole-genome shotgun sequencing using the PacBio Sequel sequencing platform at the HudsonAlpha Institute in Huntsville, Alabama. A total of 16 cells using chemistry 2.1 were sequenced with a p-read yield of 88.69 Gb (8,327,003 reads), for a total coverage of 98.54× (median read size 7,745 bp). The reads were assembled using MECAT (Xiao et al., 2017) and subsequently polished using ARROW (Chin et al., 2013). This produced 2,959 contigs with an N 50 of 4.4 Mb and a total genome size of 967.1 Mb. Contigs were then collapsed to remove redundant alternative haplotype sequence and screened against bacterial proteins, organelle sequences, and the GenBank nonredundant database to detect and remove contaminants. Version 0.5 of the C. dentata genome contains 793.5 Mb of sequence, consisting of 950 contigs with a contig N 50 of 8.1 Mb.

| Library preparation for genotyping by sequencing
Newly expanded leaves were collected from BC 3 F 2 selection candidates in June 2017. The leaf tissue was ground in liquid nitrogen, and genomic DNA was extracted using a Qiagen DNeasy

| Bioinformatics for SNP calling
Raw reads were filtered for quality, filtered for adapter contamination, and demultiplexed using STACKS software (Catchen, Hohenlohe, Bassham, Amores, & Cresko, 2013). Filtered reads were then aligned to v. 0.5 of the C. dentata reference genome using the Burrows-Wheeler Aligner (BWA) mem algorithm and subsequently converted to BAM format, sorted, and indexed with SAMtools (Li & Durbin, 2010;Li et al., 2009). GVCF files for each sample were generated using the GATK HaplotypeCaller algorithm (McKenna et al., 2010;Poplin et al., 2017), and these GVCFs were then merged using the GenotypeGVCFs function to create a candidate polymorphism set. Variants were flagged and removed as low quality if they had the following characteristics: low map quality (MQ < 40); high strand bias (FS > 40); differential map quality between reads supporting the reference and alternative alleles (MQRankSum < −12.5); bias between the reference and alternate alleles in the position of alleles within the reads (ReadPosRankSum < −8.0); and low depth of coverage (DP < 5). The resulting VCF file was filtered to retain only biallelic SNPs with <10% missing data and minor allele frequencies >0.01, leaving 71,507 SNPs. Missing SNP genotypes were imputed with Beagle v 4.1 (Browning & Browning, 2016). A total of 1,230 (865 "Clapper" and 365 "Graves") BC 3 F 2 individuals were genotyped.

| Single-step prediction of progeny canker severity
Breeding values for average progeny canker severity were obtained for all open-pollinated BC 3 F 2 mothers that were genotyped and/ or whose BC 3 F 3 progeny were phenotyped using the single-step HBLUP method Legarra et al., 2009;. This method blends the pedigree and genomic relationship matrix into a single matrix H so that phenotypic and genotypic data for both genotyped and nongenotyped individuals can be used to estimate breeding values. Breeding values were estimated from blended pedigree and genomic relationships and progeny canker severity phenotypes for 211 "Clapper" and 154 "Graves" BC 3 F 2 mothers; from pedigree relationships and progeny phenotypes for 135 "Clapper" and 44 "Graves" BC 3 F 2 mothers that died prior to genotyping; and from pedigree and genomic relationships alone for 654 "Clapper" and 211 "Graves" BC 3 F 2 mothers whose progeny had not yet been phenotyped ( Figure 2). Single-step prediction of average progeny canker severity was first performed separately for "Clapper" and "Graves" BC 3 F 2 populations based on the assumption that these populations are unrelated. Then, the data from both populations were combined into a single analysis to determine whether realized genomic relatedness between populations enhanced predictive ability. Martini et al. (2018) found that the parameters τ and ω that scale the inverse of genomic and pedigree relationships respectively in H −1 influence predictive ability and bias in prediction of breeding F I G U R E 2 Numbers of BC 3 F 2 selection candidates from the "Clapper" and "Graves" populations that were phenotyped for five traits indicative of blight resistance (Mother phenotyped), whose BC 3 F 3 progeny were phenotyped for canker severity (Progeny phenotyped), and/ or were genotyped for genomic selection (Mother genotyped) values. We therefore performed single-step prediction with H-matrices parameterized with nine pairwise combinations of τ and ω involving τ = 1, 2, or 3 and ω = 1, 0, or −1 and a tenth combination in which τ = ω = 0, which is equivalent to the pedigree relationship matrix. We sought the combination of τ and ω that maximized predictive ability while minimizing prediction bias. The inverse of the parameterized H-matrix (hereafter referred to as H −1 , ) was calculated following Martini et al., (2018): where A −1 is the inverse of the pedigree relationship matrix, G −1 is the inverse genomic relationship matrix, and A −1 22 is the inverse pedigree relationship matrix among genotyped individuals. Genomic relationships in G were estimated following VanRaden (2008): where Z is the centered genotypic matrix, and p j are reference allele frequencies for locus 1 through j.
Breeding values of average progeny canker severity for BC 3 F 2 mothers were estimated with different parameterizations of H where n = 13.5 is the mean number BC 3 F 3 progeny evaluated per BC 3 F 2 mother tree (Isik, Holland, & Maltecca, 2017).
The accuracy of single-step prediction of breeding values for progeny canker severity was estimated with 10-fold cross-validation. The cross-validation was performed in ASReml-R by randomly subdividing the phenotyped BC 3 F 3 families into ten subsets and using phenotypic data from 9/10ths of the families to predict breeding values for the remaining 1/10th of the families via H −1 , . This procedure was repeated for each subset of families to obtain predicted breeding values for all BC 3 F 2 mothers. Predictive ability (r yŷ ) of family mean canker severity was estimated as follows: where r gĝ is the Pearson correlation between predicted breeding values and breeding values estimated with phenotypes from all families (Resende et al., 2012). The entire 10-fold cross-validation procedure was repeated ten times for each parameterization H

| Comparing HBLUP to Bayes C
The accuracy of the optimized HBLUP procedure was compared to that of Bayes C and prediction from pedigree relationships (ABLUP); Bayes C first estimates the parameter π, which is the proportion of SNPs with nonzero effects, and then estimates allelic substitution effects of these SNPs assuming that the effects are normally distributed (Habier, Fernando, Kizilkaya, & Garrick, 2011). Bayes C was implemented with the R package BGLR (Perez & de los Campos, 2014). Marker effects were estimated over 50,000 iterations of a Gibbs sampler after 10,000 burn-in iterations. Residual plots were inspected to confirm that there was no autocorrelation between iterations (Perez & de los Campos, 2014).
To perform 10-fold cross-validation with Bayes C, allelic substitution effects were estimated on adjusted family mean canker severity for 9/10th of the training population. Adjusted family means were estimated in ASReml-R by treating BC 3 F 2 mothers as fixed factors and year, block, and incomplete block as random factors as in Equation 3. Breeding values (ĝ) for the remaining 1/10th of the population were predicted with the formula: Zm where is Z is the centered and imputed genotypic matrix, N is the number of SNPs with minor allele frequency >0.01, and m is a vector of allelic substitution effects. Family mean prediction accuracy (r̂ŷg), or the Pearson correlation between predicted breeding values and adjusted family mean canker severity, was used to compare the accuracy of Bayes C, HBLUP, and ABLUP. The 10-fold cross-validation was repeated ten times with different random partitions of the training population to estimate variation in predictive ability due to training population composition.

| Genomic prediction of binary blight phenotypes of BC 3 F 2 parents
Single-step analysis of the blight phenotypes of BC 3 F 2 selection candidates was performed (a) to estimate the heritability of these phenotypes and (b) to predict breeding values for genotyped trees aged five or less that were too young to reliably express these phenotypes. Breeding values for these traits were predicted for 324 "Clapper" and 115 "Graves" BC 3 F 2 trees that were aged five or younger from genomic or pedigree relationships with 1,134 Clapper and 1,042 Graves BC 3 F 2 trees that were phenotyped (Figure 2).
Breeding values and heritability of presence/absence blight phenotypes of individual BC 3 F 2 trees were estimated with the binomial mixed model: where y is a binary phenotype (i.e., main stem alive/dead, presence/ absence of large cankers, exposed wood, sporulation, or sunken cankers); t i ∼ N 0,I 2 t are the random effects of years that the BC 3 F 2 trees were planted (2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014) ] are the random effects of seed orchard block (1-9); g k ∼ N 0,H , 2 g are the random additive genetic effects of individual BC 3 F 2 trees; and ijk ∼ N[0,I 2 e ] are the residuals. Phenotypic classes indicative of blight resistance were coded as 1, whereas classes indicative of susceptibility were coded as 0 (e.g., main stem alive = 1 or dead = 0; large cankers absent = 1 or present = 0; exposed wood absent = 1 or present = 0; sporulation absent = 1 or present = 0; and sunken cankers absent = 1, present = 0). where 2 ∕3 is the variance of the standard logistic distribution (Davies, Scarpino, Pongwarin, Scott, & Matz, 2015). Breeding values for binary blight traits were estimated as probability of having a trait value of 1 given the individual trees' genotype. This probability was calculated as follows: where μ is the model intercept and g is a vector random genetic effects in units of logit scores (Gezan & Munoz, 2014).
Breeding value prediction accuracy (r gĝ ) was estimated for the "Clapper" and "Graves" populations separately with 10-fold crossvalidation. Breeding value prediction accuracy was defined as the Pearson correlation between predicted breeding values when the trees' phenotypes were left out of the model versus when all trees' phenotypes were included. The 10-fold cross-validation was repeated ten times for each trait with different random partitions of each population.

| Estimation of selection indices for blight resistance
A selection index called "Parent Condition Index" was created by summing the breeding values estimated for each of the five blight traits that were phenotyped in the BC 3 F 2 population. The variance in breeding values for each trait is proportional to the trait's heritability; thus, each trait was weighted in proportion to h 2 ind . The breeding values for progeny canker severity were multiplied by −1 to obtain the variable "Progeny Blight Resistance." Both Parent Condition Index and Progeny Blight Resistance were standardized to mean = 0 and standard deviation = 1 so that they would be equally weighted.
The standardized variables were then summed to create the "Blight Selection Index."

| Estimation of hybrid indices
Hybrid indices were estimated to determine whether blight resistance is correlated with proportion of the backcross trees' genomes inherited from C. dentata. Hybrid indices were estimated for BC 3 F 2 trees with the R package introgress (Gompert & Buerkle, 2010). To generate the required parental data, genotyping by sequencing was performed as described above on 56 C. dentata individuals and 47 C. mollissima individuals. Bioinformatic processing of these data was the same as for the BC 3 F 2 samples, and after merging data from the pure species and BC 3 F 2 samples, 27,306 SNPs were retained. The VCF file was converted to STRUCTURE format with PLINK software (http://zzz. bwh.harva rd.edu/plink/ ) and subsequently to introgress format using the prepare.data function in introgress. Hybrid indices and their confidence limits were then estimated using the est.h function.

| Accuracy of single-step prediction of progeny canker severity
The first step in developing genomic prediction models for blight resistance was to optimize prediction of average progeny canker severity of the open-pollinated BC 3 F 2 selection candidates. We compared  (Equation 1). When prediction models were trained on either "Clapper" or "Graves" BC 3 F 3 progeny separately, average predictive ability was maximized (r yŷ = 0.64 for "Clapper" and r yŷ = 0.46 for "Graves") while percent bias intersected zero across cross-validation replicates when G −1 was multiplied by τ = 3, and A −1 22 was multiplied by ω = 1 (Table 1). Training genomic prediction models on the "Clapper" and "Graves" BC 3 F 3 populations combined decreased maximum predictive ability for the "Clapper" population (r yŷ = 0.52) and marginally increased maximum predictive ability for the "Graves" population (r yŷ = 0.48) at the parameterization of H −1 , that minimized prediction bias (τ = 3, ω = 0). Genotyping revealed that some BC 3 F 2 selection candidates were more closely related than expected from pedigree relationships ( Figure S3). Average Note: The pedigree relationship matrix is equivalent to τ = ω = 0. The parameters τ and ω scale the inverse of pedigree and genomic relationships among genotyped individuals, The family mean heritability (h 2 family ), predictive ability of family means r yŷ , and percent prediction bias were compared when prediction models were trained on the "Clapper" and "Graves" populations separately and when both populations were combined. TA B L E 1 Optimizing genomic prediction of BC 3 F 3 family mean canker severity with different parameterizations of the blended pedigree and genomic relationship matrix (H −1 , ) predictive ability was lower when predicting from pedigree relationships (r yŷ = 0.25 for "Clapper"; r yŷ = 0.26 for "Graves") as compared with most parameterizations of H −1 , (Table 1). Considering these results, we trained the genomic prediction model for the "Clapper" population separately after multiplying G −1 by 3 and A −1 22 by 1. We trained the genomic prediction model for "Graves" on the "Clapper" and "Graves" populations combined after multiplying G −1 by 3 and

| Prediction of progeny canker severity using Bayes C
The Bayes C method, which sets a proportion of the marker effects to zero, had a similar accuracy to predict average progeny canker severity as compared with the single-step (HBLUP) method, which incorporates all markers into the prediction (Table 2). Both and HBLUP and Bayes C methods were more accurate than prediction from the pedigree (ABLUP; Table 2). Considering these results, we used the optimized single-step method rather than Bayes C or the pedigree to predict breeding values for progeny canker severity.

| Prediction of blight phenotypes of BC 3 F 2 selection candidates
To correct for unknown paternal bias in ranking open-pollinated BC 3 F 2 selection candidates based on progeny canker severity, we also ranked the selection candidates based on the "Parent Condition Index" or the sum of breeding values for five late-developing blight phenotypes of the selection candidates themselves.
The blight phenotypes of individual BC 3 F 2 selection candidates were weakly heritable, with h 2 ind values varying from 0 to 0.25 depending on the trait (Table 3). On average, trees with an observed phenotype indicative of blight resistance (i.e., main stem alive) also had a greater breeding value or probability expressing a resistance phenotype (Figure 3). However, due to the low heritability of the blight phenotypes of BC 3 F 2 selection candidates, TA B L E 2 Comparing the optimal blend of pedigree and genomic relationships (HBLUP), pedigree relationships (ABLUP), and the Bayes C method based on accuracy to predict observed BC 3 F 3 family mean canker severity (r̂ŷg) in 10-fold cross-validation repeated ten times Note: Accuracy was assessed with 10-fold cross-validation repeated 10 times. Accuracy using the H-matrix was not assessed for presence/absence of large cankers in the "Graves" population because the trait heritability was zero.
there was substantial overlap in the distributions of breeding values between resistant versus susceptible phenotypic classes. For example, in the "Graves" population, there was no differentiation in breeding values for trees with presence or absence of large cankers and presence or absence of sporulation, indicating that these traits did not contribute to the Parent Condition Index in this population ( Figure 3). Cross-validation was performed to estimate breeding value prediction accuracy for trees younger than 5 years.
Averaged across cross-validation replicates, breeding value prediction accuracy using the H-matrix varied from 0.88 to 0.92 for "Clapper" selection candidates and from 0.5 to 0.94 for "Graves" selection candidates. Averaged across traits, accuracy was 1.3 times greater in the "Clapper" population and 2.5 times greater in the "Graves" population when using the H-matrix as compared with prediction from the pedigree (Table 3). Therefore, the H-matrix was used to estimate breeding values for the Parent Condition Index.

| Estimation of hybrid indices
We estimated hybrid indices, or the proportion of backcross genomes inherited from C. dentata v. C. mollissima, to quantify the relationship between species ancestry and blight resistance. Hybrid indices varied from 0.99 (99% C. dentata) to 0.42 (58% C. mollissima) for 865 BC 3 F 2 descendants of "Clapper," and from 0.99 to 0.35 for 365 BC 3 F 2 descendants of "Graves" (Figure 4). There were 24 "Clapper" and 10 "Graves" BC 3 F 2 trees with hybrid indices less than or equal to 0.55. These trees were inferred to be "pseudo-F 1 " progeny of BC 3 mother trees that were pollinated by C. mollissima trees on the same property. The average hybrid index of "Clapper" and "Graves" BC 3 F 2 trees, excluding pseudo-F 1 s, was 0.89.

| Comparison of different selection scenarios
The final step in genomic selection was ranking BC 3 F 2 selection candidates based on the "Blight Selection Index," which is the of sum the Parent Condition Index and the Progeny Blight Resistance Indices. The pseudo-F 1 trees were excluded from consideration for selection; however, Blight Selection Indices of the selected trees were compared to that of the pseudo-F 1 s.
For both the "Clapper" and "Graves" populations, all selections scenarios were predicted to increase the mean Blight Selection Index. However, selected trees were, on average, significantly less F I G U R E 3 Probabilities that BC 3 F 2 trees will have a phenotype indicative of blight resistance given trees' genotypes versus trees' observed phenotypes for "Clapper" and "Graves" BC 3 F 2 trees blight-resistant than pseudo-F1s ( Figure 5). The average Blight Selection Index of selected BC 3 F 2 trees was significantly greater when selecting trees with the maximum Blight Selection Index (Scenario 2) or selecting up to three trees per plot (Scenario 3) as compared with selecting one tree per plot (Scenario 1).
The trade-off when relaxing the constraint of selecting one tree per plot was a reduction of the number of C. dentata backcross lineages represented among the selections. For example, the "Clapper" BC 3 F 2 selection candidates had 41 and 28 C. dentata grandparents and great-grandparents in their maternal line. By selecting 160 trees with the maximum Blight Selection Indices regardless of plot (Scenario 2), selections included descendants from 31 C. dentata grandparents and 24 great-grandparents. By selecting a maximum of three trees per plot, selections included descendants of F I G U R E 4 Distribution of hybrid index values for BC 3 F 2 descendants of "Clapper" and "Graves." Hybrid index values indicate the proportion of hybrid genomes inherited from Castanea dentata v. Castanea mollissima (1 = 100% C. dentata) F I G U R E 5 Comparison of average Blight Selection Indices for selected "Clapper" and "Graves" BC 3 F 2 trees under different selection scenarios. Selection scenarios included making one selection per 150 half-sibs planted in each seed orchard subplot (one selection per plot); selecting an equal number of trees with the maximum Blight Selection Index (max Blight Selection Index); and making up to three selections per plot (up to three selections per plot). The average Blight Selection Index for the selected BC 3 F 2 trees was compared to that of the current population and pseudo-F 1 trees (i.e., progeny of BC 3 trees outcrossed to Castanea mollissima). Letters above the bars indicate the significance of differences in average Blight Selection Index (Tukey's test, p < .05) 33 grandparents and 25 great-grandparents. We decided to proceed with up to three selections per plot because this scenario resulted in selections with a similar average Blight Selection Indices as Scenario 2 ( Figure 5), but retained a larger proportion of the maternal C. dentata lineages.
For both the "Clapper" and "Graves" populations, blight resistance as assessed with the Parent Condition Index, Progeny Blight Resistance, and Blight Selection Index was negatively correlated with the proportion of alleles inherited from C. dentata ( Figure 6). These negative correlations were observed when genomic prediction models were developed with and without including pseudo-F 1 s in the training population, suggesting that the pseudo-F 1 s are not driving this result (not shown). Selected BC 3 F 2 trees were estimated to have inherited an average (max, min) of 83% (99%, 61%) of their genome from C. dentata. Parent Condition Index was positively correlated with Progeny Blight Resistance ("Clapper" r parent-progeny = 0.67, "Graves" r parent-progeny = 0.40; Figure 7). A total of 121 of 161 "Clapper" and 70 of 116 "Graves" selections had above-average Parent Condition Index and above-average Progeny Blight Resistance (Figure 7). A representative BC 3 F 2 selection, a pseudo-F 1 , and a pure C. dentata are pictured in Figure S4.

| Outlook for genomic selection for blight resistance in BC 3 F 2 seed orchards
Our first aim in this study was to optimize genomic selection to increase the speed and accuracy of making final selections for blight resistance in two American chestnut BC 3 F 2 seed orchards at TACF's Meadowview Research Farms. Previously, approximately 90% of the BC 3 F 2 in these seed orchards were culled based on blight susceptibility phenotypes after artificial inoculation with C. parasitica (Steiner et al., 2017 The increased accuracy of genomic prediction may be attributed to more accurate and higher estimates of relatedness between some selection candidates than expected from the pedigree. Genomic prediction accuracy generally increases with increased relatedness between the training and prediction populations (Makowsky et al., 2011;Märtens, Hallin, Warringer, Liti, & Parts, 2016). The genotypic data detected hidden relatedness between BC 3 F 2 selection candidates, including between the "Clapper" and "Graves" populations.
The relatedness between the BC 3 F 2 selection candidates may be attributed to relatedness among American chestnut founders and full-sibling relatedness among BC 3 F 2 trees that were assumed to be half-sibs in the pedigree analysis. Maximum predictive abilities for average canker severity of BC 3 F 3 progeny (r yŷ = 0.64 for "Clapper" and r yŷ = 0.48 for "Graves") were similar to or greater than predictive abilities in other forest tree species (e.g., Chen et al., 2018;Isik et al., 2016;Lenz et al., 2019;Resende et al., 2012). The increased predictive ability in this study may be due to increased precision in predicting family means as compared to predicting the phenotypes of individual trees as has been attempted in other studies.
Despite the expanded training population size by combining the "Clapper" and "Graves" populations, the lower or similar predictive ability compared with single-population analyses suggests that blight resistance segregates at unique loci in the "Clapper" and "Graves" BC 3 F 2 populations. The "Clapper" and "Graves" BC 1 sources of resistance may have inherited different portions of their Chinese chestnut grandparents' genomes. Furthermore, the two different Chinese chestnut grandparents may have unique loci for blight resistance.
Additional association and QTL analyses of blight resistance are required to test these hypotheses.
We plan to finish selection in the Meadowview seed orchards in the next few years with additional phenotypic selection, progeny testing, and genomic selection. An additional 184 "Clapper" and 216 "Graves" BC 3 F 3 families will be inoculated in field trials in 2019, 2020, and 2021. Furthermore, genotyping of approximately 1,000 additional "Clapper" and "Graves" BC 3 F 2 selection candidates is currently ongoing. We anticipate that the accuracy of genomic selection will increase by expanding the training populations as has been predicted from simulation studies (Grattapaglia & Resende, 2011) and observed for other species and traits (Asoro, Newell, Beavis, Scott, & Jannink, 2011;Chen et al., 2018;Zhang et al., 2017).

| Evaluating the genetic architecture of blight resistance
Two observations suggest that chestnut blight resistance is inherited as a polygenic trait. First, we observed a trade-off between blight resistance and the proportion of BC 3 F 2 trees' genomes inherited from C. dentata. Second, HBLUP, which assumes an infinitesimal model of inheritance, was just as accurate at predicting progeny canker severity as Bayes C, which includes only the markers with largest effects in the prediction model. Previous QTL mapping studies of blight resistance were conducted in a small C. dentata × C. mollissima F 2 family (<100 full-sib progeny; Kubisiak et al., 1997Kubisiak et al., , 2013; therefore, it is likely that the effects of individual loci were inflated and these studies were underpowered to comprehensively detect all loci associated with blight resistance (Beavis, 1994;Slate, 2013 resistance similar to C. mollissima × C. dentata F 1 hybrids. We observed that average blight resistance of BC 3 F 2 selections that inherited approximately 90% of their genome from C. dentata was less than that of pseudo-F 1 trees, which inherited approximately 50% of their genome from C. dentata. Previous studies have found that BC 3 F 3 progeny from partially selected seed orchards have improved blight resistance relative to C. dentata in orchard and greenhouse trials (Steiner et al., 2017;Westbrook & Jarrett, 2018). Therefore, we predict that the average blight resistance of BC 3 F 3 progeny from fully selected BC 3 F 2 seed orchards will be between that of F 1 hybrids and C. dentata.

| Where does breeding for American chestnut
restoration go from here?

| Restoration trials
It is not known what combination of blight resistance and C. dentata inheritance will be sufficient for American chestnut restoration. The American Chestnut Foundation has planted field trials composed of BC 3 F 3 progeny from Meadowview seed orchards at over 35 sites across the eastern United States (Figure 1). Many of these trials are between 5 and 10 years old: too young to reliably assess for blight resistance following natural infection by C. parasitica. Encouragingly, in the oldest field trials, blight incidence and severity on 8-yearold BC 3 F 3 trees were lower than on pure American chestnut and similar to Chinese chestnut (Clark, Schlarbaum, Saxton, & Baird, 2019

| Breeding strategies to improve blight resistance
Blight resistance may be improved with additional generations of intercrossing and recurrent selection (Westbrook, 2018). Once resistance is sufficient for backcross trees to compete and reproduce in forests, natural selection may continue to improve resistance and competitive ability. Furthermore, the American Chestnut Foundation is currently generating and selecting C. dentata backcross progeny from additional C. mollissima sources of blight resistance (Steiner et al., 2017;Westbrook, 2018). Based on the finding of a trade-off between blight resistance and C. dentata inheritance, we will advance these additional sources only to the BC 1 or BC 2 generations rather than BC 3 before intercrossing the selections. Backcross trees will be selected for blight resistance not only with phenotypic selection, but also by inoculating progeny derived from controlled pollinations of these trees to ensure that selection is accurate.

| Incorporating transgenic blight resistance
Lower than expected blight resistance within BC 3 F 3 populations highlights potential advantages of using transgenic American chestnut trees for restoration. Transgenic C. dentata founder lines that constitutively overexpress an oxalate oxidase (OxO) gene from wheat have high levels of blight resistance in seedling trials (Newhouse et al., 2014;Powell, Newhouse, & Coffey, 2019). Transgenic varieties have the C. dentata genetic background except one gene that confers blight tolerance.
When transgenic C. dentata are crossed with wild-type trees, 50% of their progeny are expected to inherit the resistance gene, which can be detected inexpensively with an enzymatic assay or with PCR (Zhang et al., 2013). Federal regulatory review in the United States is ongoing to release transgenic American chestnut founder trees for breeding and restoration trials outside of a few confined, permitted field trials. If federal regulatory approval is granted, TACF plans to outcross transgenic founder clone(s) to wild-type trees over five generations to increase the effective population size to >500 and to maximize genome inheritance from wild-type trees with marker-assisted introgression (Westbrook, Holliday, Newhouse, Powell, 2019a). Transgenic trees may also be crossed with backcross trees to potentially enhance blight resistance.
Public acceptance of transgenic American chestnut trees for restoration is mixed (Delborne et al., 2018), and the long-term blight resistance of transgenic trees in forest conditions is not currently known. Therefore, it is prudent to continue traditional breeding approaches that are informed by genomics separately from breeding with transgenic trees.

| CON CLUS I ON S AND FUTURE DIREC TIONS
In developing genomic prediction models and estimating hybrid indices for BC 3 F 2 American chestnuts, we discovered a trade-off between blight resistance and proportion of the genome inherited from C. dentata. Results suggest that genetic architecture underlying the inheritance of blight resistance is more complex than previously assumed. A chromosome-scale genome assembly for C. dentata is forthcoming, which will be combined with genotyping of thousands of backcross individuals to enable mapping the inheritance of C. mollissima haplotypes and discovery of genomic regions associated with variation in blight resistance.

ACK N OWLED G EM ENTS
We would like to thank the donors and volunteers with the American Chestnut Foundation who have supported the breeding effort for the past 35 years. We also thank Advanced Research Computing at Virginia Tech for providing computational resources and technical support related to the analyses described here. Initial funding for proof of concept for Funding for genotyping to predict resistance of remaining trees in Meadowview seed orchards was provided by the Allegheny Foundation and an anonymous foundation. The Colcom Foundation provided funding to generate the draft reference genome for American chestnut. We thank the Virginia Tech Open Access Subvention Fund for paying for the publication fees for this article. We also thank three anonymous reviewers for helpful suggestions on earlier versions of this article.

CO N FLI C T O F I NTE R E S T
None declared.

O RCI D
Jared W. Westbrook https://orcid.org/0000-0001-8996-1614 Once the annotation is finalized, the genome will be publicly available at the Phytozome, comparative plant genomics portal.