A high‐quality reference genome for Fraxinus pennsylvanica for ash species restoration and research

Abstract Green ash (Fraxinus pennsylvanica) is the most widely distributed ash tree in North America. Once common, it has experienced high mortality from the non‐native invasive emerald ash borer (EAB; Agrilus planipennis). A small percentage of native green ash trees that remain healthy in long‐infested areas, termed “lingering ash,” display partial resistance to the insect, indicating that breeding and propagating populations with higher resistance to EAB may be possible. To assist in ash breeding, ecology and evolution studies, we report the first chromosome‐level assembly from the genus Fraxinus for F. pennsylvanica with over 99% of bases anchored to 23 haploid chromosomes, spanning 757 Mb in total, composed of 49.43% repetitive DNA, and containing 35,470 high‐confidence gene models assigned to 22,976 Asterid orthogroups. We also present results of range‐wide genetic variation studies, the identification of candidate genes for important traits including potential EAB‐resistance genes, and an investigation of comparative genome organization among Asterids based on this reference genome platform. Residual duplicated regions within the genome probably resulting from a recent whole genome duplication event in Oleaceae were visualized in relation to wild olive (Olea europaea var. sylvestris). We used our F. pennsylvanica chromosome assembly to construct reference‐guided assemblies of 27 previously sequenced Fraxinus taxa, including F. excelsior. Thus, we present a significant step forward in genomic resources for research and protection of Fraxinus species.


| INTRODUC TI ON
Fraxinus pennsylvanica Marsh., commonly known as green ash (also as red ash, swamp ash or water ash), is the most widely distributed ash species in North America with a broad range across the east and midwest of the continent (Kennedy, 1990). Green ash has economic importance (Kovacs et al., 2010) particularly as a landscaping tree.
Due to green ash's adaptability to urban environments, it became a popular ornamental tree following the large-scale loss of American elm trees due to Dutch Elm Disease (Burns & Honkala, 1990). Its lumber is popular for woodworking, and it is used for many specialty products. Ecologically, green ash is adapted to a wide variety of environmental conditions and is considered an important source of food and/or cover for wildlife across much of its range (Gucker, 2005).
The emerald ash borer (EAB; Agrilus planipennis) is an invasive species of jewel beetle that feeds on ash trees and is a critical threat to the native ash populations of North America, including green ash. Native to northeast Asia, the EAB is believed to have arrived in America through wood packing materials. While it was first found in the USA in 2002, evidence suggests the invasive population can be traced back to southeast Michigan in the early 1990s (Siegert et al., 2014). Females lay their eggs on the bark of an ash tree and, after hatching, the larvae chew through the bark and feed on the phloem and vascular cambium, disrupting the transport of sugars and water (Poland & McCullough, 2006).
Within 6 years, the presence of the EAB can reduce a healthy population of North American ash trees to near complete mortality (Knight et al., 2013).
Worldwide there are over forty Fraxinus species, distributed throughout the temperate forests of North America, Europe and Asia. While some ash species native to Asia have resistance to EAB, species outside of the beetle's native range-including those of North America and Europe-are largely highly susceptible to the beetle's effects Rebek et al., 2008). Economic impact from EAB in the USA has been estimated to be as much as $10.7 billion, including the replacement of 17 million ornamental ash trees (Kovacs et al., 2010). Despite the overall high mortality, a small percentage of ash trees remain healthy. Further controlled EAB screening trials have confirmed some ash genotypes as having reproducibly higher resistance to the pest. Identifying the genetic basis for this trait could benefit an ongoing US Forest Service breeding programme to develop ash with enhanced resistance to EAB (Koch et al., 2012).
The Fraxinus species of Europe face an additional threat in the form of the Ascomycete fungus Hymenoscyphus fraxineus. H. fraxineus is the causative agent of a fatal disease in ash known as ash dieback, named for the crown dieback caused by the disease (Bakys et al., 2009). Symptoms of the infection include necrotic spots on stems that eventually enlarge into cankers and wilting of leaves.
European ash trees infected with H. fraxineus have an estimated mortality rate of 85% in plantations and 69% in woodland populations (Coker et al., 2019). It has primarily impacted European ash (Fraxinus excelsior) populations, though other species have been affected as well.
While not yet detected in North America, experiments indicate that American species, including green ash, are mildly susceptible (Nielsen et al., 2017).
Genomics can be an important tool to combat threats posed by invasive pests and pathogens. Annotated genomes have contributed to resistance breeding programmes in a number of crop species by allowing researchers to identify genes for resistance (Babu et al., 2020;Pérez-de-Castro et al., 2012) and recent studies show promising results for tree species (Grattapaglia et al., 2018;Neale & Kremer, 2011;Plomion et al., 2016). European ash (F. excelsior) has the most contiguous and well-annotated ash genome to date, with 89,514 nuclear scaffolds and 38,852 protein-coding genes (Sollars et al., 2017). Less contiguous scaffold-or contig-level genomes are also available for 26 other Fraxinus taxa, including three green ash individuals, though at present these do not have de novo gene annotations available . A green ash transcriptome was published in 2016, exploring the effects of EAB feeding and other stresses on gene expression in green ash trees (Lane et al., 2016), and a genetic linkage map for green ash was reported in 2019 (Wu et al., 2019). This linkage map contained a total of 23 linkage groups spanning about 2009 centimorgans (cM), with a total of 1201 markers and an average inter-marker distance of 1.67 cM (Wu et al., 2019).
To support efforts to combat EAB and other threats to ash species, we produced a chromosome-scale reference genome for green ash scaffolded with an improved genetic linkage map, which provides a foundation for exploring population diversity across the native range, discovery of trait-associated loci including for survival after EAB infestation, and comparative genomics across the Asterids. Genomeguided scaffolding of the draft genomes from an additional 27 Fraxinus taxa representing 23 species extends these resources to support research and breeding efforts in other threatened ash species. reference-guided assemblies of 27 previously sequenced Fraxinus taxa, including F. excelsior. Thus, we present a significant step forward in genomic resources for research and protection of Fraxinus species.

K E Y W O R D S
comparative genomics, emerald ash borer, Fraxinus, genome annotation, genome assembly, green ash, whole genome duplication 2 | MATERIAL S AND ME THODS Additional methodology details are available in the Supporting Information (Methods).

| Linkage map and genome construction
Additional single nucleotide polymorphism (SNP) markers were added to the green ash linkage map following previous methods (Wu et al., 2019). Briefly, DNA was extracted from 160 additional individuals from the pseudo-testcross pedigree (Grattapaglia & Sederoff, 1994) and underwent genotyping by sequencing (Elshire et al., 2011).
The reference genome is from tree PE00248, a male tree with partial EAB resistance. To improve the previously published assembly containing 555,484 scaffolds , Illumina 800-bp insert size reads were produced from DNA from the same tree and a new assembly was completed following the methods in Kelly et al., 2020. Hi-C library construction, sequencing and genome scaffolding to chromosome level were completed by Dovetail Genomics. Rearrangements were identified and corrected in the assembly using the improved linkage map. In addition to the chromosome-scale sequences, 87 unplaced scaffolds containing 10,000 bp or more were kept for analysis.
Statistical analysis of the pseudomolecules and scaffolds of the input genome was performed using quast version 5.0.2 (Gurevich et al., 2013).

| Cytology
Samples for green ash cytology were collected from a 3-year-old green ash seedling grown at Texas A&M Forest Service Facility. Root tip digestion followed previously published protocols (Islam-Faridi et al., 2009;Islam-Faridi, Sakhanokho, et al., 2020;Jewell & Islam-Faridi, 1994). Prelabelled rDNA oligonucleotide probes were used in fluorescence in situ hybridization (FISH) to characterize rDNA sites in the green ash genome.

| Characterizing the whole genome duplication event
Four species from within the Lamiids subclade of the Asterid clade were selected for comparative genomics: wild olive (Olea europaea var. sylvestris; Unver et al., 2017), common yellow monkeyflower (Erythranthe guttata, formerly known as Mimulus guttatus) (Hellsten et al., 2013), tomato (Solanum lycopersicum; Tomato Genome Consortium, 2012) and coffee (Coffea canephora; Denoeud et al., 2014). Carrot (Daucus carota) was selected as an outgroup (Iorizzo et al., 2016). The rate of synonymous mutations (K s ) in each species was determined using reciprocal blastp analyses and inputting these results into the custom KsPlotter python script as previously described (Sollars et al., 2017). After generating the K s plot, we isolated gene clusters in F. pennsylvanica that represent the most recent whole genome duplication (WGD) event predicted in the plot, indicated with a K s of 0.25 or less. A Circos plot to visualize these gene clusters was created with a custom python script (https://github.com/MattH uff/Green_Ash_Annot ation/ blob/ maste r/get_coord inates_ash_ash.py) to match genes in a cluster with their coordinates along the genome (Krzywinski et al., 2009).

| Population genetics and trait association
Genomic DNA was extracted from tissues from across 93 accessions collected from a range-wide provenance trial (Steiner et al., 1988(Steiner et al., , 2019, as well as the parent trees of the genetic linkage map. Restriction site-associated DNA sequencing (RADseq) was performed (Baird et al., 2008;Clarke, 2009;Peterson et al., 2012).

| Fraxinus spp. reference-guided genome scaffolding
busco and repeatmasker were run on the BAT0.5 F. excelsior assembly (Sollars et al., 2017) using the same parameters previously described (Seppey et al., 2019;Smit et al., 2015a). We downloaded 28 other publicly available Fraxinus scaffold-and contig-level assemblies . We used ragtag version 1.0.1 to align the scaffolds of each assembly to the chromosomes of F. pennsylvanica and join them to produce chromosome-scale assemblies (Alonge et al., 2019). Following the same methods as above, we masked repetitive elements from the genomes with repeatmasker (Smit et al., 2015a) and gene annotations with braker2 (Hoff et al., 2016(Hoff et al., , 2019.

| Comparative genomics with Asterids
orthofinder version 2.3.12 was used to identify orthologues in F. pennsylvanica and the five other species of Asterids used for WGD analysis. The multiple sequence alignment (MSA) mode was selected, which used mafft version 7.467 to infer gene trees and obtain sequence alignments (Emms & Kelly, 2019;Katoh & Standley, 2013).
The output of orthofinder was used to identify blocks of synteny between F. pennsylvanica and the other asterid species. Orthologues among F. pennsylvanica, O. europaea and C. canephora were selected by identifying orthogroups with a single gene member from each species (Staton et al., 2020). The command line version of circos, version 0.69-6, used these orthologous links to visualize synteny between the genomes (Krzywinski et al., 2009).

| Genetic linkage map
The Fraxinus pennsylvanica genetic linkage map based on 95 progeny (Wu et al., 2019) from the PE0028 × PE0248 cross was expanded by genotyping a further 160 F 2 individuals. This resulted in a consensus linkage map representing both parents composed of 4193 SNPs organized in 23 linkage groups representing the 23 haploid chromosomes ( Figure 1a; Table 1; Table S1; Löve, 1982). The total map length was 1675.9 cM with linkage groups ranging from 49.6 cM (LG16) to 104.5 cM (LG2), yielding an average marker density of 0.4 cM per SNP.

| Assembly and scaffolding
The reference genome was generated from PE00248, a male tree with EAB resistance and a parent of the cross used for genetic mapping Wu et al., 2019). The assembly utilized Illumina paired-end and mate pair reads, then underwent additional scaffolding with Hi-C data and the new genetic linkage map. This yielded a high-quality, chromosome-scale reference genome (version 1.4) with 23 primary sequences corresponding to the expected 23 haploid chromosomes and spanning 755 Mb (Table 2). An additional 87 scaffolds of 10 kb or more in length remain unplaced (totaling 2 Mb). The genome assembly has a total length of 757 Mb, representing 81.4%-85.1% of the total predicted length of 890-930 Mb estimated from flow cytometry . The 23 chromosome sequences range from 22.2 to 56.5 Mb in length and make up 99.7% of the total sequence content.
Genome quality was assessed with multiple methods. First, to assess the accuracy of the assembly of the original contigs and scaffolding into chromosome order, sequences for each marker from the high-resolution linkage map were aligned to the current assembly ( Figure 1b). Of the 4117 markers that aligned to the assembly, 4010 (97.4%) aligned to their expected linkage group on the map. Next, the proportion of the genome captured in the assembly was evaluated by mapping the original paired-end short reads to the final assembly. While the reference genome has fewer bases than the estimated genome length and a significant proportion of uncalled bases (12.1%), we found that over 89% of short read pairs map to the genome sequence. This indicates that a large majority of the genome is represented in the assembly, and the reduced length and number of Ns may be due to collapsed assembly of repetitive areas.
Finally, completeness was evaluated with busco to confirm the presence of expected orthologues (Table 2). Of the 1614 BUSCO groups searched, 97% were complete and present at least once in the genome. In total, 83.3% of these complete BUSCOs were single copy while 13.7% were duplicated. All BUSCOs were located on the 23 chromosomes. Overall, all evaluation metrics support that the assembly is largely complete and correctly scaffolded.
Gene annotation yielded an initial set of 53,977 gene predictions that were further filtered by structural and functional annotation to a set of 35,470 high-confidence gene models (Hart et al., 2020).
All high-confidence genes are located on the 23 chromosome sequences. The majority of high-quality genes were annotated with a sequence similarity match to a protein database (29,501) and assignment to an eggNOG gene family (35,085). In total, 29,408 genes were assigned at least one gene ontology (GO) term and 8495 have at least one pathway assignment from KEGG. Annotation of tRNA identified 723 candidate loci located on all 23 pseudomolecules as well as in unplaced scaffolds 26, 41, 74 and 97.

| rRNA characterization
The rRNA genes were annotated to identify the nucleolus organizer regions and compare the patterns to previously analysed Fraxinus species. In the green ash chromosomal assembly, rDNA sequences were found only on chromosome 1 at 17.59 Mb. The region includes the 5S, 5.8S and 25S genes and part of the internal transcribed spacer but is missing the 18S subunit. An rDNA sequence with all eukaryotic subunits (5S, 18S, 5.8S and 25S) was also identified on Scaffold_24 of the green ash assembly, which has not been placed in a chromosomal location. Scaffold_24 includes two markers from the genetic linkage map: "15188_21" from linkage group 18 and "15186_147" from linkage group 20. In both chromosome 1 and scaffold_24, the rDNA sequences are present in a single copy, indicating the tandemly repeated array of rDNA sequences was collapsed during assembly to a single copy. The tandem repeat nature of rDNA arrays makes them particularly difficult to assemble from short reads and accurately place along chromosomes (Tørresen et al., 2019). To gather more information about the location of the rDNA arrays, FISH using 18S/5.8S and 5S synthetic oligonucleotide probes was conducted on green ash chromosome spreads. We observed two 35S (one major and the other minor) loci and one 5S locus. These loci were located on two different chromosomes. The 5S site colocalized proximally to the major 35S but overlapped (or intermingled with the 35S) to a certain extent ( Figure 2). The minor 35S site is located terminally on a different pair of chromosomes.
Additional FISH mapping with chromosome-specific markers would be needed to confirm the final chromosomal positions.

| Characterization of whole genome duplications
Previous studies have suggested the occurrence of two recent WGD events shared across the family Oleaceae (Sollars et al., 2017;Unver et al., 2017). A recent asterid-wide phylogeny and WGD analysis placed an Oleaceae-specific WGD at around 35 million F I G U R E 1 High-density, consensus genetic linkage map of Fraxinus pennsylvanica. (a) Consensus genetic map of the PE0048 × PE0248 F. pennsylvanica cross composed of 4193 SNPs. Black tickmarks represent markers segregating either in female or male parent. Vertical scale on the left reflects map genetic distances in centimorgans (cM). Coloured scale in the bottom shows variation in marker density (cM per locus) across the linkage groups. (b) Alignment of sequence-based genetic markers from the high-resolution linkage map (bottom) to the chromosomes of the F. pennsylvanica genome assembly (top) years ago and an additional older WGD shared by the Oleaceae and Carlemanniaceae families at around 78 million years ago (Zhang et al., 2020). Evidence of these events were assessed in F. pennsylvanica through pairwise synonymous site divergence (K s ) plotting (Blanc & Wolfe, 2004). F. pennsylvanica shares a peak at K s = 0.25 with F. The remaining chromosomes have a more complex synteny pattern encompassing multiple chromosomes but generally only a few large detectable rearrangements. The one exception is the distal end of chromosomes 4 and 12 which are a mosaic of small blocks. The internal synteny pattern of the WGD is consistent with the locations of 182 of the duplicated BUSCOs, while 13 are locally duplicatedmeaning genes located on the same chromosome within 10 genes apart-and 33 did not appear to have been the result of WGD (Table   S3).

| Genomic analysis of a range-wide provenance trial
A reference genome assembly can facilitate population genetics studies, allowing all loci and genomic regions to be interrogated, inclusive of both neutral and adaptive alleles. To establish a baseline assessment of genetic diversity in green ash prior to the EAB infestation, we generated RADseq for a total of 95 accessions, 93 of which were selected to represent all 60 green ash populations in a provenance trial established in 1978 at Pennsylvania State University of ~2000 green ash trees from across the species' natural distribution in North America (Steiner et al., 1988). In addition, the two parent trees of the green ash genetic mapping family were included. From the RADseq data of the selected accessions, we identified 28,592 high-quality SNPs with  Plotting "pure" and "admixed" individuals on a map according to  As expected for a broad zone where hybridization has occurred between two widespread lineages, the "admixed" individuals had higher heterozygosity and population differentiation and lower inbreeding coefficients than did either the "pure northern" or the "pure southern" individuals (Table 4). Differentiation between all "pure northern" and "pure southern" individuals was greater than differentiation between either of these groups and the "admixed" individuals (Table 5).

| Association mapping for identification of candidate genes
A genome-wide association study (GWAS) was carried out for five traits in the selected accessions, taking both population structure (above) and relative kinship ( Figure S6) into account. SNPs were considered as significant markers if the false discovery rate (FDR) was <0.05. We performed marker-trait associations using the SUPER GWAS model and detected a total of 15 significant associations for selected traits ( Figure 5; Genomic regions with two adjacent windows of LD decay centred by significant SNPs were used to identify candidate genes.

F I G U R E 4
Population structure of the accessions. (a) structure results for genetic variation at K = 2 for 85 individuals representing 56 provenances. The coloured plot represents the estimates of Q (the estimated proportion of an individual's ancestry from each subpopulation). From left to right, the first 18 provenances (MO_177-OH_141) we classed as "pure southern," provenances 19-29 (IL_169-NY_373) we classed as "admixed" and provenances 30-56 (MI_293-Man_513) we classed as "pure northern." (b) Scatter plot principal component axis one (PC1) and axis two (PC2) based on genotype data of 85 samples. The x-axis is PC1 and explained 10.8% of the variation and y-axis is PC2 and explained 1.6% of the variation. Samples were coloured according to structure results. Orange, blue and green represent the "pure northern," "pure southern" and "admixed" sets of individuals, respectively, from structure analysis (i.e., Q). (c) Geographical distribution of the seed source provenance locations of the trees, with each location labelled with the same colouring scheme as in (b)  . A comparative analysis of these draft genomes between the resistant and susceptible species identified 53 candidate genes containing evidence of convergent evolution correlated to EAB resistance .

TA B L E 4 Genetic (SNP) variation statistics for the
Based on sequence similarity and the reference-guided scaffolding of F. excelsior, 51 of the 53 candidate genes were located in the F. pennsylvanica annotation (Table S6); four of these were removed from the final annotation during filtering. Based on the results described in Kelly et al., 2020, OG11720 underwent a start codon loss mutation in another F. pennsylvanica individual, which might account for its absence from the annotation. Functionally, all seven candidate genes associated with the phenylpropanoid biosynthesis pathway have at least one orthologue in F. pennsylvanica and 13 of the 15 candidate genes associated with herbivorous insect defence response were annotated as well. Candidate OG27080-involved in the phenylpropanoid pathway-was predicted to be nonfunctional in F. ornus, but the associated mutation is not present in the associated gene in F. pennsylvanica. OG11720, absent from this annotation, is predicted to play a role in defence response against herbivores. Two candidates-OG32176 and OG47560-appear to contain two copies in green ash located within the same chromosome; in the case of OG47560, the genes in green ash are within 1000 bp of each other. TA B L E 7 Comparison of Fraxinus excelsior genome statistics before (version 0.5) and after (version 0.7) referenceguided assembly and reannotation ( Figure 6; Table S7). While this strategy is unable to fully anchor all bases or to identify structural variations among the genomes, similarly to F. excelsior, it could provide much higher gene model quality by joining neighbour scaffolds. To test this, following assembly, we masked repeats and produced new gene annotations for each new reference-guided genome version (Table S8).

F I G U R E 6
Reference-guided assembly of Fraxinus taxa genomes. (a) Pairing of a phylogenetic tree of available genomes in the genus Fraxinus and a bar chart illustrating percentage placement of base pairs of the original genomes to the green ash genome. F. pennsylvanica's placement is in orange to denote phylogenetic location relative to other species. (b) Illustration of the total complete BUSCOs identified before and after RagTag scaffolding. Abbreviations: "Subsp. ang…" = subspecies angustifolia, "Subsp. syri…" = subspecies syriaca, and "Subsp. oxy…" = subspecies oxycarpa. The red bars indicate additional, complete BUSCOs detected after reference-guided scaffolding full set of 207,754 were placed into orthogroups based upon degree of sequence similarity. A total of 22,976 orthogroups were identified by orthofinder; of these orthogroups, 9882 had a member from all six query species, and 1376 of these were "single-copy," meaning that all six species had exactly one copy of the gene (Table S10). All species had 88%-91% of their genes assigned to an orthogroup, except for S. lycopersicum with only 84% (Table 8). The 5085 species-specific orthogroups-defined as any orthogroup containing only genes from a single species-varied in number between species, as did the total number of genes within these species-specific orthogroups ( Table 8).
The links among single-copy orthologues enable structural synteny to be examined at a macroscale. Using only the orthogroups containing one gene from each of the species, we identified strong regions of synteny between the 23 chromosomes of green ash and the 23 chromosomes of wild olive ( Figure S7) with most chromosomes showing one-to-one synteny. Only ash chromosomes 12 and 22 appear to have syntenic regions to two chromosomes in wild olive (Table 9). In contrast, green ash chromosomes tended to have syntenic blocks in more than one of the 11 chromosomes that comprise the genome of coffee, and these often occurred in 2-to-1 patterns ( Figure S8). These patterns fit well with previous results suggesting a shared Oleaceae-specific WGD in green ash and wild olive (Unver et al., 2017) but a lack of recent WGD in coffee (Denoeud et al., 2014).

| DISCUSS ION
Five ash tree species native to North America, including green ash, are now listed on the IUCN (The International Union for Conservation of Nature) Red List of Threatened Species as "Critically Endangered" due to the ongoing devastation of EAB (IUCN, 2017).
Rare individual green ash trees with moderate EAB resistance have been documented, and these individuals are being used as a critical foundation for conservation and restoration work (Koch et al., 2012 Supporting previous findings from European ash and wild olive genomes (Sollars et al., 2017;Unver et al., 2017), we found evidence for a WGD event shared by species in the family Oleaceae and confirmed through a K s plot and internal synteny analysis. The two copies of each original chromosome are largely syntenic within the green ash genome with some major rearrangements detected. By delineating these conserved, internal blocks of synteny within the green ash genome, we provide a strong foundation for designing genetic markers unique between the syntenic regions and future studies of gene loss and diversification across Fraxinus species after the WGD. In comparing the structure of the F. pennsylvanica genome to O. europaea by collinear order of orthologous genes, we observed a surprisingly high amount of structural conservation, with most chromosomes between the species having one-to-one synteny ( Figure   S7). In examining a more phylogenetically distant asterid, coffee (Coffea canephora), more extensive structural rearrangements were common, but large blocks of synteny were still identifiable ( Figure   S8). Both coffee and olive are interesting comparators to green ash, as both have ongoing international agricultural and genetic research focused on disease and insect resistance.
With EAB threatening North American ashes, ash dieback threatening European ash species, and resistance to both found in Asian ash species, there is a strong need to develop genomic and genetic resources for the entire genus Fraxinus. We have used the green ash genome to enhance 28 currently available ash genomes, spanning 23 species and six sections (Wallander, 2012). Guided scaffolding of contig-and scaffold-level genome assemblies, using the green ash genome as a reference, yielded partial chromosome-level assemblies. We have also provided updated repeat and gene annotations for the scaffolded genomes. These reference-scaffolded genomes have major limitations: they anchor only a portion of the contigs (ranging from 44% to 86%) and are unable to detect differences in genome architecture between species. However, with gene regions having a higher percentage identity across species and thus gene-rich regions being preferentially anchored, we were able to show significantly improved gene annotation after scaffolding. Until independently scaffolded genomes are available for these species, this new resource could improve analysis of transcriptome experiments and improve studies utilizing genetic markers by better contextualizing the marker's location relative to known genes and other genomic features. An annotated, chromosome-level green ash genome offers new directions in the efforts to combat the threat of the emerald ash borer. The main barriers to tree breeding efforts in species restoration are discovery of resistance to exotic pests, the long generation times of most forest trees, and the reconstitution of genetic diversity that is so crucial for tree populations to adapt to future disturbances. We conducted an initial range-wide assessment of genetic variation at the SNP level in green ash enabled by the new genome assembly and a provenance trial that was in Chr18 Chr16 Chr19 Chr09 Chr20 Chr21 Chr21 Chr08 Chr22 Chr05 (RC), Chr16

Chr23 Chr23
Note: RC indicates the chromosome is in the reverse complemented orientation in the wild olive genome (Unver et al., 2017).
of the species' range. This does not mean, however, that adaptive variation may not differ greatly among populations based on latitude, altitude or other environmental differences. A recent publication on variation in the timing and severity of EAB attacks across the same green ash provenance trial at Penn State (Steiner et al., 2019) reported that severity of infestation (density of adult emergence holes per unit bark area at death) was structured spatially in a pattern similar to Figure 4c here, with trees from southern populations succumbing to a smaller population of successfully reared insects than northern populations. This spatial variation was similar to our results from the structure and PCA analyses of the SNP data, which also suggested Northern and Southern subgroups overlapping along the Appalachian mountain range. Steiner et al. (2019) also reported that, among the trees from across the 36 populations sampled, family-within-population variation for emergence hole density was statistically significant (p = .02). No persuasive evidence was found, however, for within-population variation in infestation severity being related to mother-tree effects on survival time after initial infestation. Thus, we also took a GWAS approach to test for alleles in candidate genes that might be related to adaptive traits, including delayed mortality ("tolerance") after exposure to EAB. Based on the EAB-severity phenotypes reported by Steiner et al. (2019), we included 12 trees that were surviving ("lingering") in the provenance trial in 2017 among the 93 trees sampled from the trial for RADseq data generation.
We detected significant SNPs for budburst, leaf coloration in autumn, height and the post-EAB infestation lingering phenotypes.
These preliminary candidates require validation with larger sample sizes and other genotypes but point towards a fruitful future of genome-enabled research related to restoration of green ash and other threatened forest tree species. Mailander for their assistance in conducting the research. We thank Jill Hamilton for reviewing the genetic diversity data and assisting in interpretation.