Genome assembly provides insights into the genome evolution and flowering regulation of orchardgrass

Summary Orchardgrass (Dactylis glomerata L.) is an important forage grass for cultivating livestock worldwide. Here, we report an ~1.84‐Gb chromosome‐scale diploid genome assembly of orchardgrass, with a contig N50 of 0.93 Mb, a scaffold N50 of 6.08 Mb and a super‐scaffold N50 of 252.52 Mb, which is the first chromosome‐scale assembled genome of a cool‐season forage grass. The genome includes 40 088 protein‐coding genes, and 69% of the assembled sequences are transposable elements, with long terminal repeats (LTRs) being the most abundant. The LTRretrotransposons may have been activated and expanded in the grass genome in response to environmental changes during the Pleistocene between 0 and 1 million years ago. Phylogenetic analysis reveals that orchardgrass diverged after rice but before three Triticeae species, and evolutionarily conserved chromosomes were detected by analysing ancient chromosome rearrangements in these grass species. We also resequenced the whole genome of 76 orchardgrass accessions and found that germplasm from Northern Europe and East Asia clustered together, likely due to the exchange of plants along the ‘Silk Road’ or other ancient trade routes connecting the East and West. Last, a combined transcriptome, quantitative genetic and bulk segregant analysis provided insights into the genetic network regulating flowering time in orchardgrass and revealed four main candidate genes controlling this trait. This chromosome‐scale genome and the online database of orchardgrass developed here will facilitate the discovery of genes controlling agronomically important traits, stimulate genetic improvement of and functional genetic research on orchardgrass and provide comparative genetic resources for other forage grasses.


Introduction
Grasslands are an essential global resource for grazing and improving the environment and occupy over 25% of the land area of Earth (Afkhami et al., 2014;Jones and Pa sakinskien _ e, 2005;Shantz, 1954). Forage grasses are the most important constructive component of grasslands (Barnes et al., 1995). Orchardgrass (Dactylis glomerata L.) belongs to Pooideae in the Poaceae family and is one of the most important cool-season forage grasses cultivated worldwide. Indigenous to Eurasia and northern Africa, orchardgrass has been naturalized on nearly every continent and utilized as a pasture or hay grass (Hirata et al., 2011;Stewart and Ellison, 2010;Xie et al., 2015). As one of the top four economically important perennial forage grasses cultivated worldwide, orchardgrass is important for the production of foragebased meat and dairy throughout the temperate regions of the world (Wilkins and Humphreys, 2003). Orchardgrass is particularly attractive for these conditions because of its high biomass yields, high carbohydrate levels, shade tolerance and adaptability to abiotic stress (AnneMarteTronsmo, 1993;Turner et al., 2007;Volaire, 2003;Volaire et al., 2001). Heading date is a surrogate measure for flowering time and is strongly correlated with the yield and quality of forage grasses. Due to the widespread geographical distribution of orchardgrass, its heading date is quite variable, which makes it ideal for studying how flowering time is regulated (Bushman et al., 2012;Sheldrick et al., 1986).
In contrast to most other major crops, forage grasses are subjected to multiple harvests per growing season for herbage yield rather than a single harvest for grain yield, and they harbour extensive variation and valuable abiotic/biotic stress resistance genetic resources for crop improvement due to their good adaptability to the natural environment (Bertrand et al., 2010;Moore et al., 1962;Talukder and Saha, 2017). Molecular breeding is an important approach in improving the breeding efficiency of forage grasses, but advancements in this field are hampered by limited genetic resources (Moose and Mumm, 2008;Ribaut et al., 2010). Acquiring a high-quality reference genome for orchardgrass is paramount to strengthening the capabilities of molecular breeding and further promoting forage grass genetic and genomewide studies (Badouin et al., 2017;Brozynska et al., 2016;Huang et al., 2015;Nogu e et al., 2016;Schulman et al., 2017;Varshney et al., 2014;Yan et al., 2016). De novo assemblies of cool-season forage grasses have been limited by their large genome sizes (2-6Gb) with different ploidy levels (2-89), high heterozygosity and high repetitive sequence content (Hegde et al., 2000;Kawube et al., 2015). Currently, the only forage grass with an available and appreciable reference genome is perennial ryegrass (Lolium perenne L.), which was sequenced using a second-generation sequencing platform. However, its assembly quality (contig N50 = 16.37 kb; scaffold N50 = 70.06 kb) has limited its applications in functional genetic research on the species as well as on other forage grass species (Byrne et al., 2016).
Here, we report an assembly of the first chromosome-scale reference genome of diploid orchardgrass, representing the first publicly available genome assembly in a cool-season (C3) forage grass. Combining PacBio single-molecule real-time (SMRT) sequencing (Roberts et al., 2013), Hi-C chromosome-scale scaffolding, BioNano, 109 Genomics and Illumina short-read sequencing (Belton et al., 2012;Mascher et al., 2017), we show an orchardgrass reference genome of 1.84 Gb with a contig N50 of 0.93 Mb, a scaffold N50 of 6.08 Mb and a super-scaffold N50 of 252.52 Mb. Phylogenetic analysis reveals a common ancestor before~17.5-27.6 million years ago (Mya) between orchardgrass and three Triticeae species. One evolutionarily conserved chromosome was detected by analysing chromosome derivation in these grass species. A total of 76 orchardgrass germplasm accessions with different origins were resequenced to understand their population structure and genetic diversity. Their flowering mechanisms were analysed, and several key candidate genes in orchardgrass were identified by an integrative approach combining quantitative genetics, gene expression analysis, quantitative trait locus (QTL) analysis and bulked segregant analysis (BSA). Additionally, an online database for the orchardgrass reference genome with integrated annotations, gene blast results and transcriptomic data has been developed (https://orchardgrassge nome.sicau.edu.cn). The results of this study provide a chromosome-level reference genome assembly, an important resource with which to advance biological discovery and breeding efforts in orchardgrass, as well as comparative genetic resources for other forage grasses.

Genome assembly, quality validation and annotation
The genome of an orchardgrass genotype, '2006-1', was initially sequenced using the Illumina, 109 Genomics and PacBio platforms to generate the V1.0 assembly. This assembly comprised 1.78 Gb of sequences, with a contig N50 of 1.05 Mb and a scaffold N50 of 3.41 Mb, accounting for 91.75% of the estimated genome size (Table 1; Tables S1 and S2; Figures S1 and S2). Of the 1.78Gb of scaffold sequences, 1.67 Gb (93.82%) was anchored to seven super-scaffolds (chromosomes) using the Hi-C platform ( Figure S3; Tables S3 and S4; Figures S4 and S5; Appendix S1). The assembly was then elongated using BioNano to generate the V1.1 assembly with a contig N50 of 0.93 Mb and a scaffold N50 of 6.08 Mb, accounting for 94.84% (1.84/1.94) of the genome size. The chromosome anchoring to the seven superscaffolds was increased to 1.77 Gb (96.21%) by Hi-C assembly.
The completeness and base accuracy of the assembled orchardgrass genome were validated using BUSCO (Simão et al., 2017) and CEGMA . BUSCO showed that 96.7% of the 1440 single-copy plant orthologues were complete, and CEGMA showed that the assembled genome completely covered 231 (93.15%) of the 248 core eukaryotic genes (CEGs) and partially covered 13 of the CEGs. Less than 2% of the CEGs were not detected (Table S5). The draft assembly was further evaluated by mapping short high-quality reads to the genome assembly. The mapping rate was 99.62%, and the genome coverage was 99.66% (Table S6). A total of 53 836 publicly available expressed sequence tag (EST) sequences of D. glomerata were mapped to the genome with an identity >95%, and 49,017 (91.05%) of the sequences were mapped to the reference genome with more than 90% coverage (Table S7) (Bushman et al., 2011). High consistency between the Hi-C and BioNano results was also observed, suggesting a reliable assembly ( Figure S6). Collectively, these data indicated the high genome coverage of the orchardgrass assembly sequence.
A total of 40 088 protein-coding genes were identified, 91% of which had functional annotations and 32 577 (81.26%) of which had evidence of transcription (Tables S3 and S8-S11). We also identified 799 transfer RNAs, 17510 miRNAs, 633 small nuclear RNAs and 400 ribosomal RNAs (Table S12). The orchardgrass reference genome with integrated annotations, gene blast results and transcriptomic data has been uploaded to an online database (https://orchardgrassgenome.sicau.edu.cn/).

Evolution of transposable elements
In total, 68.56% of the assembled genome sequences were annotated as transposable elements (TEs), 63.64% of which were retrotransposons and 4.92% of which were DNA transposons (Table S13). Of the retrotransposons, long terminal repeats (LTRs) constituted the vast majority, accounting for 61.15% of the genome (96% of the LTRs). Gypsy and Copia were the two major LTR superfamilies, and the proportion of Gypsy LTRs (48.36%) was higher in orchardgrass than in eight other Poaceae species and Arabidopsis (Gordon et al., 2017;Initiative, 2000;Ling et al., 2018;Luo et al., 2017;Mascher et al., 2017;Paterson et al., 2009;Schnable et al., 2009;Yu et al., 2002;Zhang et al., 2012; Table 1 and Tables S13 and S14; Figure 1a). Similarly, compared to the other species, orchardgrass contained larger proportions of subfamilies Gypsy/Athila (9.32%) and Copia/Sire (2.06%) ( Table S15). Similar to the other species, orchardgrass contained LTR/TEs and DNA/TEs mainly distributed in gene flanking regions (3kb) ( Figure S7). The density of Gypsy family LTRs increased from the telomere to the centromere, while the Copia family was uniformly distributed along the seven chromosomes ( Figure 1c). In an analysis including eight Poaceae species, Arabidopsis and orchardgrass, we found a strong correlation between genome size and the proportion of TEs that were Gypsy and Copia LTRs ( Figure 1b). These two LTR families were predicted to be amplified 0-1.0 million years ago (Mya) in the orchardgrass genome (Figure 1d), and the amplification of LTR/Copia appeared to have happened before the amplification of LTR/Gypsy (Figure S8), which may have led to the large genome size of orchardgrass.
The LTR amplifications were estimated to have taken place during the Pleistocene epoch, including the most recent ice age, lasting from 2.58 Mya until 10 000 years ago (Figure 1d;Figure S8). During the Pleistocene epoch, freezing weather and limited global atmospheric CO 2 (180ppm) negatively impacted the growth of grasslands and other types of vegetation (Cerling, 1999). To survive during that time, most plants had to adapt to stressful abiotic conditions. As TEs become activated under stress, environmental stress likely led to the reorganization of plant genomes during this time period (Grandbastien, 1998), potentially facilitating adaptation to stressful environments in these species (Lisch, 2013;McClintock, 1983). We modelled the age of LTRs in six Poaceae species and found that the expansion of LTRs occurred earlier in orchardgrass than in rice but later than inBrachypodium distachyon and three Triticeae species, namely Hordeum vulgare (barley), Triticum urartu and Aegilops tauschii ( Figure 1d). Interestingly, the peak in LTR insertions corresponded to the order of species divergence, where orchardgrass diverged after rice from its common ancestor but before the three Triticeae species (Chen and Craven, 2007). Collectively, the LTR content and expansion time corresponded to the genome size and divergence time of grass species, suggesting that LTRs are involved in grass speciation.
The orchardgrass genome size, LTR insertion peak and divergence times were in between to those in rice and the Triticeae species included in the analysis (Table S14; Figures 1d and 2a). The chromosome synteny and derivation among these species showed interesting patterns. All seven chromosomes in orchardgrass corresponded strongly (~80%) to the 12 rice chromosomes (Table S16). Specifically, orchardgrass chromosome (CDgl) 4 and CDgl 7 were syntenic to rice chromosome (COsa) 1 and COsa 5 (Table S17), and two ends of CDgl 4 corresponded to the opposite ends in COsa 1 ( Figure S9). In A. tauschii chromosomes (CAta), over 50% of CDgl 3, 4, 6 and 7 had syntenic matches to CAta 2, 3, 7 and 1, respectively, indicating that these chromosome pairs were conserved after divergence of orchardgrass and A. tauschii. The results further suggested possible chromosome fusions in the species ancestral to orchardgrass or chromosome divergence in the species ancestral to rice.
To reveal chromosome rearrangements in orchardgrass, we used the approach describing grass karyotype (AGK) genes by Murat et al. (2017). A total of 11 401 orchardgrass AGK genes were identified, accounting for 28.44% of all genes, lower than the percentage in B. distachyon (47.47%) and rice (30.05%) and higher than that in A. tauschii (23.63%) and H. vulgare (16.37%) ( Table S18). The AGK gene composition of each CDgl wasmuch more complex than that in the other four species (Figure 3a). In particular, CDgl 4 and 6 contained AGK genes from two ancient chromosomes (AChrs), while the AGK genes in the other four CDgls were from more than two AChrs, suggesting possible extensive transposon accumulations or alterations of chromosomal localization during the speciation of orchardgrass. Specifically, each grass species comprised one evolutionarily conserved chromosome, of which almost all AGK genes came from ancient chromosome 1, such as AGK genes on COsa 1 and 5, B. distachyon chromosome (CBdi) 2, CDgl 4, H. vulgare chromosome (CHvu) 3 and CAta 3 ( Figure 3a). The conserved chromosomes from each grass species had a higher monocot-specific gene proportion than other chromosomes ( Figure 3b; Table S19), indicating that these evolutionarily conserved chromosomes contain genes that are essential for monocot species development and that these genes may have been protected from chromosome disturbance during the speciation of monocots.
To clarify when orchardgrass underwent whole-genome duplication, synonymous substitutions (ks) were characterized in rice, B. distachyon and orchardgrass. The peak ks was 0.5 for orthologous gene pairs between orchardgrass and rice and 0.3 between orchardgrass and B. distachyon (Figure 2b), indicating that a whole-genome duplication event occurred before the divergence of orchardgrass, rice and B. distachyon, with one duplication event approximately 64 Mya in orchardgrass (Figure 3c).

Gene family analysis
In the monophyletic group (orchardgrass, B. distachyon,barley, T. urartu, rice and A. tauschii), 8797 gene families were shared while 1170 gene families were specific to orchardgrass (Figure 2a,c). The gene families unique to orchardgrass were involved in starch, sucrose metabolism, fatty acid metabolism and nitrogen compound metabolic processes. This is not surprising, given the roles of these products in ruminant digestion of forage grass (Chamberlain et al., 1993;Daley et al., 2010;Tamminga et al., 1991). Hormone signal transduction, photosynthesis, plantpathogen interaction and ABC transport pathway gene families were also specifically detected in orchardgrass, which may contribute to development and resistance to biotic/abiotic stress (Kang et al., 2011;Tables S20 and S21).
Orchardgrass shared a common ancestor with three Triticeae species, and the lineages diverged between 17.5 and 27.6 Mya ( Figure 2a). Compared to the Triticeae species, orchardgrass contained 128 gene families that substantially expanded and 11 gene families that substantially contracted ( Figure 2a). The expanded families were enriched in four pathways: galactose metabolism, starch and sucrose metabolism, sesquiterpenoid and triterpenoid biosynthesis, and brassinosteroid biosynthesis (Tables S22 and S23). The families involved in galactose metabolism and starch and sucrose metabolism were the CELL WALL INVERTASE (CWINV) family (17 genes in orchardgrass versus seven genes in rice), ALDOSE 1-EPIMERASE (AEP) family (13 versus six) and GALACTINOL SYNTHASE (GOLS) family (10 versus two). The expansion of these families may contribute to the nutritional quality of orchardgrass and its development as a forage (Chamberlain et al., 1993;Tamminga et al., 1991;Table S24). Triterpenoids are a component of wax that are often related to drought resistance (Seo et al., 2011;Zhu and Xiong, 2013). In orchardgrass, there was a substantial expansion in sesquiterpenoid and triterpenoid biosynthesis genes (Table S24), where orchardgrass had more GERMACRENE D SYNTHASE (GDSY) genes than rice (eight vs. two). In addition, some families were enriched in the biosynthesis of brassinosteroids that may regulate lateral tiller formation in perennial forage grasses (Zaman et al., 2016). Among them, orchardgrass had more BRASSINOSTEROID INSENSITIVE (BRI) and BRASSINOS-TEROID-SIGNALLING KINASE (BSK) genes than rice (six vs two for BRI and six vs three for BSK; Table S24). Although there are The family members of TFs were compared among orchardgrass and five closely related Poaceae species (Table S25). The number of B3 family members was approximately three-to sevenfold higher in orchardgrass (385) than in other species, and most of them (90.39% or 348/385) were from the PRODUCTIVE MERISTEM (REM) family (Table S26). REM genes are related to vernalization, which is critical in perennial coolseason grasses such as orchardgrass (Mantegazza et al., 2014;Moser and Hoveland, 1996;Romanel et al., 2009). In orchardgrass, most REM genes were highly expressed specifically in flowers and spikes compared with other tissues, and all REM genes were expressed dynamically during the flowering process ( Figure S10a,b). Additionally, the expansion peak of the REM genes that occurred between 2 and 3Mya overlapped with the Pleistocene epoch beginning 2.58 Mya ( Figure S10c), indicating that the ice age conditions during the Pleistocene epoch might have contributed to REM gene expansion to optimize reproduction, allowing orchardgrass to adapt to stressful conditions. A higher density of TE/LTRs was detected in the downstream region of REM genes than in the other genes in orchardgrass, suggesting potential regulation of REM genes by transposons ( Figure S10d).

Population structure and diversity
To understand the genetic diversity and population structure of orchardgrass, we resequenced 76 diploid and autotetraploid accessions collected worldwide (Tables S27-S30). Three main clusters were generated in the phylogenetic tree based on the resequencing data ( Figure S11). The three clusters containing wild accessions corresponded to three geographical regions: Western Mediterranean (Cluster 1), Eastern Mediterranean/Central Asia (Cluster 2) and East Asia/Northern Europe (Cluster 3). As accessions from East Asia/Northern Europe were grouped into one cluster, they may have intercrossed historically despite a large geographical separation, possibly through trade routes between Asia and Europe, such as the Silk Road (Li et al., 2015). The group containing both wild and cultivated orchardgrass populations had a complex subpopulation structure ( Figure S12), which was likely a result of the outcrossing nature of orchardgrass (Xie et al., 2014). To eliminate biases in single nucleotide polymorphism (SNP) calling caused by mixed polyploids, only 43 autotetraploid genotypes were selected to accurately characterize the structure and diversity of the cultivars and wild materials. The autotetraploid cultivars and wild genotypes were not separated via principal component analysis (PCA) and phylogenetic analyses, and their genetic diversities were similar (Figures S13 and S14; Table S31), suggesting a short history of domestication and that domestication did not have a strong impact on the genetic diversity of orchardgrass (Casler et al., 2001;Xie et al., 2014).

The genomic basis of flowering regulation
Floweringtime is a critical trait related to environmental adaptation in higher plants (Simpson and Dean, 2002;Zhang et al., 2009). Heading date is a surrogate measure of flowering time and is strongly correlated with the yield and quality of forage grasses (Sheldrick et al., 1986;Bushman et al., 2012). In this study, 603 orthologues and paralogues in the orchardgrass genome were identified, corresponding to 210 flowering-related genes in the Arabidopsis thaliana flowering-time gene data set (Table S32; Bouch e et al., 2016). Of these, 85 orchardgrass orthologues and paralogues corresponding to 53 flowering-related genes were differentially expressed between early-and late-flowering lines, and 25 and five were detected in the vernalization and photoperiod pathways, respectively (Table S33). Several key flowering regulators such as the photoperiod gene CO1, vernalization genes VRN1 and VRN2, circadian clock gene LUX1 and flowering integrator FT paralogue were differentially expressed between early-and lateflowering lines, potentially contributing to the difference in heading date ( Figure S15a). Additionally, five FT orthologues might have undergone expansion during orchardgrass evolution, suggesting their essential roles in floweringtime ( Figure S15b). Based on these findings, we constructed a simplified flowering pathway in orchardgrass ( Figure 4; Drosse et al., 2014).
To identify candidate genetic regions and key regulators associated with heading date, we integrated QTL analysis and BSA with transcriptome expression-profiling data. The peak value for the transformed Δ(SNP index) localized to two regions spanning from 154.344 to 156.231 Mb and from 157.05 to 159.599 Mb on chromosome 6. Based on the QTL results, we also identified a major locus at 157.639 Mb (np6325) on chromosome 6 that overlapped with the BSA candidate regions (Figure 5a). Fine-mapping analysis identified a 4.426-Mb overlapped region on chromosome 6 that may harbour the major locus contributing to orchardgrass heading date. We scanned for nucleotide diversity, differentiation and linkage disequilibrium (LD) to determine whether this region was under selection. No significant difference in nucleotide diversity(p), F ST or LD was observed between wild and cultivated accessions, implying that this candidate region was not under selection ( Figure S16). The artificial domestication history of orchardgrass is relatively short in comparison with that of other forages, and extensive variation in flowering time may be attributed to adaptation to complex environments. After removing genes that were not expressed among the prevernalization, vernalization, postvernalization, preheading and heading stages, 30 candidate genes were predicted within this region (Figure 5b, Table S34). Polymorphism detection identified 6 nonsynonymous SNPs corresponding to 4 candidates, including one FT-like gene and three MADS-box genes, in the early-and late-flowering populations (Figure 5c). In previous reports, the MADS-box family was revealed to be a highly conserved gene family involved in flowering time, floral organ formation and inflorescence architecture (Gramzow and Theißen, 2015;SchilLing et al., 2018). In the orchardgrass reference sequence, we identified 94 MADS-box genes, including 58 type I and 36 type II genes (Gramzow and Theissen, 2010; Table S35). The MADS-box gene family was markedly expanded in the orchardgrass genome (Table S35) compared with other grass genomes, which likely drives the extensive variation in heading date and strong adaptability to environmental conditions of orchardgrass. To investigate the gene expression of these four candidates, comparative transcriptome analysis was performed between the early-flowering and late-flowering orchardgrass lines. Gene model DG6G02970.1 was the only significantly differentially expressed gene; this gene encodes the MADS-box gene AGL61like, which plays an essential role in pollen tube guidance and the initiation of endosperm development (Steffen et al., 2008). Mutants of the A. thaliana homologue AT2G24840.1 (AGA-MOUS-LIKE 61, AGL61) have a phenotype associated with female fertility reduction and defective central cells with abnormal morphology. AGL61-like showed higher expression among five critical flowering stages in the early-flowering line than in the late-flowering line ( Figure S17). Three nonsynonymous SNPs were identified in the AGL61-like gene, resulting in changes from alanine to valine, alanine to threonine and glycine to valine (Figure 4c). Thus, DG6G02970.1 might participate in flowering regulation of orchardgrass.
Weighted gene co-expression network analysis (WGCNA) was used to search for candidate genes that were associated with flowering regulators. A total of 8629 differentially expressed genes (DEGs) between early-and late-flowering lines were chosen as probes for WGCN construction, of which genes in three modules (pink, purple and green modules) were related to the vernalization response ( Figure S18, Table S36), including 5 CONSTANS-LIKE and 3 FT-LIKE genes. In cereal crops, VRN2 is a flowering repressor that is down-regulated by VRN1 (Andrew and Jorge, 2012). VRN2 is associated with a set of 176 genes in orchardgrass (magenta module) (Table S37). In this module, several known flowering genes were detected, including ARR9/3/ 1, CONSTANS/CONSTANS-LIKE, LHY and PRR37, which are involved in the circadian clock and photoperiod signalling pathways (Su arezl opez et al., 2001). The gibberellic acid (GA) and abscisic acid (ABA) pathway-related genes GA20ox1D, GA20ox2, PYL5 and ABI5were also identified, which have been shown to play critical functions in flowering modulation in A. thaliana Kim et al., 2014;Wang et al., 2013).
Remarkably, when analysing the gene expression in early-and late-flowering lines, many genes in this magenta module showed different expression profiles at the postvernalization stage (Figure S19). We further identified 38 DEGs between early-and lateflowering lines (Table S38), including genes involved in photosynthesis, chlorophyll catabolic process, sodium ion transport and hormone signal transduction. WGCNA revealed that DG6G02970.1 (AGL61-like) is associated with a set of 114 genes in the early-flowering line (Table S39). Gene Ontology (GO) term enrichment indicated that carbohydrate metabolic process genes were particularly enriched, and glycolysis/gluconeogenesis pathway genes were enriched in the Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis. Among the biological processes, four terms related to carbohydrate metabolic process and two terms related to response to oxidative stress were highly enriched. The need for a high level of carbohydrates for enhanced flowering has been demonstrated. Carbohydrate accumulation is related to the transition from vegetative growth to flowering (Kozłowska et al., 2007). Assuming a conserved function of AGL61-like in flowering regulation, we annotated genes that were differentially expressed inprevernalization stage versus postvernalization stage or preheading stage versus heading stage comparisons in the early-flowering line. This analysis identified a potential relationship between AGL61-like and the carbohydrate metabolic process. However, transgenic evidence needs to be provided to further confirm that the difference in heading date is caused by AGL61-like alone or the cooperation of AGL61-like and other co-expressed genes.

Discussion
Forage grasses are very important for feeding livestock. However, genetic research on these grasses is currently hampered by the lack of a reference genome, which is due to the very large size, high heterozygosity and repetitive sequences of the genomes of these species (Hegde et al., 2000;Kawube et al., 2015). Here, we assembled a high-quality reference genome sequence for orchardgrass with a contig N50 value of 0.93 Mb, a scaffold N50 of 6.08 Mb and a super-scaffold N50 of 252.52 Mb, which covered 94.85% of the estimated genome size. The quality of this reference genome was much higher than that of the latest published forage grass genome for perennial ryegrass in terms of the contig N50 (1637 kb) and scaffold N50 (7006 kb; Byrne et al., 2016) and is better than some recently sequenced genomes of crops such as pearl millet (Pennisetum glaucum L. Varshney et al., 2017), barley (Mascher et al., 2017) and T. urartu (Ling et al., 2018). The high quality of our assembly can be attributed to the use of the unique combination of PacBio SMRT sequencing (Roberts et al., 2013), new library construction with the 109 Genomics method (Goodwin et al., 2016), and BioNano (Sta nkov a et al., 2016) with chromosome-scale scaffolding via Hi-C (Belton et al., 2012). The latter two technologies were key to resolving the linear order of scaffolds on the chromosomes (Belton et al., 2012;Sta nkov a et al., 2016;Zhang et al., 2018). The orchardgrass genome sequence provides an important resource for future molecular breeding and evolutionary studies.
Forage grass is a principal group of Poaceae grasses (Gibson, 2009), but the performance of forage grass in the evolutionary history of Poaceae is still obscure. In this study, orchardgrass was found to have diverged after rice and before three Triticeae species (H. vulgare, T. urartu and A. tauschii) that seem to have common ancestors with orchardgrass. This phylogenetic relation potentially corresponds to the genome size and LTR expansion time of orchardgrass, which were intermediate to those of rice and the three Triticeae species (Table S14; Figures 1d and 2a). Evolutionarily conserved chromosomes were also detected by analysing ancient chromosome rearrangements in these grass species, such as AGK genes on CDgl 4 corresponding to COsa 1, COsa 5, CHvu 3 and CAta 3 (Figure 3a). Thus, orchardgrass genome information will help clarify the evolutionary processes in Poaceae species, and it provides primary knowledge of the evolutionary status of forage grass among major crops.
Orchardgrass has a widespread distribution and good adaptation to many natural environments, which can provide important abiotic/biotic stress resistance genetic resources, aiding in the genetic improvement of rice and Triticeae species.
In all of the plants investigated, TEs comprised the vast majority of all DNA. The activation of TEs frequently causes their duplication and insertion, leading to an increase in genome size (Levin and Moran, 2011). Most contributions to genome size were made by a class of mobile DNA sequences called retroelements, primarily LTR retrotransposons (LTR-RTs;SanMiguel et al., 1996;Vicient et al., 1999). Waves of expansion and contraction in numbers of TEs can induce deletions, inversions, translocations and other rearrangements in chromosomes (Yu et al., 2011). In addition to these gross effects on the overall architecture of genomes, genome restructuration mediated by TE activity is also essential for the stress response of hosts, facilitating the adaptation of species to changing environments (McClintock, 1983). Evidence from rice suggests that the overall number of stressinduced genes can be increased by TE activity to help rice adapt to stress (Lisch, 2013). In the present study, LTR-RTs accounted for 59.42% of the orchardgrass genome (Table S13; Figure 1a). The insertion number of LTR-RTs reached a peak between 0 and 1 Mya in the Pleistocene (or ice) age, lasting from 2.58 Mya until 10 000 years ago. During the Pleistocene epoch, the large grasslands and savannas of North America expanded and contracted many times. However, during periods of maximum glacial extent, the freezing weather and limited global atmospheric CO 2 (180 ppm) seriously affected the growth and development of grasslands as well as trees, shrubs and other types of vegetation (Cerling, 1999). To survive during this cold period, plants had to adjust to the novel conditions through molecular or phenotypic plasticity (Nicotra et al., 2010). Therefore, the expansion of LTR-RTs in orchardgrass might be a strategy to confront extreme environmental conditions.
Flowering is a key event in the plant life cycle. Variation in flowering time is a salient feature in the evolution, adaptation and domestication of the grass family (Poaceae).The high-quality orchardgrass reference genome helps identify flowering-related homologous genes and additional candidates underlying flowering regulation. This orchardgrass genome and its companion resources will provide resources for Poaceae evolution and diversity studies and allow diploid orchardgrass to serve as a model for studying other forage grass species. The reference genome and large set of SNP markers will accelerate markerfacilitated trait mapping through genomewide association studies and genomic selection of orchardgrass. The orchardgrass genome sequence and online database will support crop improvement efforts and help identify additional candidate genes underlying biotic and abiotic stress resistance and regulatory pathways controlling growth, flowering, seed production and regeneration in tissue culture-all of which are important traits for sustained agricultural production and meeting the demands for human consumption.

Experimental procedures
Sample collection for genome sequencing

DNA extraction and library preparation
Genomic DNA was extracted from young 2006-1 leaves using a DNAsecure Plant Kit (TIANGEN, Beijing, China). For PacBio Sequel sequencing, a 20-kb-insert-size SMRTbell library was prepared following the manufacturer's protocol (PacBio, CA). For Illumina (San Diego, CA) short-read sequencing, libraries were sizeselected for PE150 sequencing. Sequencing libraries with insert sizes ranging from 250 to 350 bp were constructed and sequenced using an Illumina HiSeq X Ten platform at the Novogene Bioinformatics Institute, Beijing.
The GEM reaction and library preparation for 10X Genomics sequencing were conducted using 1ng of input DNA that was size-selected to have an approximately 50-kb length. Libraries were barcoded and paired-end-sequenced with the Rapid method on an Illumina HiSeq X Ten platform.

Genome assembly
We constructed a de novo assembly of the 2006-1 genome by combining sequences from four different technologies: Illumina PE150 short-read sequencing, PacBio Sequel long-read sequencing, 109 Genomics contig spanning and Hi-C conformational alignment ( Figure S1).
De novo assembly of the long reads from SMRT sequencing was first performed using FALCON (v3.0) (https://github.com/Pac ificBiosciences/FALCON/) and FALCON-Unzip (Chin et al., 2016). Initially, the 55 subreads with the greatest coverage were selected as seed reads to correct for error. The error-corrected reads were aligned to each other and assembled into genomic contigs using FALCON, with the length_cutoff_pr = 5000, max_diff = 120 and max_cov = 130 parameters. After the initial assembly, FALCON-Unzip was used to produce primary contigs (p-contigs), which were polished using Quiver (Chin et al., 2013). Subsequently, BWA-MEM was implemented to align the 109 Genomics data to the assembly using the default settings (Li, 2014). Scaffolding was performed by FragScaff with the barcoded sequencing reads (Adey et al., 2014;Appendix S1).
For construction of a BioNano genome map, healthy young leaves of D. glomerata were prepared, and high molecular weight DNA isolation, sequence-specific labelling of megabases of gDNA by nicking, labelling, repairing and staining (NLRS), and chip analysis were performed according to the manufacturer's instructions (BioNano Genomics). The enzyme Nt.BspQI with an appropriate label density (14.5 labels per 100 kb) was selected and applied to digest long-range DNA fragments. After filtering the molecules with a cut-off at a minimum length of 150kb, 212Gb of BioNano mapping molecules with an average length of 305.39kb was collected. Then, the RefAligner and Assembler programs in Solve tools (https://bionanogenomics.com/sup port/software-downloads?_sft_download-type=saphyr) were used to assemble these BioNano molecules, resulting in consensus maps with a total length of 2.58 Gb and an N50 length of 1.55 Mb. These consensus maps were then used to join the assembled scaffolds to form super-scaffolds.
Two Hi-C libraries were prepared as described previously (Lieberman-Aiden and Dekker, 2009). The de novo PacBio assembly and Hi-C library reads were used as input data for further assembly using HiRise, a pipeline designed specifically for assembling the scaffold genome using proximity ligation data (Putnam et al., 2016). Hi-C library sequences were aligned to the draft input assembly using a modified SNAP read mapper (http:// snap.cs.berkeley.edu; Zaharia et al., 2011). The separations of Hi-C read pairs that mapped within draft scaffolds were analysed by HiRise to generate a likelihood model for genomic distance between read pairs, and the model was used to identify and break putative mis-joins, to score prospective joins and to select joins above a threshold (Appendix S1).
To evaluate the quality of the V1 assembly, we compared the V1 assembly to BioNano super-scaffolds using NUCmer in the MUMmer package (Delcher et al., 2002). Then, we drew a dot plot using mummerplot in the same package with default parameters.

Constructing gene families
The protein sequences from A. thaliana, Populus trichocarpa, rice, S. bicolor, Z. mays, S. italica, B. distachyon, H. vulgare, T. urartu, A. tauschii, Elaeis guineensis and Musa acuminata were downloaded from Phytozome 12 (https://phytozome.jgi.d oe.gov/pz/portal.html) and the NCBI (https://www.ncbi.nlm.nih. gov/). Across the species that were included, when multiple transcripts were present in one gene, only the longest transcript in the coding region was included in further analysis. Additionally, genes encoding proteins with fewer than 50 amino acids were removed. The filtered blast results were obtained between all species' protein sequences through BLASTP with an E-value of 1e-5. Protein sequences from all 13 species were clustered into paralogous and orthologous groups using OrthoMCL (http:// orthomcl.org/orthomcl/) with an inflation parameter equal to 1.5.

Phylogenetic tree reconstruction
Protein sequences from single-copy gene families were aligned using MUSCLE (Edgar, 2004), and the alignments of each gene family were concatenated to a super-alignment matrix. A phylogenetic tree was constructed using RAxML (http://sco.h-its. org/exelixis/web/software/raxml/index.html) with the maximumlikelihood method and a bootstrap value of 100, where A. thaliana and P. trichocarpa were designated as outgroups. A Venn diagram was constructed to display the number of gene families that were shared among six Poaceae species (orchardgrass, B. distachyon, H. vulgare, T. urartu, rice and A. tauschii) clustered into one group of the phylogenetic tree.

Species divergence time estimation
The MCMCTree program (http://abacus.gene.ucl.ac.uk/softwa re/paml.html) was implemented in Phylogenetic Analysis with Maximum Likelihood (PAML) to infer the divergence time of the nodes on the phylogenetic tree. The MCMCTree parameters were as follows: a burn-in of 10 000 steps, sample number of 100 000 and sample frequency of 2. The following calibration times of divergence were obtained from the TimeTree database (http:// www.timetree.org/): 120.0-155.8 Mya for A. thaliana and rice, 105.0-124.7 Mya for riceand M. acuminata, Mya for S. italica and S. bicolor.

Expansion and contraction of gene families
The expansion and contraction of gene families were determined by comparing the cluster size differences between the ancestor and each species using the CAF E (v3.1) program (Han et al., 2013). A random birth-and-death model was used to evaluate changes in gene families along each lineage of the phylogenetic tree. A probabilistic graphical model (PGM) was used to calculate the probability of transitions in each gene family from parent to child nodes in the phylogeny. Using conditional likelihoods as the test statistics, we calculated the corresponding P-values of each lineage, and a P-value of or below 0.05 was considered significant.
To investigate the genes involved in the galactose metabolism, starch and sucrose metabolism, sesquiterpenoid and triterpenoid biosynthesis, and brassinosteroid biosynthesis pathways, genes involved in these processes in A. thaliana and B. distachyon were downloaded from the NCBI (https://www.ncbi.nlm.nih.gov/; Cao, 2015;Clouse, 2008;Gross and Pharr, 1982;Zheng et al., 2014).Using these homologues as queries, the candidate genes in D. glomerata were identified by BLASTP with an E-value cut-off of 1e-5. The aligned hits with at least 50% coverage of the seed protein sequences and >50% protein sequence identity were designated homologues. Protein domains of these homologues were predicted by Pfam (http://pfam.xfam.org/). Only the genes with the same protein domain were considered homologues.

Genome synteny and whole-genome duplication
A homologue search within the orchardgrass genome was performed using BLASTP (E-value < 1eÀ5), and MCScanX was used to identify syntenic blocks within the genome. For each gene pair in a syntenic block, ks values were calculated, and values of all gene pairs were plotted to identify putative whole-genome duplication events within D. glomerata. The molecular clock rate (r) was calculated to be 6.96 9 10 À9 substitutions per synonymous site per year. The duplication time was estimated using the formula ks/2r (Moniz de Sa and Drouin, 1996). The syntenic blocks between chromosomes were visualized using Circos (Krzywinski et al., 2009).

SNP calling
To identify SNPs found in different orchardgrass accessions, 76 accessions were used to generate high-quality paired-end reads, and the reads were mapped to the orchardgrass reference genome using the Burrows-Wheeler Aligner (BWA) (Li and Durbin, 2009). The alignment results were converted to BAM files using SAMtools (Li and Durbin, 2009 in the package SAMtools, and only high-quality SNPs (coverage depth ≥6, root mean square (RMS) mapping quality ≥20, minor allele frequency (maf) ≥ 0.01 and misses ≤0.2) were kept for subsequent analyses.
To eliminate biases in SNP calling caused by mixed polyploids, SNPs were called for the 43 autotetraploid genotypes at the population level by using GATK (Mckenna et al., 2010), and only high-quality SNPs (coverage depth ≥15, RMS mapping quality ≥20, maf ≥ 0.05 and misses = 0) were kept for subsequent analyses.

Phylogenetic tree and population structure
A method based on the diploid model was used to build a phylogenetic tree for wild and cultivated genotypes with a mixture of diploid and autotetraploid individuals, a method that has been successfully applied in other polyploid plants (Hirsch et al., 2013;Lu et al., 2013). An individual-based neighbour-joining (NJ) tree was constructed using TreeBeST v1.9.2 (Vilella et al., 2009) with 1000 bootstraps. The population genetic structure was examined via Admixture 1.23 (Alexander et al., 2009), and the number of assumed genetic clusters K ranged from 2 to 6, with 10 000 iterations for each run. To clarify the phylogenetic relationships of the 43 autotetraploid genotypes from a genomewide perspective, an individual-based NJ tree was constructed using TASSEL 5.0 (Bradbury et al., 2007). PCA and diversity (PiPerBP) estimation were performed in TASSEL 5.0.

Identification of genes that regulate flowering time
Genes that regulate flowering time are often conserved across divergent species (Bl€ umel et al., 2015). Genes that regulate flowering time in A. thaliana were retrieved from a recently developed database, FLOR-ID20 (FLOR-ID: an interactive database of flowering-time gene networks in A. thaliana), which includes 295 protein-coding genes. Using the A. thaliana homologues as queries, the putative orthologous candidate genes in orchardgrass were identified by BLASTP with an E-value cut-off of 1e-5. If these genes were in common families in OrthoMCL, then their protein domains were predicted by Pfam (http://pfam.xfam.org/). Only genes that had the same protein domain as X were considered orthologous to the A.thaliana genes.

Transcriptome analysis
Clean data were obtained by removing reads containing adapter and poly-N sequences and low-quality reads from the raw data. High-quality reads were then mapped to the draft reference genomes by TopHat2 (Kim et al., 2013) with the parametersmax-intron-length 500 000, -read-gap-length 10, -read-edit-dist 15, -max-insertion-length 5 and -max-deletion-length 5. The expression level (reads per kilobase of transcript per million mapped reads (RPKM) value) of each protein-coding gene was calculated by HTSeq (Anders et al., 2015) using default parameters. DESeq2 (Anders and Huber, 2010) was used to normalize gene expression (BaseMean) in each sample and to identify DEGs for each group that was compared, using 'P-adj (adjusted Pvalue) < 0.05' as the threshold. All DEGs were mapped to GO terms in the GO database (http://www.geneontology.org/). The significantly enriched GO terms were selected by using a hypergeometric test to develop hierarchical clusters of a sample tree by Euclidean distance using topGO (Young et al., 2010). To further clarify the biological functions of DEGs, a pathway-based analysis was conducted using the KEGG database (http://www. genome.jp/kegg). Pathways with q-values < 0.05 were considered significantly enriched. Log2-normalized RPKM values were used to generate co-expression networks using the WGCNA package in R (Langfelder and Horvath, 2008). Gene structure analysis was performed by using the TAPIS pipeline. Mapping of high-quality PacBio reads and identification of alternative splicing (AS) events were performed by GMAP with default settings (Abdelghany et al., 2016; Tables S40-S42).

Bulked segregant analysis
To identify SNPs of genes involved in flowering time, 29 full-sib individuals from an F 1 mapping population of 213 lines were used for QTL sequencing (Zhao et al., 2016). SNPs that were homozygous in one parent and heterozygous in the other parent were prioritized and extracted from the 'vcf' output files. The homozygous genotype of the parent was used as the reference to calculate the number of reads of this parent's genotype in the individuals in the offspring pools. The ratio of reads harbouring the SNP that was different from the reference sequence was calculated as the SNP index of the base site. Sliding-window methods were used to present SNP indexes across the whole genome. The SNP index for each window was calculated as the average of all SNP indexes in the selected window of the genome. The window size was set as 1 Mb, and the step size was set as 1 Kb. The difference in the SNP index of the two pools, namely one earlier flowering pool and one later flowering pool, was calculated as the transformed D(SNP index).

Data tax
The orchardgrass genome has been deposited under BioProject accession number PRJNA471014. PacBio and Illumina raw reads, resequencing sequence reads and Hi-C data have been deposited in the Sequence Read Archive (SRA) under study accession number SRP150286. Flowering RNA-seq data have been deposited under SRA accession numbers SRR5341102 and SRP131899.

Supporting information
Additional supporting information may be found online in the Supporting Information section at the end of the article.

Figure S1
The orchardgrass genome landscape. Figure S2 Workflow of the orchardgrass genome assembly. Figure S3 K-mer frequency distributions in orchardgrass. Figure S4 Scaffold Hi-C contact map data analysis.  Figure S6 Consistency between the Hi-C and BioNano results. Figure S7 The density of TEs surrounding genes. Figure S8 The distribution of divergence time for LTRs/Gypsy and LTRs/Copia. Figure S9 Synteny analysis of seven chromosomes from orchardgrass (Dgl) to twelve chromosomes from O. sativa (Osa) and seven chromosomes from A. tauschii (Ata). Figure S10 REM family in orchardgrass. Figure S11 Phylogenetic tree of 76 orchardgrass accessions. Figure S12 Structure analysis of 76 orchardgrass accessions with different K values. Figure S13 PCA plot of the first two components (PC1 and PC2) of 43 autotetraploid orchardgrass accessions. Figure S14 Phylogenetic tree of 43 autotetraploid orchardgrass accessions. Figure S15 Analysis of important flowering-related orthologues in orchardgrass. Figure S16 Nucleotide diversity (p) estimated in wild (red) and cultivated (green) orchardgrass (a) and the FST value (b) and patterns of LD in cultivated (c) and wild (d) orchardgrass in the 4.426-Mb region of orchardgrass chromosome 06. Figure S17 Comparison of AGL61 expression during the five developmental stages in orchardgrass. Figure S18 Module-sample relationship. Figure S19 Expression pattern of genes in green, pink and purple modules. Table S1 Estimation of genome size. Table S2 Sequencing libraries and statistics of the data used for the genome assembly. Table S3 Characteristics of orchardgrass assembly containing 7 chromosomes. Table S4 SNP location and annotation of assembled orchardgrass genome. Table S5 Evaluation of Benchmarking Universal Single-Copy Orthologs (BUSCO) and gene space coverage using core eukaryotic gene mapping approach (CEGMA) in orchardgrass genome.  Table S9 Summary for annotation of predicted protein-coding genes in the orchardgrass genome assembly. Table S10 The mapping information of transcriptome based on PacBio platform. Table S11 Mapping summary of RNA-seq data to the orchardgrass genes. Table S12 Noncoding RNAs in the assembly of orchardgrass. Table S13 The classification of transposons in orchardgrass genome.
Table S14 Plant genome size and proportion of TEs in the genome. Table S15 Statistics of subgroups in the Copia/Gypsy superfamily (genome ratio %). Table S16 The ratio of every seven chromosomes in orchardgrass (Dgl) genome corresponds to Aegilops tauschii (Ata) and Oryza sativa (Osa) genomes. Table S17 The ratio of orchardgrass (Dgl) genome corresponds to ratio of Aegilops tauschii (Ata) and Oryza sativa (Osa) genome. Table S18 The number of AGK genes and their proportion to all genes in five grass species. Table S19 The number of monocot-specific genes and their proportion of all genes in five grass species. Table S20 GO analysis for the unique gene families in orchardgrass. Table S21 KEGG pathway of unique families in orchardgrass. Table S22 GO analysis for the expanded gene families in orchardgrass. Table S23 KEGG pathway of expanded families in orchardgrass.