Genetic mapping of biomass yield in three interconnected Miscanthus populations

Improving biomass yield is a major goal of Miscanthus breeding. We conducted a study on one interspecific Miscanthus sinensis × Miscanthus sacchariflorus F1 population and two intraspecific M. sinensis F1 populations, each of which shared a common parent. A field trial was established at Urbana, IL during spring 2011, and phenotypic data were collected in 2012 and 2013 for fourteen yield traits. Six high‐density parental genetic maps, as well as a consensus genetic map integrating M. sinensis and M. sacchariflorus, were developed via the pseudotestcross strategy for noninbred parents with ≥1214 single‐nucleotide polymorphism markers generated from restriction site‐associated DNA sequencing. We confirmed for the first time a whole‐genome duplication in M. sacchariflorus relative to Sorghum bicolor, similar to that observed previously for M. sinensis. Four quantitative trait locus (QTL) analysis methods for detecting marker‐trait associations were compared: (1) individual parental map composite interval mapping analysis, (2) individual parental map stepwise analysis, (3) consensus map single‐population stepwise analysis and (4) consensus map joint‐population stepwise analysis. These four methods detected 288, 264, 133 and 109 total QTLs, which resolved into 157, 136, 106 and 86 meta‐QTLs based on QTL congruency, respectively, including a set of 59 meta‐QTLs common to all four analysis methods. Composite interval mapping and stepwise analysis co‐identified 118 meta‐QTLs across six parental maps, suggesting high reliability of stepwise regression in QTL detection. Joint‐population stepwise analysis yielded the highest resolution of QTLs compared to the other three methods across all meta‐QTLs. Strong, frequently advantageous transgressive segregation in the three populations indicated a promising future for breeding new higher‐yielding cultivars of Miscanthus.


Introduction
Miscanthus is a promising biomass crop for production of lignocellulosic ethanol and bioproducts in the USA and Europe (Somerville et al., 2010). Currently only a single sterile triploid genotype of M. 9 giganteus (M9g) (3n = 57), which was derived from an interspecific cross between Miscanthus sinensis (2n = 38) and Miscanthus sacchariflorus (4n = 76), is commercially grown for bioenegy (Hodkinson et al., 2002;Głowacka et al., 2014). We refer to this genotype as M9g '1993-1780' in reference to its accession number in the Kew Living Collection (Hodkinson & Renvoize, 2001). Because this commercial genotype of M9g '1993-1780' is a sterile triploid, it cannot be used as a parent to improve the yield potential of Miscanthus. New cultivars with better adaptation to target biomass-production environments are needed (Sacks et al., 2013;Clark et al., 2014). Our laboratory and others are currently working to breed new M9g cultivars that out-perform the current commercial genotype (Purdy et al., 2015), and such efforts would greatly benefit from marker-assisted selection.
Mapping the Miscanthus genome for quantitative trait loci (QTLs) associated with valuable traits will enable marker-assisted selection in this crop. As Miscanthus is an obligately outcrossing species due to self-incompatibility, genetic mapping has typically been conducted using the pseudotestcross strategy on F 1 progeny (Grattapaglia & Sederoff, 1994). The first M. sinensis genetic map was published by Atienza et al. (2002) using 257 randomized amplified polymorphic DNA (RAPD) markers, which resulted in 28 linkage groups (LGs), rather than the expected haploid chromosome number 19; four QTL studies focusing on yield, yield components and combustion quality were then conducted on this population (Atienza et al., 2003a,b,c,d). Kim et al. (2012) published two SSR-based parental maps using an interspecific cross population derived from M. sacchariflorus 'Robustus' and M. sinensis, but these maps were also incomplete due to insufficient genetic markers. In recent years, three complete genetic maps have been published for M. sinensis using next-generation sequencing technology. Ma et al. (2012) constructed a high-resolution complete genetic map of M. sinensis with 19 LGs for the first time. Swaminathan et al. (2012) constructed an integrated map for M. sinensis with 658 singlenucleotide polymorphism (SNP) and 210 SSR markers which resolved into the 19 expected LGs. Gifford et al. (2015) performed a QTL study for biomass productivity using the M. sinensis genetic map of Swaminathan et al. (2012), based on a cross between two ornamental M. sinensis cultivars of southern Japanese provenance , and identified 72 QTLs for thirteen traits. Using restriction site-associated DNA sequencing (RAD-seq), Liu et al. (2015) developed a high density composite genetic map as well as two parental maps for an M. sinensis F 1 mapping population and revealed inheritance of leaf zebra stripe. Previous genetic mapping studies have concentrated primarily on M. sinensis. An incomplete genetic map of M. sacchariflorus (40 LGs instead of the expected 19) was described by Kim et al. (2012) but to the best of our knowledge, no complete M. sacchariflorus genetic map has yet been published.
All prior QTL analyses in Miscanthus have relied on single biparental populations. Populations derived from a multicross design sharing one common parent, however, can have greater power for QTL detection through joint linkage analysis, and provide a broader genetic base for QTL discovery and allele-mining (Holland, 2007;Yu et al., 2008). The power of joint linkage analysis has been demonstrated in maize NAM populations for flowering time (Buckler et al., 2009), leaf architecture (Tian et al., 2011), inflorescence architecture , disease resistance (Negeri et al., 2011;Poland et al., 2011) and kernel colour (Chandler et al., 2012). Moreover, joint linkage analysis was proven to be effective even in a small-scale Arabidopsis NAM population (Li et al., 2011).
In this study, we used RAD-seq to develop high density parental maps for three mapping populations, as well as a high density consensus map encompassing M. sinensis and M. sacchariflorus. For fourteen biomass traits, four QTL analysis methods were evaluated as follows: individual parental map analysis with both composite interval mapping (CIM) and stepwise analyses, consensus map single-population stepwise analysis and consensus map joint-population stepwise analysis. The objectives of this study were to (1) develop a high-density consensus genetic map for M. sacchariflorus and M. sinensis, (2) identify QTLs associated with biomass traits and dissect allelic effects, (3) investigate repeatability of detected QTLs and (4) compare the relative merits of the four QTL analysis methods.

Plant populations, experimental design and field management
Three interconnected diploid F 1 populations were developed using M. sinensis ssp. condensatus 'Cosmopolitan' as a common parent (i.e. each full-sib family was part of a single half-sib family; Table 1). Two of the other parents, M. sinensis 'November Sunset' and M. sinensis 'Silberturm', were the same species as 'Cosmopolitan', but the fourth parent, M. sacchariflorus 'Robustus', was a different species. The M. sacchariflorus 'Robustus' parent in this study is identical to the genotype used by Kim et al. (2012). All of the parents were diploid ornamental cultivars. The provenance of the M. sinensis parents was previously determined to be from central and southern Japan . Although 'Cosmopolitan' has variegated leaves with white longitudinal stripes, all of its progeny in this study had completely green leaves. In total, 652 progeny were studied (Table 1). The crosses were made in isolated greenhouse bays at the University of Illinois Urbana-Champaign in 2010. A common set of thirteen Miscanthus genotypes, including the four parents and M9g '1993-1780', were included in the field trials of each population to test potential environment effects and genotype by environment interaction effects. Each progeny and control genotype were vegetatively Table 1 Diploid F 1 Miscanthus populations (A-C) evaluated in this study. Note that the parent 'Cosmopolitan' is common to all three full-sib families propagated by dividing rhizomes and establishing ramets in 36-cell plug trays (PL-36-STAR*; T.O. Plastics, Clearwater, MN, USA) and grown in a greenhouse.
Three field trials (one per population) were established on June 15-21, 2011 at the University of Illinois Energy Farm in Urbana, IL (40°3 0 57″ N, 88°11 0 43″ W; USDA hardiness zone 6, avg. min. temp.: À23.3°C). For each population and 13 in-common control genotypes, a field trial was arranged in a randomized complete block design with three replications. Because the three field trials were not conducted on a single contiguous area of land, we refer to them as locations when describing statistical analyses to compare the performance of the control genotypes over the different trials on the Energy Farm. Each plot contained one plant. Spacing between and within rows was 1.5 m. Irrigation was applied as needed during the first year to ensure good establishment and discontinued in subsequent years. Soil type was Flanagan silt loam. In the spring of each year, 100.8 kg N ha À1 was applied. To control weeds, Atrazine at 2.8 kg ha À1 , S-metolachlor at 1.5 L ha À1 and 2,4-D at 1.8 L ha À1 were applied. Hand-weeding was conducted as needed.

Phenotypic data analysis
The evaluation system to obtain phenotypic trait data is described in Table 2. In both 2012 (year 2) and 2013 (year 3), phenotypic data of 12 traits, including basal circumference (BC), compressed circumference (CC), ratio of compressed circumference to basal circumference (CC/BC), culm dry weight (CmDW), diameter of basal internode (DBI), diameter of topmost internode (DTI), culm volume (CmVol), culm density (CmDen), culm node number (CmNN), plant height (Ht), yield per plant (YPP) and yield per footprint (YPF) were obtained for all three populations. Axillary bud number was also recorded in 2012 and 2013, but only for MapA because only this population segregated for the trait. An additional six flowering time traits, including first and 50% flagging date, heading date and flowering date, were collected in 2013 for all three populations. Because the six flowering traits were highly correlated (≥0.80), principal component analysis (PCA) was conducted to reduce redundancy, and principal component 1 (PC1) was used as a single flowering time trait for QTL analysis. PC1 explained 96.38%, 81.94% and 84.91% of the variance in the six flowering time traits for MapA, MapB and MapC, respectively (Table S1). Therefore, a total of fourteen traits were included in the QTL analyses (Table 2). Because first flowering date is more readily interpreted than PC1, we present the former in subsequent tables and figures of means, heritabilities and correlations.
Initial analyses of variance (ANOVAs) were conducted for just the thirteen control genotypes across the three field plantings to investigate possible effects of environment and genotype by environment interactions. Because no significant environment or genotype by environment interactions was detected in the ANOVAs for the controls (Data S1), subsequent ANOVA analyses of the three progeny populations (i.e. field trials) were conducted independently and least squares means (LS means) estimated without adjustment for further between-population comparisons. To obtain LS means, mixed model ANOVAs were performed with SAS procedure MIXED, where genotype was a fixed effect and the remaining variables were random. Fisher's LSD test was conducted to compare trait means among three plantings at a = 0.05. Using Dunnett's test at a = 0.05, trait means of each progeny genotype were compared to the control, M9g '1993-1780', because it is the current gold standard in Miscanthus biomass production. To enable us to calculate heritability, variance components for each trait in each population were estimated with SAS procedure MIXED using a completely random model. Broad-sense heritabilities were calculated as: where H 2 is the broad-sense heritability, r 2 g is the genetic variance, r 2 gy is the genotype by year interaction variance, r 2 e is the error variance, Y is the number of years and R is the number of replications. Pearson correlation coefficients between traits were calculated using Proc Corr procedure. Genetic correlations were calculated from estimates of genetic covariance as described by Howe et al. (2000) using the following equation: where r 2 cðxÞ and r 2 cðyÞ are the genetic variance components for traits x and y, and r 2 cðxyÞ is the corresponding covariance component, calculated according to the formula ðr 2 cðxþyÞ Þ À r 2 cðxÞ À r 2 cðyÞ Þ=2. Statistical models used in phenotypic data analysis are provided in Appendix S1.

Marker development
RAD libraries were prepared using a PstI-MspI enzyme system as described previously . Progeny DNA samples of all three populations were replicated twice for sequencing, and parent DNA samples were replicated on all libraries within each population, so as to improve read depth and reduce missing data, which is especially important for correctly calling heterozygous loci in obligately outcrossing species such as Miscanthus. All libraries were sequenced on a HiSeq 2000 (Illumina, San Diego, CA, USA) with 100 bp single-end reads at the Roy J. Carver Biotechnology Center at the University of Illinois. SNP discovery was performed using the UNEAK pipeline in TASSEL (version 3.0 standalone; Lu et al., 2013) with a minimum call rate of 0.9 and minor allele frequency of 0.1. Because the UNEAK pipeline does not give a single name to the same SNPs across multiple projects, we used the Python program 'TagDigger' to rename all markers based on tag sequences (Clark & Sacks, 2016), which enabled a direct search of co-segregating markers across the three F 1 populations. Additional SNP filtering details are provided in Appendix S1.

Genetic map construction
Initially, six parental genetic maps were constructed using Join-Map 4.1 (Van Ooijen, 2011) at a minimum of independence logarithm of odds (LOD) score of 11. SNPs within each LG were ordered using the regression mapping algorithm, and intermarker distances were estimated using the Kosambi mapping function (Kosambi, 1944). To confirm the name and orientation of each LG, 64-nucleotide RAD tags of mapped SNP markers on each parental map were then aligned against the Sorghum bicolor reference genome (version 2.0; Paterson et al., 2009) using Bowtie2 (version 2.2.5;Langmead & Salzberg, 2012) with parameters set for high sensitivity (-D 20 -R 3 -N 1 -L 18 -i S,1,0.50 -local).
Consensus maps were constructed using linear programming algorithm LPmerge (Endelman & Plomion, 2014). An M. sinensis consensus map was developed by merging the five M. sinensis parental maps based on cosegregating markers. A consensus genetic map for all the Miscanthus parents, including M. sacchariflorus, was constructed in two steps using an F 2 population as a bridge, due to insufficient numbers of cosegregating markers (only 28) between the M. sacchariflorus 'Robustus' genetic map and the other five M. sinensis parental maps. The F 2 population was obtained by bulking seeds of 50 F 1 individuals . Initially, an F 2 population of 281 individuals derived from the MapA F 1 population (Table 1) was used to construct a genetic map based on recombination of the M. sacchariflorus and M. sinensis genomes, which was subsequently used to bridge between the M. sacchariflorus 'Robustus' map and the M. sinensis consensus map. Finally, a highdensity Miscanthus consensus genetic map encompassing one M. sacchariflorus genetic map and five M. sinensis genetic maps were successfully constructed using LPmerge. Missing data, which were not allowed in the joint-population analysis, were imputed to the mean of flanking informative marker genotypes and weighted by genetic distance ). An R script was written to automate this imputation process (Appendix S1).

QTL analyses
Four QTL analyses strategies were compared as follows: (1) individual parental map CIM analysis, (2) individual parental map stepwise analysis, (3) consensus map single-population stepwise analysis and (4) consensus map joint-population stepwise analysis. These analyses were conducted to determine their merits in QTL detection and validate identified QTLs across different methods. Individual parental map CIM analysis was performed using R/qtl (Broman et al., 2003). Individual parental map stepwise analysis, consensus map single-population stepwise analysis and consensus map joint-population stepwise analysis were performed using the stepwise function in TASSEL 5 (Bradbury et al., 2007). Following each of the aforementioned four analyses, meta-QTL analysis was performed for each trait using BioMercator V3 (Sosnowski et al., 2012) to examine QTL consistency across environments and populations, and also to refine QTL positions on the consensus map. To facilitate comparisons among QTLs identified from the four analysis methods, confidence intervals of QTLs were calculated using the formula of Darvasi & Soller (1997): where N is the population size, and R 2 is the proportion of the phenotypic variance explained by the QTL. Additional details of the QTL analyses are provided in Appendix S1. Subsequently, we compared our QTLs with those identified in previously published Miscanthus mapping studies (Slavov et al., 2014;Gifford et al., 2015;. Because the Miscanthus markers in Slavov et al. (2014) and Gifford et al. (2015) were aligned against S. bicolor genome v1.0, whereas the markers in  and this study were aligned against S. bicolor genome v2.0, we re-aligned the Miscanthus markers in Slavov et al. (2014) and Gifford et al. (2015) against S. bicolor genome v2.0 using blastn 2.2.31 + (Camacho et al., 2009). Detailed blast commands are listed in Appendix S1.

Phenotypic data analysis
Substantial and highly significant differences among progeny were observed for all traits within each population ( Fig. 1, Data S2). Transgressive segregation was observed for most traits in each population ( Fig. 1, Data S3). Among the three field trials, location effects and genotype by location interaction effects were not significant in any of the traits for the thirteen control genotypes (Data S1). However, Fisher's LSD test indicated that all pairwise comparisons of the three progeny population means were significantly different for culm dry weight, culm volume, culm basal internode diameter, culm topmost internode diameter, plant height and flowering time in both 2012 and 2013 (Data S4). For compressed circumference, ratio of compressed circumference to basal circumference, culm density, culm node number and yield per footprint, among the three pairwise comparisons of population means, at least one significant between-population comparison was observed.
Within year, the three population means did not differ significantly for basal circumference or yield per plant, although significant differences among years were observed. The most striking difference among the three populations was the date of first  Fig. 1), whereas no other combination of progeny genotype and year produced greater biomass per plant than the high-yield control. Plant height was similar to yield per plant in that only 24 progeny of MapB in 2012 were taller than M9g '1993-1780' (Data S3, Fig. 1), whereas all other combinations of progeny genotype and year were shorter than the control. In contrast, for each population, average yield per footprint was similar to that of M9g '1993-1780' in each year; moreover, a substantial percentage of progeny in each population had greater yield per footprint than M9g '1993-1780' (84%, 25% and 71% in 2013 for MapA, MapB and MapC, respectively; Data S3, Fig. 1). Consistent with the yield per footprint data, each population also had a high percentage of progeny with a greater ratio of compressed circumference to basal circumference than M9g '1993-1780', especially in 2013 (Data S3, Fig. 1).
Heritability estimates (Table 3) were generally higher in 2013 (year 3) than 2012 (year 2). Flowering time heritabilities were consistently high (≥0.76) across the three populations. Axillary bud number heritabilites in MapA were also high (≥0.75). For the other twelve traits, the 2 year heritability estimates ranged from 0.39 for ratio of compressed circumference to basal circumference to 0.90 for axillary bud number in MapA, 0.44 for culm density to 0.78 for plant height in MapB and 0.62 for culm density to 0.87 for culm node number in MapC (Table 3).
Genetic correlations (Table 4, Data S5) in year 3 between yield per footprint and ratio of compressed circumference to basal circumference were moderate to strong for each of the three populations (0.63-0.90); similarly, they were moderate for compressed circumference (0.56-0.64). Yield per footprint had moderate genetic correlations (0.33-0.49), and moderate to strong phenotypic correlations (0.42-0.71), with yield per plant; no strong genetic or phenotypic correlations were observed between yield per plant and any of the other traits. For each of the three populations, weak or moderate genetic correlations were observed between plant height and yield per plant (0.22-0.44) or yield per footprint (0.09-0.34), but the phenotypic correlations were where H 2 is the broad-sense heritability, r 2 g is the genetic variance, r 2 gy is the genotype by year interaction variance, r 2 e is the error variance, Y is the number of years, and R is the number of replications. H 2 1 is single year heritability whereas H 2 2 is heritability across 2 years.

Genetic maps
The female and male genetic maps within each population had the expected 19 LGs. For the six parental maps, the number of mapped markers ranged from 1214 to 2351, and map lengths ranged from 1699 to 2102 cM (Table 5, Figs 2 and S1). Alignment plots confirmed a complete whole-genome duplication in M. sinensis and M. sacchariflorus relative to Sorghum bicolor (n = 10), with a subsequent chromosome fusion to produce 19 chromosomes (Fig. S2). The numbers of cosegregating markers between pairs of the six parental maps ranged from 2 to 556 (Fig. 2). There were a substantial number of cosegregating markers between the five M. sinensis parental maps (91-556), especially among the three M. sinensis 'Cosmopolitan' maps (346-556). Moreover, there was high collinearity of cosegregating markers for the M. sinensis parental maps (Fig. S3). However, between the M. sacchariflorus 'Robustus' map and the other five M. sinensis maps, there were only 2-13 markers in common (Fig. 2). The M. sinensis consensus map consisted of 6014 markers assembled from five M. sinensis parental maps (Fig. S4a). High collinearity of marker orders was observed between five M. sinensis parental maps and the M. sinensis consensus genetic map (Fig. S4b). The MapA-derived F 2 population genetic map, which we used to integrate the M. sacchariflorus and M. sinensis maps, consisted of 810 markers across 19 LGs (Fig. S5a). Collinearity of the cosegregating markers among the F 2 genetic map, the M. sacchariflorus 'Robustus' map, and the M. sinensis consensus map was confirmed (Fig. S5b). The interspecific F 2 genetic map provided 35 cosegregating markers with the M. sacchariflorus 'Robustus' map and 25 cosegregating markers with the M. sinensis consensus map, which enabled analysis of synteny between M. sacchariflorus and M. sinensis and construction of a cross-species consensus genetic map. Thus, our Miscanthus consensus genetic map consisted of 7618 markers assembled from the six parental maps (Table 5

QTL analyses
Over 2 years and six parental maps, 288 QTLs were identified with CIM and 264 with stepwise regression (Figs 4, S6, S7 and Table 6). Meta-QTL analyses indicated that the CIM-derived QTLs and stepwise-derived QTLs resolved into 157 and 136 meta-QTLs, respectively (Figs 4 and 5), including a common set of 118 meta-QTLs. In addition, parental map CIM analysis and parental map stepwise analysis produced nearly identical QTL confidence intervals (Fig. 4). Consensus map single-population stepwise analysis and joint-population stepwise analysis identified 133 and 109 QTLs, respectively (Figs 4, S8 and Table 6), which resolved into 106 and 86 meta-QTLs, respectively (Figs 4 and 5). Axillary bud number was not included in joint-population analysis because it was phenotyped only in MapA; if we excluded the axillary bud number meta-QTLs, there were 130 and 101 meta-QTLs for single-population stepwise analysis and joint-population stepwise analysis, respectively. Among three stepwise analyses, the number of meta-QTLs in common between individual parental map analyses and single-population analyses, between individual parental map analyses and jointpopulation analyses and between single-population analyses and joint-population analyses were 89, 70 and 63, respectively (Fig. 4). These four analysis methods identified a common set of 59 meta-QTLs (Figs 4, S9 and Data S7). Of the 59 in-common meta-QTLs, the number identified for ratio of compressed circumference to basal circumference, diameter of basal internode, culm node number, flowering time, plant height, yield per plant and yield per footprint were 3, 9, 6, 7, 5, 2 and 5, respectively. Average confidence intervals around the 59 in-common meta-QTLs for the four analysis methods ranged from 17.4 to 23.2 cM, with the joint-population analysis providing the best resolution (Fig. 4). Similarly, overall meta-QTLs, the confidence intervals ranged from 19.2 to 26.8 cM, with the shortest interval again produced by the joint-population stepwise analysis (Fig. 4). More QTLs were detected in 2013 (third year growth) than 2012 (second year growth) for each of the four QTL analysis methods, with the difference greatest for the individual parental map analysis (180 vs. 104), and smallest for the joint-population analysis (66 vs. 43) ( Table 6). Of the QTLs identified in 2012, the percentage re-identified in 2013 was lowest for the single-population analyses (48%); for the individual parental map CIM analyses (54%), individual parental map stepwise analyses (56%) and the joint-population analyses (53%), they were similarly high (Table 5). In comparison, Gifford et al. (2015) reported that 22 of 36 QTLs (61%) identified in the first year were re-discovered in the second year analysis for a population of 221 M. sinensis F 1 s.
Many of the QTLs that we detected colocalize with QTLs for other traits (Figs 5, S6, S7, S8 and S9), which was consistent with the moderate to high genetic correlations observed. In MapA, a major QTL cluster on LG10 was found; LGs 3, 4, 7 and 14 also had QTL clusters, although with fewer traits having overlapping confidence intervals than the LG10 cluster. Interestingly, the QTL cluster on LG19 was a unique region in the M. sacchariflorus 'Robustus' map where no QTLs were identified in the other five M. sinensis genetic maps in this study (Fig. S6). In MapB, LGs 3, 6, 8, 9 and 13 had QTL clusters. In MapC, which was the smallest population, only LG 5 had a QTL cluster.
The four-way cross model in R/qtl together with inferred marker phase information from JoinMap provided a means to calculate maternal, paternal and interaction effects of QTL. Most QTLs exhibited nonzero interaction effects in each of the three populations (Data S7), which suggests some degree of nonadditive gene action (i.e. dominance or epistasis). The largest allelic main effects (i.e. additive) observed in this study were derived from the MapA female parent M. sacchariflorus 'Robustus' for basal circumference (68.4 cm) and first flowering date (À34.0 days) (Data S7). This was consistent with our field observations that M. sacchariflorus

Phenotypic analysis
Within each of the three populations, great variation among progeny was observed for most of the yield traits studied (Fig. 1). Especially notable was the high frequency of advantageous transgressive segregants, such as those with higher yield per plant or yield per footprint, taller stems, and greater ratio of compressed circumference to basal circumference (a proxy for stems per area) than both parents. Transgressive segregation is ubiquitous in plants and most often observed in intraspecific crosses of inbred domesticated plant populations, but less frequently observed in interspecific crosses between outbred species (Rieseberg et al., 1999). In this study, MapA was an interspecific population, and MapB and MapC were intraspecific populations, yet in all three, transgressive segregation was common.
These data demonstrate that there was adequate genetic variation for biomass yield traits to make substantial gains from selection in these diploid inter-and intraspecific Miscanthus populations.
Moreover, the data from this study validate the strategy of crossing southern-adapted selections with northern-adapted selections to breed for improved yield in the northern US. In the current study, the parent incommon to all three populations, M. sinensis ssp. condensatus 'Cosmopolitan', was the least cold hardy genotype and best adapted to hardiness zone 7 (avg. min. temp.: À17.8°C) and warmer; the subspecies condensatus is endemic to coastal southern Japan and adapted to a mild maritime environment. Thus, 'Cosmopolitan', which is variegated, and its green-leafed derivative 'Cosmopolitan Revert' are unable to reach their full yield potential in Urbana, IL (zone 6, avg. min. temp.: À23.3°C) because plantings are typically damaged during the winter; though in southern U.S. environments, their tall, thick stems and late flowering are advantageous. In contrast to 'Cosmopolitan', the other three parents of the population in this study typically perform well in central Illinois  but are    earlier flowering and have thinner stems (Fig. 1). In particular, M. sacchariflorus 'Robustus' was the most cold hardy genotype and had the earliest flowering time among these four parents (Fig. 1); additionally, M. sacchariflorus 'Robustus' had long rhizomes with relatively few culms per area, which is typical of this species and may contribute to their exceptional winter-hardiness, whereas the M. sinensis parents, had short rhizomes and a caespitose habit that is typical of that species. For Miscanthus and many other perennial temperate-adapted grasses (e.g. prairie cordgrass and switchgrass), accessions from lower latitudes typically grow taller, have thicker stems, flower later and yield more than accessions from higher latitudes when grown in a common garden, if the low-latitude accessions are sufficiently well-adapted to the test environment to survive without winter-injury (Yan et al., 2012;Guo et al., 2015;Grabowski et al., 2017). In addition to the high number of advantageous transgressive segregants observed, we found that the QTL alleles conferring the greatest gains in yield per footprint and yield per plant originated from the parents that were the most extreme in adaptation ('Cosmopolitan' and 'Robustus', respectively). The current study demonstrates that crosses between a southern-adapted Miscanthus genotype and three northern-adapted parents can result in many progeny that exceed both parents' yield potential in a northern coldtemperate environment (Fig. 1). Improving biomass yield is the primary goal in Miscanthus breeding. Previous studies reported Miscanthus dry biomass yield as 0.7-30 t ha À1 (Robson et al., 2013;. M9g '1993-1780', our high-yield control, produced 24.5 t ha À1 in 2013, which is consistent with previous studies for year 3 yields of this genotype in the Midwest . Although 23 progeny in MapB yielded more per plant than M9g '1993-1780' in 2012, none of the progeny outperformed M9g '1993-1780' in 2013. The highest yields per plant in MapA, MapB and MapC were 3749, 4165 and 5299 g in 2013 (Data S3), respectively, which were equivalent to 16.7, 18.5 and 23.6 t ha À1 based on the planting density in this study (1.5 m within and between rows). For yield per footprint, which sets the theoretical upper limit of biomass production per planting area, the three populations exhibited superior performance over M9g '1993-1780', which was partly because M9g '1993-1780' had significantly greater basal circumference and therefore required more growing space (Fig. 1, Data S3). Based on the strong positive genetic correlations between compressed circumference to basal circumference ratio (a proxy for number of stems per area) and yield per footprint (0.63-0.90, Table 4), optimizing planting density for each cultivar could be a way to improve Miscanthus biomass production. In comparison, continuing gains in grain yield of US maize during the last century have been mainly due to the development of hybrids adapted to higher plant density rather than more grain per plant (Duvick, 2005;Tian et al., 2011).
All three populations had considerable variation for culm dry weight and volume, but only the intraspecific MapB had substantial (and advantageous) transgressive segregation for these traits. Large thick stems can contribute to low rates of lodging and to high yields. However, there appeared to be relatively less variation for culm density (g cc À1 ) in these populations, in comparison with the great genotypic variation observed for the other stem traits we evaluated. If thin stems and thick stems are similarly dense, then there are two differing likely strategies to breeding for high yield: (1) tall stems that are sufficiently thick to avoid lodging, with moderate numbers of stems per area, or (2) many thin stems per area that are not so tall as to lodge.
Flowering time is an important adaptation trait, for which great diversity among genotypes of M. sinensis and M. sacchariflorus has been observed (Jensen et al., 2011). The distribution of flowering time in the MapA population, which had the earliest parent (M. sacchariflorus 'Robustus'; 1 st flowering date of 15 July 2013) and latest parent (M. sinensis 'Cosmopolitan'; 1 st flowering date of 18 September 2013), was skewed towards the early flowering parent (Fig. 1, Data S3), suggesting dominant gene action. In MapB, flowering time was similarly skewed towards the early parent, although in this case both parents were M. sinensis, and the early parent, 'Silberturm', flowered later (1 st flowering date of 2 Sep 2013) than the 'Robustus' parent in MapA (Fig. 1,  Data S3). Interestingly, the distribution of flowering times in MapA had a long tail, with some progeny flowering later than the late-flowering parent, and there were even some progeny that did not flower at all. Moreover, the latest-flowering progeny in MapA was among the shortest in the study. Jensen et al. (2012) determined that M. sacchariflorus is a quantitative short day plant, which could potentially account for differences observed between MapA and the two intraspecific M. sinensis populations. Although little is currently known about the regulation of flowering time in M. sinensis, evaluation of a large germplasm collection, primarily of Japanese and Korean provenance, in a UK common garden found a weaker negative association between flowering time and latitude of origin for M. sinensis than for M. sacchariflorus (Jensen et al., 2011). Progeny in MapC, which had two late-flowering M. sinensis parents, flowered later on average than the progeny of the other two populations; in fact, the earliest-flowering MapC progeny were typically later than average-flowering progeny of the other two populations. The population distribution of MapC was not skewed towards either parent, in contrast to the other two populations, suggesting that gene action for flowering time was population-specific. Overall the data from this study indicate that selection for later or earlier flowering within inter-or intraspecific Miscanthus populations is expected to be feasible.
In the Saccharinae, later flowering is typically associated with greater numbers of nodes due to later transition from vegetative to reproductive growth, and consequently taller stems, and higher biomass yields (e.g. tropical maize grown under long days, sweet sorghum and sugarcane) (Allison & Daynard, 1979;Zhao et al., 2009;Coelho et al., 2014). Similarly, some prior studies of Miscanthus germplasm panels have found that later flowering is correlated with higher biomass yield (Clifton- Brown & Lewandowski, 2002;Zub et al., 2011;Jensen et al., 2012;Yan et al., 2012), and some of these studies have also found positive correlations between later flowering and taller and thicker stems, which contribute to biomass yield. In contrast to these prior studies of germplasm panels, the current study of segregating biparental populations revealed weak and mostly negative genetic and phenotypic correlations between flowering time and yield. However, genetic correlations between culm node number and flowering time were moderate to strong among our three populations (0.51-0.85), as expected, with the strongest correlation for the interspecific MapA population, which is consistent with the findings of Jensen et al. (2011). After floral initiation, which determines the number of nodes, flowering time can be further influenced by temperature, availability of water and day length. The genetic correlations between culm node number and plant height were positive for the intraspecific populations (0.34-0.43) but unexpectedly negative for the interspecific MapA population (À0.63). Thus for MapA, we can deduce that the progeny with more nodes must have had on average shorter internodes than progeny with fewer nodes, thereby counter intuitively resulting in shorter plants in the genotypes with the most nodes. Such an unexpected outcome for the interspecific population may be the result of high sensitivity to day length contributed by the M. sacchariflorus parent. It is also possible that genes controlling culm node number and plant height have epistatic effects between M. sinensis and M. sacchariflorus. Genetic correlations between culm node number were weakly and positively associated with yield per plant for all three populations (0.05-0.25), whereas the correlations with yield per footprint were positive for MapA (0.03) and Map C (0.17) but negative for MapB (À0.17). Thus, for the three populations in the current study, the length of time to transition from vegetative to reproductive growth had a moderate effect on a genotype's height but a lesser effect on its biomass yield. In these populations, the number of stems per area, as indicated by the ratio of compressed circumference to basal circumference ratio, had a much greater effect on a genotype's biomass yield potential than plant height. Similarly, Matumura et al. (1986Matumura et al. ( , 1987 found that optimizing the number of culms per area was key for increasing biomass yield of new M9g progeny.

High-density genetic map
A consensus genetic map of Miscanthus with 7618 SNP markers, the most dense map developed for this crop to date, was assembled from five M. sinensis parental genetic maps and one M. sacchariflorus map. To the best of our knowledge, this is the first high-density genetic map that integrates M. sinensis and M. sacchariflorus. Moreover, we confirmed for the first time a whole-genome duplication in M. sacchariflorus relative to Sorghum bicolor, similar to that previously documented for M. sinensis (Kim et al., 2012;Ma et al., 2012;Swaminathan et al., 2012). All six parental genetic maps were also linked to our previously published M. sinensis map (Liu et al., 2015). Development of a consensus map that included both M. sinensis and M. sacchariflorus were challenging because there were very few biparental markers shared between M. sacchariflorus 'Robustus' and the M. sinensis parents, likely due to genetic differentiation since speciation. The phenomenon of few biparental markers in interspecific crosses was also reported for SSR-based maps of an M. sacchariflorus 9 M. sinensis F 1 population (Kim et al., 2012), and crosses between Eucalyptus grandis 9 E. urophylla (Grattapaglia & Sederoff, 1994), Populus adenopoda 9 P. alba (Yin et al., 2001) and Dendrobium officinale 9 D. hercoglossum (Xue et al., 2010). To circumvent the problem of few biparental markers in the interspecific MapA F 1 , we used an F 2 population derived from MapA, which significantly improved the number of markers cosegregating in the M. sacchariflorus 'Robustus' map and M. sinensis genetic maps, and thereby facilitated the assembly of the six maps. The parental genetic maps and the Miscanthus consensus genetic map in this study are valuable resources for marker-assisted selection and genomic studies in Miscanthus. In addition, the tag sequences of mapped markers in this study have been used to facilitate a whole-genome assembly of Miscanthus by resolving uncertainties in assembling large scaffolds caused by high heterozygosity and the recent genome duplication (K. Swaminathan, personal communication). The tag sequences of mapped markers could also be potentially converted into new specific markers for high-throughput screening of populations for trait-associated SNPs without doing RAD-seq.

QTL analysis
For Miscanthus, six genetic mapping studies have been conducted previously, and all of them have been on single biparental populations (Atienza et al., 2003a,b,c,d;Gifford et al., 2015;Liu et al., 2015). The current QTL study is the first to use a multiparent interconnected population of Miscanthus. Among the four methods for identifying QTL that we compared (individual parental map CIM analysis, individual parental map stepwise analysis, single-population stepwise analysis and jointpopulation stepwise analysis), we identified 59 meta-QTLs in common (Figs 4 and S9); it is these QTL for which we have the highest confidence.
We compared the CIM and stepwise methods for individual parental map QTL analysis. Identification of a large in-common set of 118 meta-QTLs confirmed the similarity in results of both methods. In addition, similar confidence interval estimates were observed for the two methods (17.8 AE 1.3 cM in CIM and 17.9 AE 1.4 cM in stepwise for the 59 meta-QTLs; 22.6 AE 0.9 cM in CIM and 22.4 AE 1.1 cM in stepwise for all meta-QTLs, Fig. 4). However, we still observed some different QTLs between the individual parental map CIM analysis (157 meta-QTLs) and the individual map stepwise analysis (136 meta-QTLs). Although CIM is powerful, the choice of covariates is critical. With user's input to determine the number of cofactors for forward selection, either too many or too few will reduce the power to detect additional QTL. Furthermore, the subsequent scan fails to account for the uncertainty of previously selected marker covariates and can give an overly optimistic estimation of the precision of QTL location (Broman et al., 2003). This may explain why we identified 21 more meta-QTLs using CIM (157 meta-QTLs) than with stepwise regression (136 meta-QTLs). Stepwise regression, however, is susceptible to collinearity errors. Because detection of a QTL and estimates of QTL effects are affected by other QTLs in multi-QTL models (Zeng, 1994), collinearity among genetic markers could complicate the detection of QTLs and estimation of QTL effects, and this problem is especially exacerbated when using high-density genetic maps . Thus, collinearity could have contributed to the identification of 18 meta-QTLs by stepwise regression that were not detected by CIM.
Though twenty more meta-QTLs were identified by the single-population analyses (106 meta-QTLs) than by the joint-population analysis (86 meta-QTLs), the higher rate of reidentifying meta-QTL over years for the latter analysis indicated higher confidence in the QTL identified by the joint analysis method. Joint-population analysis has been found to have higher power to detect QTL shared among multiple populations but may have lower power to detect rare QTL because adding more populations with nonsignificant QTL increases the sample variance and therefore cause some QTL signals to be weak (Li et al., 2011;Ogut et al., 2015). For example, in our study, few QTLs were detected on LGs 16 and 17 using single-population analysis but none were detected using joint-population analysis (Fig. S8). However, for jointpopulation analysis, the increased population size may have contributed to the detection of additional QTLs ; for example, no QTLs were detected on LG 18 using single-population analysis, whereas one culm node number QTL and one culm basal internode diameter QTL were detected using joint-population analysis (Fig. S8). Additionally, the joint-population analysis had modestly better resolution (i.e. shorter confidence intervals and smaller SEs) than the other three analyses (Fig. 4), which would be advantageous for marker-assisted selection.
Across three populations in this study, we detected several clusters of QTLs for multiple traits, especially on LGs 3,4,7,10,14,19 in MapA,on LGs 3,6,8,9,13 in MapB and on LG 5 in MapC (Figs S6 and S7). Traits that had QTLs clustered together also showed substantial genetic correlations with each other. For example, on LG 3 in MapA, basal circumference, compressed circumference and yield per plant both had one QTL located at the similar location; the genetic correlations between the three traits in 2013 ranged from 0.49 to 0.75. On LG 6 in MapB, culm basal internode diameter, culm topmost internode diameter, culm dry weight and culm volume all had QTLs located at a similar position and the genetic correlation coefficients ranged from 0.70 to 0.81. Likewise, on LG5 in MapC, culm basal internode diameter, culm dry weight, culm volume and culm node number had QTLs clustered together, and genetic correlations between the first three traits ranged from 0.48 to 0.87. Although culm node number showed low to medium genetic correlations with culm basal internode diameter and culm volume (0.08-0.31), it had a high genetic correlation with culm dry weight (0.77 in 2012 and 0.66 in 2013). In addition, colocalization of QTLs could also be due to aliasing of traits. For example, culm basal internode diameter and culm topmost internode diameter both evaluate culm diameter but at different positions, and these two traits also had high phenotypic correlations (0.64-0.81). A previous study of an M. sinensis F 1 population by Gifford et al. (2015) also found two major QTL clusters on LGs 3 and 6. It is possible that genes underlying these QTL clusters might have pleiotropic effects, or the QTL clusters are due to multiple linked genes. The species-specific QTL cluster on LG 19 in M. sacchariflorus 'Robustus' could be due to differentiation of M. sacchariflorus and M. sinensis during their evolution, since Miscanthus diverged from the Saccharum lineage 3.1 million years ago (Kim et al., 2014).
Nonzero interaction effects between maternal alleles and paternal alleles in this study indicated some degree of nonadditive (i.e. dominant or epistatic) effects. Previous QTL studies in Miscanthus also reported significant QTL allelic interactions for total yield, stem yield, tops yield, leaf yield (Atienza et al., 2003a), height, flag-leaf height, stem diameter (Atienza et al., 2003b) and combustion quality (Atienza et al., 2003c). Gifford et al. (2015) studied 13 biomass traits and also observed significant allelic interactions in the detected QTLs. These nonadditive effects could have contributed to the transgressive segregation that we observed (de Vicente & Tanksley, 1993), although additive gene models for transgressive segregation (de Vicente & Tanksley, 1993;Rieseberg et al., 1999;Birchler et al., 2010) cannot be ruled out for our data. In addition, some QTLs detected in this study were of large effect. It is important to keep in mind that these estimates could be biased upward due to limited population size (Beavis, 1998). Additionally, large effect QTLs might fractionate into multiple linked small effect QTLs upon closer examination (Noor et al., 2001;Studer & Doebley, 2011).
We compared our QTLs with those identified in previously published Miscanthus mapping studies (Data S7; Slavov et al., 2014;Gifford et al., 2015;. Based on S. bicolor chromosome positions to which Miscanthus markers were aligned, we were able to find QTL correspondence for similar traits between this study and previous studies. From a genome-wide association study (GWAS) of M. sacchariflorus based on phenotypic data taken at collection sites in eastern Russia , a SNP associated with number of stems per area (53 659 234 bp S. bicolor chromosome 2) was identified in an untranslated region of a protein kinase gene, and we found it to be <1 cM from the peak of a QTL we identified on M. sacchariflorus 'Robustus' LG 4 for the ratio of compressed circumference to basal circumference. Another SNP identified by , associated with basal stem diameter in the previous GWAS (57 761 452 bp on S. bicolor chromosome 6), was within the confidence interval of our basal internode diameter QTL on LG 12 (Data S7). Similarly, we identified twelve QTLs in our study that had overlapping confidence intervals with QTLs reported by Gifford et al. (2015) for a biparental M. sinensis population (Data S7). Additionally, two SNPs associated with stem length by Slavov et al. LGs 2 and 19, respectively (Data S7). About 10% of the QTLs detected in each of our four QTL analysis methods corresponded with QTLs identified in three previous studies of Miscanthus (Slavov et al., 2014;Gifford et al., 2015;. Detection of the same QTLs in multiple, independent studies increases our confidence in their validity and their usefulness for plant breeding. Genomic synteny and collinearity are a common feature in the Poaceae (Devos & Gale, 1997). Recent comparisons between Miscanthus and sorghum genomes were consistent with this finding (Kim et al., 2012;Ma et al., 2012;Swaminathan et al., 2012). Previous studies have shown that genes/QTLs for domestication traits often correspond across grass species (Paterson et al., 1995a,b;Ming et al., 2002;Hu et al., 2003). Intriguingly, using physical positions of Miscanthus markers on the sorghum genome and location of major maturity genes in sorghum (Mace & Jordan, 2010), we were able to find one flowering QTL on LG 2 in this study residing in the vicinity of sorghum maturity gene Ma3 (Table 7, Data S7). Similarly, Gifford et al. (2015) also found overlapping confidence intervals between Ma5 with a Miscanthus QTL. In addition, our flowering QTL on LG 1 (65.8-94.4 cM) contained ASYMMETRIC LEAVES2-like1 gene, which controls proximal-distal patterning in Arabidopsis petals (Chalfun-Junior et al., 2004). Although Nagano et al. (2015) indicated that Hd1 gene can affect flowering time in Miscanthus, especially differences among M. sinensis groups, no QTLs were detected near this gene in this study. Cinnamoyl Coa reductase 1 is the first committed enzyme of the lignin branch biosynthetic pathway (Lacombe et al., 1997) and was located within the confidence intervals of QTLs on LG 10 for basal circumference, ratio of compressed circumference to basal circumference, culm basal internode diameter, culm dry weight, culm volume and yield per plant. Mineral transporters are key players in mineral acquisition, transport and recycling in plants (Schwartz & Amasino, 2013;Palmer et al., 2014). In this study, mineral transporters such as potassium transporter 2, zinc transporter 3 precursor, high-affinity K + transporter 1, nitrate transporter 2.4, sulphate transporter 2 were found within QTL confidence intervals for basal circumference, culm node number, plant height and yield per footprint. In addition, cellulose synthase-like B4, D2, D5 enzymes were located within QTL confidence intervals for yield per footprint, culm topmost internode diameter, culm basal internode diameter and culm volume. The culm volume QTL on LG 2 contained the Expansin B2 (EXPB2) gene, which is predicted to cause loosening and extension of plant cell walls by disrupting noncovalent bonding between cellulose microfibrils and matrix glucans in rice (http://www.uniprot.org/uniprot/O24230). Thus, this study has resulted in many new hypotheses for future Table 7 Putative candidate genes that are within the confidence intervals of QTLs for basal circumference (BC), compressed circumference to basal circumference ratio (CC/BC), culm dry weight (CmDW), culm node number (CmNN), culm volume (CmVol), diameter of basal internode (DBI), diameter of topmost internode (DTI), flowering traits PC1 testing, regarding which genes influence the traits we observed (Table 7, Data S8). In summary, the advantageous transgressive segregation for yield and many of its component traits observed in each of the three populations we studied is encouraging for breeding improved cultivars of Miscanthus. Significant positive correlations between yield components provide opportunities for indirect selection and making concurrent genetic gains in multiple traits. A complete M. sacchariflorus genetic map, as well as a high-density consensus genetic map integrating M. sinensis and M. sacchariflorus, was reported for the first time. QTL analyses were conducted using four separate methods for each of 2 years' data. We found jointpopulation analysis could be a useful tool given its higher resolution, but we observed that it lacked power to detect rare QTLs, which can also be valuable in breeding.

Supporting Information
Additional Supporting Information may be found online in the supporting information tab for this article: Figure S1. The female and male genetic maps within each population.  Figure S3. Marker collinearity between Miscanthus sinensis parental maps: (a) genetic maps of M. sinensis 'Cosmopolitan', the common parent of three interconnected Miscanthus populations. Figure S4. Miscanthus sinensis consensus genetic map: (a) consensus genetic map of five M. sinensis parental maps for three diploid F 1 Miscanthus populations (one map per parent, developed via the psuedo-testcross method for obligately outcrossing species). Figure S5. MapA-derived F 2 genetic map and its collinearity with the Miscanthus sinensis consensus map and the M. sacchariflorus 'Robustus' map: (a) Genetic map of MapA-derived F 2 population (MapA: M. sacchariflorus 'Robustus' 9 M. sinensis 'Cosmopolitan'). Figure S6. Chromosomal locations of 288 QTLs identified on each of six parental genetic maps in 2012 and 2013 using composite interval mapping. Figure S7. Chromosomal locations of 264 QTLs identified on each of six parental genetic maps in 2012 and 2013 using stepwise regression. Figure S8. Chromosomal locations of 133 QTLs identified in the consensus map single-population stepwise analyses, and 109 QTLs identified in the joint-population stepwise analyses in 2012 and 2013. Figure S9. Chromosomal locations of 59 meta-QTLs identified in common among four QTL-identification strategies: (1) individual parental map composite interval map analysis, (2) individual parental map stepwise analysis, (3) consensus map single-population stepwise analysis, and 4) consensus map joint-population stepwise analysis. Table S1. Principal component analysis summary of six flowering traits (first flagging date, first heading date, first flowering date, 50% flagging date, 50% heading date, 50% flowering date) for three diploid F 1 Miscanthus populations established in a field trial with three clonal replications at Urbana, IL in 2011. Appendix S1. Detailed information on the methods used in the study including statistical models in phenotypic data analysis, RAD-seq libraries preparation, SNPs filtering, and data analysis scripts. Data S1. ANOVA of each trait for 13 in-common Miscanthus control genotypes across the three field plantings to investigate possible effects of environment, and genotype by environment interactions. Data S2. ANOVA of each trait within each of the three populations. Data S3. Trait summary statistics for each of the three populations.
Data S4. Population mean comparisons for each trait based on Fisher's LSD test.
Data S5. Genetic and phenotypic correlations between traits in the three populations.
Data S6. Detailed genetic map information including marker names, segregating alleles, tag sequences, Miscanthus genetic map positions and corresponding physical positions on the sorghum genome (version 2.0). Data S7. Summary of QTLs identified by each of four analysis methods.
Data S8. Complete list of candidate genes for meta-QTLs.
Data S9. Raw phenotypic data of progeny, parents and controls within each population.