High‐density linkage map reveals QTL underlying growth traits in AP13×VS16 biparental population of switchgrass

Switchgrass (Panicum virgatum L.), a native warm‐season perennial grass, is being considered as a feedstock for biofuel production in the United States. To expedite its genetic improvement and enhance genetic gain per selection cycle, application of marker‐assisted selection is indispensable. A high‐density linkage map was constructed in a pseudo‐F1 testcross mapping population of AP13×VS16, consisting of 349 progenies. A total of 8,757 single nucleotide polymorphism (SNP) markers generated through genotype‐by‐sequencing (GBS) were used to construct the linkage map. The total map length spans up to 2,540.2 cM with the marker density of one marker in every 0.25–0.34 cM. Spring green‐up (SG), days to flowering (FL), and the vegetative growth period (VP) data were analyzed and used for quantitative trait loci (QTL) mapping. The population showed significant variations and exhibited transgressive segregation for SG, FL, and VP. QTL analyses were performed using trait mean of each year and location along with BLUP (best linear unbiased prediction) values of the traits. A total of 35, 37, and 34 QTL for SG, FL, and VP, respectively, were identified. Phenotypic variability explained by each QTL ranged from 11.29% to 27.85%. The additive genetic effects of individual QTL ranged from −1.81 to 2.40, −6.12 to 7.58, and −16.01 to 6.38 for SG, FL, and VP, respectively. Comparing major QTL regions in the switchgrass genome, 20 candidate genes were identified which were reported to be involved in growth‐, development‐, and flowering‐related traits in switchgrass.


| 673
ALI et AL. 2006). Relatively high and reliable productivity across a wide geographical range, low input requirement, and suitability in marginal land that is too dry and uncultivable for other crops (Wright & Turhollow, 2010), makes switchgrass a preferred second-generation (lignocellulosic) feedstock for bioenergy production.
Highly interrelated traits, such as spring green-up (SG), flowering (FL), and vegetative growth period (VP), contribute directly to biomass production. In annual crop species, early planting enhances biomass production by extending VP (Danalatos & Archontoulis, 2010). However, in a perennial species like switchgrass, early spring green-up after long winter dormancy plays a significant role in biomass production.
Days to flower is an important agronomic trait, which plays a key role in the adaptation and geographic distribution of switchgrass ecotypes. The upland ecotypes are late in spring green-up and early in flowering which enables them to adapt to the short growing season in the higher latitude areas of the United States. On the other hand, the lowland ecotypes green-up early in the spring and are late in flowering. Flowering time plays a key role in biomass yield with one day delay increased biomass yield by 0.47 Mg/ha in lowland switchgrass ecotype (Casler, 2014). Negative correlations have been reported between biomass yield and maturity (Newell, 1968;Talbert, Timothy, Burns, Rawlings, & Moll, 1983), indicating that late flowering cultivars accumulate more biomass. Therefore, identification of the QTL underlying flowering in population of a lowland × upland cross would have a significant impact in switchgrass breeding as a tool for trait enhancement.
Vegetative growth period was reported to have an essential impact on biomass yield (Demura, Ye, & Ye, 2010). Biomass yield of switchgrass increased with the increase of VP duration (Van Esbroeck, Hussey, & Sanderson, 1998). The number of days between the SG and FL actually represents the VP, where nearly all of the biomass accumulation takes place. Longer generative and vegetative growth period, faster growth rate, and greater response to input contributed to greater biomass yield (Kusmenoglu & Muehlbauer, 1998).
Early SG and late FL in switchgrass are the likely characteristics to extend the VP that contribute to the biomass accumulation. The VG also substantially increases grazing period while maintaining the forage quality. The shift from the vegetative to the reproductive growth signifies the downhill of the biomass accumulation because the photosynthetic products are mostly used to complete the development of reproductive parts. Dry matter digestibility and concentrations of crude protein, ADF (acid detergent fiber), NDF (neutral detergent fiber), and cellulose decline with the maturity of the switchgrass hay (Burns, Pond, Fisher, & Luginbuhl, 1997;Griffin & Jung, 1983;Twidwell, Johnson, Cherney, & Volenec, 1988).
Marker-assisted selection (MAS) showed great importance to enhance genetic gain in crop improvement. Uniformly distributed and relatively high-density markers in the genome are necessary for effective QTL mapping and marker-trait association studies. Generation of large number of SNPs from genotyping-by-sequencing (GBS) and from whole genome and exome sequencing has proven to be powerful resources employed in the refinement of breeding practices, even for complex polyploid plant species. Genetic linkage map construction, QTL analysis, and genome-wide association studies (GWAS) are being implemented in MAS and genetic selection techniques in molecular plant breeding programs. GBS has been proven to be a very cost-effective and useful method in generating SNP markers that covers whole genome (Elshire et al., 2011;He et al., 2014;Poland, Brown, Sorrells, & Jannink, 2012;Poland & Rife, 2012;Sonah et al., 2013;Spindel et al., 2013). However, the challenges with GBS are the lack of uniform and enough coverage of the genome to call SNPs accurately. In addition, proportion of genotypes with missing markers and their trade-off, and the assignment of markers in proper order within the chromosomes are considered the major hurdles of GBS. Following the GBS method, 1.2 M SNP markers were generated in switchgrass (Lu et al., 2013), but only 3,000 of them could be used in constructing linkage map.
Construction of high-density linkage map and identification of genes linked to QTL would have a valuable role in increasing genetic selection gain by employing markerassisted breeding. The objective of this study was to construct linkage map using SNP markers and to detect the QTL underlying growth traits that contributes to biomass yield, determine their effects and breeding implications. In this study, a high-density linkage map was constructed in a pseudo-F 1 testcross population, AP13×VS16, using 8,757 haplotype SNP markers generated through GBS. It is expected this linkage map will be a valuable resource in resolving some of the bottlenecks of low-density linkage map, especially for finding genes associated with QTL accurately. However, major challenges are how to integrate large number of markers in constructing linkage map and subsequently in analyzing QTL. This study reports important modifications in the procedures for generating linkage map and QTL analysis for switchgrass growth traits, such as SG, FL, and VP.

| The mapping population
A pseudo-F 1 testcross mapping population was developed by crossing two heterozygous genotypes, AP13 and VS16. AP13, the female parent, was a selection from a lowland cultivar 'Alamo' (Casler, 2014;Serba et al., 2013). AP13 is characterized by early regrowth in the spring, late heading, tall plants up to 270 cm, strong shoot that withstands lodging, and high biomass yield. VS16, the male parent, was a selection from the upland cultivar 'summer.' It is characterized by late regrowth, early heading, short plant height with slender stem, and low biomass yield. The population was originally developed at the University of Georgia (Missaoui, Paterson, & Bouton, 2005) and further expanded at the Noble Research Institute (Serba et al., 2013). A total of 349 pseudo-F 1 progenies of the population were used for linkage map construction of which 251 were used for phenotyping. Genotypes with significant missing markers were excluded, and a final set of 120 genotypes were used for linkage map construction and QTL analysis. The selected genotypes had at least 50% of the 14,212 markers.

| Field experiments and phenotypic data
Field experiments were conducted at two Oklahoma locations: Ardmore (Ard) and Red River research farm near Burneyville (RR), which were planted on July 19, 2007, andMay 08, 2008, respectively. The experiments were laid out in an R-256 honeycomb design (Fasoulas & Fasoula, 2010) with four replications. Details of field establishment and management practices were described previously (Serba et al., 2015(Serba et al., , 2013.
Data on SG and FL were collected from Ardmore during 2008-2011 and from Red River during 2009-2011. Spring green-up dates were recorded for each plant at the time of new shoot emergence from the crown during spring. Days to flower was recorded when 50% tiller of a plant had inflorescence. Spring green-up and FL dates were converted to calendar days with the help of 'Days360' function in Excel using the first day of the year as a starting date. Vegetative growth period was calculated as days from SG to FL.
Analysis of variance was conducted using PROC GLM procedure in SAS 9.3 ® (SAS Institute, Cary, NC, USA). Genotype, genotype × location, genotype × year, and genotype × location × year were considered as random effects. Statistical significance was determined at p ≤ 0.05. Frequency distribution of each of the traits was plotted for year-location combinations to check normality of the data. Correlation coefficients among SG, FL, and VP were calculated across all location-year using PROC CORR procedure in SAS 9.3 ® (SAS Institute).

| Marker development, genetic linkage map construction, and QTL analysis
Young leaf tissues from parents and F 1 pseudotestcross progenies were collected in a 2-ml tube and froze in liquid nitrogen. Samples were ground to fine powder. DNA was extracted using a DNeasy ® Plant Mini-prep DNA Extraction Kit (QIAGEN Inc., Valencia, CA, USA) following manufacturer's instructions. DNA concentrations were measured using a NanoDrop ND 1000 spectrophotometer (NanoDrop Technologies, Wilmington, DE). GBS library preparation and sequencing were performed at Joint Genome Institute facility, Walnut Creek, CA, with 2 × 100 bp pair end run set in HiSeq2000. The sequence coverage was ½x to 8x. A set of 349 genotypes (out of 367) had enough sequence data to call meaningful SNPs.
SNP calling used an initial set of AP13-specific (Switchgrass v4.0 reference genome) intrachromosomal specific 51-mer markers to segregate the 18 switchgrass chromosomes. The criteria for the 51-mers was that: (a) They were found only in AP13 and not in VS16 and (b) the 52nd base that immediately follows the 51-mer segregates appropriately in the F 1 progenies. The 52-mers were then counted in the progenies. The linkage map was then constructed representing the two subgenomes combining the data from both AP13 and VS16 parents. A set of 455,999 haplotype markers were selected based on their segregation ratios and number of uncalled progenies. The final marker calls were phased into separate subgenomes, and the phasing was tested against a set of ~5,000 BAC clones. The error rate in phasing was estimated to be 0.41%.
A marker set of 14,212 were used in constructing genetic linkage map for each of the 18 chromosomes initially. The maximum likelihood method of JoinMap 4.1 (www.kyazma. nl) was used to order the markers within the chromosomes and to calculate initial map distances using haplotype (HAP) model. Map distances were then adjusted to Kosambi map function according to the formula suggested in JoinMap 4.1. As WinQTLCartographer (Wang, Basten, & Zeng, 2012) has limitation of 500 markers for each chromosome, thus linkage map was constructed with 8,757 markers-500 markers from each chromosome with the exception of chromosome 7N that contains only 257 markers. In selecting markers, the distance of markers from neighboring ones was first calculated using map distance of original map. Then, markers that are very close were removed until a maximum of 500 markers were obtained for each chromosome. This procedure ensures marker sets that were almost evenly distributed within the chromosome.
QTL mapping was conducted using Windows QTL Cartographer v2.5  considering the means as well as the BLUP values of the traits. Identification and reporting of the QTL were done using the composite interval mapping (CIM) approach (Zeng, 1994). Genome-wide logarithm of odds (LOD) thresholds of 2.5 was used to call for QTL. The cofactor markers were determined as implemented in standard CIM model using the forward-backward regression method. A walking speed of 2 cM in windows size of 10 cM was used. The physical map of switchgrass genomes flanking 50 kb up-and downstreams of the major QTL peak markers was scanned, and annotated genes within these regions were identified using switchgrass v4.1 annotation information.

| Phenotypic variations
Analysis of variance (ANOVA) showed significant variations among the genotypes of F 1 pseudotestcross mapping population for SG, FL, and VP (Table 1). In addition to the genotypic variation, there was significant genotype × environment interaction of these traits. Frequency distribution of the data revealed nearly normal distribution for all the traits in all location-year combinations (Figures 1-3). At the same time, transgressive segregation was prominent in all location-year combinations. Transgressive segregation could not be called for FL or VP in RR2011 since parental data were not available for these two environments. The mean SG for the population ranged from 58.8 to 89.1 days (  Table S1). SG and VP were negatively correlated.

| Genetic linkage map construction
The linkage map used for QTL detection constructed of 8,757 markers arranged in 18 chromosomes with a maximum of 500 markers in each chromosome. Map length varies from 67.5 to 168.4 cM ( Figure 4, Table 3). One marker was present in 0.25 to 0.34 cM, which means 2.97 to 4.06 markers present in every cM of the map distance. List of markers with map position for original and new linkage map is provided in Supporting information Tables S2 and S3, respectively. Estimation of pairwise recombination fraction showed almost perfect positioning of the markers on the chromosomes. There was no marker displacement found for any chromosome in the recombination fractions plot ( Figure 5).

| QTL analyses
A total of 106 QTL were recorded for all the three growth traits with a LOD of ≥2.5. A higher number of QTL (72) were associated with Ardmore location as compared with RR (34) (Supporting information Table S4). Chromosome 3K harbored the highest number of QLT (15) followed by chromosomes 2K (12) and 1K (11) (Supporting information Thirty-five QTL were detected for SG in seven locationyear combinations (Table 4). Eight QTL showed negative additive effects, while 27 had positive effects, an indication of abundance of late-spring green-up QTL in the population. The proportion of phenotypic variability explained (PVE) by the QTL ranged from 11.29% to 27.85%. Among the negative effect QTL, SG_9N_57447010 showed the highest PVE of 19.37 with additive effect of −0.73. The positive additive effect of the QTL ranged from 0.63 to 2.40, while the negative effect from −0.73 to −1.81. Five QTL showed at least 15% PVE and all had negative additive effects.
A total of 37 QTL for FL were detected (Table 5). The PVE of the FL-related QTL ranged from 11.31% to 26.95%. Twenty-four QTL had PVE value more than 15%: Eight of them had positive additive effects, indicating their influence on late flowering. The QTL, FL_3K_11371679, showed the highest PVE of 26.95% with positive allele effects of 3.69. In general, the PVE values of the QTL detected at Ardmore, especially in 2008 and 2011, were higher than those for the other year-locations.
Thirty-four QTL were recorded for VP (Table 6) with PVE values ranging from 11.49% to 23.10%. Eighteen QTL showed PVE more than 15% and seven had positive additive effects, which indicates long vegetative growth period that can enhance biomass production. The highest positive additive effect was observed for the QTL VP_2K_67003274, whereas VP_8N_40657439 showed the highest negative additive effect. VP_9N_63132454 and VP_9N_53557827 were colocalized in Ardmore for the year 2009 and 2011.

| QTL for BLUP values
Based on the significant genotype × environmental interactions, BLUP values for all three traits were estimated for each of the genotypes from all replication, location, and year data. A total of 16 QTL were identified using the BLUP values of which six were for SG, seven for FL, and three for VP (Table  7). One BLUP QTL of each of the traits appeared at the same position of their corresponding year-location QTL. These are as follows: SG_BLUP_6N_22131630 and SG_6N_20163874 for RR2009 (Tables 4 and 7), FL_BLUP_1K64692294 and FL_1K64692294 for Ardmore 2011 (Tables 5 and 7), and VP_BLUP_1K_57091180 and VP_1K_57091180 for Ardmore 2011 (Tables 6 and 7). LOD score, PVE, and additivity of the BLUP QTL ranged from 2.50 to 9.45, 11.36 to 43.65, and −6.17 to 4.09, respectively (Table 7).

| Identification of colocalized and pleotropic QTL
QTL that appeared at the same or overlapping positions across year-locations were examined for each of the traits. For SG trait, QTL SG_2N_86467959 and SG_2N_89345610 appeared in both Ardmore 2008 and 2009 at the same position of the genome. QTL SG_5K_6039252 of RR2009 and SG_5K_6040501 of Ard2009 overlap (Table 4). Similarly, QTL for VP, VP_3K_3071522, were detected both in Ard2009 and in RR2011 ( Table 6). Detection of the same QTL in different environmental conditions indicates the reliability of this QTL in MAS. After scanning common QTL among all three traits, four QTL were identified that are associated with both FL and VP traits. These colocalized or pleiotropic QTL are FL/VP_1N_6206723, FL/VP_1K_60899297, FL/ VP_2K_7303895, and FL/VP_7K_12994974 and they are located on chromosomes 1N, 1K, 2K, and 7N (Tables 5 and  6), respectively. Two colocalized BLUP QTL were identified for FL and VP, FL/VP_BLUP_2N_23594225, and FL/ VP_BLUP_3K_14668834.

| Mapping and annotation of the major QTL
We mapped 13, 13, and 9 QTL which had at least 15% PVE and typical QTL peaks for SG, FL, and VP, respectively, on switchgrass chromosomes ( Figure 6). These were considered as major QTL. QTL box and whisker values were calculated as width of QTL at 1 and 2 LOD values down from peak marker LOD scores, respectively. Chromosome 2N possessed the highest number of major QTL (5 for SG and 1 for FL, Figure 6). Chromosomes 1K and 2K harbor 4 QTL each for FL and VP. Four QTL were also found on 3K (2 SG, 1 Fl, and 1 VP ( Figure 6).
We reported annotation of 363 genes identified within 50 kb upstream or downstream regions from the peak marker of 10, 12, and 8 QTL for SG, FL, and VP traits, respectively (Supporting information Table S6). These QTL had at least one annotated gene. QTL RG_3K_7236072  (Ard2010) had the highest 23 annotated genes followed by QTL FL_3K_11371679 (Ard2008) with 21 annotated genes and QTL FL_3K_8820134 (Ard2008) possesses 19 genes. QTL RG_6K_18323886 (Ard2010) had only one annotation. Twenty genes were identified that are involved directly or indirectly in growth and development, flowering, and a number of biosynthetic pathways (Table 8). Senescence regulator (REG), CELLULASE, and WRKY transcription factors were found associated with QTL for SG in chromosome 2N. The 5K harbors Photosystem II lipoprotein (Psb) and MADS box transcription factor that were also associated with SG QTL. No apical meristem (NAM) protein is localized with SG QTL in chromosome 9N. Flavonoid 3ʹ,5ʹ-hydroxylase (F3ʹ5ʹH) and ethylene-insensitive protein 3 (EIN3) were associated with FL QTL, located in 7K and 7N, respectively.

| Parental genotypes and population variability
The two parental genotypes crossed to generate the population for this study greatly varied for all the traits under study. AP13, the female parent, belongs to lowland ecotype and was early in SG, late in FL, and has longer VP. Contrastingly, VS16, the male parent, belongs to upland ecotype and was late in SG, early in FL, and has a short VP. Both the parents are heterozygous due to the obligate outcrossing behavior of the species. The ultimate bioenergy goal of switchgrass breeding is to improve lignocellulosic biomass feedstocks for biofuel production. Three interrelated traits, SG, FL, and VP, play significant contributions to increase biomass production by early green-up in spring and delay in flowering, which increase VP. In switchgrass, even one day delay in flowering increases biomass yield by ~0.47 Mg/ha (Casler, 2014). Early heading shifts the vegetative regeneration into reproductive growth and literally stops biomass accumulation. Thus, the genotypes with early green-up coupled with late heading get longer vegetative growth period to boost their biomass yield.
Frequency distributions of SG, FL, and VP values in the population showed a normal distribution which is characteristic of quantitative traits. The variance component analysis revealed that the effects of environment and genotype × environment interaction are important. The low level of genotypic effect implies that this set of traits is controlled by many genes that have small contribution to the phenotype and largely influenced by the environment and genotype × environment interaction. When parents with many heterozygous loci are crossed, the genetic segregation can be observed in a full-sib family, which is the outcome of genetic recombination of alleles in the two parents during meiosis (Butcher et al., 2002). As a result, number of different segregation types with unknown linkage phases arise in the progenies (Lu, Cui, & Wu, 2004). Genotype means of progenies were lower or higher than the parental mean values. Progeny with earlier SG and later FL than AP13 were observed, demonstrating F I G U R E 5 Plots of estimated pairwise recombination fractions (upper triangles on the diagonal) and LOD scores (lower triangles on the diagonal) for all the markers. Red indicates very low recombination with high LOD values, and blue indicates very high recombination (unlinked) with low LOD value transgressive segregation for these traits. Such transgressive segregations were observed in wheat when favorable alleles were contributed by both parents or by linkage between favorable and unfavorable loci breaks (Mengistu et al., 2012). In either case, the combination of favorable alleles brings about superior performance in a progeny derived from hybridization of genetically diverse parents.

| Linkage map and genomic regions underlying growth
Quality and density of linkage map are very important precondition for effective QTL mapping. QTL mapping using low-density map has been considered as a factor hindering the identification of precise number and locations of T A B L E 4 QTL for spring green-up in the AP13×VS16 full-sib population grown at two Oklahoma locations detected using composite the genes associated with QTL controlling a trait (Zou et al., 2012). Complex genetic bases of quantitative traits, accompanied by strong environmental influences, affect QTL locations and estimation of their effects (Robertson-Hoyt et al., 2006). Increase of number of markers and population size and integration of data from multiple environments overcomes some of these limitations ( . QTL and associated candidate genes for corn Fusarium air rot disease were identified after integration of GBS SNPs in linkage maps (Maschietto et al., 2017). Several linkage maps were reported in switchgrass using RFLP (Missaoui, 2003); SRDF (Missaoui et al., 2005); SSR (Liu, Wu, Wang, & Samuels, 2012;Okada et al., 2010;Serbaet al., 2015Serbaet al., , 2013Wang, Samuels, & Wu, 2011); STS (Okada et al., 2010;Serba et al., 2013); DArT (Serba et al., 2013); and SNP (Lu et al., 2013) markers. However, none of the map had a reasonably good marker coverage. Thus, the linkage map generated in this study, which was aligned with the published switchgrass genome, shows great importance for future endeavor in QTL mapping and application of MAS. Currently, a small obstacle in using such map for QTL mapping is the limitation of marker numbers in chromosomes (Maschietto et al., 2017;Serbaet al., 2015Serbaet al., , 2013. Thus, reduction of marker number in each chromosome was done systematically, taking uniform marker distribution into account, for this QTL mapping using QTL cartographer.

SL QTL/marker Chr. Map position (cM) LOD PVE (%) Additivity
Identification of pleiotropic QTL that controls more than one trait and markers linked to these QTL can be used to simultaneously select for two or more traits, which would have a practical implications in plant breeding. In this study, high correlation coefficients between FL and VP were identified, and the colocalization of several QTL for these two traits indicates that simultaneous selection of more than one growth traits is possible to genetically improve switchgrass for biomass production.
MYB108 and EIN3 were found associated with FL QTL. MYB type transcription factor genes regulate number of plant growth and developmental processes. In association with MYB24, MYB108 regulates the jasmonic acid-mediated stamen maturation (Mandaokar, 2009). EIN3 mutants showed cell growth inhibition and accelerated senescence in Arabidopsis (Chao et al., 1997). TSO1 (CXC domain-containing protein) and VQ (valine-glutamine repeat) motif protein were found associated with VP QTL. TSO1 was required for both male and female fertility, and TSO1 mutant caused defects in flower and ovule development (Andersen et al., 2007;Hauser, He, Park, & Gasser, 2000). VQ proteins play an important role in plant growth and development by acting as cofactor of WRKY transcription factors (Cheng et al., 2012). The findings of this research indicate that biomass yield can be improved by manipulating these genes through breeding or molecular approaches. Some of these genes controls multiple developmental stages of plants. Nevertheless, this finding is a step forward in implementing MAS in breeding for biomass yield improvement in switchgrass.