• Open Access

Contrasting geographic patterns of genetic variation for molecular markers vs. phenotypic traits in the energy grass Miscanthus sinensis


Correspondence: Gancho Slavov, tel. + 44 01970 823094, fax: +44 01970 622350, e-mail: gts@aber.ac.uk


Species and hybrids of Miscanthus are a promising energy crop, but their outcrossing mating systems and perennial life cycles are serious challenges for breeding programs. One approach to accelerating the domestication of Miscanthus is to harness the tremendous genetic variation that is present within this genus using phenotypic data from extensive field trials, high-density genotyping and sequencing technologies, and rapidly developing statistical methods of relating phenotype to genotype. The success of this approach, however, hinges on detailed knowledge about the population genetic structure of the germplasm used in the breeding program. We therefore used data for 120 single-nucleotide polymorphism and 52 simple sequence repeat markers to depict patterns of putatively neutral population structure among 244 Miscanthus genotypes grown in a field trial near Aberystwyth (UK) and delineate a population of 145 M. sinensis genotypes that will be used for association mapping and genomic selection. Comparative multivariate analyses of molecular marker and phenotypic data for 17 traits related to phenology, morphology/biomass, and cell wall composition revealed significant geographic patterns in this population. A longitudinal cline accounted for a substantial proportion of molecular marker variation (R2 = 0.60, = 3.4 × 10−15). In contrast, genetic variation for phenotypic traits tended to follow latitudinal and altitudinal gradients, with several traits appearing to have been affected by divergent selection (i.e., QST >> FST). These contrasting geographic trends are unusual relative to other plants and provide opportunities for powerful studies of phenotype–genotype associations and the evolutionary history of M. sinensis.


Because of their high productivity and low requirements for agricultural inputs, C4 grasses from the tropical genus Miscanthus are believed to have great potential as a bioenergy crop (Clifton-Brown et al., 2004, 2007; Hastings et al., 2009). However, Miscanthus species are essentially undomesticated, and their accelerated breeding is hampered by their primarily outcrossing mating systems and perennial life cycles (Clifton-Brown et al., 2011a; Robson et al., 2011). Furthermore, although various agronomic approaches are being evaluated, Miscanthus seed establishment is currently impractical in Northern Europe (Clifton-Brown et al., 2011b). To overcome these challenges, we have assembled a large (>1500 accessions) and genetically diverse germplasm collection of Miscanthus and obtained high-quality data for a large number of phenotypic traits from replicated field trials (Allison et al., 2011; Jensen et al., 2011; Robson et al., 2012). In addition, we are using genotyping-by-sequencing approaches to generate extensive molecular marker data for linkage mapping (Ma et al., 2012), genomewide association studies (GWAS), and genomic selection, with the ultimate aim of using marker-assisted breeding approaches for the accelerated creation of highly productive Miscanthus varieties.

Detailed knowledge about the distribution of genetic variation within and among populations is an important prerequisite for the effective use of germplasm in a breeding program. Furthermore, profound understanding of population genetic structure for both neutral markers and phenotypic traits is indispensable for the design of appropriate experimental populations for GWAS and genomic selection, as well as for the ability to statistically control for spurious phenotype–genotype associations (Balding, 2006; Mackay & Powell, 2007; Price et al., 2010). However, population genetic data are currently limited in Miscanthus. Most previous studies were focused on phylogenetic inference, taxonomic classification, and confirmation of the hybrid status of M. × giganteus (Glowacka, 2011), but these studies also detected extensive intraspecific genetic variation (e.g. Hodkinson et al., 2002), even at very small spatial scales (Iwata et al., 2005).

We used data for 120 single-nucleotide polymorphism (SNP) and 52 simple sequence repeat (SSR) markers to depict patterns of putatively neutral population structure among 244 Miscanthus genotypes grown in a replicated field trial, which was described in several previous studies (Allison et al., 2011; Jensen et al., 2011; Robson et al., 2012). Based on these analyses, we delineated a population of 145 M. sinensis genotypes that will be used for proof-of-concept GWAS and genomic selection. Finally, we compared patterns of putatively neutral genetic structure (i.e. based on molecular markers) in this population to those for 17 phenotypic traits related to (1) phenology, (2) morphology/biomass, and (3) cell wall composition, and detected contrasts that have important practical implications.

Materials and methods

Plant materials and DNA extraction

Leaf tissue samples and phenotypic data were collected from a replicated field trial located near Aberystwyth, Wales, UK (Allison et al., 2011; Jensen et al., 2011; Robson et al., 2012). Briefly, 244 genotypes of M. sinensis, M. sacchariflorus, and M. × giganteus, which had previously been brought into and grown across Europe, were cloned using rhizome division and planted in April 2005 following a Randomized Complete Block Design, with one replicate per genotype in each of four blocks planted at 1.5 × 1.5 m spacing. Developmentally young leaves were collected in 96-well plates and kept on ice until storage at −80 °C. DNA was extracted using the DNeasy 96 Plant Kit (Qiagen, Crawley, West Sussex, UK).

Molecular marker genotyping

Molecular marker data were obtained for 241 of the 244 genotypes in the field trial. Briefly, 120 bi-allelic SNP markers were detected using DNA sequence alignments and the genotyping-by-sequencing protocol used to build a high-resolution linkage map for M. sinensis (Ma et al., 2012). In addition, 52 SSRs were selected from publically available Sorghum and Panicum DNA sequence and used in our Miscanthus population (primer sequences and PCR conditions available upon request). The amplification of these SSRs resulted in the detection of 691 polymorphic bands, which were scored as binary variables (i.e. band presence vs. absence) because (1) banding patterns for many of these markers were not consistent with those expected for single codominant loci and (2) flow cytometry assays (unpublished data) suggested that 32 of the 241 genotypes we genotyped could be polyploid. The SNP and SSR markers that we used have not been mapped, and we assumed that the majority of these markers corresponded to nuclear loci.


Seventeen phenotypic traits reflecting phenology, morphology/biomass, and cell wall composition were measured on plants in all four replicates of the trial in the period 2007–2009 (i.e., 2–4 years after establishment, Table 1).

Table 1. Phenotypic traits measured in our study population of M. sinensis (see Fig. 1)
Trait a Unit V G b V ε c H2, dQSTe Geo f r (P-value) g
  1. a

    Phenotypic traits measured in 2007 (.7), 2008 (.8), or 2009 (.9). See 'Materials and methods' for trait definitions.

  2. b

    Genetic variance (see 'Materials and methods').

  3. c

    Error variance (see 'Materials and methods').

  4. d

    Broad-sense heritability (see 'Materials and methods').

  5. e

    Differentiation for phenotypic traits (see 'Materials and methods'). Values exceeding the empirical 95th percentile of FST (i.e. 0.23) are shown in bold.

  6. f

    Geographic coordinate with strongest correlation.

  7. g

    Pearson correlation coefficient and two-sided P-value for correlation of genotypic BLUP (see 'Materials and methods') with geographic coordinate. Significant correlations (α = 0.05) are shown in bold.

DOYFS1.7 Day of year948.15131.410.880.21 Alt −0.62 (5.1e-07)
DOYFS1.8 Day of year992.6663.100.94 0.23 Alt −0.66 (5.6e-08)
DOYFS1.9 Day of year876.37106.550.89 0.24 Alt −0.66 (2.5e-08)
AvgeSen.7 NA1.560.330.820.02 Lat 0.42 (0.0003)
AvgeSen.8 NA1.050.110.900.01 Lat 0.39 (0.0009)
AvgeSen.9 NA1.240.240.840.04 Lat 0.46 (6.5e-05)
Average (Phenology)     0.88 0.13   
BaseDiameter.7 mm2721.003966.600.410.02Long0.18 (0.1436)
BaseDiameter.8 mm2391.404681.700.340.04Long0.20 (0.0880)
BaseDiameter.9 mm3382.103115.600.520.03Long0.18 (0.1283)
DryMatter.7 g36430.9938697.720.480.00 Lat −0.24 (0.0485)
DryMatter.8 g128013.00102673.000.550.00 Alt −0.26 (0.0365)
DryMatter.9 g152500.00133604.000.530.01Alt0.20 (0.1041)
LeafLength.7 cm150.8886.340.64 0.34 Alt −0.59 (1.5e-07)
LeafWidth.7 cm0.110.060.640.00Lat0.21 (0.0723)
MaxCanopyHeight.7 cm348.51211.350.620.08 Lat −0.25 (0.0370)
MaxCanopyHeight.8 cm434.79194.490.690.10 Lat −0.25 (0.0376)
MaxCanopyHeight.9 cm458.08118.600.790.18Lat0.23 (0.0541)
Moisture.7 %95.4618.200.84 0.25 Alt −0.64 (1.3e-08)
Moisture.8 % 0.24 Alt −0.64 (9.5e-09)
Moisture.9 %36.418.850.80 0.34 Alt −0.66 (2.5e-09)
StatureCategory.7 NA0.630.680.480.02 Long 0.37 (0.0014)
StatureLeafAngle.7 NA0.030.030.500.06Alt0.19 (0.1240)
StatureStemAngle.7 NA0.160.170.480.00 Long 0.36 (0.0021)
StemDiameter.7 mm0.991.540.390.04Lat0.19 (0.1146)
StemDiameter.8 mm0.881.010.460.00 Lat −0.26 (0.0297)
TallestStem.8 cm1451.49294.530.83 0.36 Alt 0.57 (6.7e-07)
TallestStem.9 cm1665.62216.450.88 0.46 Alt 0.57 (6.2e-07)
TransectCount.7 NA18.9418.730.500.00Long0.15 (0.2100)
TransectCount.8 NA33.5234.840.490.00Long0.21 (0.0815)
TransectCount.9 NA40.1340.330.500.00Lat0.19 (0.1063)
Average (Morphol./Biomass)     0.60 0.11   
Cell wall composition
Cellulose.7 %DW10.102.650.79 0.26 Alt 0.60 (1.1e-07)
Cellulose.8 %DW5.561.480.79 0.23 Alt 0.54 (3.2e-06)
Hemicellulose.7 %DW0.790.740.520.07Long0.19 (0.1217)
Hemicellulose.8 %DW1.090.600.650.06Long0.22 (0.0696)
Lignin.7 %DW0.540.370.590.00Lat0.13 (0.2933)
Lignin.8 %DW0.430.230.650.00Long0.17 (0.1521)
Average (Cell wall)     0.67 0.10   
Overall Average     0.65 0.11   


Day of flowering stage 1 (DOYFS1), the first observable indication of flowering, was scored as described by Jensen et al. (2011). Briefly, DOYFS1 was recorded as the day of year when the first flag leaf of the plant emerged, but the panicle was still within the leaf sheath. Average senescence (AvgeSen) was scored as described by Robson et al. (2012). Briefly, AvgeSen scores (range = 0–10) were obtained by observation of the whole visible aerial parts of the plant. A value of zero represented no visible leaf senescence; a value of 1 represented approximately 10% loss of green leaf and so on, up to a value of 10, which represented a fully senesced plant with 100% loss of green leaf. Plants were assessed every 2–3 weeks through the majority of the growing season. AvgeSen was the average of all senescence scores for each plant over the approximately 4 months prior to harvest, from the end of September to the end of January.


BaseDiameter (mm) was the diameter measured at ground level across the visually determined widest part of the base of the plant. TallestStem (cm) was the length of the tallest stem, prior to harvest in February, from the base to the uppermost ligule. StemDiameter (mm) was measured approximately 10–15 cm from the base of the plant on one randomly chosen stem using M35 Callipers (Masser, Savcor Group Ltd Oy, Mikkeli, Finland). LeafLength (cm) was measured from the ligule to the tip along the central vein of the youngest leaf that had a ligule. LeafWidth (cm) was the width of the blade at half-leaf length for the leaf used to measure LeafLength. MaxCanopyHeight (cm) was measured from the ground to the point of ‘inflection’ of the majority of leaves at the top of the plant. This was done to capture the height at which the plant leaves intercept most of the available radiation. TransectCount was the number of stems across the middle of the plant. To measure this trait, we inserted a stick through the base of the plant, across its widest diameter, and counted the stems touching the stick and reaching 50% or more of the canopy height. The stature of each plant (StatureLeafAngle, StatureStemAngle, and StatureCategory) was scored independently three times on different days, and the consensus value was recorded. StatureLeafAngle was scored categorically as 0: upright (i.e. mostly vertically oriented leaves), 1: horizontal (i.e. mostly horizontally oriented leaves), or 0.5: intermediate relative to the previous two categories. StatureStemAngle was scored as 1: mostly upright stems, 2: stems mostly upright to inclined at up to 30° from the vertical, 3: stems mostly upright to inclined at up to 60°, 4: stems mostly upright to inclined at up to 90°. StatureCategory additively combined StatureLeafAngle and StatureStemAngle in a single value. To measure biomass-related traits (i.e. DryMatter and Moisture), each plant was harvested in the February following the growing season. Plants were harvested at a height of approximately 5 cm above the soil surface, and the whole above-ground biomass was passed through a silage chopper. The resulting plant material was collected in a plastic sack and weighed (i.e. FWtotal). A subsample of approximately 200–300 g was removed, placed in a paper bag, and weighed to determine wet weight (i.e. FWsubsample). This subsample was then dried to a constant weight at 60 °C (i.e. DWsubsample) and the percentage moisture content was calculated as Moisture (%) = 100(FWsubsample−DWsubsample)/FWsubsample. The total dry weight of the bulk sample was estimated as DryMatter (g) = FWtotal(DWsubsample/FWsubsample).

Cell wall composition

Cellulose, Hemicellulose, and Lignin contents of senesced, bulked plant tissue samples (described above) were estimated indirectly from gravimetric measurements of neutral detergent fiber, acid detergent fiber, and acid detergent lignin as described by Allison et al. (2011). All three measures were expressed as percentage dry weight of the sample (%DW).

Data analysis


Because we did not detect identical multilocus genotypes or observe a discontinuity in the right tail of the frequency distribution of pairwise allele sharing proportions (Fig. S1), we assumed that all 241 multilocus genotypes corresponded to different clones (Arnaud-Haond et al., 2007).

Population genetic structure for molecular markers

Molecular marker data were analyzed using two different approaches. First, we applied the model-based clustering algorithm implemented in v. 2.3.3 of the STRUCTURE program (Pritchard et al., 2000; Falush et al., 2003, 2007) on the SNP data, which were coded as codominant diploid genotypes, and used the empirical statistic ∆K (Evanno et al., 2005) and analyses of molecular variance (AMOVA) to characterize the hierarchical pattern of genetic structure that we expected to be present among the Miscanthus genotypes in the field trial (Fig. 1). We did not use SSR loci in this analysis because (1) we wanted to obtain measures of differentiation that were not potentially affected by the higher mutation rates that are characteristic of this class of markers (Excoffier & Hamilton, 2003) and (2) the majority of SSR markers could not be treated as single codominant loci (see ' Molecular marker genotyping '). For each hierarchical level, we ran the program using the default model parameters and varying the assumed number of genetic groups (K) from one to at least five. Each run consisted of 5000 burn-in iterations and 5000 data collection iterations, and the results from 10 independent runs were aligned using the clumpp program (Jakobsson & Rosenberg, 2007). Based on the results from these runs, we calculated the ad hoc statistic ΔK, which tends to peak at the value of K that corresponds to the highest hierarchical level of substructure (Evanno et al., 2005), using the online version of the Structure Harvester program (Earl & vonHoldt, 2012). Differentiation among groups at each level was quantified using Wright's FST (Wright, 1965) based on amova, and its statistical significance was evaluated based on 1000 permutations using v. 3.11 of the arlequin program (Excoffier et al., 2005).

Figure 1.

Hierarchical population genetic structure for 120 SNP markers of 241 Miscanthus genotypes grown in a replicated field trial near Aberystwyth, UK. Groups at each hierarchical level were delineated using model-based clustering and confirmed through AMOVA. All subsequent analyses were performed using only the 145 genotypes from the sin.1 group (encircled by a dashed line), which will also be used for proof-of-concept GWAS and genomic selection.

As expected for the highest hierarchical level, which included 241 of the 244 Miscanthus genotypes represented in the field trial, the ΔK statistic provided overwhelming support for K = 2. The two groups generally corresponded to the a priori morphological classification of genotypes as M. sinensis or M. sacchariflorus, although there were also many unambiguous discrepancies, which likely resulted from morphological misidentification (Fig. S2). As expected, all M. × giganteus genotypes had intermediate proportional memberships (Fig. S2). Differentiation between the two species (i.e. based on genotypes with proportional memberships ≥ 0.95 for the respective group based on the STRUCTURE results) was strong and highly significant (FST = 0.359, < 0.001). An iteration of this approach within the M. sinensis group (= 163 genotypes) led to the identification of a moderately differentiated (FST = 0.155, < 0.001) subpopulation of 12 genotypes with unknown source locations (i.e. sin.2 in Fig. 1), which had been contributed by Plant Research International (PRI, Wageningen, Netherlands) and six genotypes with intermediate proportional memberships (0.10–0.90). The 145 M. sinensis genotypes from the other subpopulation identified at this hierarchical level (sin.1 in Fig. 1) were used in all subsequent analyses, but three putatively triploid genotypes were excluded from analyses of geographic patterns of genetic variation. Because of the relatively weak substructure within this set of 145 M. sinensis genotypes (FST = 0.058, Fig. 1), they will be used for proof-of-concept GWAS and genomic selection.

Second, to confirm the patterns detected using model-based clustering (described above) and to obtain a continuous view of the geographic structure within our study population of 145 M. sinensis genotypes (see above), we also used v. 4.2 of the eigensoft program to apply the individual-based principal components analysis (PCA) approach of (Patterson et al., 2006). This was done on a data set that contained both SSR and SNP data by coding all genotypic data as presence vs. absence of individual alleles (i.e. without assuming diploidy and codominance) and omitting the normalization step by setting the ‘usenorm’ parameter of the smartpca program to ‘NO’ (Patterson et al., 2006). Results from these analyses were consistent with those from model-based clustering at all hierarchical levels shown in Fig. 1.

Population genetic structure for phenotypic traits

We analyzed phenotypic data for our study population of 145 M. sinensis genotypes using the mixed linear model:

display math(1)

where Yij is the phenotypic trait measurement for genotype j in block i, μ is the population mean, Bi is the fixed effect of block i, Gj is the random effect of genotype j, and εij is the experimental error. When significant (α = 0.05), the row and column positions of plants within blocks were also included in the model as random effects. We conducted mixed linear model analyses using the lme4 package in R (R Development Core Team, 2008) and used the resulting variance components to calculate broad-sense heritabilities (H2) as:

display math(2)

where VG and Vε are the genetic and error variances, respectively (Falconer, 1989). Best linear unbiased predictors (BLUP) of genotypic values were extracted from the mixed linear model used to analyse each trait. To reduce the dimensionality of the phenotypic data, PCA was then applied to the genetic correlation matrix of BLUP values (Campbell, 1979, 1986) for all traits or by groups of traits (Table 1) using R.

To compare levels of genetic differentiation for phenotypic traits (i.e. QST) to those for putatively neutral markers (i.e. FST), we fitted mixed linear models that were similar to Eqn (1) but also included Subpopulation (Figs 1 and 2) as a random effect, with genotypes nested within subpopulations. An estimate of QST was then calculated for each trait as:

display math(3)

where VS was the variance among subpopulations and VG(w) was the variance among genotypes within a population (Whitlock, 2008). This approach is expected to result in underestimates of QST (i.e. because VG(w) includes nonadditive genetic variance and C effects; Howe et al., 2003) and is therefore conservative in identifying traits with unusually high genetic differentiation. QST values for individual traits were compared with the distribution of FST values among the 120 SNP markers and considered extreme, when exceeding the empirical 95th percentile of that distribution (i.e. 0.23).

Figure 2.

Sampling locations of M. sinensis genotypes from our study population (see Fig. 1) and discrete view of putatively neutral population genetic structure as revealed by 120 SNP markers. The two subpopulations identified using model-based clustering (squares vs. triangles) were weakly but significantly differentiated (FST = 0.058, < 0.001).

Spatial representation of genetic variation

To illustrate geographic patterns of genetic variation, PC1 scores for molecular markers and phenotypic traits were kriged using the default options of the Geostatistical Analyst in ArcMap 10 (Esri Ltd., Aylesbury, UK).


Population genetic structure for molecular markers

Model-based clustering analyses of SNP data indicated the presence of two subpopulations (Fig. S3) within our study population of 145 M. sinensis genotypes, but provided no evidence of further substructure within each of these subpopulations (data not shown). Spatially, the genetic discontinuity roughly corresponded to a subdivision between genotypes from the continent and Japan (Fig. 2). Putatively neutral differentiation between these two subpopulations was relatively weak, but statistically significant (FST = 0.058, < 0.001). Consistent with this, individual-based PCA of the combined SNP and SSR data resulted in a highly significant primary axis of variation (< 10−91) that explained 6.3% of the total variation and was strongly correlated with the source longitudes of the genotypes we analyzed (= −0.78, = 3.4 × 10−15, Fig. 3). Unlike model-based clustering, the PCA approach also resulted in the detection of significant (< 10−5) and geographically meaningful axes of variation within each subpopulation.

Figure 3.

Continuous view of neutral population structure in our study population of M. sinensis (see Fig. 1). (a) Geographic variation for the primary axis of neutral genetic variation (PC1) revealed by PCA of 120 SNP and 52 SSR markers. (b) Correlation between PC1 scores and source longitudes.

Population genetic structure for phenotypic traits and evidence for divergent selection

All phenotypic traits measured in our field trial were moderately to highly heritable, with broad-sense heritabilities ranging from 0.34 to 0.94 (Table 1). As expected, phenological traits had the highest (average H2 = 0.88) and morphological/biomass-related traits (average H2 = 0.60) the lowest heritabilities. Multivariate analyses captured a substantial fraction of the genetic variation, with PC1 explaining 30–70% of the total variation, and PC1 and PC2 jointly explaining 50–94% of the total variation (Table 2). Geographic patterns of genetic variation for phenotypic traits (Fig. 4) were strikingly different from those for molecular markers (Fig. 3) and between the two subpopulations that we identified using molecular markers (Table 2). Joint examination of the kriged PC1 maps (Fig. 4) and the trait loadings for the different groups of traits we considered (Tables S1–S4) indicated that genotypes from lower latitudes in the ‘Continent’ subpopulation and lower altitudes in the ‘Japan’ subpopulation tended to have (1) later flowering and senescence, (2) taller and thicker stems and canopies (but not necessarily denser canopies), longer and wider leaves, and consequently higher biomass but also higher moisture, and (3) higher hemicellulose and lower cellulose and lignin contents.

Table 2. Multivariate analysis of genetic variation for 17 phenotypic traits in our study population of M. sinensis (see Fig. 1)
Group of traits a Explained % Subpopulation
Continent Japan
Geo b r(P-value) c Geo d r(P-value) e
  1. a

    Traits were grouped as shown in Table 1.

  2. b

    Geographic coordinate with strongest correlation in the Continent subpopulation (Fig. 2)

  3. c

    Pearson correlation coefficient and two-sided P-value for correlation with geographic coordinate in the Continent subpopulation (Fig. 2). Significant correlations (α = 0.05) are shown in bold.

  4. d

    Geographic coordinate with strongest correlation in the Japan subpopulation (Fig. 2).

  5. e

    Pearson correlation coefficient and two-sided P-value for correlation with geographic coordinate in the Japan subpopulation (Fig. 2). Significant correlations (α = 0.05) are shown in bold.

All traits
PC129.5 Lat 0.64 (0.0080) Alt 0.49 (0.0049)
PC220.2Alt0.39 (0.1314) Long 0.53 (0.0019)
PC170.0 Lat 0.54 (0.0310) Long 0.59 (0.0005)
PC223.9 Alt 0.50 (0.0465) Alt0.12 (0.5060)
PC133.7 Long −0.59 (0.0169) Alt0.34 (0.0637)
PC217.5 Alt 0.54 (0.0305) Long 0.68 (2.4e-05)
Cell wall composition
PC144.5Lat0.13 (0.6215)Long0.07 (0.7163)
PC230.7Alt−0.13 (0.6366) Alt −0.59 (0.0005)
Figure 4.

Spatial genetic variation for 17 phenotypic traits in our study population of M. sinensis (see Fig. 1). Geographic variation for the primary axes of genetic variation (PC1) for all 17 traits (a), phenological traits (b), morphometric and biomass-related traits (c), and cell wall composition traits (d).

As expected, the distributions of subpopulation differentiation for individual SNP markers (FST) and phenotypic traits (QST) both had high variances (Fig. 5) and were not significantly different (median FST = 0.025 vs. median QST = 0.041, two-sided = 0.085 from a Wilcoxon rank sum test). Surprisingly, however, average QST values were similar among the three groups of traits (Table 1), with traits from each group (i.e., DOYFS1, LeafLength, Moisture, TallestStem, and Cellulose) having QST values that exceeded the empirical 95th percentile of FST (i.e. 0.23).

Figure 5.

Genetic differentiation for individual phenotypic traits (QST) relative to that for SNP markers (FST) within our study population of M. sinensis (see Fig. 1). Traits listed on the right of the dashed line (i.e. the empirical 95th percentile of FST, 0.23) have putatively been affected by divergent selection (see Table 1 and 'Materials and methods' for trait definitions).


Population genetic structure for molecular markers

The extent of putatively neutral population structure that we detected in M. sinensis (Figs 1 and 2) was comparable to the average for outcrossing monocotyledonous plants (GST = 0.157; Hamrick & Godt, 1996) and similar to that observed in a smaller scale study of Miscanthus sinensis ssp. condensatus on the Miyake Island, Japan (Iwata et al., 2005). As expected, genetic differentiation in M. sinensis was also similar to that observed among inbred lines of maize (Liu et al., 2003; Hamblin et al., 2007), a closely related outcrossing C4 plant, but substantially lower than among landraces of the predominantly selfing Sorghum (Sagnard et al., 2011; Bouchet et al., 2012).

The continuous pattern of genetic variation for molecular markers that we detected in M. sinensis (Fig. 3) is consistent with those observed in other widespread plants (Bakker et al., 2009; Platt et al., 2010; Slavov et al., 2012). Although PCA gradients do not necessarily reflect specific migration routes and can be expected under a range of demographic scenarios (Novembre & Stephens, 2008), their existence at multiple spatial scales indicates that Isolation By Distance models may be more appropriate for analyses of spatial genetic and genomic data in M. sinensis than the traditionally used Island Model (Rousset, 1997; Guillot et al., 2009). We are currently generating more extensive molecular marker data, which will allow us to also characterize genomewide patterns of allele frequency variation and linkage disequilibrium, estimate effective population sizes over different timescales, and eventually gain insights into the evolutionary history of M. sinensis.

Population genetic structure for phenotypic traits and evidence for divergent selection

All phenotypic traits that we analyzed were moderately to highly heritable, with broad-sense heritabilities (H2, Table 1) being comparable to those calculated for the highly diverse nested association mapping population in maize (Buckler et al., 2009; http://www.panzea.org/db/gateway?file_id=NAM_2006_trait_herit) and somewhat higher than those in experimental populations of sugarcane (Jackson, 2005) and forage grasses (England, 1975; Humphreys, 1995; De Araujo & Coulman, 2002; Majidi et al., 2009). This suggests that the genetic variation of the germplasm captured in our field trial and the phenotyping protocols we have developed should allow us to effectively dissect the genomic architecture of traits related to phenology, biomass productivity, and cell wall composition through GWAS, as well as to accelerate our Miscanthus breeding program through genomic selection empowered by next-generation sequencing (Meuwissen & Goddard, 2010).

Both univariate and multivariate analyses of genetic variation for phenotypic traits revealed clear geographic patterns (Tables 1 and 2, Fig. 4) and indicated the existence of intriguing contrasts between the two subpopulations delineated based on molecular markers and model-based clustering (Fig. 2). Phenological and morphological/biomass traits appeared to vary primarily with latitude in the Continent subpopulation and with altitude and longitude in the Japan subpopulation (Fig. 4b, c). Spatial patterns of genetic variation in both subpopulations provided strong hints for the presence of local adaptation. The evidence for this hypothesis was further strengthened by the facts that (1) geographic patterns for putatively adaptive traits were different from those presumably shaped by genetic drift and migration (i.e. Fig. 4 vs. Fig. 3) and (2) several phenotypic traits consistently (i.e. across multiple years) had subpopulation differentiation (QST) exceeding the empirical 95th percentile of differentiation values for putatively neutral molecular markers (FST) (Fig. 5, Table 1). However, a direct test of this hypothesis would require a more extensive study, which would need to be replicated in multiple environments (i.e. including the native environments of at least some genotypes) and include a set of genotypes that had not been preselected for their ability to overwinter and grow in Europe (i.e. which was the case with the genotypes included in this study).

Genetic clines for phenological and morphological/biomass traits tended to be collinear with presumed climatic gradients (Fig. 4b, c; Table 1), although a formal test of this hypothesis was beyond the scope of this study. In contrast, geographic patterns of genetic variation for cell wall composition traits were more intricate and heterogeneous (Fig. 4d, Table 1). Interestingly, genotypic BLUP for cellulose content were moderately and highly significantly correlated with altitude in both years for which cell wall component data were available (Table 1). Furthermore, cellulose content was among the traits that consistently had much stronger subpopulation differentiation than that observed for molecular markers (Table 1, Fig. 5). Meanwhile, consistent with the nearly independent variation of cellulose vs. hemicellulose contents detected previously in M. sinensis (Allison et al., 2011), genetic variation for hemicellulose content did not appear to follow any simple geographic trends (Table 1). The finding that cellulose content appears to have been affected by divergent selection is surprising, and is being pursued in more detailed ongoing studies at both the phenotypic and molecular levels.


Our study provides broadly applicable information about geographic patterns of genetic variation in M. sinensis both for putatively neutral molecular markers and for a comprehensive set of phenotypic traits. In addition, our results have several specific implications. First, the continuous pattern of putatively neutral genetic structure that we detected will need to be adequately reflected in GWAS (Price et al., 2010) and considered in the design of experimental populations for genomic selection. Second, geographic trends of genetic variation for neutral markers (i.e. longitudinal cline) are distinctly different from those for phenotypic traits (i.e. mostly latitudinal and altitudinal clines). This situation is unusual relative to that in widespread model plants (Nordborg et al., 2005; Platt et al., 2010; Hancock et al., 2011; Filiault & Maloof, 2012; Slavov et al., 2012) and provides an opportunity to detect phenotype–genotype associations and molecular signatures of natural selection (Novembre & Di Rienzo, 2009) with potentially greater statistical power. Finally, geographic patterns of genetic variation combined with the distributions of subpopulation differentiation for phenotypic traits vs. molecular markers suggested that traits from all three groups we analyzed (i.e. traits related to phenology, morphology/biomass, and cell wall composition) could have been affected by divergent selection. The geographic trends we detected provide clues that will guide the identification of the direct agents of natural selection (e.g. climatic factors) thereby bringing insights into the mechanisms of local adaptation in M. sinensis and informing breeding efforts for current and future climates.


GS was funded by the Biosciences, Environment and Agriculture Alliance while working on this manuscript. PR, GA, and ID were supported by the Biotechnology and Biological Sciences Research Council (BBSRC) Institute Strategic Programme grant BBSEG00003134. Funding was also provided by BBSRC grant BB/E014933/1, and a BBSRC Institute career path fellowship BB/E024319/2; and the Department for Environment Food and Rural Affairs grant NF0426. We thank Luke Evans and Chris Thomas for their comments on an earlier version of this manuscript and John Norris for providing Linux computing support.