Genetic diversity of Miscanthus sinensis in US naturalized populations

Miscanthus is increasingly gaining popularity as a bioenergy grass because of its extremely high biomass productivity. Many clones of this grass were introduced into United States over the past century from East Asia where it originated, and planted for ornamental and landscaping purposes. An understanding of the genetic diversity among these naturalized populations may help in the efficient selection of potential parents in the Miscanthus breeding program. Here, we report our study analyzing the genetic diversity of 228 MiscanthusDNA samples selected from seven sites in six states (Ohio, North Carolina, Washington D.C., Kentucky, Pennsylvania, and Virginia) across the eastern United States. Ten transferable DNA markers from other plant species were employed to amplify genomic DNA of Miscanthus because of the paucity of molecular markers in Miscanthus. There were significant genetic variations observed within and among US naturalized populations. The highest genetic diversity (0.3738) was found among the North Carolina genotypes taken from Biltmore Deer Park and Biltmore, Madison County, Cody Rd. The lowest genetic diversity (0.2776) was observed among Virginia genotypes that were diverged from those from other states, suggesting Virginia genotypes might be independently introduced into the United States from the different origin. By the cluster and structure analysis, 228 genotypes were categorized into two major groups that were further divided into six subgroups at the DNA level and the groups were generally consistent with geographic region.


Introduction
Miscanthus (Gramineae) is an herbaceous perennial grass, native to eastern Asia found throughout China, Korea, and Japan. Due to its high biomass yield, high ligno-cellulose content, and high photosynthesis, Miscanthus is among the top potential bioenergy-producing plants in Europe and North America (Somerville et al., 2010;Glowacka, 2011). This 'green-energy' grass is widely cultivated as a prospective energy source for power production and liquid biofuel generation in Europe and North America. Miscanthus is not native to the United States and has no natural wild relatives with potential out-crossing risks. Thus, it represents an attractive candidate crop for future transgene modifications (ex. to develop clones producing in-planta enzymes or with targeted modification of biomass genes) because of the reduced biosafety concern related to gene flow, with consequent low regulatory burden (Yuan et al., 2008;Jakob et al., 2009;Bransby et al., 2010;Somerville et al., 2010).
The study of genetic diversity of Miscanthus is not only important for the effective conservation, management, and utilization of the genetic resource, but it is also a prerequisite in breeding this grass to select desirable plants. Molecular markers are useful tools for studying genetic diversity and facilitating crop improvement programs, as well as evolutionary and conservation studies. Compared to earlier molecular marker technologies such as AFLPs, RFLPs, and RAPDs, simple sequence repeat (SSR) or microsatellite markers are more effective tools for plant genetics and genomics studies because of their abundance, hyper-variability, ease of PCR-based recognition, codominant transmission, reproducibility, transferability among species and genera, and ubiquitous distribution in the genome and high frequency of polymorphisms (Brown et al., 1996;Cordeiro et al., 2000). Although the advantages of SSR markers are obvious, the identification of SSRs from genomic DNA is an expensive and lengthy process, requiring library construction as well as clone sequencing (Zhao et al., 2011). Because of the lack of SSR markers in Miscanthus, it may be prudent to explore the transferability of SSR markers from closely related species. A large number of such transferable markers have been identified from related species (Hern andez et al., 2001;Zhou et al., 2011;Kim et al., 2012;Swaminathan et al., 2012;Dai et al., 2013;Yu et al., 2013;Tamura et al., 2015), opening up access to genetic variation studies of Miscanthus and molecular breeding. Kim et al. (2012) constructed two genetic linkage maps in Miscanthus, in which total 261 and 303 loci were mapped in the populations of M. sacchariflorus and M. sinensis using sugarcane EST-SSRs, respectively. Moreover, many unigenes in sugarcane deposited in GenBank provide potential for development of gene-based markers in Miscanthus.
Cluster analysis and phylogenetic studies have elicited interest among scientists to study genetic identity, genetic diversity, genetic similarity and dissimilarity, and genetic distance using molecular markers on many different crops. Miscanthus diversity and population structure have been studied in different geographic regions, in China (Selvi et al., 2003;Xu et al., 2013;Zhang et al., 2013), in Japan (Clark et al., 2015), in Korea (Yook et al., 2014), and in Asia (Clark et al., 2014). Miscanthus sinensis has been grown throughout the United States since shortly after they were introduced from eastern Asia in the early 1870s (Anonymous, 1876;Bailey & Miller, 1901). Naturalized populations of Miscanthus have become established in the United States (Clark et al., 2014). However, there are few studies reporting on their origin, genetic diversity, and relationships between U.S. naturalized populations.
To better understand genetic variation in naturalized populations of Miscanthus introduced as ornamental plants from East Asia over the past century in the United States, the transferable DNA markers could be used to analyze the diversity among Miscanthus genotypes based on marker amplicon data. Analysis and comparison of gene sequences would further reveal genetic variation at the DNA level in Miscanthus naturalized populations. Therefore, the aim of this study was to gain an insight into genetic variations of US naturalized populations of Miscanthus with following objectives: (i) use heterologous sugarcane DNA markers to identify genetic variation within and among naturalized populations of Miscanthus; (ii) conduct phylogenetic analysis of Miscanthus in US naturalized populations using DNA sequences.

Experimental materials
Two hundred and twenty-eight seedlings of M. sinensis were collected from seven sites in six states (Ohio, North Carolina, Washington D.C., Kentucky, Pennsylvania, and Virginia) ( Fig. 1, Table 1). M. sinensis is an obligate outcrossing species due to self-incompatibility; thus, each seedling is a unique genotype. Accessions collected from the same state were considered as a single naturalized population for this study. These six naturalized populations likely represent adaptation across complex topography and various climate conditions, especially cold tolerance, within the major distribution areas of

DNA extraction and PCR amplification
About 10-15 g of fresh young leaves was collected from each seedling and freeze-dried and then ground to powder in sterile acid washed sand. Genomic DNA (gDNA) was extracted from 500 mg of powdered leaf samples using a modified protocol (Egnin et al., 1998).
Seven genomic SSR markers and 29 EST-SSR markers from sugarcane and 38 gene markers from Arabidopsis, Lolium perenne, strawberry, and pea (Table S1) were used for PCR amplification. Polymerase chain reaction (PCR) amplification of Miscanthus DNA was carried out using 74 molecular markers (Table S1). The PCR conditions included an initial step of 5 min at 94°C, followed by 35 cycles of denaturation at 94°C for 30 s, annealing at 50-55°C for 30 s depending on primers used, extension at 72°C for 1 min, and final extension at 72°C for 7 min. The total volume of each PCR reaction was 10 lL containing 25 ng template DNA, 1.0 lL of the 109 reaction buffer (MgSO 4 free), 1.0 lL of 25 mM Mg ++ , 0.2 lL of 10 mM dNTP, 1.0 lL of 5 lM (0.5 lL Forward and 0.5 lL Reverse) primer, and 0.05 lL of 5 u lL À1 Taq polymerase. The amplified products were separated on 6.0% polyacrylamide gel using 0.5% Tris-borate-EDTA (TBE) buffer. After electrophoresis, the gel was stained by 1% ethidium bromide solution and the image of the gel was visualized under UV light. The PCR amplification was repeated one more time to ensure reproducibility.
The presence and absence of the DNA bands for each primer-genotype combination was scored as either 1 or 0. Several genetic parameters including allele frequency, genetic diversity (H e ), and Nei's genetic distance (D) (Nei et al., 1983) were estimated using the software of POWER-MARKER 3.25 (Liu and Muse 2005). Cluster analysis was performed using the neighbor-joining method by the Molecular Evolutionary Genetics Analysis software (MEGA 6.06) (Tamura et al., 2013). Unique alleles were considered as those present in one accession or one group of accessions but absent in other accessions or groups of accessions. Rare alleles were those with frequency of ≤5% in investigated materials, while those alleles with frequencies >20% were classified as frequent alleles .
Population structure was determined and individuals assigned to groups using the software STRUCTURE 2.3.4, Stanford University, CA (Pritchard et al., 2000). The admixture model, using correlated allele frequency, was used. The program Structure was run eight times for each subpopulation value (K, ranging from 1 to 10) with a burn-in period of 10 4 followed by 10 5 iterations. Evanno's delta K (Evanno et al., 2005) was chosen to determine the optimum number of subpopulations. The run with maximum likelihoods value was chosen to assign accessions with the posterior membership coefficients (Q). A graphical bar plot representing the posterior membership coefficients of each accession was then generated.
Analysis of molecular variance (AMOVA) was performed to evaluate population differentiation using GENALEX6.5 software (Peakall & Smouse, 2006. The calculation of pairwise GDs for binary data followed the method of Huff et al. (1993).

DNA sequence data analysis
Seventy-four DNA markers from other species were used to amplify Miscanthus genomic DNAs obtained from genotypes in US naturalized populations. When amplicons displayed a monomorphism among genotypes by certain marker, the amplified PCR products of such monomorphic marker were sequenced to further reveal genetic variation within and between populations. On the basis of DNA sequences of individuals, the GD among genotypes was calculated and the phylogenetic relationship was reconstructed using MEGA version 6.06 (Tamura et al., 2013).

Genetic diversity and population structure
Of 36 sugarcane SSR markers, 32 produced amplicons and the remaining four EST-SSR markers did not. However, only 20 from 38 gene-based markers of different species could be transferable to Miscanthus. Among DNA markers tested in this study, eight SSR markers and two gene markers (actin, GA30x, PF00931, PF03856, SMC226CG, SMC248CG, SMC319CG, SMC1039CG, EST-SSR29, and EST-SSR38-2) producing polymorphic bands were used for downstream analyses in this study ( Table 2). A total of 23 alleles were generated by these 10 markers, including 22 frequent alleles and one rare allele with an average of 2.3 alleles per primer pair. The rate of polymorphic alleles, Nei's genetic diversity (H e ), and polymorphism information content (PIC) were 0.7054, 0.3833, and 0.3030, respectively, among the genotypes studied (Table 3). This study presented a sizeable molecular marker dataset in a diverse panel of M. sinensis. Genetic variation within and between US naturalized populations of M. sinensis were both significant; however, variation within populations (89%) was substantially greater than between populations (11%) in the distance-based AMOVA analysis ( Table 4). The results indicated genotypes within naturalized population were genetically diverse, whereas genetic differences among populations were substantially less common.
The estimated GD indicated divergence among the populations ( Table 5). The VA population was generally the most diverged from the other five populations. The largest GD (0.245) was found between VA and DC populations (Table 5), while the smallest distance (0.035) was obtained between NC and OH, and (0.038) between KY and PA. Using the software STRUCTURE 2.3.4 (Evanno et al., 2005), we identified six groups among the US M. sinensis naturalized populations (Fig. 2). Genotypes within each group identified by the Structure were represented with different colors in Figure 2. To further evaluate how well the six groups identified by Structure were consistent with the six naturalized populations, cluster analysis was performed among all 228 genotypes using neighbor-joining with the software MEGA 6.06. The resulted radial tree is shown in Figure 3, where each genotype branch has the same color as that genotype had in groups in Figure 2. Although the clades in the tree were not entirely the same as the groups by the Structure analysis (Fig. 2), genotypes with the same color were typically clustered together with a few exceptions. Given the degree of admixture indicated in the Structure analysis, the groups identified were largely consistent with geographic regions, although some genotypes did not fit into major clades. Population composition of the structure groups was 88% VA for green group, 26% OH and 31% NC for blue group, 45%  The blue group included 15 genotypes from OH, 18 were from NC, 11 from KY, eight from DC, five from PA, and one from VA; it included 58 genotypes and formed the biggest group. The second biggest group, yellow, contained 44 genotypes of which 20 were from DC, eight from KY, seven from OH, five from NC, and one from VA. The aquamarine group included 34 genotypes of which 21 were from PA, eight from NC, three from KY, and two from OH genotypes. The red group had 34 genotypes including 19 form NC, five from OH, four from PA, and two from KY. The green group in Figure 3 included 28 genotypes from VA, two from PA, one was from NC, and one from KY. In 'dark pink' group, there were the least number (26) of genotypes.
Most of the genotypes from VA were included in a clade, although a few genotypes were grouped in other clades, further suggesting that the VA naturalized population had the smallest genetic variation among the six populations and it was the most genetically different from the other populations.

Genetic variation in DNA sequences
Because 29 of 36 sugarcane DNA markers were monomorphic among Miscanthus genotypes, the PCR products of one such marker SSR38-1 (cellulose synthase gene-related marker) were sequenced to investigate the genetic variation within DNA sequences among the genotypes. Four sequences were removed due to their poor sequence quality. The genetic distance (GD) among individuals within a naturalized population and among populations was calculated using 224 sequence data by the software MEGA 6.06. Genetic variation within a population was lowest (0.006) for VA followed by OH (0.008) ( Table 6). The naturalized populations KY, PA, and DC had the same genetic variation (0.011), while NC had the greatest variation (0.012). For GD among populations, the lowest GD was between OH and VA (0.011), as well as OH and PA (0.011), whereas the greatest GD was between NC and VA (0.015) ( Table 7).

Phylogenetic relationships
Genetic variation may exist in the DNA sequences even when their PCR products show monomorphism on gel electrophoresis due to the size homoplasy. Phylogenetic relationships were reconstructed using the 224 DNA sequences amplified by the marker SSR38-1 (Fig. 4). Using neighbor-joining method in MEGA 6.06 software, 224 genotypes were categorized into two major groups on the basis of genetic similarity matrix obtained using the software POWER-MARKER 3.25. One major group included four subgroups I, II, III, and IV and another one having two subgroups V and VI (Fig. 4). Genotypes from different naturalized populations were mixed within subgroups. However, genotypes from VA were clustered and concentrated in three subgroups (II, III, and IV) of one major group. Genotypes from OH were found in four subgroups (I, II, III, and VI) in both major groups.

Discussion
In this study, DNA markers from sugarcane proved the most useful tool for analyzing Miscanthus DNA. Ten of these markers (actin, GA30x, PF00931, PF03856, SMC226CG, SMC248CG, SMC319CG, SMC1039CG, EST-SSR29, and EST-SSR38-2) generated clear and unambiguous amplicons showing the polymorphism at each locus in Miscanthus. Thus, these polymorphic markers can be used in Miscanthus genetic and genomic studies, such as genetic linkage mapping and genetic diversity studies.
M. sinensis is an important ornamental crop in the United States and Europe and it is also one of the parental species of M. 9 giganteus, the most economically important Miscanthus species for bioenergy (Linde-Laursen, 1993;Lafferty & Lelley, 1994). M. sinensis can be used not only as a genetic resource for development of new hybrids or improvement of fertile lines, but also as an alternative biofuel crop besides M. 9 giganteus based on a series of studies on agronomy, productivity, and utilization (Christian & Haase, 2001;Jørgensen & Muhs, 2001;Lewandowski et al., 2003;Clifton-Brown et al., 2008). Therefore, it is important to understand the genetic diversity of US naturalized M. sinensis populations and their potential value for breeding improved Miscanthus cultivars that would benefit the ornamental horticulture and bioenergy industries. In the present study, we found high diversity observed among these genotypes, indicating the potential of its use in the genetic improvement of Miscanthus. Our study shows that the genetic diversity within each U.S. naturalized population ranged from 0.28 to 0.37 (Table 3), which was similar to the genetic diversity ranging from 0.25 to 0.32 within different provinces that was observed for some Chinese M. sinensis populations although different markers were used . Clark et al. (2014Clark et al. ( , 2015 found that US naturalized M. sinensis were derived from ornamental cultivars that originated in portions of southern Japan. The present results suggest that although the US naturalized populations are the product of a genetic bottleneck, considerable genetic diversity remains and thus could be useful resource for enabling selection of parents in breeding to develop hybrids with high heterosis. In this study, 228 M. sinensis genotypes from six naturalized populations in the United States could be divided into six groups (Fig. 2) using two statistical methods, STRUCTURE and MEGA 6.06 employing the genotyping dataset. Although differentiation among the groups was incomplete, as indicated by admixture estimates and group geographic composition, the groups were largely   (Figs 2 and 3). The result suggests that either there has been insufficient time for greater differentiation to have occurred and/or migration has allowed for gene flow between geographically distant populations. Seed of Miscanthus is dispersed long distances by wind, and many naturalized populations in the United States are found along highways that would further facilitate seed dispersal. In contrast, Zhao et al. (2013) observed that Chinese M. sinensis were grouped based on their different provinces. The NC naturalized population was the most genetically diverse and also present in many of the Structure groups (Fig. 3, Table 3), which is consistent with the historical record that the Biltmore Estate in Asheville, NC, was an early grower and distributor of M. sinensis in the late 1800s and early 1900s (Quinn et al., 2010). The largest GD between DC and VA naturalized populations was shown by both DNA marker data and DNA sequence data. Moreover, most genotypes from the VA naturalized population could be clustered in the same structure group (Fig. 3) and had the lowest genetic diversity based on DNA marker data (0.2776 in Table 3) and DNA sequence data (0.006 in Table 6), indicating they could be independently introduced from the same place or a narrow origin region differing from the origin for genotypes in other naturalized populations.
M. sinensis has great potential to be a feedstock for producing bioenergy including bioethanol (Heaton et al., 2008;Hastings et al., 2009a,b), and it has long been an important ornamental grass in American gardens. The information obtained in this study on genetic structure and diversity of US naturalized populations of M. sinensis will be useful for breeders to improve M. sinensis cultivars and hybrids. Both polymorphic markers and DNA sequences indicated that large genetic variation was found in NC and PA genotypes and small variation in VA genotypes. These results suggest that the higher genetic gain would be obtained within NC and PA populations for Miscanthus breeding. While less genetic advance would be reached within the VA population, it could be a valuable resource for new hybrids as parental lines because they are genetically diverged from those in other naturalized populations. The genetic information on Miscanthus genotypes in naturalized populations in this study could not only broaden the genetic knowledge of US M. sinensis germplasm, but also be of benefit to future association mapping studies.
There is a concern that M. sinensis could escape production to become a potential invasive species because it produces viable seeds (Raghu et al. 2006). To effectively control escape, strategies were proposed to develop complete sterility in breeding programs by inducing triploidy and functional sterility using combination of genotype by environment interactions to minimize seed-related invasiveness in M. sinensis (Quinn et al., 2010). In this study, we focused on genetic diversity of naturalized genotypes of M. sinensis rather than its invasiveness. However, one rare allele with frequency of ≤5% was observed among genotypes studied. A further study is needed to understand whether this rare allele was caused by mutation or gene flow between Miscanthus and related species due to its nature of open pollination.