Development of high transferability cpSSR markers for individual identification and genetic investigation in Cupressaceae species

Abstract Given the low substitution rate in plastomes, the polymorphic and codominant nature of chloroplast SSRs (cpSSRs) makes them ideal markers, complementing their nuclear counterpart. In Cupressaceae, cpSSRs are mostly paternally inherited, thus, they are useful in mating systems and pollen flow studies. Using e‐PCR, 92 SSR loci were identified across six Cupressaceae plastomes, and primers were designed for 26 loci with potential interspecific transferability. The 26 developed cpSSRs were polymorphic in four genera, Platycladus, Sabina, Juniperus, and Cupressus and are suitable for Cupressaceae molecular genetic studies and utilization. We genotyped 192 Platycladus orientalis samples from a core breeding population using 10 of the developed cpSSRs and 10 nuclear SSRs, and these individuals were identified with high confidence. The developed cpSSRs can be used in (1) a marker‐assisted breeding scheme, specifically when paternity identification is required, (2) population genetics investigations, and (3) biogeography of Cupressaceae and unraveling the genetic relationships between related species.

are also important in ecological conservation and horticulture.
Platycladus, which contains only one species, Platycladus orientalis, is native to China, Korea, and the Russian Far East (Cheng & Fu, 1978;Hu, Jin, Wang, Mao, & Li, 2015). As a pioneer species, P. orientalis is often used in ecological restoration projects (Jiang, Shi, Niu, & Yue, 2009;Li, He, & Ren, 2011). P. orientalis has a remarkable capability to absorb and accumulate atmospheric pollutants and heavy metal soil pollutants (Chu, 2012); thus, it is widely used for ecological remediation in densely populated cities in north China.
Despite its importance, to date, little is known about this species' population genetic diversity; however, scant nuclear genetic markers information is available (allozymes: (Xie, Dancik, & Yeh, 1992); AFLP: (Wang, Xing, Tang, & Feng, 2011); SSRs: (Jin et al., 2016)), all revealing low genetic diversity. Understanding P. orientalis population genetics and the extent of diversity in its germplasm collections are critical for the development of sound conservation and breeding strategies. The implementation of marker-assisted breeding methods such as "Breeding without Breeding," which is based on paternity assignment and pedigree reconstruction, requires the availability of diagnostic genetic markers (El-Kassaby & Lstibůrek, 2009;El-Kassaby, Cappa, Liewlaksaneeyanawin, Klápště, & Lstibůrek, 2011).
The uniqueness of organelles genomes has been useful in plants population genetics and evolution due to their nonrecombinant nature and uniparental inheritance (Birky, 1995). In plants, the mitochondrial genome is highly variable and has high levels of intramolecular recombination (Olmstead & Palmer, 1994), rendering them challenging research tool (Provan, Powell, & Hollingsworth, 2001).
The high level of substitution conservation in the plastome confirms the feasibility of using cpSSRs to reveal their genome diversity. CpSSRs, as genetic markers, were first developed by Powell (Powell, Morgante, Andre et al., 1995), highlighting their high polymorphism and codominant inheritance, making them attractive genetic markers, coupled with the fact that only few loci are needed to identify unique genotypes (Schlotterer & Tautz, 1992). Plastome SSR loci are often distributed throughout the noncoding regions and show greater sequence variation than the coding regions and are characterized by low evolutionary rate and an almost nonexistent recombination rate (Powell, Morgante, Andre et al., 1995;Powell, Morgante, McDevitt, Vendramin, & Rafalski, 1995;Powell, Machray, & Provan, 1996), making them ideal molecular tools to complementing their nuclear markers counterparts (Huang et al., 2015;Provan et al., 2001). CpSSRs have been found to be transferable among related species because the regions flanking the cpSSR loci are conserved (Pan et al., 2014), thus, have the potential to evaluate both inter-and intraspecific variability (Powell, Morgante, Andre et al., 1995;Powell, Morgante, McDevitt et al., 1995).
Here, we identified 92 cpSSRs loci in the plastomes of Cupressaceae species and developed 26 transferable SSR markers. The primers were screened for intraspecific polymorphism across different genera, and discrimination rate of the polymorphic loci was evaluated by genetic investigation of a core breeding population of P. orientalis. This study presents a set of polymorphic cpSSR markers which are transferable across diverse genera of Cupressaceae and demonstrates their value for genetic discrimination and diversity studies in this family.

| Study species, sample collection, and DNA extraction
To evaluate the polymorphism and interspecific transferability of the cpSSR markers with the designed primers, we sampled 48 individual trees representing Platycladus orientalis (n = 24), Sabina chinensis Additionally, we characterized the genetic variation in the plastomes of a core breeding population (n = 192 parent trees) of P. orientalis growing in a seed orchard located in the National Tree

| SSR detection and primer design
The GMATA2.1     using constraints of more than five repeats and a motif length between 2 and 10 bp. Electronic PCR (e-PCR) refers to the process of recovering unique sequence-tagged sites in DNA sequences by searching for subsequence that closely match the PCR primers and have the correct order, orientation, and spacing that they could plausibly prime the amplification of a PCR product of the correct molecular weight (Schuler, 1997). Therefore, e-PCR presents a good tool to evaluate the designed primer pairs for interspecific amplification. We performed e-PCR to select potentially interspecific transferable SSRs. The selected primers, which showed interspecific transferability in e-PCR and potential polymorphism, were screened with M13 attached to the 5′ end of the forward primer.
Primer3 (Rozen & Skaletsky, 1999) was employed to design primers for the loci with the following constraints: amplicon size of 120 ~ 400 bp, annealing temperature of about 60°C, and flanking region size ≤2,000 bp. Comparison of DNA sequences from the six species up to one hundred kb long and visualization of the alignments with annotations was generated using VISTA (Frazer, Pachter, Poliakov, Rubin, & Dubchak, 2012;Mayor et al., 2000).
The feasibility of primer amplification of these 26 e-PCR-selected primer pairs was examined in other chloroplast genomes of seven Cupressaceae species from six genera by e-PCR.

| Primer screening and detection of cpSSR polymorphisms
In the initial screening, we amplified the designed markers from mixed DNA of each species across different genera. In a further screening, DNA from individual trees was used to amplify the cpSSR markers which showed clear amplification in the initial step, and polymorphism levels were determined by fluorescentbased capillary electrophoresis with an ABI 3730 sequencer.

| CpSSR evaluation with a core breeding population
We selected 24 primers for genotyping the 192 plastomes of P. orientalis, after excluding two primer pairs (N19 and N29) which displayed low polymorphism in the initial screening. Nuclear SSR polymorphism data for the same sample set were collected from the previous study (Jin et al., 2016). GeneCap version 1.4 (Wilberg & Dreher, 2004) was used to distinguish the different haplotypes by contrasting all alleles. Principal component analysis (PCA) was performed using the "adegenet" package (Jombart, 2008) to reference the redistribution of populations in R (Ihaka & Gentleman, 1996).

| SSR analysis and primer design
Ninety-two SSRs with more than five repeats and a motif length ranging between 2 and 10 bp were identified in the plastomes of six species of two genera, Cupressus and Juniperus. Dinucleotide repeats were the most abundant, with a count of 73 (79.35%), followed by 13 tri-, 4 nona-, 1 tetra-, and 1 hepta-nucleotide repeats (14.13%, 4.35%, 1.09%, and 1.09%, respectively) ( Table S1). The locations of these 92 loci are presented in Figure S1-S6. In the six species plastomes, the most common dinucleotide repeat was (AT/TA) n and the most common trinucleotides were (AGA/TCT) n and (TTC/GAA) n .

| Chloroplast SSR amplification and polymorphism detection
Initial screening via electrophoresis showed that amplicons were obtained for 24 of the 26 e-PCR-selected primer pairs, and amplification of two primer pairs (N14 and N23) was detected only in P. orientalis, S. chinensis, and J. formosana, and not in C. torulosa. Further analyses of the cpSSRs were carried out in these four species.
In the second screening step, the amplicons of most of the

| Application of cpSSR markers in genetic variation analysis
Among the 24 primer pairs characterized in the P. orientalis core breeding population (n = 192), ten pairs (N1, N2, N8, N9, N11, N13, N20, N27, N33, and N34) amplified well and showed polymorphism while the remaining were monomorphic and exhibited low amplification and specificity (Table 3). The performance of these ten primer pairs was uneven, and they displayed different degrees of polymorphism (Table 4). The core breeding population's genetic parameters were as follows: number of observed alleles (N a ) ranged from 2 to 9, effective number of alleles (N e ) ranged from 1.021 to 2.743, Shannon's information index (I) ranged from 0.058 to 1.294, diversity index (h) ranged from 0.021 to 0.635, and unbiased diversity  (uh) ranged from 0.021 to 0.639 (Table 4). Interestingly, locus N8, from the rpoB gene coding region, displayed fairly high degree of diversity. Although only some of the loci produced lower polymorphisms, they nevertheless collectively presented a wide range of polymorphism. Ten polymorphic cpSSR loci revealed 134 unique genotypes in the 192 sampled trees. Statistical confidence for individual identification using these ten loci was moderate (p ID = .0086).
Thus, not every tree had a unique multilocus genotype, and 28 of the 134 genotypes were assigned to 2-7 trees, with a total of 86 trees sharing genotypes.
According to the polymorphism information of the ten loci, PCA partitioned 72.74 4.33, and 2.94% of the variance in the data along the first three axes, respectively, collectively accounting for 80.01% of the total variation. PCA did not show clear separation of the genotypes into discrete clusters ( Figure S7). In the generated dendrogram, the 192 samples were grouped into seven main clades ( Figure   S8: note individuals color correspond to population origin).
We combined the ten polymorphic cpSSR loci with ten previously developed nuclear SSR loci (Jin et al., 2016)

| D ISCUSS I ON
To establish a sound cpSSR genotyping platform for Cupressaceae species, we developed 26 polymorphic and interspecific transfer- plastomes are paternally inherited (Vendramin et al., 1996).

| E-PCR is a critical step to predicate the transferable SSRs
The high quality, versatility, and applicability of our primers demonstrate the viability of the implemented method for developing SSR loci. We first identified and characterized SSRs along with gene features in each given genome/sequence . In the following step, simulated marker mapping (e-mapping) was performed across all genomes/sequences using a forward e-PCR algorithm, allowing evaluating the transferability of the cpSSRs and calculating the potential intergenomic/intersequence polymorphism of each developed SSR locus (Schuler, 1997 Hesperocyparis glabra) showed strong versatility of the primers and also provided important instructions for the use of molecular markers for these species.

| Variation across the chloroplast genome
Formed through a process of mutation known as slippage replication, SSRs are believed to be a key source of genetic variation for plastomes (Provan et al., 2001). Codon sequences are highly conserved in chloroplasts, whereas intron and intergenic regions are variable. In the present study, we found that most of the polymorphic cpSSRs were located in intergenic or intron regions (23 loci in intergenic spacer regions and one in an intron), which is consistent with previous studies (Powell, Morgante, Andre et al., 1995;Powell, Morgante, McDevitt et al., 1995;Powell et al., 1996;Provan et al., 2001). The distribution of polymorphisms indicates that the genetic relationship between samples is the significant factor determining the variation in the number of repeats.
Successful amplification demonstrated the viability of the primers, which were designed using the plastome of related species. Some primers may generate PCR products among unrelated species; however, because the loci are highly conserved, it is expected that there will be no polymorphism among unrelated species (Cheng et al., 2005). We also found that some of the primers failed to show polymorphism between species, which may result from the proximity of the cpSSR locus to a very conservative coding sequence.
Most interestingly, we found that the two loci from coding regions (N6 and N8) displayed polymorphism at a level comparable with other loci. N6 is located in the coding region of ycf1, one of the two largest ORFs in the plastome, which encodes the TIC214 protein involved in protein precursor import into chloroplasts (Drescher, Ruf, Calsa, Carrer, & Bock, 2000). Ycf1 products are essential for cell survival (Kikuchi et al., 2013), and it is among the four genes (together with matK, ndhF, and ccsA) in which single-nucleotide variations and insertion/deletion frequencies were clearly higher than average, showing a signature of positive selection (Daniell, Lin, Yu, & Chang, 2016). Ycf1 was also identified as the most promising plastid DNA barcode for land plants (Dong et al., 2015). N8 is located in the coding region of rpoB, which encodes the beta subunit of a multisubunit RNA polymerase (Little & Hallick, 1988). RpoB was also identified as a marker suitable for phylogenetic study due to its relatively high substitution rate in this sequence (Olmstead & Palmer, 1994). Our study thus provides more evidence of SSR-mediated insertion/deletion variation in two highly variable chloroplast genes (ycf1 and rpoB) in Cupressaceae species.

| CpSSR and its application in P. orientalis
Pollen dispersal is an important mechanism for long-distance gene flow in conifers (Adams, 1992) and potentially plays a major role in maintaining genetic diversity (El-Kassaby & Davidson, 1991;O'Connell, Mosseler, & Rajora, 2007) and, in turn, promoting adaptive evolution and positive response to rapid climate change (Kremer et al., 2012).
Paternity analysis and tracking the origin of pollen also have significant implication for studies of mating systems, conservation genetics, and breeding (Chaisurisri & El-Kassaby, 1994;El-Kassaby, 1995).
SSRs have already been recognized as an efficient means for paternity F I G U R E 2 Principal component analysis (PCA) of Platycladus orientalis populations' plastomes based on the polymorphism information of 10 cpSSRs and 10 nuclear SSRs analysis in conifer species (Cato & Richardson, 1996;El-Kassaby et al., 2011). To enrich the genetic toolkit, we developed a set of cpSSRs for the Cupressaceae and evaluated their application in population genetic studies by analyzing a core breeding population of P. orientalis. In the present study, 86 trees shared 28 genotypes over 10 polymorphic cpSSR loci, but each individual owns unique fingerprint by adding the polymorphic information of nuclear SSR. Low levels of differentiation among the 192 sampled trees had already been revealed in a previous study in which 10 polymorphic nuclear SSRs were used (Jin et al., 2016); we also obtained similar results with cpSSR, and the results of these two primers were also consistent with the above results. It has been suggested that cpSSR markers represent ideal molecular tools to complement nuclear genetic markers (Huang et al., 2015) in investigations of population genetics and biogeography and to unravel the genetic relationships of closely related species. Further analysis can be carried on combining the cpSSR with nuclear SSR, as these primers have significant discrimination rate on individual identification.
Finally, the utility of the developed cpSSRs along with the available nuclear SSRs in paternity assignment and pedigree reconstruction cannot be understated, specifically when applied in marker-assisted breeding schemes that aimed at simplifying the breeding process and speeding generation turnover for capturing greater gains (El-Kassaby & Lstibůrek, 2009;El-Kassaby et al., 2011).

| CON CLUS ION
We developed 26 polymorphic cpSSR markers for Cupressaceae species. The evaluation with 192 P. orientalis samples of a core breeding population demonstrated moderate confidence of individual identification, but significantly high discrimination rate was attained after combining with 10 nuclear SSRs. The high degree of interspecific transferability and polymorphism of the developed cpSSRs proved that cpSSRs are a powerful tool for population genetics and breeding of conifer species.

ACK N OWLED G M ENT
We thank Professors X-R Wang and X-Y Kang for their valuable suggestions. This study was supported by the grants from the Fundamental Research Funds for the Central Universities (NO. YX2013-41).

CO N FLI C T O F I NTE R E S T
None declared.

AUTH O R CO NTR I B UTI O N S
Conceived and designed the experiments: JFM. Performed the experiments: LSH YJ. Analyzed the data: YQS YJ XGH. Contributed reagents/materials/analysis tools: QG FLG XLY JJZ. Wrote the paper: LSH JFM YAEK.

DATA ACCE SS I B I LIT Y
Primers designed and evaluated: main body of text (Table 2).