Development and characterization of 17 microsatellite markers for Sonchus oleraceus

Premise The common sowthistle, Sonchus oleraceus (Asteraceae), is a globally invasive weedy species. In order to investigate its genetic diversity, population genetic structure, and evolutionary history, we developed and characterized nuclear simple sequence repeat markers (SSRs or microsatellites). Methods and Results Seventeen microsatellite primer pairs were developed based on the Illumina sequence data. Ten developed SSR loci were polymorphic in four populations sampled from broad geographical regions. The number of alleles per locus ranged from one to 11, and the levels of observed and expected heterozygosity ranged from 0.000 to 1.000 and from 0.000 to 0.801, respectively. Up to 82% of the newly developed primer pairs were successfully amplified in the congeneric taxa S. asper, S. asper subsp. glaucescens, S. canariensis, and S. palmensis. Conclusions The SSR markers developed in this study will be useful for future population genetic studies on S. oleraceus and other congeneric species.

(Korea, China, Germany, and Australia) were sampled in order to evaluate the polymorphisms of the target loci. Moreover, we used five individuals for each of four taxa of the genus Sonchus for cross-amplification: S. asper subsp. asper, S. asper subsp. glaucescens, S. canariensis, and S. palmensis (Appendix 1). Sonchus asper (subg. Sonchus L.) represents a purported parental species in the hybrid origin of S. oleraceus, whereas S. canariensis and S. palmensis (subg. Dendrosonchus Sch. Bip. ex Boulos) represent two woody perennials endemic to the Canary Islands, Spain; these latter two species were selected to test the broader applicability of the developed microsatellite markers.
Total genomic DNA was extracted from silica gel-dried leaf tissues using the DNeasy Plant Mini Kit (QIAGEN, Carlsbad, California, USA), following the manufacturer instructions. An Illumina paired-end genomic library was constructed using a TruSeq DNA LT Sample Prep Kit (Illumina, San Diego, California, USA), and the library sequencing was conducted with an Illumina HiSeq 4000 platform at the Macrogen Corporation (Seoul, Korea). All raw reads were submitted to the National Center for Biotechnology Information (NCBI) Sequence Read Archive (BioProject ID PRJNA577793). Altogether, 15.6 Gbp was sequenced and a total of 50,229,522 and 53,108,408 paired-end reads were obtained for samples from Korea and Spain, respectively. Raw data were cleaned up by trimming the adapters and low-quality reads with a custom script by the Macrogen Corporation, and the reads were assembled de novo using Velvet version 1.2.10 with default settings (Zerbino and Birney, 2008).
Microsatellite motifs with a repeat unit ranging from two to six nucleotides and a minimum number of six repeats were detected using MISA version 1.0.0 (Thiel et al., 2003) with the default parameters. A total of 685 regions were found. Primer pairs were designed using Primer3 version 2.3.6 (Koressaar and Remm, 2007) with the following settings: PCR product size of 100-300 bp, length from 12 to 48 bp, annealing temperature of 57-63°C, and GC content of 40-60%. The PCR amplifications were performed in a total volume of 20 μL, containing 1 μL of genomic template DNA, 0.5 μL of 10 mM forward primer with fluorescent dye (Table 1), 0.5 μL of 10 mM reverse primer, 2.5 μL (10× with 2.5 mM MgCl 2 ) of PCR buffer, 0.5 μL of dNTPs of 10 mM, and 0.1 μL (1.25 U/μL) of Taq DNA polymerase (Inclone Biotech, Gyeonggi-do, Korea). The PCR cycles Although most of the samples were amplified based on a higher T a setting, some samples failed to amplify, and thus, lower annealing temperatures were used for PCR.
were as follows: initial denaturation at 95°C for 1 min; 35 cycles of denaturation at 95°C for 20 s, annealing at 54-64°C for 40 s, and extension at 72°C for 60 s; and a final extension at 72°C for 5 min. The PCR products were detected electrophoretically on a 3% agarose gel.
Of the 685 regions identified, 107 randomly selected primer pairs were tested for amplification efficiency by PCR using eight representative individuals of S. oleraceus sampled from four populations. Of these 107 primer pairs, 33 failed to amplify and the remaining 74 displayed clear bands and were suitable for further analysis. The PCR amplicons were labeled with fluorescent dye (HEX and FAM) and separated with a GeneScan 400 LIZ Size Standard (Applied Biosystems, Foster City, California, USA) at the Macrogen Corporation using an ABI 3730XL DNA Sequencer (Applied Biosystems). The microsatellite marker profiles were scored using GeneMapper version 5.0 (Applied Biosystems). Of the 74 successfully amplified loci, 10 were found to be polymorphic and seven were monomorphic (Table 1). The remaining 57 loci were excluded due to difficulty in obtaining clean peaks and unsuccessful amplification after labeling the forward primer with fluorescent dye. Population genetic parameters, including the number of alleles, expected heterozygosity, and observed heterozygosity, were calculated using GenAlEx version 6.503 (Peakall and Smouse, 192, 198 2 192, 198 2 192, 198 2 192, 198 Note: -= unsuccessful amplification; A = number of alleles; N = number of individuals sampled. a Locality and voucher information are provided in Appendix 1.

2012
). Significant deviations from Hardy-Weinberg equilibrium and linkage disequilibrium were estimated using GENEPOP 4.2 (Rousset, 2008) using the default parameters. The conventional chisquare goodness-of-fit tests were performed to compare observed and expected genotypic frequencies (α = 0.05). Polymorphism information content, a value that is indicative of a measure of the informativeness of a genetic marker for linkage studies, was estimated using CERVUS version 3.0.7 (Kalinowski et al., 2007).
In four investigated populations of S. oleraceus, the number of alleles ranged from one to 11 (Table 1), polymorphism information content values ranged from 0.000 to 0.778, and levels of observed and expected heterozygosity ranged from 0.000 to 1.000 and from 0.000 to 0.801, respectively (Table 2). Significant deviation from Hardy-Weinberg equilibrium was detected in the four populations: Australia (eight loci), Germany (three loci), China (two loci), and Korea (one locus) ( Table 2).
The results of cross-amplification in four congeneric taxa are shown Table 3. Out of 17 successfully amplified loci, six and seven loci were found to be polymorphic in S. asper and S. asper subsp. glaucescens, respectively. Furthermore, out of 14 successfully amplified loci in two woody Sonchus species (S. canariensis and S. palmensis), seven and five loci were polymorphic, respectively ( Table 3). The high level of cross-species transferability (>82%) suggests that the genus Sonchus has highly conserved sequences and that its species have recently diverged.

CONCLUSIONS
In this study, we developed the first set of microsatellite markers for the genus Sonchus. Ten polymorphic markers are suitable for evaluating the population genetics and evolutionary history of S. oleraceus. These newly developed microsatellite markers displayed a high rate of cross-amplification in congeneric species, demonstrating their applicability within the genus Sonchus.