Genetic relationships among Eriobotrya species revealed by genome‐wide RAD sequence data

Abstract Restriction site‐associated DNA sequencing (RAD‐seq) was used to illuminate the genetic relationships among Eriobotrya species. The raw data were filtered, and 221 million clean reads were used for further analysis. A total of 1,983,332 SNPs were obtained from 23 Eriobotrya species and two relative genera. We obtained similar results by neighbor‐joining and maximum likelihood phylogenetic trees. All Eriobotrya plants grouped together into a big clade, and two out‐groups clustered together into a single or separate clade. Chinese and Vietnam accessions were distributed throughout the dendrogram. There was nonsignificant correlation between genotype and geographical distance. However, clustering results were correlated with leaf size to some extent. The Eriobotrya species could be divided into following three groups based on leaf size and phylogenetic analysis: group A and group B comprised of small leaves with <10 cm length except E. stipularis (16.76 cm), and group C can be further divided into two subgroups, which contained medium‐size leaves with a leaf length ranged from 10 to 20 cm and a leaf length bigger than 20 cm.


| INTRODUCTION
High-throughput sequencing technologies have revolutionized the genome research in recent years. The field of population genomics is rapidly expanding, and studies are now possible on unprecedented scales even in nonmodel organisms. Restriction site-associated DNA (RAD-tag) sequencing can simultaneously detect and genotype thousands of genome-wide SNPs (Baird et al., 2008;Willing et al., 2011). It is one of the reduced representation methods that sampled a shared set of sites across the genome in many individuals or populations, making population-scale sequencing possible at a fraction of the cost of whole genome sequencing (Davey et al., 2011). RAD-Seq is suitable for fine-scale linkage mapping (Scaglione et al., 2015;Wang, Fang, Xin, Wang, & Li, 2012), population genetics Andersen et al. 2012), phylogenetics, and phylogeography (Cruaud et al., 2014;Rubin, Ree, & Moreau, 2012;Takahashi & Moreno, 2015;

| The investigation of leaf length
Thirty mature leaves (from five to 10 individuals) were sampled to measure the leaf length. The leaf investigation was carried out for consecutive years. Significant difference analysis (SPSS) was performed at 0.01 level.

| RAD-seq library preparation
RAD-seq library was prepared by using 5 units of NsiI and MseI (NEB, USA) to digest 1,000 ng genomic DNA per sample at 37°C for 2 hr in a 50 μl reaction volume and then inactivate enzyme at 80°C for 20 min . The ligation reaction was performed with and 500 bp were isolated using a Min Elute Gel Extraction kit (Qiagen, Germany) and diluted to 10 μmol/L for Illumina HiSeq2000 sequencing using single-end sequence.

| Quality filtering and SNP calling
Low-quality reads (Q score < 20) and reads with contamination were filtered out; reads were trimmed to 84 nucleotides to remove flanking barcode sequences. All reads were pooled and used for a de novo assembly and SNP calling in ustacks (STACKS pipeline, Catchen, Amores, Hohenlohe, Cresko, & Postlethwait, 2011). We set a minimum stack size of 5 reads (-m) and maximum distance between stacks (-M) within a locus as 2. Population snps were filtered reserving more than half of the samples have snp information. The Illumina data set has been deposited in NCBI sequence read archive (SRA) under accession number PRJNA342569.
Neighbor-joining and maximum likelihood phylogenetic trees were constructed by Treebest software, and bootstrap replicates were set to 1,000.

| The investigation of leaf length
The leaf length of Eriobotrya plants was ranged from 4.45 cm (E. seguinii.) to 35.78 cm (E. malipoensis) ( Table 1). We found three groups: (1) Three species were found with <10 cm leaf length, including, E. seguinii., E. henryi, and E. angustissima, but only E. seguinii showed significant difference from other species; (2) A group of 15 species having leaf lengths between 10 and 20 cm was found, and there was nonsignificant difference among these species; (3) Five species exhibited >20 cm leaf length and grouped together. E. ellipticavar E. petelotii, E. elliptica, and E. malipoensis were found to be significantly different from the rest of species at the 0.01 significance level.

| RAD-tag sequencing and SNPs calling
We got 221 million clean reads by using Illumina HiSeq2000, after removing low-quality reads (Q score < 20), and ambiguous reads with incorrect barcodes. The sequencing quality scores of 20 (Q20), which represent an error rate of 1 in 100, with a corresponding call accuracy of 99%, of all samples were more than 97.6%, indicating that the sequencing quality was good. Of these high-quality reads, the highest reads (37.71 million reads) were detected in E. bengalensis f. angustifolia, and the lowest reads (1.96 million reads) were found in E. japonica, with an average read number of 8.84 million per accession.
We obtained a total of 1,983,332 SNPs, among them, 1,720,528 and 262,804 SNPs were homozygous and heterozygous, respectively (

| Phylogenetic relationship revealed by RAD-seq
Although two different approaches were used to construct the phylogenetic tree, similar results were obtained by both methods. All the other subgroup, and they all have medium-size leaves (Figure 1).
Although we detected variations in the phylogenetic analysis, some results were consistent with the previous studies (Yang, Li, Liu, & Lin, 2009;Yang et al., 2011Yang et al., , 2012. For example, the accessions belonging to the same species were classified into the same cluster, such as E. bengalensis and E. bengalensis f. angustifolia.
Notably, E. japonica and E. malipoensis were always grouped together before clustering with other species. The same situation was found between E. seguinii and E. henryi and between E. fragrans and E. cavaleriei.
However, E. defleax, E. deflexa f. buisanensis and E. defleax var. koshunensis belong to the same species, but clustered into different groups.

| DISCUSSION
Next-generation sequencing technologies have facilitated the study of organisms on a genome-wide scale. RAD-seq allows sampling sequence information at reduced complexity across a target genome using the Illumina platform. Paired-end RAD-seq provides a large number of informative genetic markers in reference as well as nonreference organisms (Willing et al., 2011 T A B L E 2 The SNPs number and information by RAD-seq F I G U R E 1 Leaf size and phylogenetic trees of Eriobotrya species and two relative genera. (a) The leaf size of 23 Eriobotrya species and two relative genera. Bar: 2 cm. (b,c) are neighbor-joining and maximum likelihood phylogenetic trees of 23 Eriobotrya species and two relative genera by RAD-seq. Node support is given as the maximum parsimony bootstrap value. Group C consists of two subgroups and marked as green and pink, respectively sults were in accordance with the preliminary classification proposed by Yang and Lin (2007

| CONCLUSION
This study revealed the genetic relationships among Eriobotrya species by restriction site-associated DNA sequencing (RAD-seq). A total of 1,983,332 SNPs were obtained from 23 Eriobotrya species and two relative genera. We obtained similar results by neighbor-joining and maximum likelihood phylogenetic trees. Our results are reliable, all Eriobotrya plants grouped together into a big clade, and two outgroups clustered together into a single or separate clade. Chinese and Vietnam accessions were distributed throughout the dendrogram. The clustering results were correlated with leaf size, and the Eriobotrya species could be divided into three groups based on leaf size.