Integrated analysis of three newly sequenced fern chloroplast genomes: Genome structure and comparative analysis

Abstract Background Some ferns have medicinal properties and are used in therapeutic interventions. However, the classification and phylogenetic relationships of ferns remain incompletely reported. Considering that chloroplast genomes provide ideal information for species identification and evolution, in this study, three unpublished and one published ferns were sequenced and compared with other ferns to obtain comprehensive information on their classification and evolution. Materials and Methods The complete chloroplast genomes of Dryopteris goeringiana (Kunze) Koidz, D. crassirhizoma Nakai, Athyrium brevifrons Nakai ex Kitagawa, and Polystichum tripteron (Kunze) Presl were sequenced using the Illumina HiSeq 4,000 platform. Simple sequence repeats (SSRs), nucleotide diversity analysis, and RNA editing were investigated in all four species. Genome comparison and inverted repeats (IR) boundary expansion and contraction analyses were also performed. The relationships among the ferns were studied by phylogenetic analysis based on the whole chloroplast genomes. Results The whole chloroplast genomes ranged from 148,539 to 151,341 bp in size and exhibited typical quadripartite structures. Ten highly variable loci with parsimony informative (Pi) values of > 0.02 were identified. A total of 75–108 SSRs were identified, and only six SSRs were present in all four ferns. The SSRs contained a higher number of A + T than G + C bases. C‐to‐U conversion was the most common type of RNA editing event. Genome comparison analysis revealed that single‐copy regions were more highly conserved than IR regions. IR boundary expansion and contraction varied among the four ferns. Phylogenetic analysis showed that species in the same genus tended to cluster together with and had relatively close relationships. Conclusion The results provide valuable information on fern chloroplast genomes that will be useful to identify and classify ferns, and study their phylogenetic relationships and evolution.


| INTRODUC TI ON
Ferns are the most evolved of the spore-forming plants, and some have medicinal properties (Chen et al., 2007). The genus Dryopteris (Dryopteridaceae, comprising 225-300 species) is considered ideal for studying diversification, hybridization, and polyploidy in ferns (Sessa et al., 2012). D. goeringiana and D. crassirhizoma, which originated in the northeast region of China are distributed across Russia, Japan, and North Korea. The rhizome and petiole residues of D. crassirhizoma are used in traditional Chinese herbal medicine to eliminate heat and toxins, promote blood circulation, and treat blood stasis (Z. Zhao et al., 2007). Polystichum (Dryopteridaceae) is one of the most abundant genera of ferns and commonly occurs in lowlands and montane to alpine areas (Zhang, 2012); it contains 500 species, with 208 species known in China (Zhang & Barrington). Athyrium Roth (Athyriaceae), the lady-fern genus, contains approximately 220 described species (Ran Wei & Zhang, 2016). Athyrium brevifrons is often used as a wild vegetable in northeastern China because of its high nutritional value. Because only a small proportion of ferns have been identified and classified, additional studies are needed.
With the development of next-generation sequencing (NGS) technology, the details of the most subtle nuclear gene components in eukaryotic cells have become clearer, and the study of cytoplasmic organelle genomes has also been facilitated in a more straightforward and time-saving way (Ruiz-Ruano et al., 2018). This is especially true for chloroplasts, which are involved in many biochemical metabolism processes, including amino acid, sugar, lipid, vitamin, starch, and pigment synthesis; sulfate reduction; and nitrogen. Most chloroplast converts light energy into chemical energy via photosynthesis, making chloroplasts indispensable for plants (Bausher et al., 2006;Jarvis & Soll, 2001;Leister, 2003). Compared with nuclear genomes, chloroplast genomes are more highly conserved in terms of gene order, gene content, and substitution rate (Green, 2011;Helena, 2004;Ruhlman & Jansen, 2014;Wolfe et al., 1987;J. H. Xu et al., 2015). Chloroplast genomes have a typically circular structure with one large single-copy (LSC) region, one short single-copy (SSC) region, and two inverted repeat (IR) regions, ranging from 120 to 170 kilobases in length (Downie & Palmer, 1992). Owing to the absence of recombination and maternal transmission, the chloroplast genomes are helpful for tracing source populations (McCauley et al., 1996;Small et al., 2004). They have become a valuable and ideal resource for species identification, population genetics, plant phylogenetics, and genetic engineering considering their similar structures, highly conserved sequences, and stable maternal heredity (Nock et al., 2014). However, gain and loss of genes, gene content duplication, and gene order rearrangements appear to be phylogenetically and species informative (Bausher et al., 2006;Green, 2011;J. H. Xu et al., 2015).
An increasing number of chloroplast genomes have been reported in recent years, especially because NGS has become cheaper and faster. The chloroplast genomes of many plants have been sequenced, including those of bryophytes P. Wolf & Karol, 2012), lycophytes (Guo et al., 2016;Tsuji et al., 2007), monophytes (Logacheva et al., 2017;Lu et al.,.., 2015;Ruiz-Ruano et al., 2018;R. Wei et al., 2017;P. G. Wolf et al., 2011), and spermatophytes (Sun et al., 2016;J. H. Xu et al., 2015). As one of the largest group of vascular plants, approximately 2,129 species of ferns are present in China (Z. et al.,.., 2013), of which only 60 have been reported. D. crassirhizoma, D. goeringiana, A. brevifrons, and P. tripteron studied in the present study are relatively distributed in Heilongjiang Province, China, and all have certain antibacterial effects or edible value. These ferns are also the research hotspots of domestic ferns. Comparing the differences between the chloroplast genomes through the genetic relationship at different levels provides theoretical support for further development and utilization. In the present study, the complete chloroplast genomes of D. goeringiana, A. brevifrons, and P. tripteron were sequenced for the first time. We performed comparisons of the genomes and IR boundary expansion and contraction. Simple sequence repeats (SSRs), highly variable loci, and RNA editing events were also investigated in the four ferns. A phylogenetic tree was constructed based on the chloroplast genomes of almost all ferns reported thus far. The present study was conducted to achieve the following objectives: a) to sequence and report the chloroplast genomes of D. crassirhizoma, D. goeringiana, A. brevifrons, and P. tripteron; b) to compare the chloroplast genomes of eight fern species to identify useful DNA barcodes for plant identification and evolution analysis; and c) to identify a more comprehensive phylogenetic relationship among ferns.

| Chloroplast DNA extraction and sequencing
Fresh leaves were collected, immersed in liquid nitrogen immediately, and stored at − 80℃ prior to DNA extraction. We isolated chloroplast DNA using an improved extraction method (McPherson et al., 2013) and evaluated its quality and quantity

| Genome assembly and annotation
Prior to assembly, low-quality reads were removed using FastQC (ht tp://w w w.bioin forma tics.babra ham.ac.uk/proje ct s/fastq c/). Then, the chloroplast genomes were assembled in three steps as follows (Cronn et al., 2008): Clean reads were assembled into contigs using SOAPdenovo 2.04 (Luo et al., 2012), clean reads were mapped to the contigs for assembly and optimization using SOAPGapCloser Zhao et al., 2011), and redundant sequences were removed.

| Nucleotide diversity analysis
We measured the parsimony informative (Pi) characters persite values to identify the most variable chloroplast genes using Loci with a Pi value of > 0.20 were considered as highly variable regions.

| RNA editing
The RNA editing events were counted in four ferns.

| Phylogenetic analysis
MAFFT v7.149 was used to align the cpDNAs sequences under default parameters (Katoh et al., 2005), and the alignment was trimmed by Gblocks_0.91b to remove low-quality regions with the parameters: (Castresana, 2000). The maximum-likelihood (ML) methods were performed for the genome-wide phylogenetic analyses using PhyML 3.0 (Guindon et al., 2010), respectively.
Nucleotide substitution model selection was estimated with jModel-
Notably, matK in P. tripteron contained two introns, whereas matK in the other three species contained only one intron. rps12, with one exon in the LSC region and the other two in the IR regions, was considered a trans-spliced gene separated by two introns.

| SSR analysis
Five types of SSRs were identified, including mononucleotides, dinucleotides, trinucleotides, tetranucleotides, and pentanucleotides, with a total of 75-108 SSRs in the four species (Figure 3a and Table S1). There were 55-90 mononucleotides,  Table S1). Only six types of SSRs were simultaneously present in the four ferns: A, C, G, T, AT, and AGAT. The SSRs were composed of a higher number of A + T (63.26%) bases than G + C bases (36.74%; Figure 3g).

| Nucleotide diversity analysis
The chloroplast genome contains numerous variable nucleotides, which are usually recognized as valuable DNA barcoding regions for resolving closely related species or genera. In the present study, variable loci were identified in the four species, with Pi values ranging from 0.0000 to 0.2778 (rpl16) (Figure 4 and Table S2). Ten loci with  Note: tRNA varied in four species and are listed in Table 3.

| RNA editing
RNA editing is defined as the post-transcriptional modification of precursor RNAs to alter their nucleotide sequences through the insertion and deletion, or specific substitution of nucleotides to introduce or remove start or stop codons or yield functional RNA species (Tsudzuki et al., 2001). In the present study, a total of 268 RNA editing events were identified in the four chloroplast genomes: 85 in D. crassirhizoma, 55 in A. brevifrons, 50 in D. goeringiana, and 78 in P. tripteron. C-to-U conversion (120 events, 44.8%) was the most prevalent RNA editing event, followed by U-to-C (103 events, 38.4%), A-to-G (36 events, 13.4%), and G-to-A (9 events, 3.4%).  rpoC2, rpoB, psbC, pasA, rbcL, ycf2, ycf1, and ndhB were identified to be divergent among these chloroplast genomes ( Figure 5). The sequences in the IR regions were more highly conserved than those in the LSC and SSC regions. However, ndhB extended into the LSC region of L. clathratus.

| Phylogenetic analysis
The phylogenetic tree (Figure 7

| D ISCUSS I ON
The chloroplast genome provides information that is valuable for species identification, population genetics, plant phylogenetics, and genetic engineering. In the present study, three previously unpublished and one published fern were sequenced and compared with other species. The chloroplast genomes, which were 148,539-151,341 bp in length in the present study, were within the limit of fern chloroplast genomes (131,760-181,684 bp) reported by Gao et al., (2018) and Ruiz-Ruano et al., 2018). The ferns in the present study had a typical four-junction region structure (F. Liu & Pang, 2016;Mira Park et al., 2018). Variables were usually present in the LCS and SSC regions, and expansion and contraction were noted in the IR region (Asaf et al., 2017). Gao et al., (2018) andLu et al., (2015) reported that intergenic sequences were extended, but overlapping genes were reduced in fern chloroplast genomes. Thus,

F I G U R E 5
The sequence alignment of eight fern species. Gray arrows above the alignment indicate the orientation of genes. Purples, blue, and pink bars represent exons, introns and ncRNAs, and noncoding sequences, respectively. X-axis represents the genome coordinate positions; Y-axis represents the percent identify within 50%-100%. Dashed rectangles indicate highly divergent regions. Use D. goeringiana as the reference sequence utilization is more specific, and genes are more independent. In the present study, we noted the absence of some genes in the ferns (trnI-GAU was absent in P. tripteron), and that tRNA genes were more diverse than protein-coding and rRNA genes, which possibly play an essential role in fern evolution.
Other factors that may promote evolution are introns within genes, boundary divergence, mutations, SSRs, and RNA editing events. Genes are interrupted by introns in the major groups of organisms. One-intron genes vary among species, whereas clpP, rps12, and ycf3 have been found to be two-intron genes (Brouard et al., 2016;S. Liu et al., 2017;Ting Wang et al.,.., 2018); these findings were consistent with our observations, except for P. tripteron.
One additional gene, matK, contained two introns in P. tripteron.
matK is a useful biomarker for phylogenetic analysis in plant classification because its sequence evolution is faster than that of other chloroplast genes (Selvaraj et al., 2008). Notably, the two-intron gene matK and the absence of trnI-GAU may provide valuable evidence regarding molecular evolution of P. tripteron.
Although highly conserved, the expansion and contraction of IR regions are responsible for variations in chloroplast genome size and rearrangement (Raubeson et al., 2007;Yang et al., 2010), thereby promoting genomic evolution (Daniell et al., 2016;Logacheva et al., 2017). D. fragrans has a genome loss of 4,033 bp in the IR region, resulting in a longer SSC region and shorter IRs regions than those in D. crassirhizoma and D. goeringiana (Gao et al., 2018). The nucleotide diversity analysis also demonstrated that the IR regions contained fewer variable loci than the SC regions.
Additionally, genes with Pi values of > 0.20 were mainly located in the SC regions. None of the intron-containing genes (atpF, clpP, matK, ndhA, ndhB, petA, petB, petD, rpl16, rpl2, rpoC1, rps12, rps16, and ycf3) had a Pi value of > 0.20, except rpl16. Intron-containing genes are more highly conserved than exon-containing genes only in the chloroplast genome. In other words, higher variability was found F I G U R E 6 Comparison of the borders of LSC, SSC, and IR regions among the seven chloroplast genomes. The rectangular strips of each row represent a genome. The different colors represent different partitions. The black vertical line represents the boundary; the genes on both sides are indicated by small squares of different colors, the gene name is indicated on it, the gene in the forward chain is above; and the number represent the distant from gene and the boundary. Use D. goeringiana as the reference. LSC, large single copy. SSC, small single copy. IR, inverted repeat regions in exon-containing genes, which provided more valuable information for species evolution. This result was consistent with that in fern plastomes (R. Wei et al., 2017).
SSRs, which serve as valuable molecular genetic markers, are widely used for population genetics (Doorduin et al., 2011;He et al., 2012) (Kwon et al., 2020). The high G + C content of ferns helps them to survive in more environments than other plants.
The unparalleled G + C content might be explained by the high level of RNA editing in the organelles (Smith, 2009). It has been reported that fern and hornwort chloroplast genomes have evolved a higher number of RNA editing events than spermatophyte chloroplast genomes, in which only 30-40 RNA editing sites are typically present (Masanori et al., 2003;P. G. Wolf et al., 2004). In the present study, most editing events were C-to-U conversions, which was consistent with that reported in D. fragrans (Gao et al., 2018). It has been reported that the high number of C-to-U conversions developed in the early stages of vascular plant evolution (Koichiro et al., 2008).
We can conclude that C-to-U RNA editing events might be essential for the rapid evolution of ferns.
Chloroplast genome data are valuable for resolving species definitions because organelle-based "barcodes" can be established for certain species and then applied to reveal interspecies phylogenetic relationships (Jun-Bo Yang et al.,.., 2013). The phylogenetic relationships of A. brevifrons, D. goeringiana, and P. tripteron have been rarely studied before chloroplast genome data became available. This study evaluated the chloroplast genomes of 43 fern species, almost all the ferns reported thus far, in the phylogenetic analysis. We found that D. crassirhizoma and D. goeringiana were closely related to D. decipiens.
F I G U R E 7 Molecular phylogenetic tree on 43 fern species. The tree was constructed using maximum-likelihood algorithm and the general time-reversible (GTR)+I + G + G model. The species studied in the present study was colored with pink P. tripteron was identified as a sister species of C. devexiscapulae.
Interestingly, D. decipiens and C. devexiscapulae were found to be clustered into one branch in a study by Wei et al. (R. Wei et al., 2017). These two species were reported to be closely related to D. crassirhizoma by Xu et al. (L. Xu et al., 2019). Based on the results of the present study, we can speculate that D. crassirhizoma, D. goeringiana, D. decipiens, P.
tripteron, and C. devexiscapulae are closely related.

| CON CLUS IONS
The complete chloroplast genomes of D. goeringiana, A. brevifrons, and D. crassirhizoma were sequenced for the first time. We also demonstrated that ferns have a higher G + C content and a higher number of C-to-U RNA editing events than other plants. The genomic characteristics and variations provide valuable information for understanding the evolution and phylogeny of ferns.

E TH I C S S TATEM ENT
None of the species are endangered, protected, or personally owned.
This research was authorized by the Institute of Natural Resources and Ecology.

ACK N OWLED G M ENTS
None.

CO N FLI C T O F I NTE R E S T
We declare that we have no conflict of interest.