Molecular evolution and SSRs analysis based on the chloroplast genome of Callitropsis funebris

Abstract Chloroplast genome sequences have been used to understand evolutionary events and to infer efficiently phylogenetic relationships. Callitropsis funebris (Cupressaceae) is an endemic species in China. Its phylogenetic position is controversial due to morphological characters similar to those of Cupressus, Callitropsis, and Chamaecyparis. This study used next‐generation sequencing technology to sequence the complete chloroplast genome of Ca. funebris and then constructed the phylogenetic relationship between Ca. funebris and its related species based on a variety of data sets and methods. Simple sequence repeats (SSRs) and adaptive evolution analysis were also conducted. Our results showed that the monophyletic branch consisting of Ca. funebris and Cupressus tonkinensis is a sister to Cupressus, while Callitropsis is not monophyletic; Ca. nootkatensis and Ca. vietnamensis are nested in turn at the base of the monophyletic group Hesperocyparis. The statistical results of SSRs supported the closest relationship between Ca. funebris and Cupressus. By performing adaptive evolution analysis under the phylogenetic background of Cupressales, the Branch model detected three genes and the Site model detected 10 genes under positive selection; and the Branch‐Site model uncovered that rpoA has experienced positive selection in the Ca. funebries branch. Molecular analysis from the chloroplast genome highly supported that Ca. funebris is at the base of Cupressus. Of note, SSR features were found to be able to shed some light on phylogenetic relationships. In short, this chloroplast genomic study has provided new insights into the phylogeny of Ca. funebris and revealed multiple chloroplast genes possibly undergoing adaptive evolution.

is generally 120-160 kb, with a quadripartite structure involving two inverted repeat (IR) sequences that separate the rest of the genome into large and small single-copy regions (Kwon et al., 2020;Wicke et al., 2011). In particular, conifers are one of the few groups that have lost the canonical IR region Kwon et al., 2020;Li et al., 2016;Ping et al., 2020). Compared with nuclear genome and mitochondrion genome, chloroplast genome is characterized by small genome size, high copy number, conservative structure, and moderate nucleotide evolution rate, which has been widely used to understand evolutionary events and to infer efficiently phylogenetic relationships (Barrett et al., 2016;Saina et al., 2018). Similarly, the conservation of chloroplast genes and chloroplast genome structure, gene loss and pseudonymization, gene transfer events, plastid rearrangement, and RNA editing sites make it possible to characterize specific lineages (Bock, 2017;De Santana Lopes et al., 2019;Martin et al., 2014;Vieira et al., 2016).
The repeat sequence, gene sequence, gene content, and GC content of the chloroplast genome can provide more information for phylogeny. Simple sequence repeats (SSRs), also known as microsatellite, are a piece of DNA composed of basic units of 1 to 6 nucleotides that are repeated many times, and it has polymorphism rate at the species level (Cavalier-Smith, 2002). They are widely distributed in different locations in the genome and are generally <200 bp in length. SSR molecular markers are widely used because of their advantages of codominance, high information content, wide coverage, and easy operation. The SSRs of chloroplast genomes of plants contain rich information on genetic variation and have been widely used in crop species identification, genetic diversity, and relationship studies (Chmielewski et al., 2015) and have a broad prospect of application in plant genetics and taxonomy (Jiménez, 2010).
Chloroplasts, as organelles for plant photosynthesis, are closely related to the adaptation of plants to various environments. Some genes that play a key role in photosynthesis may undergo adaptive evolution along with species radiation (Hasegawa et al., 2009;Zhang et al., 2018). Studies have reported that clpP (Erixon & Oxelman, 2008), rbcL (Kapralov & Filatov, 2007), matK , and other chloroplast genes have undergone positive selection. Analysis of adaptive evolution at the genetic level and identification of the key functional sites and amino acid structure can provide evidence for exploring the mysteries of plant evolution, as well as a deeper understanding of genetic structure and function variations (Nei & Kunar, 2000).
Gymnosperms originated more than 300 million years ago and have about 860 species. It includes four of the five main lineages of seed plants: cycads, ginkgos, gnetophytes, and conifers. There are 227 gymnosperm species in China, covering 8 families and 37 genera . Due to the large genome size, high heterozygosity, and long generation time of populations, understanding the evolution of gymnosperms still faces great difficulties in the era of genomics . At the beginning of the 21st century, Cupressus was a hot topic in the conifer systematics, and various suggestions have been proposed to subdivide it into multiple genera (Adams et al., 2009;de Laubenfels, 2009;Farjon et al., 2002;Little, 2006;Mao et al., 2010;Terry et al., 2012). Now cypresses are divided into two groups: Old World cypresses (only Cupressus) and New World cypresses (Cupressus, Callitropsis, and Hesperocyparis). There are about 25 species of Old World cypresses, among which Callitropsis funebris is endemic to China. It is widely distributed, with rapid growth, strong adaptability, and other characteristics. As a precious wood species, it also has medicinal value and is of very important economic values.
The scaly branchlets of Ca. funebris are flat, drooping, with smaller cones and a fewer number of seeds per scale (only 5-6). These characteristics are similar to those of Chamaecyparis and Callitropsis; but the branchlets of Ca. funebris are drooping and homomorphic, unlike those of Chamaecyparis, which are flattened and heteromorphic. The cones of Ca. funebris mature in the second year and have 3-4 pairs of seed scales with 4 pollen sacs in each scale that are similar to those of Cupressus. Zheng and Fu (1978) believed that the main morphological traits of Ca. funebris were consistent with those of Cupressus, and it was more natural to put it into Cupressus. Similarly, Rushforth et al. (2003) divided Ca. funebris into Old World cypresses based on molecular trait analysis. The relationship between Ca. funebris and Chamaecyparis has also been controversial (Farjon, 2005;Gadek & Quinn, 1987;Jagel & Stutzel, 2001 Due to the unique characteristics of Ca. funebris and its controversial phylogenetic location, we start from the chloroplast genome to explore its phylogenetic location and further understand related evolutionary events. So, in this study, the complete chloroplast genome of Ca. funebris was sequenced using next-generation sequencing technology. Phylogenetic relationship of Cupressales was constructed based on a variety of data sets and methods. We also analyzed and compared the chloroplast genome characteristics of cypresses (focus on SSRs) and carried out adaptive evolution analysis under the phylogenetic background of Cupressales.

| │ Sampling
Fresh leaves from Ca. funebris were sampled from the campus of South China Agricultural University (E113°35', N23°15'). The leaves were wrapped in tinfoil and stored in the liquid nitrogen at −80℃ for later use.

| │ DNA extraction and sequencing assembly
The chloroplast genomic DNA was extracted using a new DNAsecure plant genomic DNA extraction kit (TIANGEN), and the DNA library (300 bp) was constructed after fragmentation. Illumina HiSeq2500 platform was used to carry out two-end sequencing with 150 bp reading length. In order to ensure the quality, the original data must be filtered to remove sequences with joints and low quality.

| │ Construction of phylogenetic tree
The chloroplast genome sequences of 31 species of Cupressales were downloaded from the NCBI database (Table 1), and three sequence datasets were constructed, respectively. Datasets were as follows: (a) 32 chloroplast genome sequences, (b) 32 sequences of rbcL and matK (rbcL + matK), and (c) chloroplast genome complete sequences of 14 cypresses and Juniperus monosperma. The Clustal W module in MEGA X (Kumar et al., 2018) was used for sequence alignment and correction, and Jmodeltest 2.1.7 was used to predict the optimal nucleotide substitution model. MEGA X was used to construct a neighbor-joining (NJ) tree. Maximum parsimony (MP) tree was constructed by using PAUP4.0 (Swofford, 2002). RaxmlGUI2 (Stamatakis, 2014) was applied to construct maximum likelihood (ML) tree, and MrBayes3.2.6 (Huelsenbeck & Ronquist, 2001) was used to conduct bayesian inference (BI) tree. Finally, FigTree v1.4.4 was used to draw the phylogenetic tree. The phylogenetic tree for evolutionary analysis was obtained after comparison and correction based on the comprehensive analysis of each tree file and the results of previous studies at the genus level (Gadek et al., 2000;Hao et al., 2016;Lu et al., 2014;Qu, Jin et al., 2017;Qu, Wu et al., 2017).

| │ Adaptive evolutionary analysis
Geneious Prime 2020.1.2 was used to screen the shared genes of the chloroplast genomes of 32 species, and to extract the coding sequences of the genes. The sequences were aligned and corrected using the ClustalW (Codons) module in MEGA X. Sequence files in.fasta/.fas format were converted to .PML format through DAMBE 7.2.1 (Xia, 2017).
Using the codeml program of software PAML 4.9 (Yang, 2007), adaptive evolutionary analysis of common protein-coding genes was performed based on the ML method in the phylogenetic background of Cupressales.

| │ Phylogenetic relationship of cypresses
In order to further determine the phylogenetic relationship of cy-

| │ SSR analysis of cypresses
All cypresses detected a total of 721 SSRs, including a total of 5 repeat types and 8 types of repeat motifs (Figure 4c and Appendix S1).
The number of mononucleotide repeats is the largest, totaling 581   Specifically, there is only one located at ycf1 in Ca. nootkatensis, only one located at rpoB in Ca. vietnamensis, and there are 2-3 located at ycf1 and rpoB in the remaining species (Appendix S1).

| │ Analysis on adaptive evolution of Cupressales
A total of 63 common genes were selected (Table 4)

| │ Callitropsis funebris is more closely relate to the Cupressus than Callitropsis
Phylogenetic relationships among Juniperus and cypresses of Cupressaceae are controversial. Our result showed the 8 topological structures constructed for 32 species are incongruent. Particularly, the phylogenetic relationships from complete chloroplast genome were chaotic (Figure 2a-d). Specifically, the Hesperocyparis and Chamaecyparis did not form a monophyletic branch, respectively.
The possible reason is due to the multiple rearrangements of the genome (especially inversions [Wu & Chaw, 2016]) during the evolution of Cupressales. So it is difficult to align during the alignment process, thereby losing most of the informative sites and causing phylogenetic relationship chaos. In short, genome rearrangement has an impact on sequence alignment analysis. However, we observe that the monophyletic branch of Ca. funebris and Cu. tonkinensis in the four topological structures is always sister groups with Cupressus.
rbcL and matK are used as DNA Barcodes (Hollingsworth et al., 2011), and the obtained phylogenetic relationship has a high degree of recognition. But it is uncertain to resolve deeper branching relationships due to too few sequence differences (Tonti-Filippini et al., 2017). As it appears in the results, there is a parallel relationship among related species (Figure 2e-h). However, the phylogenetic relationship between genera is very clear and consistent with the phylogenetic relationship constructed from nuclear genes (Lu et al., 2014) or plastid genomes (Gadek et al., 2000;Hao et al., 2016;Qu, Jin, et al., 2017;Qu, Wu, et al., 2017). Also, all trees show that the monophyletic branch of Ca. funebris and Cu. tonkinensis is always sister groups with Cupressus. It should be noted that, in addition to the  BI tree, the other three trees all show that cypresses is a monophyletic group, but the support is not high (NJ:67, MP:48, ML:59). This also shows the complexity of cypresses and Juniperus (Zhu et al., 2018).

| │The complexity of cypresses (especially Callitropsis)
In this research, in order to further determine the phylogenetic relationship of the cypresses, the complete sequence of 15 chloroplast genomes with the same gene order was selected. All trees highly support Hesperocyparis and Cupressus to form monophyletic branches, respectively (Figure 4a), which is consistent with the results of Mao et al. (2010) based on nuclear genome sequence and chloroplast genome sequence. Consistent with Figure 2, the monophyletic branch formed by Ca. funebris and Cu. tonkinensis presents as the sister group of Cupressus.
Among them, the most controversial is Callitropsis. Farjon et al. (2002) classified a new cypress species from Vietnam into a new genus-

Xanthocyparis, and a particular cypress species (Cupressus nootkatensis)
into the new genus. Little (2006) (Mao et al., 2010;Terry et al., 2012). This is consistent with the results of this study, indicating that Callitropsis under the existing classification system is not a monophyletic group. Because of this, recent research supports the division of 4 genera in cypresses, namely Callitropsis, Cupressus, Hesperocyparis, and Xanthocyparis (Zhu et al., 2018).
A common factor that interferes with phylogenetic analysis is hybridization events, especially when using cytoplasmic loci. If recombination occurs between different plastid haplotypes, the impact will be more serious (Wolfe & Randle, 2004). In conifers, hybridization may lead to chloroplast capture, nuclear introgression, and phylogenetic inconsistencies between the nuclear and plastid genomes (Peng & Wang, 2008;Sullivan et al., 2017;Xiang et al., 2015).

Value
Genes (Whittle & Johnston, 2002), and genetic leakage has been observed in many Cupressaceae species and other seed plants (Wagner et al., 1991;Weihe et al., 2009). Therefore, cypresses may have undergone some degree of network evolution. In general, the research has enriched the plastid genomics research of cypresses, and we need to combine more effective data sets in future exploration.
During the evolution from Chamaecyparis to cypresses, the chloroplast genome structure has at least 3 insertions and inversions ( Figure 3)

C u n n i n g h a m i a l a n c e o l a t a C a . f u n e b r i s H . b e n t h a m i i Taxodium distichum H. glabra H . lu si ta n ic a
number of inversions between each genera differs by 6 times (Wu & Chaw, 2016). Similarly, Ca. nootkatensis and Ca. vietnamensis should belong to different genera according to the inversion. This seems to be consistent with the views of previous studies (Zhu et al., 2018).
And Ca. funebris should be considered as an independent genus.
Moreover, from the analysis of chloroplast genome (GC content, gene number, and genome structure), the division of Ca. funebris and Chamaecyparis is not disputed.

| │ The distribution pattern of SSRs is beneficial to the phylogenetic relationship study
SSRs are dominated by A/T bases, and very few G/C were found and are mainly located in the IGS region. This is consistent with the existing chloroplast SSR report (Gui et al., 2020;Li et al., 2017

| │ Adaptation changes occurred during the evolution of Cupressales
During the evolution of Cupressales, positive selection branches were detected in three genes, and the positive selection effect of these three genes occurred in the middle and early stages of Cupressales evolution. There are two genes with an obvious difference in selection pressure detected between cypresses and other branches, indicating that the two genes are experienced to strong negative selection effect in the cypresses branch.
To better understand the evolutionary history of Cupressales, the analysis of its genetic diversity and adaptive evolution is essential. Positively selected genes play an important role in the adaption to various environments. Ten genes with positively selected sites were detected in the Site model, seven genes belonged to genetic system gene (rpo-, rps7, rps11, rpl20), 1 photosystem gene (psbM), and two other genes (cemA, ccsA). Rpo-gene encodes RNA polymerase, the most critical enzyme in transcription. Rps7, rps11, and rpl20 encode ribosome 30S small subunit S7, S11 protein, and 50S large subunit L20 protein, respectively, all of which are involved in protein synthesis. PsbM is one of the components of the core complex of photosystem II. cemA encodes a chloroplast envelope membrane protein (Sasaki et al., 1993) and is inferred to indirectly influence CO 2 uptake in plastid (Rolland et al., 1997). The ccsA gene encodes a protein required for heme attachment to c-type cytochromes (Merchant, 1996).
Understanding the patterns of divergence and adaptation among the members of a specific phylogenetic clade can offer important clues about the forces driving its evolution (Li et al., 2019).
In this study, we detected that only the rpoA gene had undergone positive selection in the Ca. funebris branch, and only one positively selected site was detected. This may be due to plants having multiple strategies to adapt to the environment, and the adaptive modification of other abiotic stress-targeted genes in the nucleus is sufficient to maintain photosynthesis homeostasis. As a result, adaptive evolution of chloroplast-encoded genes is not required (Li et al., 2019).
All these genes might play important roles when founder effects occur in populations, both changes in selection pressure and genetic drift may result in the rapid shift of these genes to a new coadapted combination. The results provide an insight into adaptive evolution of Cupressales and a basis for further clarifying the chloroplast genetic characteristics.

| │ CON CLUS I ON S
Morphological Ca. funebris and Chamaecyparis are two different groups. During the evolution of Cupressales, multiple genes were detected to experience positive selection, suggesting that they may have undergone adaptive changes. One problem is that the data for constructing the topological structure only come from the chloroplast genome, which is far from enough. It is necessary to combine the nuclear genome or mitochondrial genome data, which will be more conducive to understanding their evolutionary process.

ACK N OWLED G M ENTS
We thank Ming Zhu for her suggestions on the manuscript and the National Natural Science Foundation for its support (31670200, 31770587, 31872670, and 32071781).

CO N FLI C T O F I NTE R E S T S
The authors declare no conflict of interest.