Plastomes of nine hornbeams and phylogenetic implications

Abstract Poor phylogenetic resolution and inconsistency of gene trees are major complications when attempting to construct trees of life for various groups of organisms. In this study, we addressed these issues in analyses of the genus Carpinus (hornbeams) of the Betulaceae. We assembled and annotated the chloroplast (cp) genomes (plastomes) of nine hornbeams representing main clades previously distinguished in this genus. All nine plastomes are highly conserved, with four regions, and about 158–160 kb long, including 121–123 genes. Phylogenetic analyses of whole plastome sequences, noncoding sequences, and the well‐aligned coding genes resulted in high resolution of the sampled species in contrast to the failure based on a few cpDNA markers. Phylogenetic relationships in a few clades based only on the coding genes are slightly inconsistent with those based on the noncoding and total plastome datasets. Moreover, these plastome trees are highly incongruent with those based on bi‐parentally inherited internal transcribed spacer (ITS) sequence variations. Such high inconsistencies suggest widespread occurrence of incomplete lineage sorting and hybrid introgression during diversification of these hornbeams.

With the development of high-throughput sequencing technology, it is becoming much cheaper and easier to sequence whole plastomes of plants (Hu et al., 2016; and thus increase the resolution of previously ambiguous phylogenetic relationships based on several cpDNA markers (Hu et al., 2016;Jansen et al., 2007). For example, Zeng et al. (2017) used whole plastome sequences and coding genes to construct phylogenetic trees of Rehmannia, both of which indicated four nearly identical clades and had high levels of phylogenetic resolution. The noncoding regions in a plastome usually have higher variation rates than the coding genes (Hu et al., 2016;. However, it is not known whether phylogenetic trees based on noncoding sequences and coding genes of plastomes of the genus Carpinus would be consistent. Thus, in this study, we sequenced plastomes of nine species representing four major clades of the genus identified in a previous study (Yoo & Wen, 2007). We examined structural variations of the plastomes among the species, extracted three sets of sequences (whole plastomes, noncoding sequences, and coding genes), for phylogenetic analyses and compared the resulting trees with the ITS trees. We specifically addressed the following three questions. Does use of the three plastomic datasets covering more informative sites provide greater phylogenetic resolution of the sampled clades than use of a few cpDNA markers? Are phylogenies based on the three datasets consistent?
Are phylogenies based on plastome datasets consistent with those based on nuclear ITS sequences?

| Plant materials, DNA extraction, and ITS sequencing
We chose nine species (i.e., C. fangiana, C. cordata, C. betulus, C. caroliniana, C. fargesiana, C. tschonoskii, C. putoensis, C. tientaiensis, and C. viminea) to represent the four clades based on ITS sequence variation (Yoo & Wen, 2007). According to Kuang and Li (1979), based on characters of bracts and nutlets, C. fangiana and C. cordata belong to section Distegocarpus, and the other seven species belong to section Carpinus. As C. betulus and C. caroliniana are distributed in Europe and North America, respectively, it was difficult for us to obtain fresh leaves of these species from the field. We therefore used a specimen of C. betulus collected in Dagestan in 1987 and a specimen of C. caroliniana collected in USA in 1996. Fresh leaves of the remaining seven species were collected in the field and dried immediately in the presence of silica gel (Table S1). We could not get any samples of the three species included in one of the ITS clades identified by Yoo and Wen (2007): C. monbeigiana, C. pubescens, and C. turzaninowii. However, our initial analysis of ITS sequences suggested that C. fargesiana is closely related to C. turczaninowii and thus could be used to represent this ITS clade. We selected Corylus fargesii as an outgroup. We used the modified CTAB method to extract total DNA from the dried leaves (Doyle & Doyle, 1987). ITS sequences of four Carpinus species (C. betulus, C. caroliniana, C. putoensis, and C. tientaiensis) and the outgroup species (Corylus fargesii) were downloaded from GenBank, while we sequenced samples from 5 to 10 individuals of each of the other species to obtain their ITS sequences (Table S2).

| Phylogenetic analyses
We aligned the plastome and ITS sequences of the nine selected Carpinus species and the outgroup using MAFFT v.7 (Katoh, Misawa, Kuma, & Miyata, 2002) and MEGA v.6 (Tamura, Stecher, Peterson, Filipski, & Kumar, 2013). The aligned sequence matrix was then manually examined and corrected. To assess the consistency of phylogenetic constructions based on different plastome regions, we extracted three datasets from the finally aligned plastome matrix. These included sequences of: (a) the whole plastomes, (b) noncoding regions, and (c) protein-coding genes (PCGs) present in all nine Carpinus species and the outgroup. We converted FASTA files to NEXUS or PHYLIP format using ClustalW v.2.1 (Larkin et al., 2007). All alignment positions containing gaps in one or more taxa were removed before phylogenetic analyses.
We used Prank v. 6.864b (Loytynoja & Goldman, 2010) to align coding genes. We estimated constant sites, parsimony informative sites, and variable sites of the three plastome datasets and ITS matrix using MEGA v.6 (Tamura et al., 2013). For ITS sequences, we only retained one haplotype if multiple identical haplotypes existed within each species for the phylogenetic analyses.
MrBayes v.3.2.4 (Huelsenbeck & Ronquist, 2001) was used to reconstruct phylogenetic trees. We repeated the MrBayes analyses three times for each of the datasets (i.e., the whole plastomes, noncoding regions, coding genes, and ITS sequences); in each case running four chains (one cold and three hot) of 10,000,000 generations, sampling every 1,000 steps with the temperature parameter set to 0.1. We determined convergence by examining trace plots of the log likelihood values for each parameter in Tracer v.1.6 (Rambaut, Xie, & Drummond, 2014). Maximum-likelihood (ML) analyses were performed with RAxML v.8.1.17 (Stamatakis, 2014) using the GTR + G model of evolution and 1,000 bootstrap replicates to assess node support.
F I G U R E 1 Gene map of the Carpinus betulus plastome, as an example of the nine investigated plastomes. Genes drawn outside of the circle are transcribed clockwise, while those inside the circle are transcribed counterclockwise. The typical small single copy (SSC), large single copy (LSC), and inverted repeats (IRa, IRb) are indicated  (Tables 1, S3).
Most genes occurred in single copy, including 75-77 unique proteincoding genes in the genomes and 18 unique tRNA gene sequences, but there were two copies of all ribosomal RNA genes. Thirteen of the genes were duplicated in the IR regions: four rRNA genes (4.5S, 5S, 16S, and 23S rRNA), four PCGs (rpl2, ycf2, ndhB, rps7), and five tRNA genes (trnI-CAT, trnL-CAA, trnV-GAC, trnR-ACG, and trnN-GTT). There were also three copies of one gene: trnN-GTT. The rps12 gene was a unique trans-spliced gene with three exons. Of the annotated genes, 10 contained a single intron (e.g., atpF CDS, rpoC1 CDS, and trnN-GTT tRNA), and four protein-coding genes had two introns (clpP, ycf3, rpl2, and rps12). The rps19 gene was located in the boundary region between LSC/IRb. Two copies of ycf1 gene were located at the junctions of IRb/SSC and SSC/IRa.
In plastomes of each of the nine species, the overall GC content was about 36.5%, and 55% of the plastomes were coding regions (Table 1). All plastomes showed similar features in terms of gene content, gene order, introns, intergenic spacers, and AT content.
However, some coding genes were pseudogenized or lost.
The ML and Bayesian analyses of each chloroplast dataset resulted in similar topologies, but there were discrepancies between those obtained using the plastome and ITS datasets ( Although the ITS phylogenetic tree also showed high resolution, the topology was mostly incongruent with the phylogenetic trees derived from the plastome datasets (Figure 3b). In the ITS tree, C. betulus, C. tientaiensis, C. putoensis, and C. tschonoskii clustered as one clade, while C. caroliniana grouped with C. viminea, but with low support in both analyses. This pattern of phylogenetic relationships among these seven species is completely incongruent with the patterns in the plastome phylogenetic trees (Figure 3b). Positions of the remaining two species were congruent with the phylogenetic trees based on the whole plastome and noncoding genes datasets, but not the coding genes tree.

| D ISCUSS I ON
Our comparative analyses of plastomes of nine species representing clades identified by Yoo and Wen (2007) (Hu et al., 2016;Zhang, Ma, & Li, 2011). The conserved and wellaligned plastomes across different species therefore facilitate the further phylogenetic analyses and comparisons based on the whole plastomes, their coding regions, and noncoding regions.
Previous studies of the genus Carpinus or related genera based on a few cpDNA markers have consistently failed to resolve phylogenetic relationships of the major clades (Lu et al., 2016;Yoo & Wen, 2007). In contrast, we obtained well-supported clades and all interspecific relationships were well resolved except for those of C. betulus ( Figure 3) by analysis of the whole plastome datasets with more informative sites. It should be noted that we obtained identical topological relationships using the whole plastomes or noncoding datasets. However, results based solely on the coding genes suggested different phylogenetic positions for C. betulus and C. caroliniana (Figure 3), presumably because the whole plastome and noncoding datasets provided more detailed signals for these two species (Hu et al., 2016;Zeng et al., 2017). These findings suggest that it is essential to assess the consistency of phylogenetic relationships based on whole plastomes and both their coding and noncoding regions, as well as their correspondence to phylogenies derived from analyses of nuclear genes or genomes.
The ITS sequences (623 bp) had a much shorter total length than the coding genes in the plastomes (623 and 68,058 bp, respectively), but included a similar number of parsimony informative sites (53 and 66, respectively). Clearly, the difference in mutation rates implies this may influence estimates of interspecific relationships obtained from analyzing these sets of sequences. The relatively rapid mutation and lineage sorting of the ITS sequence may be helpful for discriminating interspecific relationships for genera such as Carpinus (e.g., Lu et al., 2017;Wang, Yu, & Liu, 2011), but in other genera, the ITS sequences may have lower discriminatory power than the chloroplast genes (Hu et al., 2015;Ren et al., 2015). It should be noted that both a single nuclear gene (e.g., ITS) and the plastome (which ultimately represents a single locus) have limited power for resolving a "true" species tree. Multiple, independent nuclear loci or whole genomes would be needed to identify phylogenetic relationships reflecting a "true" species tree, especially when reticulate evolution may have occurred (Hughest, Eastwood, & Bailey, 2006).
The most surprising finding in this study is that the well-resolved phylogenetic relationships based on plastomes substantially differ from those inferred from the nuclear ITS sequences. Interspecific relationships between all the species except the two members of the basal subclade, C. cordata and C. fangiana, are inconsistent with those inferred from the three plastome datasets. Such discordance of gene trees derived from nuclear and organelle markers is common and may be due to two nonexclusive factors (Stenz, Larget, Baum, & Ane, 2015;Suh, Smeds, & Ellegren, 2015;Zwickl, Stein, Wing, Ware, & Sanderson, 2014). First, hybridization and introgression are very common in numerous plants (Mallet, 2007), especially wind- Furthermore, C. putoensis, a 14-ploidy species (Meng, He, Li, & Xu, 2004), is clustered with C. viminea in the plastome trees (Figure 3), implying that C. viminea or a closely related species was the maternal progenitor during the formation of C. putoensis. Its paternal progenitor may be closely related to C. tschonoskii according to the interspecific relationships in the ITS tree, but further studies involving more samples and genetic data are needed to better understand the reticulate evolution of C. putoensis.
The other factor that could lead to inconsistency between gene trees derived from nuclear and organelle markers is incomplete lineage sorting (ILS) through retention of ancestral polymorphism in different species or populations. This may also lead to inconsistent phylogenies based on different markers with contrasting inheritance (Sousa & Hey, 2013;Suh et al., 2015). When the same ancestral allele is sampled from two distantly related species without complete lineage sorting, the resulting phylogeny will be inconsistent with that based on genes or other DNA sequences following speciation is likely that both ILS and hybrid introgression may have been common features of diversifications of the hornbeams. In the future, the genetic evidence from the nuclear genome at the population level will be needed to elucidate the two factors' precise contributions to the inconsistent phylogenies observed here.

ACK N OWLED G M ENTS
We would like to thank Dr. Zhiqiang Lu for his great help in collecting samples in the field and in generating and analyzing data. The manuscript benefited greatly from the comments received from International Collaboration 111 Projects of China.

CO N FLI C T O F I NTE R E S T
The authors declare no conflict of interests.

AUTH O R CO NTR I B UTI O N S
Y.L. and G.R. planned and designed the research. Y.L. carried out the laboratory work and performed the molecular analysis. Y.L. and G.R.
wrote the manuscript with the help of Y.Y., Y.L., and X.D.

DATA ACCE SS I B I LIT Y
The GenBank accessions of the whole plastomes of nine species are listed in Table 1 and GenBank accessions of the new generated ITS sequences can be found in Table S2.