The complete chloroplast genome of Myriophyllum spicatum reveals a 4‐kb inversion and new insights regarding plastome evolution in Haloragaceae

Abstract Myriophyllum, among the most species‐rich genera of aquatic angiosperms with ca. 68 species, is an extensively distributed hydrophyte lineage in the cosmopolitan family Haloragaceae. The chloroplast (cp) genome is useful in the study of genetic evolution, phylogenetic analysis, and molecular dating of controversial taxa. Here, we sequenced and assembled the whole chloroplast genome of Myriophyllum spicatum L. and compared it to other species in the order Saxifragales. The complete chloroplast genome sequence of M. spicatum is 158,858 bp long and displays a quadripartite structure with two inverted repeats (IR) separating the large single copy (LSC) region from the small single copy (SSC) region. Based on sequence identification and the phylogenetic analysis, a 4‐kb phylogenetically informative inversion between trnE‐trnC in Myriophyllum was determined, and we have placed this inversion on a lineage specific to Myriophyllum and its close relatives. The divergence time estimation suggested that the trnE‐trnC inversion possibly occurred between the upper Cretaceous (72.54 MYA) and middle Eocene (47.28 MYA) before the divergence of Myriophyllum from its most recent common ancestor. The unique 4‐kb inversion might be caused by an occurrence of nonrandom recombination associated with climate changes around the K‐Pg boundary, making it interesting for future evolutionary investigations.

The genetic relationships also do not readily facilitate identification as previously published molecular phylogenies are lacking (Moody & Les, 2010). Commonly used markers for determining phylogenetic relationships include the nuclear-encoded internal transcribed spacer (nrITS) and numerous chloroplast DNA markers (Moody & Les, 2007, 2010Thum, Zuellig, Johnson, Moody, & Vossbrinck, 2011). Therefore, it is necessary to select more appropriate phylogenetically informative regions.
The sequencing of whole chloroplast genomes (cp genome), which are haploid and maternally inherited, have the potential to significantly advance our ability to resolve evolutionary relationships in complex plant lineages, such as Myriophyllum (Doorduin et al., 2011;Philippe & Roure, 2011). The plant cp genome is generally conserved in content and structure. It is usually composed of two copies of inverted repeats (IR) that separate a large single copy region (LSC) from a small single copy region (SSC). Highly conserved genes (100-120) have been retained in the cp genome, including those for photosynthesis, self-reproduction, transcription of chloroplast expression-related genes, and some unknown genes (Wicke, Schneeweiss, Depamphilis, Müller, & Quandt, 2011). Despite being much more conservative than the nuclear and mitochondrial genomes, the cp genome still varies in size, contraction and expansion of IRs, and structure (Daniell, Lin, Yu, & Chang, 2016). Moreover, many mutation events in the cp genome have been detected including indels, substitutions, and inversions (Chumley et al., 2006). These evolutionary hotspots can provide useful information to elucidate the phylogenetic relationships of taxonomically unresolved plant taxa. Kim, Choi, and Jansen (2005) confirmed the Barnadesioideae as the most basal lineage in the Asteraceae by using a 22-kb DNA inversion. The close relationship between the Poaceae and Joinvilleaceae was clarified by treating three DNA inversions composed of a nested set as a phylogenetic character (Doyle, Davis, Soreng, Garvin, & Anderson, 1992). Some variations in the cp genome, like gene loss and transfer, have been used to determine the evolutionary history of some plant species. For example, the extreme loss of ndh genes observed in Najas flexilis was used to illustrate a modified character associated with photosynthetic efficiency (Peredo, King, & Les, 2013).
In this study, we sequenced the complete cp genome of M. spicatum (Figure 1). The cp genome was then compared with previously published cp genomes from related species, allowing the identification of a noteworthy inversion. Phylogenetic analyses were then performed on Saxifragales spp. to determine the point at which the inversion in the cp genome of Myriophyllum occurred. Finally, we evaluated the sequence divergence between Myriophyllum and other clades in Haloragaceae. We investigated potentially useful plastid regions for future molecular phylogenetic analyses in Saxifragales with observation on the variation of chloroplasts at different molecular markers (exon, intron, and intergenic regions). These data provide insight into the evolutionary history of this cosmopolitan family and, in the future, will facilitate the identification of Myriophyllum spp.

| Plant materials and DNA extraction
The taxa sampled in this study are shown in Table 1. All specimens were deposited in Wuhan Botanical Garden, Chinese Academy of Sciences in China. The total DNA of all samples were isolated from the fresh leaves according to the mCTAB method (Li, Wang, Yu, & Wang, 2013).

| Chloroplast genome sequencing, mapping, and annotation for M. spicatum
The whole cp genome of M. spicatum was sequenced. The DNA sequencing library of M. spicatum was prepared following the method described by Dong, Xu, Cheng, Lin, and Zhou (2013) and , and fragments were amplified using universal primers. Specific primers were designed for regions, such as poly-A tails, that were insufficiently amplified using the universal primers. The inverted repeat regions (IRs) of the cpDNA were not amplified separately; instead, primers were designed to amplify the regions spanning the junctions of LSC/IRA, LSC/IRB, SSC/IRA, and SSC/IRB. Using these primers, we covered the entire cp genome of M. spicatum with PCR products ranging in size from 500 bp to 5 kb.
The overlapping regions of each pair of adjacent PCR fragments exceeded 150 bp. The standard PCR amplification reactions were performed at 94°C for 4 min followed by 35 cycles of 30s denaturation F I G U R E 1 The Myriophyllum spicatum L. (Haloragaceae, Myriophyllum), a perennial submerged aquatic plant widely distributed in Europe, Asia, and north Africa at 94°C, 30s annealing at 55°C, 1.5 min extension at 72°C, and a final extension of 72°C for 10 min. PCR products were electrophoresed on a 1.0% agarose gel and purified with gel extraction kit (Omega Bio-Tek). The amplified DNA fragments were further sent to Majorbio Bio-Pharm Technology Co. Ltd. (Shanghai, China) for Sanger sequencing in both the forward and reverse directions according to their standard protocols on an ABI 3730xl DNA Analyzer.
All fragments were sequenced 2-10 times (6-fold coverage of the M. spicatum cp genome on average). The chloroplast DNA sequences were manually assembled by using of the program Sequencher v4.1.4 (Gene Codes Corporation, USA). Since automated assembly methods cannot distinguish two IRs, we input the reads as two groups and obtained two large contigs, with each contig including one IR and its adjacent partial large and small single copy (LSC and SSC) regions. Then, the two large contigs were manually assembled into the complete circular genome sequence.
The cp genome of M. spicatum was annotated using the online program Dual Organellar Genome Annotator (DOGMA; Wyman, Jansen, & Boore, 2004). All tRNA genes were further verified by the corresponding structures predicted by tRNAscan-SE 1.3.1 (Schattner, Brooks, & Lowe, 2005). The graphical map of the circular plastome was drawn by GenomeVx (Conant & Wolfe, 2008).

| Comparative genomic analysis
To determine structural variation of the cp genome, the newly se-

| Identification of the inversion by PCR screening and sequencing in Myriophyllum and close relative Gonocarpus
To determine the origin of the inversion observed in M.  M. dicoccum, M. heterophyllum, M. lophatum, M. oguraense, M. quitense, M. sibiricum, M. tenellum, M. ussuriense, M. variifolium, M. verrucosum, M. verticillatum), and Gonocarpus (G. micranthus; listed in Table 1 The standard PCR amplification reactions were performed at 94°C for 2 min followed by 35 cycles of 1 min denaturation at 94°C, 1 min annealing at 55°C, 2 min extension at 72°C, and a final extension of 72°C for 7 min. PCR-amplified DNA was purified using the QIAquick PCR purification kit and then checked on 2% agarose gels after staining with ethidium bromide. The purified products were sequenced by Sangon Biotech (Shanghai, China). Sequence assemblies and alignments followed the abovementioned methods.
In total, chains were run for 5,000,000 generations, with trees sampled every 1,000 generations. The first 25% of sampled generations were discarded as burn-in, and the remaining trees were used to calculate majority-rule consensus trees and posterior probabilities for nodes. Akaike information criterion (AIC) via Modeltest v3.7 (Posada & Crandall, 1998) was used to determine the most appropriate model of nucleotide evolution, supporting the use of GTR + I+G.

| Molecular dating
Molecular dating analyses were run in BEAST package v1.7.5 (Drummond & Rambaut, 2007) using the combined ITS, matK, trnK, rpoB-trnE, and trnC-trnT matrix. The analysis followed the dating strategies in Chen et al. (Chen et al., 2014). The GTR + I + G model was selected as the best fit for the data by Mrmodeltest v2.3 (Nylander, 2004

| General characteristics of the M. spicatum cp genome
The The genome contains 113 unique genes including 30 tRNA genes, four rRNA genes, and 79 protein-coding genes ( Table 2). Genes involved in photosynthesis and transcription and translation were the two dominant families. There were six genes coding the subunits of ATP synthase and 11 genes associated with the subunits of NADH dehydrogenase.
The genome consists of 58% coding regions and 42% noncoding regions, including both intergenic spacers and introns. A total of 26,316 codons represent the coding capacity of 79 protein-coding genes in the genome. The frequency of codon usage was calculated based on the sequences of protein-coding genes and tRNA genes, which are summarized in Table 3. Codon usage frequency demonstrated that

F I G U R E 2
The whole assembly of the chloroplast genome of M. spicatum. The inverted repeats (IRa, IRb) were indicated in thick black lines on inner cycle which separate the genome into the large (LSC) and small (SSC) single copy regions. The genes drawn outside of the circle are transcribed counterclockwise, while those inside are clockwise. Gene boxes are colored by functional group as shown in the key. The red arrows denote the location of the 4-kb inversion  leucine is the most common amino acid with 2,812 codons (10.69%), while cysteine is the least common with 299 codons (1.14%).

| Repeat analysis
A total of 38 repeats were found including 21 direct (forward) repeats, 15 inverted (palindrome) repeats, one reverse repeat, and one complement repeat (Table S1). The longest repeat is a 51-bp inverted repeat between the rbcL and accD. Most of the repeats are distributed within the intergenic spacer regions, the intron sequences, and ycf1 and ycf2. Cp microsatellites (cpSSRs) are potentially useful markers for detection of polymorphisms (Provan, Powell, & Hollingsworth, 2001); therefore, the distribution of SSRs was also analyzed, and 260 SSRs were identified in total. Among the identi-  (Table S2).
The locations of repeat sequences and SSRs are shown in Figure 3.

| Comparison of genome organization in Saxifragales
To understand the structural characteristics in the cp genomes of M. spicatum, L. formosana, P. obovata, P. chinense, and S. sarmentosum, and broadly, Saxifragales, the size, gene content, and organization of the cp genomes were sampled for comparative analysis.
The characters of the genomes from the abovementioned species are listed in Table S3 as a reference (Figure 4). This showed general conservativeness among the five species but with some highly varied regions, including ycf1, rps16, ndhA, and accD, occurring as the most divergent coding genes.
The exact borders between the IR regions and the two single copy regions (LSC and SSC) were also compared to investigate the contraction or expansion of the IR regions ( Figure 5). We found that the IR/SSC boundary regions were slightly varied. The genes marking the beginning and end of the IR were only partially duplicated.
Specifically, 2-110 bp of rps19 (except for in P. chinense, which was entirely located in the LSC) and 1,065-1,164 bp of ycf1. The rps19 TA B L E 2 Genes present in Myriophyllum spicatum chloroplast genome

Category Group of genes Genes
Photosynthesisrelated genes (47) Rubisco (1) (Zhao et al., 2018). Only the ycf1 pseudogene was detected across the SSC/IRa border in the five Saxifragales species, which might be caused by a duplication of the normally single copy gene ycf1.

| Occurrence of the unique lineagespecific inversion
No stop codons were detected in the coding sequence of ycf1; thus, we hypothesize that the expansion of the IR was caused by a duplication of ycf1, which occurred in the common ancestor of these species in Saxifragales.  were detected in intergenic regions yet are also highly variable in coding regions such as ycf1, rps16, ndhA, and accD. These highly variable regions may be useful as specific DNA barcodes for species-level identification, as well as provide genetic markers for resolving relationships among Saxifragales. Over 260 SSRs were identified in this study, which could be candidates for future inferences on population genetics and help to trace the origin of invasive populations (Provan et al., 2001). Moreover, these SSR markers could be used for genetic diversity studies on closely related species in Haloragaceae.
Normally, plastomic rearrangements in flowering plants are rare . Most photosynthetic angiosperms have a highly conserved plastome organization, except a small number of groups among major lineages, especially the Campanulaceae, Fabaceae, and Geraniaceae, which exhibit remarkable and extensive rearrangements (Jansen & Ruhlman, 2012;Mower & Vickrey, 2018)). In this article, a 4-kb inversion was identified in all Myriophyllum species sampled and therefore likely provides an informative marker that highlights an additional synapomorphy supporting the monophyly of Myriophyllum.
Moreover, the activity of repetitive elements has often been considered to be associated with plastome rearrangement and recombination (Lu et al., 2017;Weng, Blazier, Govindu, & Jansen, 2013). Regarding the trnE-trnC inversion in Myriophyllum, a flip-flop recombination event might have contributed to its occurrence (Figure 8a). This detectable rearrangement of sequences has occurred during the evolution of Myriophyllum, possibly playing an important role in the maintenance of the structural stability of the chloroplast genome (Palmer & Thompson, 1982;Wolfe, Li, & Sharp, 1987).
The 4-kb inversion was detected in G. micranthus, a species in a genus closely related to Myriophyllum (Chen et al., 2014).
Our results are congruent with the previous phylogenetic analysis among families of Saxifragales (Jian et al., 2008;Moody & Les, 2010;Dong, Xu, Cheng, Lin, et al., 2013;. The 4-kb inversion was identified in all of the in- million years ago based on dated genome data (Vanneste, Baele, Maere, & Yves, 2014). Thus, we speculate that the 4-kb inversion might be caused by an occurrence of nonrandom recombination associated with climate changes around the K-Pg boundary (Kaiho et al., 2016;Vellekoop et al., 2015). Additional whole chloroplast genome sequences from species in Haloragaceae should be obtained to construct larger phylogenetic trees to further test this presumption. In addition, more functional investigations are also needed to provide a more comprehensive understanding of divergence history and the influence of climate change on the novel 4-kb inversion.

This study was supported by the Strategic Priority Research
Program of Chinese Academy of Sciences (Grant No. XDB31010104) and the National Natural Scientific Foundation of China (31500457, 31870206 and 31670369).

CO N FLI C T O F I NTE R E S T
The authors declare no conflict interest.

AUTH O R CO NTR I B UTI O N S
TW and FL designed the study and modified manuscript. FL and YYL conducted the sequence analyses and drafted the manuscript. YL and RWM performed the experiments and analyzed the data. XL and TFL collected the samples. All authors read and approved the final manuscript.

DATA AVA I L A B I L I T Y S TAT E M E N T
The complete chloroplast genome of Myriophyllum spicatum has been deposited in GenBank (Accession Number: MK250869).