Loss of a chloroplast encoded function could influence species range in kelp

Abstract Kelps are important providers and constituents of marine ecological niches, the coastal kelp forests. Kelp species have differing distribution ranges, but mainly thrive in temperate and arctic regions. Although the principal factors determining biogeographic distribution ranges are known, genomics could provide additional answers to this question. We sequenced DNA from two Laminaria species with contrasting distribution ranges, Laminaria digitata and Laminaria solidungula. Laminaria digitata is found in the Northern Atlantic with a southern boundary in Brittany (France) or Massachusetts (USA) and a northern boundary in the Arctic, whereas L. solidungula is endemic to the Arctic only. From the raw reads of DNA, we reconstructed both chloroplast genomes and annotated them. A concatenated data set of all available brown algae chloroplast sequences was used for the calculation of a robust phylogeny, and sequence variations were analyzed. The two Laminaria chloroplast genomes are collinear to previously analyzed kelp chloroplast genomes with important exceptions. Rearrangements at the inverted repeat regions led to the pseudogenization of ycf37 in L. solidungula, a gene possibly required under high light conditions. This defunct gene might be one of the reasons why the habitat range of L. solidungula is restricted to lowlight sublittoral sites in the Arctic. The inheritance pattern of single nucleotide polymorphisms suggests incomplete lineage sorting of chloroplast genomes in kelp species. Our analysis of kelp chloroplast genomes shows that not only evolutionary information could be gleaned from sequence data. Concomitantly, those sequences can also tell us something about the ecological conditions which are required for species well‐being.


| INTRODUC TI ON
Brown algae (Ochrophyta) have complex chloroplasts, that is, these organelles are surrounded by four membranes. Evolutionary, this has been explained by the occurrence of a secondary endosymbiosis, whereby a red alga was engulfed by a eukaryote host (Yoon, Hackett, Pinto, & Bhattacharya, 2002). Over time, the red alga was integrated into the host metabolism, thereby losing its complete nuclear genome. Kelps (Laminariales, Phaeophyceae) are large multicellular, highly differentiated marine brown algae.
They can form huge coastal forests, which provide a habitat for microbes, animals, and other algae (Steneck et al., 2002). Thus, they construct an ecological niche dependent on their presence.
Kelp forests are thriving along all temperate to polar rocky coastlines, but some forests also occur in deeper depth below the thermocline in tropical regions (Graham, Kinlan, Druehl, Garske, & Banks, 2007). Habitat ranges of different kelp species can overlap so that they can be present in a common forest. Laminaria species are found in the northern and southern Atlantic and northern Pacific but are not present in the southern Pacific and Antarctica (Lüning, 1990). Laminaria digitata thrives in the Northern Atlantic with a southern distribution boundary in Brittany (France) or Massachusetts (USA) and a northern limit in the Arctic whereas L. solidungula is restricted to the Arctic Ocean and often thrives at lower depths. Here, we wanted to analyze whether these contrasting distribution patterns might also find a reflection in their genomes.
Only a handful of brown algal nuclear genomes have so far been deciphered, namely Ectocarpus siliculosus (Cock et al., 2010), Saccharina japonica (Ye et al., 2015), and Cladosiphon okamuranus (Nishitsuji, Arimoto, & Iwai, 2016), S. japonica being the sole member of kelp species. Thus, it is currently not possible to comparatively examine complete nuclear genomes of kelp species for evolutionary changes and adaptations.
Chloroplast genomes generally have a quadripartite structure with a small and a large single copy region separated by inverted or direct repeats harboring at least the small and large ribosomal RNA subunits but exceptions are also known (Glöckner, Rosenthal, & Valentin, 2000).
The first completely deciphered and annotated chloroplast genome of a kelp species was that of Saccharina japonica (Wang et al., 2013). Two further kelp chloroplast genomes have also been published (Zhang, Wang, Liu, Wang, Chi, et al., 2015a;Zhang, Wang, Liu, Wang, Wang, et al., 2015b), resulting in only three available kelp chloroplast genomes so far. All the three kelp chloroplast genomes are conventionally quadripartite with inverted repeats restricted to the rRNA genes together with a few tRNA genes. Additionally, all three chloroplast genomes were collinear. We here present chloroplast genome data on two Laminaria species (L. digitata and L. solidungula) and compare all five chloroplast genomes. Our analysis reveals general trends of chloroplast genome evolution within kelp species.

| Algal material
Clonal male gametophytes of Laminaria digitata (AWI culture number 3157), originally isolated from Helgoland (North Sea), were cultivated at 8-15°C in sterilized filtered sea water under red light to avoid differentiation and to generate enough vegetative biomass for DNA extraction. Before DNA extraction, the gametophytes were washed three to six times with sterilized filtered seawater every second day to reduce the amount of bacteria in the culture.
Further isolates for the analysis of population differences came from Connecticut, USA (AWI culture number 3380), and Halifax, Canada (AWI culture number 3259), and non clonal vegetative gametophyte material (mixture of both sexes) which had been derived from spores collected in September 2018 at Roscoff and Quiberon (France) were used for DNA extraction and PCR and sequencing of chloroplast regions.
Sporophytes of L. solidungula were initiated from gametophytes (AWI culture number 3130, originally isolated from Kongsfjorden, Spitsbergen). After fertilization of the gametophytes in short day lengths (5:19 hr LD) at 0°C, they were transferred into 16:8 hr LD conditions, 5°C and a photon fluence rate of 40 µmol m -2 s -1 for further cultivation. Resulting sporophytes were sampled for DNA extraction when they had a size of approx. 5 cm. Gametophytes were sent to Cologne under cooled conditions within a working day before extraction. The sporophytes were cleaned with tissue paper and shock-frozen in liquid N 2 before freeze-drying and extraction.

| DNA extraction
After grinding, the tissue under liquid nitrogen DNA of L. digitata gametophytes was extracted from freshly drained material according to Doyle and Doyle modified cetyl trimethyl ammonium bromide method (CTAB; Doyle & Doyle 1990). The material from the freezedried sample of L. solidungula was submitted to the same extraction method.

| Sequencing, assembly, and chloroplast sequence extraction
Total DNA (5 µg) was converted to an Illumina sequencing library and analyzed on an Illumina Hiseq machine. Trimming and further processing were done with the Illumina software suit. Assembly was performed with abyss-pe (Simpson et al., 2009) using kmers 40, 45, and 55. These assemblies were searched for similarity to the S. japonica chloroplast nucleotide sequence (JQ405663). Resulting contigs were used to reconstruct the complete chloroplast genomes by closing gaps with Gapfiller (Boetzer & Pirovano 2012).
PCR on L. digitata isolates was done with forward primer TTCATCAATAAATAAAAGACCACCCATTGC at position 75,636 to 75,665 and reverse primer TTCATCAATAAATAAAAGACCACCCA TTGC at position 76,426 to 76,455. The resulting PCR products were ligated into pGem-T Easy vectors. To be able to discern between polymerase errors and true SNPs, three clones from each ligation were sequenced.

| Phylogenetic analysis
The chloroplast coding sequences of both Laminaria species were identified by blasting the CDS from S. japonica against the respective chloroplast sequences. Nucleotide sequences of the coding sequences were extracted and aligned gene-wise using muscle (Edgar, 2004). The single alignments were inspected by eye and corrected, if needed. Concatenation of all single alignments was done with SCaFoS (Roure, Rodriguez-Ezpeleta, & Philippe, 2007). The concatenated data set was used in a maximum-likelihood approach for phylogenetic reconstruction with a discrete gamma distribution and with 1,000 bootstrap replications in MEGA6 (Tamura, Stecher, Peterson, Filipski, & Kumar, 2013).

| Chloroplast genome analysis
Collinearity of the assembled kelp chloroplast genomes was tested with the nucmer tool of mummer (Kurtz, Phillippy, & Delcher, 2004), and a global alignment was done with MAFFT (Katoh & Standley 2013). The Laminaria chloroplast genomes were annotated using the available kelp chloroplast annotation as a BLAST query. Additionally, we detected tRNAs with the help of tRNA-scan-SE (Lowe & Eddy 1997) by searching all five kelp genomes using the organelle tRNA detection method. SNPs and small insertions/deletions can best be defined using software developed for the analysis of allelic differences in diploid eukaryote genomes. The raw sequence reads from L. digitata and L. solidungula were mapped to the S. japonica chloroplast genome as a reference. The Costaria costata and Undaria pinnatifida chloroplast genomes were downloaded from NCBI, and artificial raw reads were produced using the ArtificialFastqGenerator (Frampton & Houlston 2012). The reads of all chloroplast genomes were mapped to the reference genome using bowtie2 (Langmead & Salzberg 2012) resulting in a sorted bam file. The sequence variants were analyzed with The Genome Analysis Toolkit (Van der Auwera et al., 2013) and the resulting SNP library manually inspected for consistency.

| The chloroplast genomes of L. digitata and L. solidungula
The sequencing total DNA yielded 179 million reads for L. digitata and 150 million reads for L. solidungula amounting to 12.3 and 11.3 gigabases, respectively. After assembly of all reads, we extracted the chloroplast contigs from the total assembly using the Ectocarpus siliculosus chloroplast coding sequences as a bait. Since the coverage of the chloroplast genomes is much higher than that of the nuclear genomes (estimated ~3,000× each for L. solidungula and for L. digitata), the assembly of so many reads results in a very fragmented chloroplast genome. Thus, the extracted chloroplast contigs were extended, scaffolded and the gaps between them were filled by using the original raw read information with the help of Gapfiller (Boetzer & Pirovano 2012). Extensions into the inverted repeats from both sides of the final single contig of each Laminaria species indicated completeness of the chloroplast genomes. We annotated the genomes using the available annotations for the other three kelp genomes and included de novo detection of tRNAs. With this approach, we defined 139 coding sequences each in the genomes and 29 (L. digitata) and 30 (L. solidungula) tRNAs together with three rRNA species (16S, 23S, and 5S) located in the inverted repeats. Since the number of tRNAs thus seems to differ between the chloroplast genomes of kelp species, we further analyzed, which tRNAs were affected by potential evolutionary processes. In total, we defined 36 tRNA locations on the chloroplast genomes of which 27 are located on the same position in all five kelp chloroplast genomes (Table A1).
Of the remaining nine tRNAs, seven are present in only one species, one can be found in two species, and the remaining one is missing in C. costata only. Interestingly, six of the seven orphan tRNAs and the tRNA occurring in two genomes are predicted to contain type II introns.

| The phylogeny of kelp genomes
To be able to trace back the evolution of kelp species, we needed a robust phylogeny of the species analyzed. Thus, we extracted all coding sequences of the chloroplast genomes from Undaria pinnatifida (Zhang, Wang, Liu, Wang, Chi, et al., 2015a), Costaria costata (Zhang, Wang, Liu, Wang, Wang, et al., 2015b), Saccharina japonica (Wang et al., 2013), the two Laminaria species analyzed here, and Ectocarpus siliculosus and Fucus vesiculosus (Le Corguille et al., 2009). All these chloroplast genomes had 137 coding sequences in common, the two open reading frames (ORFs) with undefined functions being restricted to kelp species. After alignment of the coding sequences of the respective individual genes, we concatenated these to yield a combined alignment of 96,570 bases. For the phylogenetic analysis, we used E. siliculosus and F. vesiculosus as outgroups. A model test indicated that the GTR + Gamma model would be best fitting for the data. Using this model with 1,000 bootstrap replications, we generated a phylogeny of the kelp species ( Figure 1). Clearly, the Laminaria species group together, and the bootstrap values of the whole kelp tree indicate that the phylogenetic relationships of the species are well resolved. Sequence variations not following the species tree were also observed (see below) but the phylogenetic signal over the whole plastid genomes seems to be strong enough to be not influenced by them. This phylogeny was then the basis for further analysis of the observable trends in kelp chloroplast genome evolution.

| Alignment to other kelp genomes
We then asked whether the whole chloroplast genomes were alignable, that is, are completely collinear between each other. To this end, we first made a nucmer alignment with the U. pinnatifida genome as reference, which showed that large segments of all chloroplast genomes could indeed be aligned (Figure 2). Only a few regions appear to be rearranged or contain larger insertions or deletions so that the similarity dropped below the 90% threshold. Missing or additional tRNAs are too small to cause such similarity breakpoints as the comparison of tRNA positions (Table A1) and nucmer similarity breakpoint positions shows (Table A2). We then aligned the chloroplast genomes with MAFFT which proved that the nucmer segments aligned in the same order in all chloroplast genomes and that therefore all kelp chloroplast genomes are collinear. However, closer inspection revealed that small rearrangements occurred involving the inverted repeat (IR) regions (Table 1). In comparison with C. costata, S. japonica and U. pinnatifida both Laminaria species have a gene directly adjacent of the IRs translocated to the other copy of the IR (Table 1). In L. digitata rpl21 is affected and in L. solidungula ycf37. Interestingly, ycf37 was presumably pseudogenized during this process in L. solidungula since the N terminal part of the protein is no longer encoded in this gene (Table A3).

| Sequence variation across five chloroplast genomes
The collinearity of the chloroplast genomes allows alignment and definition of sequence variation irrespective of coding, noncoding, or intergenic regions. Since we, however, observed small rearrangements in the Laminaria species, we decided not to use the global alignment for single nucleotide polymorphism (SNP) and insertion or deletion (indel) detection. Instead, we analyzed the sequence variations locally using a 100× coverage of artificial reads each which we mapped to the S. japonica genome. In total, we found 9,218 SNPs and 164 indels. We counted all SNPs from all species in windows of 1,000 bases to examine the SNP distribution over the chloroplast genome ( Figure 3). The F I G U R E 1 Phylogeny of Laminariales species (kelp) in comparison with other brown algae with completely sequenced chloroplast genomes. The tree was rooted with Ectocarpus siliculosus and Fucus vesiculosus. The evolutionary history was inferred by using the maximumlikelihood method based on the general time reversible model (Nei & Kumar 2002;Tamura et al., 2012) with 1,000 bootstrap replications. The tree with the highest log likelihood (−249454.9341) is shown. The initial tree for the heuristic search was obtained by applying the neighbor-joining method to a matrix of pairwise distances estimated using the maximum composite likelihood (MCL) approach. A discrete Gamma distribution was used to model evolutionary rate differences among sites (five categories (+G, parameter = 0.2099)). The rate variation model allowed for some sites to be evolutionarily invariable ([+I], 0.0000% sites). The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. There were a total of 96,570 positions in the final dataset. Evolutionary analyses were conducted in MEGA6 (Tamura et al., 2013) F I G U R E 2 Synteny of the four kelp chloroplast genomes. The assembled genomes were mapped against the Undaria. pinnatifida genome using nucmer (Kurtz et al., 2004) and visualized with Bio:: Graphics (https ://metac pan.org/relea se/LDS/Bio-Graph ics-2.37). Colors for the different chloroplast genomes were chosen arbitrarily. The identity threshold for each segment was 90%, and small hits contained within a larger one were removed including the matches of the second repeat region. The scale represents the U. pinnatifida base positions in kb. The breaks indicate nucmer alignment breaks See Table A2. When gaps between alignments are small, the graphics software shifted the next alignment block to a lower position to emphasize the alignment gap positions SNPs are fairly equally distributed over the whole-genome sequence, only the inverted repeat regions are nearly devoid of sequence variation. This phenomenon was already observed in higher plants (Zhu, Guo, Gupta, Fan, & Mower, 2016). By far, the highest numbers of unique SNPs are present in the genomes of U. pinnatifida and C. costata ( Figure 4). Conversely, the Laminaria species have the largest set of SNPs in common (502)  We then examined the ratio of SNPs between intergenic and genic (i.e., coding regions including RNA genes; Table 2). The ratio of genic to intergenic SNPs ranges from 15% to 19%. The number of detectable SNPs per kb is, however, slightly lower in intergenic compared to genic regions. Since most larger indels reside in the intergenic regions the alignability of these regions is reduced and thus the potential to detect SNPs. Overall, the number of SNPs per kb is comparable between intergenic and genic regions in all species (Table 2).

The distribution of synonymous versus nonsynonymous SNPs
in coding regions is also of interest (Table 3). For this analysis, we calculated for each species the number of SNPs in the two categories and tested, whether those SNPs also occurred in another species. As expected, nonsynonymous SNPs are much rarer than synonymous SNPs indicating purifying selection on the coding sequences. Some codons contain different SNPs in different species, resulting sometimes in the encoding of different amino acids.
These 260 codons therefore seem to be less constrained in terms of exchangeability.
The ratio of nonsynonymous to synonymous SNPs ranges from 10.2% to 18.5% in species and from 0.7% to 18% in species pairs. The partly lower values for species pairs might be caused by a lower likeliness of maintenance of nonsynonymous SNPs in two independent species. Interestingly, S. japonica and U. pinnatifida have the highest ratio of nonsynonymous to synonymous SNPs in their species specific SNPs, which could be due to a less efficient purifying selection TA B L E 1 Chloroplast genome features of kelp species. The inverted repeat (IR) consists of the genes in the order 16S ribosomal RNA, tRNA-Ile, tRNA-Ala, 23S ribosomal RNA, 5S ribosomal RNA. The first row in each cell of the gene order column shows the neighboring genes of the forward repeat and the second row those of the reverse repeat for each species row Species Length (bp)  Table 2) from the aligned reads of the available four kelp species in windows of 1,000 bases were counted and plotted. X-axis: Base count in the S. japonica reference. Y-axis: number of SNPs. The red rectangles indicate the position of the inverted repeats or faster accumulation of mutations than in the other species. By calculating the dN/dS ration, we found no evidence for positive selection (i.e., dN/dS > 1) in any of the coding genes of the chloroplast genomes.

Gene order found at boundaries of the two IR regions Rearrangements
SNP pairs (i.e., mutations adjacent to each other or multinucleotide polymorphisms [MNPs]) are thought to be not always independent (Prendergast, Pugh, & Harris, 2018). We analyzed such pairs in the Kelp chloroplast genomes and found that they are generally rare, but are also partly shared between species (Table 4). Interestingly, these SNPs are equally distributed between genic and intergenic regions.
Since intergenic regions cover a far smaller area of the chloroplast genome, the propensity for this kind of SNPs is to reside in intergenic regions.
To exclude the possibility that population structure and sequence variation impact the SNP analyses, we retrieved L. digitata

| D ISCUSS I ON
The chloroplast genomes of photosynthetic eukaryotes are relatively stable and have a low substitution rate . We have analyzed two kelp species chloroplast genomes and compared them to available genomes of other kelp species. This analysis gives us deep insights into kelp evolution and may help to understand evolutionary processes in this phylogenetic branch.

| Collinearity and stability of the chloroplast genomes
Only one or two tRNA genes are additionally inserted in the otherwise nearly collinear kelp chloroplast genomes. These additionally inserted tRNAs mainly have introns and are only a second copy of a tRNA species. Thus, these tRNAs would be dispensable and might occur and disappear frequently in evolution without affecting the collinearity.

Only in the vicinity of the IRs, we observed translocations of genes in
Laminaria. Such translocations could be connected to double strand break repair and homologous recombination at IR sites as it was also observed in higher plants (Zhu et al., 2016). The translocation of ycf37 in L. solidungula probably led to its defunctionalization since the N terminal part including the start codon of the gene is missing as the alignment indicates (Table A3). No start codon in the 5′ vicinity was found which could be used as alternative start from the ribosome. Further work will have to show whether or not a protein can be produced by this truncated gene locus. Functional analysis of a knockout mutant of ycf37 in Synechococcus revealed its involvement in the building of a specific photosystem I complex, which seems to be required under high light conditions (Dühring, Irrgang, Lünser, Kehr, & Wilde, 2006).
It is possible that this protein is dispensable under the relatively lower light conditions in higher latitudes, for example (Pavlov et al., in press), where L. solidungula thrives exclusively (Roleda, 2016).

| SNP evolution
The evolutionary occurrence of the same mutation at a given location independently in different species is unlikely. Thus, if a SNP is found in two species, it should have the same origin, that is, one mutation event in the course of evolution. Our analysis shows that SNP presence and absence in kelp species chloroplast genomes does not follow the phylogeny; that is, we cannot trace back the first occurrence of a SNP in the phylogenetic tree. Thus, scattered occurrence of a SNP, for example, presence in U. pinnatifida and L. solidungula and absence in the other species does not mean that this SNP was lost in these lineages independently. Rather, this scattered occurrence can most easily be explained by the presence of heteroplasmic chloroplast genomes with homologous recombination between them. Thus, our study reveals for the first time incomplete lineage sorting in kelp species as it was shown in higher plants (Jakob & Blattner 2006;Sabir et al., 2014). The amount of 2018). Here, we could show that such substitutions are rarer in coding sequences than in intergenic regions. The lower amount of multinucleotide mutations per kb in genic regions of the chloroplast genomes is likely due to purifying selection. We observed a variation of Kelp chloroplast genomes in pairwise comparisons of 2.5%-3.3%. For Gossypium (cotton) species, the variation was determined to be at 0.6% (Xu et al., 2012) with a divergence time of roughly 12.5 mya (Wendel et al., 2010).
For Oryza (rice), the variation is 0.36% (Wambugu, Brozynska, Furtado, Waters, & Henry, 2015) with a divergence time of Oryza estimated to be at around 10 mya (Kellogg, 2009). The first Kelp forests occurred in the Miocene around 22 mya together with grass lands. Thus, their evolution started much earlier than the establishment of either rice or cotton families. We therefore Kelp chloroplast genomes seem to evolve at comparable rates as land plant families. TA B L E A 3 Alignment of chloroplast gene ycf37 from different brown algae. The nonhomologous sequence part of Laminaria solidungula is colored in red. The amino acid translation was made from the 70% consensus sequence, where a nucleotide was taken if at least seventy percent of the aligned sequences have the same letter. n in consensus denotes any nucleotide at that alignment position, and u stands for purin bases at that position