Metagenome-assembled genome of the glacier alga Ancylonema yields insights into the evolution of streptophyte life on ice and land

genomic basis for adaptations to ice and to land in streptophytes. Comparative genomics revealed that the reductive morphological evolution in the ancestor of Zygnematophyceae was accompanied by reductive genome evolution. (cid:1) This ﬁrst genome-scale data for glacier algae suggests an Ancylonema -speciﬁc adaptation to the cryosphere, and sheds light on the genome evolution of land plants and Zygnemato-phyceae.


Introduction
The evolution of terrestrial flora (Embryophyta) (de Vries & Archibald, 2018) transformed Earth's continents, atmosphere and climate (Lenton et al., 2016;Bengtson et al., 2017;Leebens-Mack et al., 2019;Delaux & Schornack, 2021;Bowles et al., 2023), and promoted the diversification of multiple lineages spanning the tree of life (Lutzoni et al., 2018;Cortona et al., 2020).The first land plants evolved from a streptophyte algal ancestor (Wickett et al., 2014;de Vries & Archibald, 2018), the last common ancestor of Anydrophyta, with timescale analyses inferring a likely Neoproterozoic divergence between Embryophyta and its closest Zygnematophycean algal relatives, with the youngest estimates falling within the Ediacaran period (Morris et al., 2018), and others dating this divergence as far back as the Mesoproterozoic (Su et al., 2021).To make the fundamental transition from an aquatic to terrestrial environment required adaptations to tolerate extremes in temperature, desiccation and UV radiation (de Vries et al., 2016(de Vries et al., , 2018a(de Vries et al., ,b, 2020;;de Vries & Archibald, 2018;Bowles et al., 2020Bowles et al., , 2021)).It is likely that ancestral streptophytes possessed a suite of exaptations (preadaptations) available to be co-opted to these ends.Studying closely related extant streptophytes could provide significant insight into the genomic capacity of ancestral lineages and is therefore critical to unravelling the processes involved in land plant terrestrialisation (de Vries & Archibald, 2018;Donoghue et al., 2021).
Streptophyte glacier algae live on the surfaces of contemporary glaciers and ice sheets, and proliferate in widespread algal blooms during summer melt seasons when sunlight and liquid water are available to power photosynthesis (Yallop et al., 2012;Williamson et al., 2018Williamson et al., , 2020;;Holland et al., 2019;Cook et al., 2020).To inhabit icy environments, glacier algae must balance their requirements for photosynthesis and growth with tolerance of extremes in temperature, desiccation and UV radiation (Williamson et al., 2018(Williamson et al., , 2020)), raising interesting parallels with the adaptation of plants to life on land.Glacier algae also belong to the Zygnematophyceae, the sister lineage to all embryophytes (Williamson et al., 2019).One currently untested hypothesis suggests that plants moved onto land in the aftermath of global glaciations in the Cryogenian, whereby ice provided an intermediate habitat between water and land (Williamson et al., 2019;Z arsk y et al., 2022).Considering the niche of modern-day glacier algae, and their phylogenetic proximity to embryophytes, these species thus represent an important model system to explore adaptations to extreme conditions and potential processes of plant terrestrialisation.
Here, we describe and analyse the metagenome-assembled genome (MAG) of the glacier alga Ancylonema nordenski€ oldii to investigate the genomic basis of its adaptations to life on ice and the process of plant terrestrialisation more broadly.Specifically, we explore whether modern-day adaptations of glacier algae represent exaptations derived from an anydrophyte ancestor, which would support the hypothesis that Cryogenian glaciations were a major driver of land plant evolution, or whether these adaptations emerged more recently within the glacier algal lineage.The MAG also enables comparative analyses of genome evolution in Zygnematophycaeae, to test the recent hypothesis (Hess et al., 2022;Bowles et al., 2024) that gene loss has dominated their evolutionary history.As sequence data for glacier algae has historically been limited and with successful culturing protocols only very recently established (Jensen et al., 2023;Remias & Proch azkov a, 2023), our metagenome-assembled genome represents a significant step forward in our genomic understanding of glacier algae and their evolutionary history.

Study site and sample collection
Communities of ice-inhabiting glacier algae (Ancylonema nordenskioldii Berggren) were sampled from the surface ice of Morteratsch glacier, Switzerland, in August 2020.Surface ice was collected using a presterilised sampling ice-saw directly into sterile Whirl-Pak bags and maintained frozen at À20°C during transport to the University of Bristol, UK, whereby samples were held at À80°C before genomic DNA extraction as below.

Library construction and sequencing
For short-read sequencing, genomic DNA was extracted using the DNeasy PowerSoil Pro kit (Qiagen, Hilden, Germany) and assessed with Qubit (Thermo Fisher Scientific, Waltham, MA, USA), BioAnalyser (Agilent, Santa Clara, CA, USA) and Femto Pulse (Agilent).Samples were purified through bead-based clean up and polymerase chain reaction-free libraries were prepared by the Earlham Institute.Short-read libraries were sequenced with Illumina NovaSeq 6000 system in paired-end mode.In total, 369.4 Gb raw data were generated, with 333.7 Gb remaining after filtering by Trimmomatic (Bolger et al., 2014).
For long-read sequencing, samples were ground in liquid nitrogen for 15 min and genomic DNA was extracted following a previously described protocol (Auber & Wisecaver, 2023) and assessed with Qubit (Thermo Fisher Scientific), BioAnalyser (Agilent) and Femto Pulse (Agilent).The NERC Environmental Omics Facility sequenced long-read libraries with PacBio, producing 22.2 Gb raw data.

Genome assembly
Short-(Illumina, San Diego, California, USA) and long-read (PacBio, Menlo Park, CA, USA) datasets were assembled with hybrid assembler OPERA-MS (Bertrand et al., 2019).A minimum read length of 3000 bp was specified.As the sequenced samples represented communities of organisms, a metagenomic approach was used to computationally extract streptophyte algal data, rigorously minimising the likelihood of contamination.Eukrep (West et al., 2018) was used to classify and filter eukaryote and prokaryote contigs.Within the eukaryotic reads, Kraken2 (Wood et al., 2019) was used to identify and filter 9979 streptophyte contigs, discarding data from other taxonomic groups (e.g.fungi).Multiple methods were then used for genome binning, including Maxbin (Wu et al., 2014), Metabat2 (Kang et al., 2019), BinSanity (Graham et al., 2017) and Concoct (Alneberg et al., 2014).These were evaluated with the BUSCO Eukaryota dataset (Waterhouse et al., 2018) to identify Metabat2 as the best binning tool (Supporting Information Fig. S1), as well as to assess completeness.

RNA sequencing and assembly
RNA for Illumina RNA-Seq was extracted with the RNEasy PowerSoil Total RNA kit (Qiagen) following the manufacturer's protocol.The quality of RNA was assessed using an Agilent BioAnalyzer (Agilent).The Bristol Genomic Centre sequenced RNA-Seq libraries with Illumina HiSeq 2500 system in paired-end mode.RNA sequencing data were mapped against our streptophyte algal MAG with HISAT2 (Kim et al., 2019), using default parameters.Mapped reads were then assembled with Trinity (Grabherr et al., 2011), with default parameters.

Gene prediction
The MAKER-P (Campbell et al., 2014) pipeline was used for gene annotation in two rounds, incorporating multiple annotation sources.First, assembled RNA reads were mapped to the genome using TopHat2 (Kim et al., 2013).Homology-based gene prediction was completed with the protein-coding genes of closely related species including Mesotaenium endlicherianum (Cheng et al., 2019), Penium margaritaceum (Jiao et al., 2020) and Spirogloea muscicola (Cheng et al., 2019).Gene models obtained from RNA-aided and homology-based pipelines were used to train the de novo prediction pipeline, SNAP (Korf, 2004).A final MAKER-P run, combining all these sources, was used to annotate genes for the Ancylonema nordenskioldii MAG (Campbell et al., 2014).Genome completeness was assessed with BUSCO Eukaryota (Waterhouse et al., 2018).Functional annotation of protein-coding genes was completed with InterproScan (Jones et al., 2014).

Distinguishing species of microalgae
Morphological analysis has identified two common species of glacier algae, unicellular Ancylonema alaskanum and filamentous A.
nordenskioldii.Marker based analysis of rbcL and 18S genes demonstrated that these species of Ancylonema are very closely related (Proch azkov a et al., 2021); the only published genetic data for glacier algae to date.BLAST comparison, using default parameters, of previous rbcL and 18S gene sequences from Ancylonema alaskanum and filamentous A. nordenskioldii against the MAG finds best hits against the same contig (Tables S1-S4).The best hits identified our MAG as A. nordenski€ oldii (Tables S1-S4).Additional analysis was used to assess that the MAG derived from a single species.GC content analysis, using SeqKit (Fig. S2; Shen et al., 2016), and kmer analysis, using Jellyfish (Fig. S3; Marcais & Kingsford, 2011), were conducted.The function of duplicated BUSCO Viridiplantae genes were also assessed, using functional categories in OrthoDB (Table S5; Zdobnov et al., 2021).

Phylogenetic analysis
We utilised the latest transcriptome and genome data, as well as the predicted protein-coding genes from the MAG, to infer the evolutionary history of streptophytes.Specifically, these included Zygnematophyceae transcriptomes from the one thousand plant transcriptomes project (1KP) and genomes across the plant tree of life (Figs S4-S6).A benchmark of 30% BUSCO genes missing was used to filter high quality data (Waterhouse et al., 2018).Two datasets were produced, the first constructed based on genome only data from across the green plants and the second constructed from Zygnematophyceae data from the 1KP project and complementary Anydrophyta genome data.
For both datasets, ORTHOFINDER (v.2.3.7) was used to cluster protein-coding genes into orthogroups (Emms & Kelly, 2019), based on sequence divergence, using default settings (orthofinder -f data_folder).Single copy orthologs were identified using a previously described python script (Harris et al., 2020), which removes paralogous genes from orthogroups.The script enables the user to specify a minimum taxonomic occupancy of each orthogroup, set at 70%.Single copy orthologs were aligned using Mafft (Katoh et al., 2002) using -auto parameter and trimmed with Trimal (Capella-Guti errez et al., 2009) using the -automated1 parameter.Two complementary approaches were used to reconstruct phylogenies.In a first concatenation approach, multiple sequence alignments were concatenated using Phyutility to create a supermatrix (Smith & Dunn, 2008).A bootstrapped maximum likelihood phylogeny was inferred using IQ-Tree (Nguyen et al., 2015) using the Bayesian Information Criterion (BIC) to select best fitting substitution model and empirical profile mixture models (C10-C60).1000 ultrafast bootstrap replicates were used.In the second coalescent approach, individual bootstrapped maximum likelihood phylogenies were inferred using IQ-Tree as described above.These individual phylogenies were then summarised into a species tree using Astral (Zhang et al., 2018).

Gene family evolution analyses
The gain, loss, expansion and contraction of gene families across the green plant genomes was analysed (Tables S6-S11).Using the dataset containing green plant genomes, Count (Cs} u€ os, 2010) was used to analyse the gain and loss of genes with Dollo parsimony using the concatenation based phylogenetic tree produced above and the orthogroup count table from Orthofinder as input.CAF E (Bie et al., 2006) was used to analyse gene family expansion and contraction, using the below time-calibrated phylogeny and the orthogroup count table from Orthofinder as input using default parameters.This approach is commonly used to understand gene family evolution in deeply diverging taxa (Bowman et al., 2017;De Clerck et al., 2018;Nishiyama et al., 2018;Cheng et al., 2019;Lemieux et al., 2019;Wang et al., 2019;Jiao et al., 2020;Li et al., 2020).
Protein domains were also assessed with Interproscan (Jones et al., 2014).The function of gained, lost, expanded and contracted gene families were visualised with REVIGO (Figs S7-S16; Tables S12-S15; Supek et al., 2011) using Gene Ontology terms from Interproscan.While analyses of GO terms have limitations, they can provide some insight into gene function, particularly for genes found outside model organisms.

Evolution of key gene families
The occupancy of gene families involved in key biological processes were queried within the outputs of Orthofinder.Glacier algae are found across the cryosphere and tolerate extremes of temperature, desiccation and UV radiation.Thus, our investigation of the Orthofinder outputs focused on transcription factors, phytohormone biosynthesis and signalling, circadian rhythm, secondary metabolite biosynthesis and photosynthesis (Figs S17-S19; Tables S16-S23).Further inspection by inference from bootstrapped maximum likelihood phylogenies was conducted using IQ-Tree (Nguyen et al., 2015) with the Bayesian Information Criterion (BIC) to select best fitting substitution model and empirical profile mixture models (C10-C60).1000 ultrafast bootstrap replicates were used.
For the analysis of ice-binding proteins, reciprocal BLASTs were conducted between the protein-coding genes of the A. nordenski€ oldii MAG against Uniprot ice-binding proteins, using default parameters and specifying a single best hit.The reciprocal BLAST analyses were compared and matching hits were recorded (Table S20).

Divergence time estimation
For molecular clock analysis, we utilised the aligned and trimmed 472 gene dataset from the green plant genome phylogeny as input into MCMCtree (Yang, 2007).Node distributions using minimum and maximum constraints were specified, with full phylogenetic and age justifications listed in Notes S1.These calibrations derive from previous critical reviews of the fossil record (Harris et al., 2022;Bowles et al., 2024).To specify the prior distributions on node ages, all calibrated nodes were given a hard minimum age and a soft maximum age.
Initially, molecular clock analyses were run without sequence data to obtain effective time priors, to ensure that the calibration densities and time priors were appropriate.The single copy Ó 2024 The Authors New Phytologist Ó 2024 New Phytologist Foundation New Phytologist (2024) www.newphytologist.comorthogroups were divided into 4 partitions according to their evolutionary rate, based on total tree length in IQ-Tree (Nguyen et al., 2015) and grouped using k-means clustering in R (R Core Team, 2014).A relaxed clock model was used (Uncorrelated; Independent Gamma Rates).Given the protein dataset, branch lengths were first estimated using codeml (Yang, 2007).The tree topology was fixed based on the focal maximum likelihood analysis above and was analysed using the normal approximation method in MCMCtree (Yang, 2007).After a burn-in of 10 000 generations, parameter values were saved every 20 th generation until 20 000 cycles were saved (400 000 generations total).Trees were plotted using MCMCTreeR (Puttick, 2019).

Results
The metagenome of the glacier alga Ancylonema nordenski€ oldii A single glacier algal MAG was assembled using a combination of PacBio High-Fidelity (HiFi) long reads and Illumina short reads; in total, these were assembled into 9979 contigs (25 KB contig N50) generating an estimated genome size of 170 megabases (Mb).While the contig number is high compared to streptophyte algae sequenced from cultures (Hori et al., 2014;Cheng et al., 2019), this is the first genome-scale data for glacier algae and therefore an important dataset to understand their adaptations and evolution.We predicted 19 593 protein-coding genes in the glacier algal MAG while BUSCO (Waterhouse et al., 2018) analysis (Eukaryota_odb10) suggested a genome completeness of 86.7% (Fig. S1).Analysis revealed that 46% of present BUSCO genes were duplicated, which is comparable to other Zygnematophyceae genomes sequenced from cultured samples including Spirogloea musicola (75%) and Penium margaritaceum (22%).This set of 19 593 protein-coding genes was used for downstream analyses.
Current knowledge based on morphology and limited amplicon sequencing data (18S and rbcL) recognises two species of glacier algae, the filamentous Ancylonema nordenski€ oldii and the unicellular Ancylonema alaskanum (Williamson et al., 2019).Samples for the present study were collected in 2020 from Morteratsch glacier, Switzerland; a site known for the dominance (70% relative abundance) of A. nordenski€ oldii (Mauro et al., 2020).Reciprocal BLAST analysis of glacier algal 18S and rbcL marker genes against our contigs identified our MAG as A. nordenski€ oldii (Tables S1-S4).Additional analysis of GC content and kmers did not identify a bimodal distribution of contigs (Figs S2 , S3).Duplicated BUSCO genes were mostly involved in enzymatic processes as opposed to housekeeping genes, suggesting proliferation of metabolism in a single organism, rather than multiple housekeeping genes, which are less prone to duplication, deriving from multiple related organisms (Table S5).Therefore, we concluded that the MAG contained material from a single species, A. nordenski€ oldii, and proceeded conservatively with downstream analyses on that basis.Phylogenetic analysis placed our A. nordenski€ oldii MAG within the Zygnematophyceae, the sister group to land plants, with which they comprise the group Anydrophyta (Fig. 1).Within the Zygnematophyceae, A. nor-denski€ oldii is placed within the Zygnematales (Fig. 1).Further analysis placed Anydrophyta within the Streptophyta, with Chlorophyta sister to the latter group (Fig. S4).

Glacier algal adaptations to life in ice
Comparative genomic analysis combining the A. nordenski€ oldii MAG with previously published data indicated that glacier algae adapted to the cryosphere through lineage-specific diversification of existing genetic pathways (Fig. 2a-c; Tables S6-S15).The 19 593 protein-coding genes from our A. nordenski€ oldii MAG were clustered into 6242 orthogroups (OGs) using Ortho-Finder (Emms & Kelly, 2019).Gene ontology (GO) analysis of 2195 expanded gene families, identified with CAFE (Bie et al., 2006), revealed functions associated with water transport (e.g.lipid localisation), protein repair (e.g.autophagy, PSII associated light-harvesting complex), response to abiotic stimulus (e.g.response to high light intensity, UV, radiation), wax metabolic process (e.g.lipid biosynthesis and modification) and plant-type cell wall modification (e.g.chloroplast organisation; Figs 2b, S7; Tables S6, S8, S12).Due to the identification of the wax metabolic process term, the evolution of the cuticle biosynthetic machinery (Kong et al., 2020) was investigated, which identified the DGAT family in streptophytes, including A. nor-denski€ oldii (Table S24).
Our analysis also identified extensive gene loss and gene family contraction during the evolution of glacier algae (Fig. 2).Gene losses and gene family contractions outweighed gains and expansions (Fig. 2), suggesting extensive gene turnover with the evolution of glacier algae.Gene ontology analysis of 1583 contracted gene families indicated a reduction in functions associated with phenylpropanoid metabolism, clathrin coat assembly and cellular catabolic process (Figs 2c, S9; Tables S10, S14).GO terms associated with the 2184 lost gene groups, identified with Count (Cs} u€ os, 2010), mirrored the contracted gene families of A. nordenski€ oldii.These lost genes were associated with phenol-containing compound metabolism, clathrin coat assembly and cellular catabolic assembly (e.g.cell wall organisation) as well as regulation of development (e.g.cell-cell signalling; Figs 2a, S10; Tables S11, S15).Together, these suggest a loss of intracellular trafficking, via clathrin coat assembly, and a reduction in chemical degradation pathways.
Despite the conspicuous phenolic pigmentation of A. nordens-ki€ oldii, a known physiological adaptation to life in surface ice (Remias et al., 2012a,b;Williamson et al., 2020), genes involved in the biosynthesis of many phenylpropanoids and phenolic compounds were lost or became reduced in copy number in this MAG (Figs S9, S10), although several were retained (de Vries et al., 2021; Table S25).Comparative genomic and phylogenetic analysis instead suggested that lineage-specific gene diversification of a particular pathway underpinned the novel screening pigmentation of A. nordenski€ oldii (Fig. 3).In glacier algae, purpurogallin pigments absorb ultraviolet and visible light, providing photoprotection against high levels of UV irradiance associated with life in supraglacial surface ice (Remias et , 2020).Purpurogallin accumulation has also been provisionally associated with protection against low temperatures, protection of the photosynthetic machinery and tolerance to desiccation (Williamson et al., 2019).While genes involved in the purpurogallin biosynthetic pathway were found across the green plants, one component, dehydroquinate dehydratase/shikimate dehydrogenase (DHQD/SD), was present at a high copy number (9) in A. nordenski€ oldii; greater than in any other green plant species (Fig. 3; Table S16).DHQD/SD catalyses the dehydration of dehydroquinate (DHQ) to dehydroshikimate (DHS) and the reduction of dehydroshikimate (DHS) to shikimate (Bontpart et al., 2016;Lynch, 2022), leading to the spontaneous synthesis of gallic acid.Phylogenetic analysis demonstrated that the expansion of this gene family was specific to A. nordenski€ oldii (Fig. 3c).
Aside from purpurogallin, comparative genomic analysis did not find any A. nordenski€ oldii specific gene radiations in relation to well-characterised light screening pathways, including UV-A or UV-B stress (Table S17).This is further supported by genetic analysis of photosystem I and II here, showing no significant patterns of expansion or contraction in known genes in the photosynthetic machinery (Table S18).
Analysis of ice-binding proteins suggested that cold adaptations of A. nordenski€ oldii represent a unique adaptation of  S50).
Ó 2024 The Authors New Phytologist Ó 2024 New Phytologist Foundation New Phytologist (2024) www.newphytologist.comglacier algae, rather than an ancient exaptation derived from Anydrophyta (Tables S19, S20).Reciprocal BLAST of Uniprot ice-binding proteins from across the tree of life (e.g.fungi, bacteria) revealed 1194 OGs with hits against A. nordenski€ oldii (Table S19).Based on the taxonomic occupancy of OGs, many were distributed across the green plant phylogeny (Table S20).These included a protein kinase superfamily, ATP-binding cassette protein family and heat shock protein family.Only 35 of these 1194 OGs were novel to Anydrophyta, indicating a relatively small evolutionary response to cold stress.Additionally, A. nordenski€ oldii had the highest copy number amongst all green plants for some gene families, including a lipase, ATP-dependent Clp protease and Cytochrome C Oxidase.
Further analysis of land plant cold tolerance pathways did not find any gene family expansion in A. nordenski€ oldii (Fig. S17; Table S21), instead indicating that a core element of land plant cold tolerance emerged after the split with Zygnematophyceae.
Cold stress signalling in land plants is coordinated by the CBF (C-REPEAT BINDING FACTOR)-COR (COLD REGU-LATED) signalling pathway (Ding et al., 2019).Comparative genomics revealed that the majority of this pathway predated the origin of Anydrophyta (Table S21).However, ICE (INDUCER OF CBF EXPRESSION), a key regulator of CBFs, emerged in the ancestor of land plants.ICE, also known as SCRM (SCREAM), is involved in stomatal development (Chater et al., 2017), and may have evolved in land plants as an adaptation to both cold and drought stresses.Due to their importance in abiotic stress responses, the evolution of the biosynthesis and signalling of phytohormones was also investigated at a broad phylogenetic level (e.g.presence in Anydrophyta and land plants).Similar to previous analyses (Bowman et al., 2017;Nishiyama et al., 2018;Bowles et al., 2020;Jiao et al., 2020), we demonstrated these phytohormone genes are mostly estimated to originate before the transition of plants onto land (Table S22; Fig. S18).CBF/DREBs (DEHYDRATION RESPONSIVE ELEMENT BINDING) are transcription factors that bind to DRE/CRT cis-acting elements in response to abiotic stress (e.g.drought, low temperatures).As such, the evolution of DREBs and all other transcription factors was investigated (Table S23; Fig. S19), with all major families being found in the A. nordenski€ oldii MAG, consistent with previous studies (Catarino et al., 2016;Lai et al., 2020;Bowles et al., 2022).

Timescale of glacier algal evolution
Timescale analysis based on 472 genes from the genomes of 24 green plant species suggested that Ancylonema nordenski€ oldii split from its closest living algal relative 520-455 million years ago (Ma) during the Cambrian-Ordovician (early Phanerozoic).Consistent with previous estimates (Morris et al., 2018;Nie et al., 2020;Harris et al., 2022), our analyses also suggested that Anydrophyta emerged 703-623 Ma likely during the Cryogenian-earliest Ediacaran, a period of dynamic environmental change characterised by two major global glaciations (Fig. 4) (Stern et al., 2006;Hoffman et al., 2017).The above comparative genomic and phylogenetic analyses have suggested that adaptations to life in ice are lineage-specific to Ancylonema.As such, further comparative genomics (Tables S6, S7, S26-S49) was used to investigate patterns and functions of genome evolution in Anydrophyta, land plants and Zygnematophyceae, and potential exaptive Cryogenian evolution.

Genome and gene family evolution
Analysis of gene family evolution highlighted increasing genome complexity of Anydrophyta and land plants, contrasting with large-scale genome reduction in the Zygnematophyceae (Fig. 2;  S36, S37).Further reductive evolution was also prevalent within the Zygnematophyceae, in the ancestor of Zygnematales (500 lost & 683 contracted OGs) and the common ancestor of Ancylonema nordenski€ oldii and Zygnema species (419 lost & 274 contracted OGs; Fig. 2a; Tables S6, S7).
Analysis of gene ontology (GO) terms and Pfam domains documented how ancestral anydrophytes and land plants evolved increasingly complex anatomies to establish in terrestrial environments (Fig. 2d,e, Tables S38, S42).Gained OGs in Anydrophyta were associated with responses to gravity, polysaccharide metabolic process (e.g.xyloglucan metabolism) and anatomical structure development (e.g.cell differentiation; Figs 2d, S12).In land plants, gained OGs were associated with hydrotropism, cutin biosynthesis (e.g.meristem development), plant-type secondary cell wall biogenesis and system process (e.g.vascular and phloem transport; Figs 2e, S14).
Gene family expansions resulted in an increased number of stress response genes in the first Anydrophyta and in land plants (Fig. 2a; Table S39, S43).Expanded OGs in Anydrophyta were linked to water transport, response to abiotic stimulus (e.g.responses to salt, osmotic stress, heat, cold) and plastid organisation (Fig. S11).Land plant expanded OGs were associated with regulation of development, intercellular transport and response to abiotic stimulus (responses to salt, temperature, osmotic stress; Fig. S13).
Our GO analysis suggested that extensive loss and contraction of gene families in the first Zygnematophyceae may have contributed to the reductive evolution of their simple morphology from a more complex ancestor (Fig. 2; Tables S48,  S49).The OGs lost from Zygnematophyceae were associated with regulation of molecular function (e.g.regulation of cell communication), lignan biosynthesis and plasma membrane fusion (e.g.extracellular matrix and structure organisation; Figs 2f, S15).Contracted OGs in Zygnematophyceae were linked with regulation of molecular function (e.g.auxin polar transport), responses to endogenous stimuli (e.g.salt stress, hypoxia, biotic stimulus), regeneration (e.g.multicellular organism development) and extracellular matrix organisation (Figs 2g, S16).

Glacier algal adaptation to ice
Several lines of evidence suggest here that glacier algal adaptation to life in ice are lineage-specific and do not derive from ancient exaptations of an ancestral Cryogenian anydrophyte.These include the lack of gained or expanded OGs in Anydrophyta linked to cold tolerance, the emergence of a core element of land plant cold tolerance after the split with Zygnematphyceae, and the absence of gene family expansion of land plant cold tolerance genes in A. nordenski€ oldii (Fig. S17; Table S21).Our work thus argues against the previous hypothesis that ice may have provided an intermediate habitat between water and land during processes of plant terrestrialisation (Williamson et al., 2019;Z arsk y et al., 2022).
By contrast, analysis of our A. nordenski€ oldii MAG highlighted low gene gain (276 genes gained), suggesting exaptive evolution of a zygnematophycean alga to glacial environments (Figs 2a, S8; Tables S7, S9, S13).Expanded gene families suggested that glacier algae gained adaptations to key abiotic stressors, principally high-light and UV stress, consistent with their high-light surface ice environment (Williamson et al., 2020), potentially achieved through duplication of genes involved in light sensing and photodamage repair.These expansions were not seen in known light screening pathways (e.g.UV-A, UV-B, PSI, PSII) suggesting genes in A. nordenski€ oldii for high light stress derive from outside these well-characterised pathways.Importantly, gene duplication followed by neofunctionalisation may have enabled glacier algae to synthesise their novel purpurogallin pigment that underpins their dominance of surface ice environments (Williamson et al., 2018(Williamson et al., , 2019(Williamson et al., , 2020)).The high production of purpurogallin within glacier algal cells, that is to 11-times the cellular content of Chl a (Williamson et al., 2020), may have made other phenolic and phenylpropanoid-based compounds functionally redundant, explaining the loss and contraction of genes important for secondary metabolite biosynthesis highlighted here (Fig. 2c).The lack of evolutionary signal in relation to alternate light screening mechanisms supports previous assertions that purpurogallin provides the bulk of photoprotection within glacier algal cells, while chloroplasts remain typically light-adapted for green algae (Williamson et al., 2020).Indeed, Ancylonema nordenski€ oldii has been shown to be very well protected against high light, with oxygen production apparently unaffected up to 2000 lmol photons m À2 s À1 (Remias et al., 2012a,b).While purpurogallin is produced in high quantities in glacier algae, other members of the Zygnematophyceae synthesise sunscreens with similar chemical composition to gallic acid (Busch & Hess, 2022).For example, Zygogonium erictorum contains an unusual phenolic compound, a glycosylated ferric Fe3 + (gallate)2 complex, giving this alga a distinct purple colour (Aigner et al., 2013).Thus, more work is required to understand the evolution and genetics of phenolic biosynthesis in the Zygnematophyceae.

Timescale of glacier algal evolution
Our divergence time estimates of 520-455 Ma for the split of Ancylonema nordenski€ oldii from its closest relatives emphasises the sparseness of genome-scale sampling in this important region of the streptophyte tree (Fig. 4).This estimate suggests either that no other close relatives have survived to the present day, or that close relatives have not yet been identified.While no longterm glaciations occurred during this period, this estimate is predated and postdated by a number of glacial episodes (e.g. the mid Ediacaran Gaskiers, Late Ordovician and the late Carboniferous glaciations) (Pohl et al., 2016), potential drivers of glacier algal evolution.Additionally, regardless of their exact origination point, this represents a long evolutionary time in which to assemble glacial adaptations.While several studies produce an older estimate for crown Anydrophyta (Su et al., 2021;Yang et al., 2023), they require that key fossils (e.g.Proterocladus antiquus, Bangiomorpha pubescens) are assigned to derived phylogenetic positions that, in our view, are not justifiable based on the phenotypic evidence preserved (Notes S1).

Genome and gene family evolution
The divergent genomic trajectory of land plants and Zygnematophyceae identified here, mirror their morphological trajectories (Hess et al., 2022;Bowles et al., 2024) and suggest that genome streamlining underpins the reductive morphological evolution of Zygnematophyceae.Contrasting patterns of genome evolution between closely related groups have been seen elsewhere in the tree of life (Paps & Holland, 2018;Guijarro-Clarke et al., 2020;Harris et al., 2022;Ocaña-Pallar es et al., 2022), highlighting the role of gene gains as well as losses in driving phenotypic evolution (Clark, 2023).Our gene ontology analysis indicated that gene gains in Anydrophyta and land plants were associated with anatomical structural development.A greater repertoire of drought stress response pathways emerged in Anydrophyta and land plants, which potentially enabled them to tackle temperature and osmotic stress of the Cryogenian through to the Cambrian.These gained genes could derive from de novo formation from noncoding sequences, shuffling from a combination of existing domains or through horizontal gene transfer as recently identified in land plants (Bowles et al., 2020;Ma et al., 2022;Xue et al., 2023).In Zygnematophyceae, gene losses were linked to multicellular development.As extant Zygnematophyceae are unicellular or filamentous (Hess et al., 2022), the loss and contraction of gene families involved in extracellular matrix organisation suggests a clear molecular signature of the evolution of their simple morphology.

Conclusions
Production of the first genome-scale dataset for ice-inhabiting glacier algae demonstrated their unique adaptation to life in ice and served to reject recent hypotheses that processes of terrestrialisation dating to the Cryogenian involved the adaptation of ancestral streptophytes to ice as an intermediate habitat between water and land.Instead we highlight the more recent (Ordovician -Cambrian) divergence of glacier algae from their closest sequenced relatives and their exaptive evolution to life in ice surrounding novelties of high-light tolerance and specialised pigment production.Corresponding to their divergent morphology, we identify the divergent genomic trajectories of Zygnematophyceae and land plants, which functionally link to the loss of multicellularity and gain of abiotic stress tolerance, respectively.Our analysis, therefore, adds to the growing body of work demonstrating the stress tolerance capabilities of the common ancestor of land plants and Zygnematophyceae (de Vries et al., 2016, 2018a, 2020;Zhao et al., 2019;Becker et al., 2020), eventually leading to the establishment of plants on land.Indeed, this bears out the expectation that the freshwater algal relatives of embryophytes were confronted by much the same challenges as their land plant relatives, and the adaptive solutions that they established primed stem-embryophytes for life on land (Donoghue & Paps, 2020).Table S1 BLAST hits against Ancylonema nordenski€ oldii rbcL.

Table S5 Duplicated Benchmarking Single Copy Orthologs genes and their functions.
Table S6 Gene family expansion and contraction analysis with CAF E.
Table S7 Gene gain and loss analysis with COUNT.
Table S12 Gene Ontology Analysis for expanded genes in Ancylonema nordenski€ oldii.
Table S13 Gene Ontology Analysis for gained genes in Ancylonema nordenski€ oldii.
Table S15 Gene Ontology Analysis for lost genes in Ancylonema nordenski€ oldii.
Table S16 Occupancy of purpurogallin biosynthesis genes.
Table S17 Occupancy of light adaptation genes.
Table S18 Occupancy of Photosystem I & II genes.
Table S19 Taxonomic occupancy of potential ice-binding proteins.
Table S20 BLAST hits against ice-binding proteins.
Table S21 Occupancy of cold stress genes.
Table S22 Occupancy of phytohormone biosynthesis and signalling genes.
Table S23 Occupancy of transcription factors.
Table S24 Occupancy of wax metabolic genes.
Table S25 Occupancy of phenylpropanoid biosynthesis genes.
Table S26 Expanded genes in Anydrophyta.
Table S27 Gained genes in Anydrophyta.
Table S29 Lost genes in Anydrophyta.
Table S30 Expanded genes in land plants.
Table S31 Gained genes in land plants.
Table S32 Contracted genes in land plants.
Table S33 Lost genes in land plants.
Table S34 Expanded genes in Zygnematophyceae.
Table S35 Gained genes in Zygnematophyceae.
Table S36 Contracted genes in Zygnematophyceae.Table S38 Gene Ontology Analysis for expanded genes in Anydrophyta.
Table S39 Gene Ontology Analysis for gained genes in Anydrophyta.
Table S40 Gene Ontology Analysis for contracted genes in Anydrophyta.
Table S41 Gene Ontology Analysis for lost genes in Anydrophyta.
Table S42 Gene Ontology Analysis for expanded genes in land plants.
Table S43 Gene Ontology Analysis for gained genes in land plants.
Table S44 Gene Ontology Analysis for contracted genes in land plants.
Table S45 Gene Ontology Analysis for lost genes in land plants.

Fig. 2
Fig. 2 Comparative genomics of anydrophyte evolution.(a) Gene family gain, loss, expansion and contraction across anydrophyte evolution.Key nodes are indicated on the tree with orange circles.(b-g) Gene Ontology analysis of (b) expanded Ancylonema (nordenski€ oldii) orthogroups (OGs), (c) contracted Ancylonema (nordenski€ oldii) OGs, (d) gained Anydrophyta OGs, (e) gained land plant OGs, (f) lost Zygnematophyceae OGs and (g) contracted Zygnematophyceae OGs.Terms are grouped into clusters of related terms.The size of the rectangles relates to the frequency of the GO term, whilst colours have no meaning.

Fig. 4
Fig. 4 Timescale analysis of Ancylonema nordenski€ oldii.Analysis of the timescale of the evolution of green plants, based on a concatenated alignment of 472 genes.Timescale in millions of years is at the upper panel of the figure.The nodes are positioned on the mean age, and the bars represent the 95% highest posterior density.Intervals of geologic time are highlighted at the lower panel, abbreviated as follows: C., Cambrian (green); C., Carboniferous (turquoise); C., Cretaceous (light green); D., Devonian; J., Jurassic; Me., Mesoproterozoic; P., Permian; T., Triassic.Ancylonema nordenski€ oldii is highlighted in yellow.Taxonomic group are denoted on the side, with Chlorophytes abbreviated to Chloro.

Fig. S11
Fig. S11 Reduce and Visualise Gene Ontology plot for expanded gene families in Anydrophyta.

Fig. S12
Fig. S12 Reduce and Visualise Gene Ontology plot for gained gene families in Anydrophyta.

Fig. S13
Fig.S13Reduce and Visualise Gene Ontology plot for expanded gene families in land plants.

Fig. S14
Fig.S14Reduce and Visualise Gene Ontology plot for gained gene families in land plants.

Fig. S15
Fig.S15Reduce and Visualise Gene Ontology plot for contracted gene families in Zygnematophyceae.

Fig. S16
Fig. S16 Reduce and Visualise Gene Ontology plot for lost gene families in Zygnematophyceae.

Fig. S17
Fig.S17The evolution of land plant cold stress signalling pathways.

Fig. S19
Fig. S19 The evolution of transcription factors.Notes S1 Fossil calibrations for Molecular Clock Analysis.

Table S37
Lost genes in Zygnematophyceae.

Table S46
Gene Ontology Analysis for expanded genes in Zygnematophyceae.

Table S47 Gene
Ontology Analysis for gained genes in Zygnematophyceae.

Table S48 Gene
Ontology Analysis for contracted genes in Zygnematophyceae.TableS49Gene Ontology Analysis for lost genes in Zygnematophyceae.TableS50Phylopic links.Please note: Wiley is not responsible for the content or functionality of any Supporting Information supplied by the authors.Any queries (other than missing material) should be directed to the New Phytologist Central Office.