Epipelagic microbiome of the Small Aral Sea: Metagenomic structure and ecological diversity

Abstract Microbial diversity studies regarding the aquatic communities that experienced or are experiencing environmental problems are essential for the comprehension of the remediation dynamics. In this pilot study, we present data on the phylogenetic and ecological structure of microorganisms from epipelagic water samples collected in the Small Aral Sea (SAS). The raw data were generated by massive parallel sequencing using the shotgun approach. As expected, most of the identified DNA sequences belonged to Terrabacteria and Actinobacteria (40% and 37% of the total reads, respectively). The occurrence of Deinococcus‐Thermus, Armatimonadetes, Chloroflexi in the epipelagic SAS waters was less anticipated. Surprising was also the detection of sequences, which are characteristic for strict anaerobes—Ignavibacteria, hydrogen‐oxidizing bacteria, and archaeal methanogenic species. We suppose that the observed very broad range of phylogenetic and ecological features displayed by the SAS reads demonstrates a more intensive mixing of water masses originating from diverse ecological niches of the Aral‐Syr Darya River basin than presumed before.


| INTRODUC TI ON
The Aral Sea, an inland sea in Central Asia, has a multi-level cyclical history, dating back to about thirty million years, as part of the Paratethys Ocean (Boomer et al., 2000). It was once the world's fourth-largest lake with an area of 68,000 km 2 . Since the 1960s, the Aral Sea has been shrinking remarkably and recently has been almost depleted by extensive use for irrigation. Nowadays, it represents a highly labile hydrological system consisting of the Small Aral Sea (northernmost part) and the Large Aral Sea (southern part, which is divided into the Eastern Large Aral and the Western Large Aral), with varying degrees of salinity and a maximum of about 40 m of depth in the north part (Izhitskiy et al., 2016;Ltolle et al., 2005). Large spatial and temporal gradients of salinity ranging from 9 g/L to 92 g/L (Izhitskiy et al., 2016) in the Aral Sea create unique and challenging living conditions for all inhabitants, including microorganisms.
In the last several years, there have been many studies documenting the level of salinity, temperature fluctuations, and other physicochemical properties of various parts of the Aral Sea basin (Gaybullaev et al., 2012;Izhitskiy et al., 2014;Rafikov & Gulnora, 2014). Izhitskiy et al. reported that the Aral Sea water bodies, depending on their location and the season, can exhibit very different vertical structures, ranging from a fully mixed to a strongly stratified one. The authors also underlined the dramatic differences in the physical and biological regimes among the different residual basins.
Microbial diversity/structure is a key factor in ecological resilience. Analysis of the Aral Sea microbial communities, an understanding of their features, and their distribution (in the context of environmental conditions) is of paramount importance for environmental monitoring and long-term remediation strategies in the region (Izhitskiy et al., 2016;Namsaraev, 2018;Shurigin et al., 2019;Stulina et al., 2019). However, a significant part of the lake's microbial community remains unexplored, mainly due to the limited possibilities for the cultivation of microorganisms isolated from hypersaline water.
This study aimed to describe and to analyze the microbial community from one location of the Small Aral Sea (SAS) using metagenomic approaches. This study should complement and extend the data collected in the Large Aral Sea, reported in 2019 by Shurigin et al. (2019). Starting from this pilot run, a group of different SAS locations can be sampled and analyzed in the future. Finally, this study should also contribute to our knowledge concerning the spectrum of microbial metabolic activities of the SAS marine ecosystem. The marine environment remains an immense and mostly untouched source of exclusive microbial metabolites that might be used in novel biotechnological and medical applications (Beygmoradi et al., 2018;Tian & Hua, 2010;Viesser et al., 2020).

| Water samples collection
Three water samples (20-22°C, salinity 8.1 g/L, approx. 10 L each) were taken from a 1-m depth of the Small Aral Sea (SAS) in sterile containers at 1-week intervals during May 2019. The near-coastal sampling site (46°37′22.0″N 61°28′25.0″E) was chosen as representative, to better capture the epipelagic microbial diversity ( Figure 1). This location is considered as ecologically restored and is currently officially used for commercial fishing.

| Water samples processing
Immediately after collection, the seawater samples were filtered through a cellulose filter with a diameter of 300 mm and with a pore size of 3 μm to remove zoo-and phytoplankton. A similar sample treatment approach has been recently used in the studies by Reddington and by Brumfield who filtered the collected water samples through 1.2 µm and 0.6 μm pore size filters, respectively (Brumfield et al., 2020;Reddington et al., 2020). However, we realize that the removal of suspended solids inevitably depletes the samples of certain microorganisms. The filtrate was then concentrated to a final volume of 500 ml using the tangential flow filtration device Vivaflow 200 (Sartorius Co., Germany) with a 200 cm 2 polyethersulfone membrane. The samples were pooled and centrifuged at 100,000 g for 2 h at 4°C using Avanti J30I ultracentrifuge (Beckman Coulter, USA). The pellet was resuspended in a minimal volume of phosphate-buffered saline and used for DNA extraction.

| Isolation and quantification of nucleic acids
Total DNA was isolated from the sample using the Pure Link genomic DNA extraction kit (ThermoFisher Scientific, USA) and stored at −80°C. Quantitative DNA measurements were performed using the Qubit dsDNA HS (high sensitivity) kit and Qubit 3.

| The preparation and purification of genomic libraries
DNA libraries were prepared from 1.0 ng of the obtained dsDNA using the Nextera XT DNA Sample Preparation Kit (Illumina, USA) following the manufacturer's instructions. In brief, the preparation of the libraries included enzymatic fragmentation of DNA, ligation of sequence adapters, preliminary amplification of the library, selection of fractions of the desired length, and clonal amplification of the selected library.
The step "selection of fractions of the required length" was carried out using the Agencourt AMPure XP paramagnetic bead system (Beckman Coulter Corp. USA), capable of binding 100 bp and longer DNA fragments. Excess primers, nucleotides, salts, and enzymes were removed by washing with freshly prepared 80% (v/v) C 2 H 5 OH.

| Genomic library quality analysis and bioinformatics
The obtained sequences were analyzed using the 2100 Bioanalyzer system with the DNA 1000 Kit (both Agilent Technologies Inc., USA). For size-based separation of nucleic acid fragments, they were electrophoretically driven through an interconnected set of specially designed gel-filled microchannels. The libraries were sequenced using the Illumina MiSeq platform (San Diego, California, USA) using the MiSeq Kit v3, allowing 300-bp pair-end readings.
The quality of the resulting sequences was tested using the Fast Quality Control (GFastQC) function (https://www.bioin forma tics. babra ham.ac.uk/proje cts/fastq c/). Low-quality reads were excluded, and adapters were trimmed using the Trimmomatic tool (Babraham Bioinformatics, 2019;Bolger et al., 2014). The LCA (lowest common ancestor) algorithm was used for binning short reads onto the nodes of a given taxonomy (such as the NCBI taxonomy), based on alignments.
The presumptive functions and metabolic clustering of the epipelagic microbiota of SAS were analyzed using the KEGG (Kyoto  Encyclopedia of Genes and Genomes) orthology database and the database of Clusters of Orthologous Groups of proteins (COGs).

| Species taxonomy and diversity analysis
Further bioinformatic processing of the obtained metagenomic data was performed using the Geneious Prime 2019 software (https:// www.genei ous.com/prime -featu res/) and the Kaiju program (http:// kaiju.binf.ku.dk/), intended for precise taxonomic classification of readings from high-throughput metagenomic and metatranscriptomic sequencing. Each read was assigned to a node in the NCBI taxonomy and was labeled by a taxon and the number of matching reads. Due to its protein level classification, the Kaiju algorithm usually achieves higher sensitivity compared to the nucleotide-based methods (Breitwieser et al., 2019;Kearse et al., 2012;Menzel et al., 2016).
The taxonomy used in this study is mainly based on the List of

| Phylogenetic and ecological features of the major bacterial groups of the SAS
The reads assigned to Terrabacteria and Proteobacteria together comprised 84.49% of the identified major cluster sequences.
Terrabacteria is presumably the most ancient bacterial group of the SAS and includes both Gram-positive and Gram-negative strains.
Most of the identified sequences within the group belonged to the taxa Actinobacteria, Cyanobacteria, Thermi (Deinococcus-Thermus), Chloroflexi, Tenericutes, and Firmicutes ( Figure 2b). These phyla share similarities in cellular membrane composition and have many common features in the oxidation/electron transport pathways.
Actinobacteria, which dominated the SAS Terrabacteria, are well known for their ability to decompose "problematic" nutritional substrates, such as chitin, chitosan, and cellulose, and thus play an important role in the trophic chains of the SAS (Souza et al., 2011).

The second-largest bacterial phylum of the Small Aral Sea
Proteobacteria was represented in the samples by six taxa, among them a large family of Enterobacteriaceae (belongs to Gammaproteobacteria) including many marine rhizospheric microorganisms ( Figure 2c). This ecological group of microorganisms dominated by Proteobacteria inhabits marine sediments and forms distinct communities usually consisting of Enterobacteria, Acidobacteria, Actinobacteria, Nitrospirae, Deltaproteobacteria, and Chloroflexi similar to those found in terrestrial environments (Sogin et al., 2019).
According to the same authors, members of these taxonomic groups contribute to the core microbiome living in marine rhizospheres and are predictive of the presence of seagrasses.
Many rhizospheric microorganisms were previously mentioned as typical for the marine plastisphere (Zettler et al., 2013). In this study, we applied the contig-LCA algorithm on the MG-RAST server, which finds a single consensus taxonomic entity for all features on each sequence. Indeed, the rhizospheric microorganisms' signatures abundantly appeared on the genus level ( Figure 3).
Among the most ecologically interesting Proteobacteria found in the epipelagic Small Aral Sea waters were aerobic chemotrophs Acidithiobacillia and Zetaproteobacteria that use iron and sulfur compounds as their only energy source. Due to their narrow metabolic specialization, these organisms may play a pivotal role in the circulation of Fe/S compounds in the SAS ecosystem.
The FCB group (Bacteroidetes) unites heterotrophic Gramnegative rod-shaped bacteria capable of gliding locomotion. These bacteria possess multi-enzyme systems helping to utilize virtually any organic substrates as carbon and energy sources. The members of Bacteroidetes display a wide range of other physiological adaptations that allow them to succeed in very diverse aquatic ecosystems (Gupta, 2004). Noteworthy was also the finding in the SAS-samples sequences specific for the novel class Ignavibacteria, belonging to Bacteroidetes. These anaerobic moderately thermophilic bacteria are typically isolated from microbial mats at terrestrial hot springs (Iino et al., 2010). Therefore, their identification in the epipelagic microbiome of the SAS was surprising.
However, Chlamydiae may be of great practical interest due to their importance as human and animal pathogens (Sachse et al., 2009).
Verrucomicrobia have a low population density but display wide distribution in various freshwater and marine habitats. Some Verrucomicrobia possess genes encoding nitrogen fixation and sulfate utilization pathways (Wertz et al., 2012). Their heterotrophic mainly carbohydrate-decomposing metabolism and the predominantly epibiotic and symbiotic lifestyles imply that these bacteria play a significant ecophysiological and biogeochemical role in the SAS microbiome (Cardman et al., 2014).
Most of Patescibacteria representatives, currently grouped into 14 classes, were first discovered by metagenomic analysis of

F I G U R E 3
The distribution of most abundant microbial signatures from SAS on the genus level (or the nearest identifiable phylogenetic level) was found using the contig LCA algorithm. The number of hits is displayed after the genus name samples from hardly accessible isolated habitats, such as permafrost and deep water trenches (Brown et al., 2015;León-Zayas et al., 2017;Parks et al., 2018). A particularly high Patescibacteria content was later reported for some groundwater microbiomes, reaching 38% of the total reads (Bruno et al., 2017;Schwab et al., 2017). Despite their relatively small number in the epipelagic SAS water (less than 2%) compared to the other "major" groups, they are considered important contributors to the ecological balance of aquatic ecosystems. Ca

| Phylogenetic and ecological characteristics of the minor bacterial and archaeal epipelagic SAS groups
177,568 bacterial reads identified in this study have been reckoned among the "minor cluster" phyla. These 13 phyla include bacteria with very diverse biochemical characteristics, playing very versatile roles in the ecological structure of the Small Aral Sea (Figure 4a).

Remarkable functional specializations and species diversity
among representatives of the minor cluster (7.93% of total reads) of the Small Aral Sea sequences is evidence for the rich ecological history of this peculiar drainless salty lake. The distinct phylum Spirochaetes is famous primarily for its highly peculiar double-membrane and helically coiled shape morphology of most of its representatives. These bacteria are also very miscellaneous in their pathogenic capacity and in the ecological niches that they inhabit.
The second-largest group Acidobacteria is both phenotypically and physiologically very heterogeneous. The members of this phylum are mostly uncultivated and typically very abundant in soil habitats representing up to 52% of the total bacterial community (Dunbar et al., 2002).
Surprisingly, the minor cluster samples contained numerous signatures of chemolithotrophic microorganisms. Among some most striking findings for us were the sequences typical of hydrogen-oxidizing bacteria belonging to Desulfurobacteriales (Aquificae) (Eder & Huber, 2002;Reysenbach & Cady, 2001). Another unanticipated group was Aquificales, which representatives prefer rather microaerophilic and thermophilic (>65°C) conditions (Huber et al., 1992).  The second-largest group of archaea in the Small Aral Sea was the TACK group (16%), combining chemolithoautotrophs and chemoorganotrophs capable of using elemental sulfur in their metabolism.
The number of other archaeal groups altogether did not exceed 10%.

| Functional view on the SAS epipelagic microbiome
The functional profile of the microbial community was as- The SAS samples were reached with functional signatures related to genetic transposable elements (relative abundance 19%), carbohydrate metabolism, as well as to amino acid transport, and protein metabolism. In addition, numerous sequences encoding co-factors (nucleoside-diphosphate-sugar epimerase) were identified.

| CON CLUS IONS
Proper diversity and structure of aquatic microbial communities are of great importance for the sustainability and efficacy of global biogeological transformations. The main object of our pilot metagenomic study was to shed more light on the still poorly discovered epipelagic microbiome of the Small Aral Sea.
The focus was made not only on the prevalence of certain systematic groups but also on their ecological properties and functions.
An ecologically remediated near-coastal location (its well-being is indicated by the revival of industrial fishing) was chosen as the F I G U R E 5 (a) Main predicted protein functions in the SAS microbiota derived from the ortholog analysis (COGs; KEGG sampling site. The collected epipelagic samples were tested by massive parallel sequencing without preliminary 16S rRNA amplification, as described elsewhere (Bubnoff, 2008). In our opinion, this high-throughput method can efficiently provide detailed data on the diversity of the Small Aral Sea microorganisms.
As expected, the majority (44% of the total reads) of the identified DNA sequences of the Small Aral Sea belonged to Terrabacteria.
This unranked supergroup contains approximately two-thirds of known prokaryotic species, typically found in aquatic ecosystems.
The second-largest SAS group, Actinobacteria, distinguishes itself by being virtually uncultivated and phenotypically very diverse.
The occurrence of some other groups (Deinococcus-Thermus, Armatimonadetes, Chloroflexi) was less anticipated in the epipelagic horizon. Peculiarly, it was found that many detected sequences belonged to strict anaerobes-Ignavibacteria, hydrogen-oxidizing bacteria Desulfurobacteriales, and archaeal methanogenic species.
We found that due to the presence of a fairly large number  (Micklin, 2010).
Future research focus may lay on analyzing the seasonal and yearly dynamics of the bacterial community. As a labile product of a variety of different environmental factors, such as salinity, pH, temperature, osmotic pressure, and solar irradiation, the behavior and the evolution of the SAS ecosystem still needs to be better understood. One of the future tasks will be also to evaluate the exact involvement of different microbial groups in the regional nu-

CO N FLI C T O F I NTE R E S T
None declared.

E TH I C S S TATEM ENT
None required.  Figure S1: SAS taxonomic structure," as well as a raw

DATA AVA I L A B I L I T Y S TAT E M E N T
Kaiju data file "SAS kaiju_taxonpaths," are available online at https:// doi.org/10.5281/zenodo.4057925.