Exploring biogeographic patterns of bacterioplankton communities across global estuaries

Abstract Estuaries provide an ideal niche to study structure and function of bacterioplankton communities owing to the presence of a multitude of environmental stressors. Bacterioplankton community structures from nine global estuaries were compared to understand their broad‐scale biogeographic patterns. Bacterioplankton community structure from four estuaries of Sundarbans, namely Mooriganga, Thakuran, Matla, and Harinbhanga, was elucidated using Illumina sequencing. Bacterioplankton communities from these estuaries were compared against available bacterioplankton sequence data from Columbia, Delaware, Jiulong, Pearl, and Hangzhou estuaries. All nine estuaries were dominated by Proteobacteria. Other abundant phyla included Bacteroidetes, Firmicutes, Acidobacteria, Actinobacteria, Cyanobacteria, Planctomycetes, and Verrucomicrobia. The abundant bacterial phyla showed a ubiquitous presence across the estuaries. At class level, the overwhelming abundance of Gammaproteobacteria in the estuaries of Sundarbans and Columbia estuary clearly stood out amidst high abundance of Alphaproteobacteria observed in the other estuaries. Abundant bacterial families including Rhodobacteriaceae, Shingomonadaceae, Acidobacteriaceae, Vibrionaceae, and Xanthomondaceae also showed ubiquitous presence in the studied estuaries. However, rare taxa including Chloroflexi, Tenericutes, Nitrospirae, and Deinococcus‐Thermus showed clear site‐specific distribution patterns. Such distribution patterns were also reinstated by nMDS ordination plots. Such clustering patterns could hint toward the potential role of environmental parameters and substrate specificity which could result in distinct bacterioplankton communities at specific sites. The ubiquitous presence of abundant bacterioplankton groups along with their strong correlation with surface water temperature and dissolved nutrient concentrations indicates the role of such environmental parameters in shaping bacterioplankton community structure in estuaries. Overall, studies on biogeographic patters of bacterioplankton communities can provide interesting insights into ecosystem functioning and health of global estuaries.


| INTRODUC TI ON
Bacterioplankton, a key component of the marine microbial loop, plays an important role in carbon and nitrogen cycles (Azam et al., 1983). Variations in hydrological parameters such as tidal inflow, salinity, and availability of organic matter may lead to strong environmental gradients for resident microbial communities (Herlemann et al., 2011;Hewson & Fuhrman, 2004;Kirchman, Dittel, Malmstrom, & Cottrell, 2005). Coastal ecosystems, such as estuaries, which are one of most productive ecosystems in the world (Kan, Suzuki, Wang, Evans, & Chen, 2007), represent an ideal niche to study bacterioplankton community structure and its associated functions.
Continuous mixing of freshwater from the riverine and marine water makes estuaries highly dynamic ecosystems. Short residence time of water coupled with slow growth rates deter free-living bacterial communities to efficiently develop into typical estuarine communities (Crump, Armbrust, & Baross, 1999). Previous studies have shown that estuarine bacterial communities harbor a mix of taxa that essentially belong to either freshwater or marine ecosystems (Crump et al., 1999;Herlemann et al., 2011). Functionally, bacteria in estuaries play an important role in the degradation of terrestrial and riverine organic carbon (Lee & Wakeham, 1988). In estuaries with low primary production, allochthonous carbon can be an important factor in supporting the food web by processes such as bacterial secondary production and through trophic transfers as part of the microbial loop (Murrell, Hollibaugh, Silver, & Wong, 1999). In coastal ecosystems such as estuarine mangroves, about 50% of the average litterfall is exported out of the ecosystem and eventually degraded by microbial communities including bacteria (Alongi, 2014). These organically rich mangroves form "specialized ecosystems" where resident bacterioplankton could be a key player in carbon cycling (Bano et al., 1997;Bianchi & Bauer, 2011).
Bacterioplankton community structure from various major estuaries such as the Chesapeake Bay (Bouvier & del Giorgio, 2002), Delaware estuary (Campbell & Kirchman, 2013;Kirchman et al., 2005), Columbia estuary (Fortunato & Crump, 2011), and Pearl estuary (Zhang, Jiao, Cottrell, & Kirchman, 2006) among others has been investigated using both Sanger and next-generation sequencing (NGS) approaches. These studies have shown the presence of diverse bacterial phyla such as Proteobacteria, Actinobacteria, Bacteroidetes, Deferribacteres, and Verrucomicrobia across these sites. Furthermore, these studies have also shown spatial variation in bacterial classes such as Betaproteobacteria and Gammaproteobacteria and bacterial orders such as Burkholderiales, Rhodocyclales, Hydrogenophilales, and Methylophilales. This has broadened our understanding of bacterioplankton community structure and at the same time hinted toward difficulties in understanding habitat dynamics. Bacterioplankton communities from estuaries such as those located within mangrove ecosystems have rarely been looked into using uncultured approaches (Angsupanich, Miyoshi, & Hata, 1989;Ghaderpour et al., 2014;Gonsalves, Nair, LokaBharathi, & Chandramohan, 2009), and most of the studies have been restricted to soil and sedimentary habitats (Alfaro-Espinoza & Ullrich, 2015; dos Santos et al., 2011;Ghosh et al., 2010;Gomes, Cleary, Calado, & Rodrigo, 2011) Due to the organic materials originating from litterfall, estuaries located within mangrove environments could potentially harbor novel bacterial groups. Therefore, undertaking survey of different estuaries to determine which microbial taxa are present where, and how and why they assemble into functional communities is integral to understanding the exact role of marine microbial communities in ecosystem processes.
The advent of NGS methods has allowed for increased sampling depth both in terms of number of sites covered and number of sequences generated per sample (Kircher & Kelso, 2010). Owing to restricted length of the sequenced amplicon, choice of efficient variable regions for taxonomic classification and phylogenetic analyses is still debated (Yang, Wang, & Qian, 2016). This could be circumvented by the introduction of Illumina sequencers such as MiSeq which can generate reads of up to 600 bp, thereby increasing the accuracy of downstream data processing (Degnan & Ochman, 2012).
Work undertaken by Hao and Chen (2012) has shown that fragments of 16S ribosomal ribonucleic acid (16S rRNA) with length >150 bp from different portions of the marker gene could provide taxonomic assignment as accurately as the entire 16S rRNA sequence.
Such sequencing technologies have improved elucidation of bacterioplankton community structures from different ecosystems.
Previous studies such as the ones highlighted below have shown spatial differences in many bacterial groups (Schauer, Massana, & Pedrόs-Aliό, 2000;Ma et al., 2016;references herein). At lower taxonomic levels, bacteria representing the classes such as Alphaand Betaproteobacteria have been found to be more abundant across marine and freshwater ecosystems, respectively (Cottrell & Kirchman, 2000;Glöckner, Fuchs, & Amann, 1999;Methe & Zehr, 1999;Tang et al., 2015). The abundance of members belonging to Alphaproteobacteria tends to be higher in environments with higher salinity (Cottrell & Kirchman, 2003). To date, reasons behind such observed biogeographic patterns of only certain groups of bacteria are not well understood. In light of such information, we hypothesize that bacterioplankton groups could exhibit distinct distribution patterns across estuaries under the influence of similar environmental parameters.
Coastal estuaries such as those located within mangroves ecosystems are particularly important from the viewpoint of organic carbon inputs (Dittmar, Hertkorn, Kattner, & Lara, 2006). Plant matter including litter fall contributes substantially to the pool of dissolved organic matter in such estuaries (Kristensen, Bouillon, Dittmar, & Marchand, 2008;Sardessai, 1993). Extensive studies have provided a comprehensive picture of the bacterioplankton communities from various global estuaries, while our knowledge of bacterioplankton communities of Sundarbans remains limited (Ghosh & Bhadury, 2017. Sundarbans is the largest contiguous stretch of mangrove forest globally that lies in the Ganga-Brahmaputra-Meghna (GBM) delta (Gopal & Chauhan, 2006). This ecosystem stretches over approximately 10,000 km 2 spanning across India and Bangladesh and is characterized by the presence of perennial and rain-fed rivers that open into the Bay of Bengal forming broad estuaries (WCMC, 2005).
Nutrient-rich coastal water flows in from the Bay of Bengal on a daily basis due to diurnal tidal action and mix with freshwater from these large rivers (Rahaman et al., 2013). High seasonal precipitation along with diurnal intrusion of tidal water rapidly changes the salinity profile of estuaries located in the Sundarbans Rahaman et al., 2013).
The goal of this study was to examine biogeographic patterns of bacterioplankton communities from geographically distant estuaries. The objectives of this study were (a) to explore biogeographic patterns of bacterioplankton communities across major estuaries in comparison with the Sundarbans and (b) to understand the role of environmental parameters that lead to observed biogeographic patterns at broad phylogenetic levels. has been considered in this study (Bhattacharjee, Samanta, Danda, & Bhadury, 2013;Choudhury, Das, Philip, & Bhadury, 2015;Samanta & Bhadury, 2014). Huge freshwater from the Mooriganga estuary and diurnal tidal influx of coastal water from the Bay of Bengal gives rise to typical estuarine conditions in this station. Moreover, this station is strongly influenced by local as well as regional precipitation caused by the southeastern monsoons (Gopal & Chauhan, 2006 namely Thakuran estuary, Matla estuary, and Harinbhanga estuary, respectively, was considered in this study. These three estuaries are located in the central sector of the Indian Sundarbans. Due to heavy siltation, these three rivers have lost their upstream connection with the River Ganga and are only tidally fed on a diurnal basis (Banerjee, 2013;Manna, Chaudhuri, Bhattacharyya, & Bhattacharyya, 2010).

| Sampling stations of estuaries in Sundarbans
Moreover, these three estuaries are located within the Sundarbans Biosphere Reserve (SBR) which is considered to be relatively pristine part of Sundarbans (Gopal & Chauhan, 2006). The SBR stations are located within the Sundarbans Biosphere Reserve, which is a heavily protected area and is a "high-risk zone" for undertaking sampling work. Out of many stations that are usually sampled along these estuaries, one station (as mentioned above) each was selected for this study based on the comparable environmental parameters recorded during sampling. Stations along these estuaries vary largely from one another in salinity and water depth.

| Sampling
One liter of surface water was collected from four estuaries (one station each) of Sundarbans following published protocol  and immediately fixed with molecular grade ethanol (Merck, Germany). Water was collected from Stn3 (Mooriganga) of SBOTS in July 2014 and from the other three estuaries in August 2015 representing monsoon period. The collected samples were immediately transferred to the laboratory for downstream molecular analyses.

| Measurement of in situ environmental parameters from Sundarbans estuaries
In situ measurement of environmental parameters such as surface water temperature (digital thermometer, Eurolab ST9269B, Belgium), salinity (Conductivity meter, Erma, Japan), pH (Eco testr,
The filter paper was further incubated at 37°C for 1 hr after addition of 10 µl of 10 mg/ml Lysozyme (ThermoScientific, USA). DNA was recovered from the aqueous phase following a phenol:chloroform step and precipitated overnight using 3 M sodium acetate (Merck, India) and molecular grade absolute ethanol. DNA was pelleted by centrifugation at 16,000 rcf for 12 min and washed using 70% molecular grade ethanol. The pellet was dissolved in 10 mM Tris-HCl and run on a 1% agarose gel.

| Sequence quality control and operational taxonomic unit (OTU) generation
The reads were quality filtered and trimmed by removing adaptor, barcode, and primer sequences. The pair-end reads were merged together by using Fast Length Adjustment of SHort reads (FLASH) (Magoč & Salzberg, 2011). Chimera sequences were identified by the default method of UCHIME in QIIME (Caporaso et al., 2010;Edgar, Haas, Clemente, Quince, & Knight, 2011). For each dataset, sequences were clustered into operational taxonomic units (OTUs) at 97% sequence identity using UCLUST (Edgar, 2010). The OTUs were classified using RDP Classifier 2.2 at a confidence level of 80% (Wang, Garrity, Tiedje, & Cole, 2007).

| Data retrieval from 16S rRNA databases
To compare biogeographic patterns of bacterioplankton communities in Sundarbans estuaries with other global estuaries, 16S rRNA sequences were retrieved from INSDC Sequence Read Archives from the National Centre for Biotechnology Information website (Leinonen, Akhtar, et al., 2011;Leinonen, Sugawara, et al., 2011). Our target was to find bacterioplankton datasets generated from only estuarine surface water. To retrieve bacterioplankton sequences from the Short Sequence Archive (SRA), Boolean search strings were designed and these are summarized in Supporting Information Table   S1. Selected datasets were then downloaded using the SRA Toolkit version 2.5.7. Details of sequence data generation are included in Supporting Information Table S2. Bacterioplankton datasets generated over different stations within an estuary were specifically targeted to minimize the effect of local environmental parameters on the overall bacterioplankton community structure representing a particular estuary. The locations of the selected estuaries have been shown in Supporting Information Figure S1.

| Comparison of bacterioplankton community structure of estuaries from Sundarbans with other estuaries
Downloaded datasets representing other estuaries were processed similarly to Sundarbans data. Details of data generated from other sites including primers used for amplification, region of the 16S rRNA amplified, and accession number details are summarized in Supporting Information Table S2. The downloaded datasets were generated across multiple studied stations representing the study site. The database contained sequence information from each study station. Such individual sequence files were processed as before.
The different data files from each station per estuary were then merged into a single file before further analyses were undertaken.
It was assumed that the final merged datasets would contain sequences from all study stations, thereby representing the estuary as a whole. This could essentially eliminated differences in bacterioplankton community composition arising due to local environmental factors. The datasets were normalized to account for the difference in sequence depth from different estuaries. This was performed by rarefying available sequence data to 33,760 sequences, which were the least count among samples based on random subsampling. The SILVAngs analysis pipeline (Quast et al., 2013) was used for taxonomic classification based on the SILVA Reference alignment. The query sequences were first aligned against SILVA SSU rRNA SEED to ensure that all sequences were that of 16S rRNA using SILVA Incremental Aligner v1.2.10 (SINA; Pruesse, Peplies, & Glöckner, 2012). Unclassified sequences, Archaea and chloroplast sequences were removed from downstream analyses. This step also allowed for the removal of low-quality reads (reads shorted than 50 aligned nucleotides) based on the presence of ambiguous bases (2% of the total length) or homopolymers (2% of the total length). This was followed by a dereplication step where 100% identical reads are marked a replicate. Dereplication was performed using CD-HIT (v 3.1.2; https://www.bioinformatics.org/cd-hit) applying identity criteria of 1.00 in accurate mode (Li & Godzik, 2006). Pairwise distance estimation and single linkage clustering were used to create clusters of sequences with 97% sequence identity to each other using CD-HIT. From among each cluster, the longest reads served as a reference sequence which was then taxonomically affiliated to known bacteria using nucleotide BLAST search version 2.2.30+ (Shiryev, Papadopoulos, Schäffer, & Agarwala, 2007)  The resulting classification of the reference sequence was mapped to all sequences of the respective cluster and its replicates which provided quantitative information on the number of individual reads per taxonomic assignment. Dominant bacterial phyla were defined as those with abundance of >2% in all samples. The rare bacterial phyla were defined as those that accounted for <0.1% of the total reads (Fuhrman, 2009;Pedrόs-Aliό, 2006;Sjöstedt et al., 2012). The abundance of bacterial families across different estuaries was plotted on a heat map in excel in order to observe variation in abundance and pattern across estuaries.

| Validation of comparison between different hypervariable regions of 16S rRNA
The datasets included in this study used different regions of the 16S rRNA to elucidate bacterioplankton community structure. To validate that different variable regions of the 16S rRNA provide similar taxonomic affiliation, the comparison was done across them.
Full-length 16S rRNA sequences from a previous study conducted in Mooriganga estuary were used for validation (Ghosh & Bhadury, 2018). The full-length sequences were trimmed to the V2, V3, V4, V1-V2, and V3-V4 variable region combinations corresponding to the variable regions of the metabarcoding datasets. Taxonomic affiliation of the 541 sequences of the individual variable regions and the full length of the 16S rRNA was performed in RDP classifier . The classification was considered up to the family level at 80% confidence as shown in Supporting Information Table S5.
Sequences used for this validation are deposited at GenBank under the accession numbers KX014028-KX014568.

| Statistical analyses
The Shannon-Weaver index, Simpson's diversity index, and rarefaction curves were generated at a bacterial family level in Visualization and Analysis of Microbial Population Structures (VAMPS) (Huse et al., 2014). The data were normalized using "Normalized to Percent" option, and values were generated using "Alpha diversity" option under Choose Visualization Method option. The taxonomic depth was set at "family level" for calculation of values. The abundance of bacterioplankton families across the estuaries was normalized and square root transformed. A nonmetric multidimensional scaling (nMDS) ordination was undertaking using Bray-Curtis dissimilarity in PRIMER v6.02 (Clarke & Gorley, 2006). Plots were generated separately for abundant and rare taxa to understand their biogeographic distribution patterns across the estuaries. One-way ANOVA was performed to compare the abundance of bacterioplankton groups between studied stations (Ståhle & Wold, 1989). Similarity percentage (SIMPER) was performed to identify phyla that contributed most to the dissimilarity between the bacterioplankton community compositions in studied estuaries of Sundarbans (Clarke, 1993). BIO-ENV was used to explore the effect of measured environmental parameters on changes in the bacterioplankton community structure (Clarke & Ainsworth, 1993). Five environmental parameters, namely surface water temperature, salinity, dissolved oxygen, dissolved nitrate, and dissolved ortho-phosphate available for studied estuaries, were included in the analysis. Environmental variables were normalized, and Euclidean distance between the samples was determined. The OTU abundance data were log (N + 1) transformed, and Bray-Curtis similarity coefficient was determined to quantity similarity between bacterioplankton community structures of studied estuaries. Pearson's correlation was undertaken to analyze the influence of environmental parameters on each bacterioplankton group.

| Overview of bacterioplankton community patterns in Sundarbans
Four estuaries located within the Sundarbans were selected, and bacterioplankton community patterns were assessed. The distribution patterns of dominant bacterioplankton phyla and Proteobacterial classes are summarized in Table 1  ANOVA, p < 0.1), whereas phyla such as OD1 (9 OTUs) and SR1 (8 OTUs) were higher compared to other stations (one-way ANOVA, p < 0.001). The community level variation was largely contributed by the rare taxa identified from these estuaries. Differences in the distribution pattern of the less abundant or "rare" taxa in these four estuaries are shown in Figure 2b. No significant difference in abundance was found for rare taxa such as OD11, Spirochaetes, and Deferribacteres among the four estuaries of Sundarbans.
Interestingly, other rare bacterial phyla such as WS3 (1 OTU) and

| Comparison between different variable regions of 16S rRNA
A total of 541 clones were analyzed to check for congruency in taxonomic affiliation based on the selection of different variable regions as indicated in Supporting Information

| Biogeographic patterns of bacterioplankton taxa across studied global estuaries
In order to identify bacterioplankton biogeographic patterns, we

Rarefaction analysis was undertaken to compare relative
OTUs richness among targeted estuaries as shown in Supporting Information Figure S2. The rarefaction curves showed saturation for all the estuaries indicating optimum NGS sequencing effort (Supporting Information Figure S2). The Shannon-Weaver values were found to be highest in Matla estuary (3.68) and lowest in Pearl estuary (2.57) (Supporting Information Table S4). Simpson's index value of studied estuaries also showed similar trends (Supporting Information Table S4).

| Influence of environmental parameters on bacterioplankton biogeography
The available data representing environmental parameters and dissolved nutrients present in estuaries have been summarized in

| D ISCUSS I ON
Transitional environments such as estuaries witness a gradient in salinity, temperature, and dissolved nutrients owing to continuous mixing of freshwater and saline water (Crump, Adams, Hobbie, & Kling, 2007;Fortunato, Herfort, Zuber, Baptista, & Crump, 2012). In coastal ecosystems such as mangroves, plant matter serves as an additional source of organic matter in nearby aquatic systems (Dittmar et al., 2006). Plant matter such as mangrove litterfall contributes to 30%-60% of total primary production (Bunt, Boto, & Boto, 1979). Litterfall into the water results in rapid release of dissolved organic matter (DOM) and tannins (Fell, Cefalu, Masters, & Statzell-Tallman, 1975;Newell, Fell, Statzell-Tallman, Miller, & Cefalu, 1984;Robertson, 1988;Steinke, Holland, & Singh, 1993). Decomposition of remaining particulate organic matter is a slow process that is largely driven by microbial communities (Robertson, Alongi, & Boto, 1992). Previous studies have shown that greater structural and functional versatility in diverse communities may facilitate exploitation of environmental resources to a further extent (Jolliffe, 1997). It has been suggested that under varying environmental conditions, greater species diversity could help to maintain the productivity of ecosystems by stabilization of environmental conditions (Yachi and Loreau, 1999).
Numerous studies have been undertaken to elucidate bacterial community structures from various estuaries globally. These extensive datasets of millions of reads using next-generation sequencing technologies are available in public databases such as GenBank/ DDBJ/ENA (Benson, Karsch-Mizrachi, Lipman, Ostell, & Wheeler, 2005;Leinonen, Akhtar, et al., 2011;Leinonen, Sugawara, et al., 2011;Mashima et al., 2017). The 16S rRNA is commonly used as a chronometer for this purpose (Brown & Fuhrman, 2005;Clarridge, 2004;Martínez-Murcia, Antόn, & Rodríguez-Valera, 1999). Owing to short read lengths generated in NGS such as those generated by Illumina platforms, studies focus on sequencing only one or two variable regions of the 16S rRNA. The accuracy of different hypervariable regions of the 16S rRNA in taxonomic assignment has been long debated as there is no clear consensus on the region of 16S rRNA to be used for sequencing. Huse et al. (2008) investigated the accuracy of V3 and V6 regions in comparison the with full-length 16S rRNA in determining taxonomy of generated sequences. They found the V3 region to be 99% accurate for assignment to genus level, whereas the V6 is 97% accurate. These regions have been found to be consistent for the elucidation of rare taxa (Huse et al., 2008). Bacterioplankton communities considered in this study were elucidated using a com- rRNA (Bartram, Lynch, Stearns, Moreno-Hagelsieb, & Neufeld, 2011;Kircher & Kelso, 2010;Vilo & Dong, 2012). Numerous studies have discussed the possible effect of use of different variable regions in addition to primer selection to be a critical factor in elucidating bacterioplankton community structure (Hamp, Jones, & Fodor, 2009;Liu, Lee, Vanlare, Kasper, & Mazmanian, 2008;Mizrahi-Man, Davenport, & Gilad, 2013;Soergel, Neelendu, Knight, & Brenner, 2012;Sundquist et al., 2007;Wu, Hartman, Ward, & Eisen, 2008).
However, the efficiency of different primer sets or use of different variable regions to accurately resolve the bacterioplankton communities of estuaries was beyond the scope of this study. The efficiency of different variable regions of the 16S rRNA in providing similar taxonomic resolution at least to the family level could be validated using a simple step. This has also been observed in a detailed study by Wear, Wilbanks, Nelson, and Carlson (2018) where the authors clearly show that the choice of primers does not greatly affect bacterioplankton community composition. Additionally, the authors also validated using large datasets that qualitative information of bacterioplankton communities generated by use of different primers is highly comparable. The abundance of Actinobacteria sequences was found to be significantly higher in Mooriganga estuary. This could be due to the availability of complex organic substances in estuarine water surrounding Mooriganga some of which could be anthropogenic in origin and thus promoting their cell abundance Vilo & Dong, 2012). It is already well known that this group can degrade a range of polymeric substances such as cellulose, lignin, chitin, starch, and laminarin (Linos et al., 2002;Pranamuda, Chollakup, & Tokiwa, 1999) which could also originate from mangrove plants.
Interestingly, high abundance of Actinobacteria has been reported from other mangrove environments such as those located in Brazil (Silveira et al., 2011). We also encountered higher abundance of Verrucomicrobia sequences in Matla estuary. However, this group was not encountered in Mooriganga estuary which highlights the spatial differences in bacterioplankton biogeography across Sundarbans within a localized geographic scale. Verrucomicrobia is known for dependence on organic matter and can be important in carbon cycling (Canfora et al., 2014). Incidentally, organic matter load in Matla estuary could be high due to litterfall as this part of the Sundarbans has relatively higher density and diversity of mangrove plants (Mukherjee & Ray, 2012). More recently, it has been reported that Verrucomicrobia play important role in the assimilation of dissolved inorganic carbon in coastal water (DeLorenzo et al., 2012).
Although Verrucomicrobia sequences did not constitute more than 3% in other global estuaries, nevertheless their presence in almost all the estuaries indicates functional significance. For the first time, we show that Verrucomicrobia sequences are ubiquitous in global estuarine water given that they are known to be widely present in soil environment and also have reported from other coastal ecosystems but not from estuaries (Freitas et al., 2012).
Other bacterial phyla, which can be considered "rare" in terms of the abundance of sequence reads in the studied estuaries, include Chloroflexi, Spirochaetes, Nitrospirae, Tenericutes, and Deinococcus-Thermus. Out of these rare phyla, Deinococcus-Thermus sequences were not encountered in Delaware and Hangzhou estuaries. These minor components of bacterioplankton communities could be functionally more relevant or may harbor an important repertoire of stress-related genes (Freitas et al., 2012). It is well established that some members of Nitrospirae, Chloroflexi, and Proteobacteria are nitrite-oxidizing bacteria (NOB) which catalyzes nitrification and thus play important roles in nitrogen cycling (Sorokin et al., 2012;Wang et al., 2015). Interestingly, all the four estuaries of Sundarbans and Columbia estuary showed a high abundance of Nitrospirae and Chloroflexi sequences compared to other studied estuaries although dissolved nitrate concentration was not higher in the former. However, the possibility of a "lag phase" be- is frequently reported from the hypersaline environment (Ma & Gong, 2013), but we did not find any correlation with available salinity datasets as part of our study. In fact, the highest abundance of Chloroflexi was recorded from Columbia estuary which also had the lowest salinity among all estuaries targeted as part of this study.
Such site specificity is further reinstated by the nMDS ordination plot of rare taxa that does not indicate distinct clusters.
Low abundance of Cyanobacteria was encountered in studied estuaries, except for Columbia estuary where it was relatively higher (relative abundance-3%). The observed low abundance of Cyanobacteria in studied estuaries of Sundarbans may be due to high suspended particulate matter in the water column which can affect light penetration Litchman, 2003 (Kirchman et al., 2005;Pommier et al., 2007;Zhang et al., 2007;Zhang, Lau, Ki, Thiyagarajan, & Qian, 2009). Additionally, it is known that the marine end of estuaries (higher salinity zones) are dominated by Alphaproteobacteria that otherwise normally exist in low abundance in low salinity water (Bouvier & del Giorgio, 2002;Cottrell & Kirchman, 2003 (Kersters et al., 2006). Such groups indicate the presence of DOM and also hint toward the pristine nature of the Sundarbans mangrove ecosystem. Similarity in abundance of such bacterial families could be resulting in the distinct clustering of the Sundarbans estuaries as observed in the nMDS ordination plot.
A common understanding of the "distance-decay" relationship is that bacterial community similarity decreases with increasing geographic distance. The distance-decay relationship directly challenges the more popular concept of "everything is everywhere" for bacteria. Our study shows that geographically distant estuaries share a high degree of similarity which seems to override temporal variations commonly observed in bacterioplankton communities. We observed clear biogeographic patterns of bacterioplankton communities which can be explained by prevailing environmental conditions in studied estuaries. The influence of environmental parameters on the bacterioplankton communities of the studied sites was in line with previously published literature (Fortunato et al., 2012;Lozupone & Knight, 2007). We found surface water temperature and dissolved nutrients such as dissolved nitrate and ortho-phosphate to strongly influence observed patterns of bacterioplankton communities (BIOENV ρ w = 0.907). Consistent distribution patterns of dominant members of the bacterioplankton communities could be also due to the similarity in surface water temperature prevailing across studied estuaries. At the same time, no single environmental parameter could account for the observed biogeographic pattern of rare bacterioplankton taxa across studied estuaries. At the bacterioplankton biogeography level, we find many bacterial classes showing ubiquitous distribution across estuaries globally. It is important to mention that although ubiquitous trends were observed, the studied estuaries were not sampled across same time point. While the estuaries of Sundarbans were sampled during the Indian monsoon, the NGS data for Pearl, Jiulong, and Hangzhou estuaries were also generated previously during the monsoon season of Southern China.
However, NGS datasets of Columbia and Delaware estuaries were obtained from samples collected during late winter and early summer of Northern Hemisphere. Therefore, it may be possible that some of the seasonal trends of bacterioplankton communities may not be reflected in the studied datasets. However, despite this seasonal variability, our study shows remarkable similarity in terms of structure and function of bacterioplankton communities.
Biogeography studies allow the retrieval of bacterioplankton groups widespread in all study sites along with site-specific groups.
It may be difficult to determine whether or not this estuary-specific bacterioplankton are related to particular stress prevalent in each estuary. The observed biogeographic patterns of bacterioplankton across global scale can help understand their role in ecosystem-level processes such as utilization of DOM pool, autotrophic, and heterotrophic production as well as carbon and nitrogen cycling (Cavigelli & Robertson, 2000;Horz, Barbrook, Field, & Bohannan, 2004;Naeem & Li, 1997;van der Heijden et al., 1998). Furthermore, such biogeographic patterns yield information on bacterioplankton co-occurrence which could ultimately help toward developing culture-based approaches. Information about substrate requirement and essential growth conditions can be understood as found in this study and applied for the establishment of bacterioplankton cultures, especially those within mangrove ecosystem. Given there is increasing anthropogenic pressure, nevertheless the ecosystem level health of studied estuaries was good as bacterioplankton indicative of eutrophication or of urban origin was found to be infrequent in terms of abundance based on the analyzed dataset. To conclude, this study which incorporated robust bioinformatics analysis of generated as well as available NGS datasets provided an interesting glimpse of bacterioplankton communities across local, regional, and global scales.

ACK N OWLED G M ENTS
Anwesha Ghosh is the recipient of IISER Kolkata Integrated Ph.D.
Fellowship. We thank Andrew Voorhis for his constant help with processing the data in VAMPS.

CO N FLI C T O F I NTE R E S T
The authors declare no conflict of interest.

AUTH O R S CO NTR I B UTI O N
PB and AG conceived the idea; AG undertook experiments and data analysis; AG and PB wrote the manuscript. All authors read and approved the final manuscript.

E TH I C S S TATEM ENT
No human and/or experimental animal subjects were involved in this study.

DATA ACCE SS I B I LIT Y
The datasets supporting the conclusions of this article are available in the National Centre for Biotechnology Information (NCBI) Short Read Archive data under accession number SRP092508, https:// www.ncbi.nlm.nih.gov/sra. The datasets of the other estuaries used in this study are publicly available for download, and the corresponding accession numbers of the datasets are provided in the manuscript. All environmental data generated and analyzed in this