Hidden heterogeneity and co‐occurrence networks of soil prokaryotic communities revealed at the scale of individual soil aggregates

Sequencing PCR‐amplified gene fragments from metagenomic DNA is a widely applied method for studying the diversity and dynamics of soil microbial communities. Typically, DNA is extracted from 0.25 to 1 g of soil. These amounts, however, neglect the heterogeneity of soil present at the scale of soil aggregates and thus ignore a crucial scale for understanding the structure and functionality of soil microbial communities. Here, we show with a nitrogen‐depleted agricultural soil the impact of reducing the amount of soil used for DNA extraction from 250 mg to approx. 1 mg to access spatial information on the prokaryotic community structure, as indicated by 16S rRNA gene amplicon analyses. Furthermore, we demonstrate that individual aggregates from the same soil differ in their prokaryotic community compositions. The analysis of 16S rRNA gene amplicon sequences from individual soil aggregates allowed us, in contrast to 250 mg soil samples, to construct a co‐occurrence network that provides insight into the structure of microbial associations in the studied soil. Two dense clusters were apparent in the network, one dominated by Thaumarchaeota, known to be capable of ammonium oxidation at low N concentrations, and the other by Acidobacteria subgroup 6, representing an oligotrophic lifestyle to obtain energy from SOC. Overall this study demonstrates that DNA obtained from individual soil aggregates provides new insights into how microbial communities are assembled.


| INTRODUC TI ON
One of the major challenges in microbial ecology is to gain a predictive understanding of microbial diversity through elucidating the principles, patterns, and interactions that lead to the assembly of highly diverse microbial communities as found for example in soil (Green et al., 2008). To achieve this, it is essential to consider temporal and spatial variation in microhabitat conditions. In soil microbial ecology, the latter, however, is commonly ignored (Lombard et al., 2011;Vos et al., 2013). Instead, large composite samples are favored to obtain an overview of microbial diversity at the scale of a plot or a field. This neglects the fine-scale heterogeneity of soil structure, and thus loses much information on patterns of community assembly (Thakur et al., 2020).
Soil structure develops as primary particles of different sizes and mineral composition, that is, clay, silt, and sand interact with each other, and with organic material to build microaggregates and macroaggregates, that are below or above 250 µm in diameter respectively (Six et al., 2000). Most bacterial cells occur inside aggregates rather than on their surfaces (Ranjard, Poly, et al., 2000), and biogeochemical cycles, which are key ecosystem services driven by an interacting microbial community (Smith et al., 2015), are considered to mainly occur within aggregates (Wilpiszeski et al., 2019). Soil aggregates have been regarded as "massively concurrent evolutionary incubators" (Rillig et al., 2017) or as "microbial villages" (Wilpiszeski et al., 2019) that represent small communities separated by distance and physical barriers and connected only periodically, for example, during wetting events.
DNA-based methods to assess microbial diversity typically start with extracting 250 mg to 1 g of soil material (Young et al., 2014). This strategy has been useful to investigate the overall microbial diversity of soils (Fierer & Jackson, 2006;Roesch et al., 2007), its variation across geographical regions (Griffiths et al., 2011;Karimi et al., 2018), or its response to land-use change at a continental scale (Szoboszlay et al., 2017). However, to understand the processes and interactions occurring within soil microbial communities, it would be rewarding to increase the spatial resolution of the community analysis to individual aggregates and investigate microbial diversity in these spatial entities. Approaching the "calling distance" of microbial interactions (Nunan, 2017) would increase the likelihood of detecting interacting microbial partners. Correlation networks have increasingly been applied to reveal relationships between microbial community members as detected by PCR amplicon sequence analyses (Banerjee et al., 2016;Barberan et al., 2012;Karimi et al., 2020;Li et al., 2017). However, to interpret a positive correlation as a mutualistic and a negative correlation as an antagonistic relationship is possible only for community members sharing the same microhabitat (Weiss et al., 2016). Without distinction of microhabitats, it is hard to assign the presence of taxa to niche exclusion . Analyzing soil DNA extracted from 250 mg to 1 g represents mixed DNA from many microhabitats. In contrast, working with individual soil aggregates should strongly enhance the ecological significance of soil microbial network analyses.
The potential impact of the heterogeneous soil constituents on structuring microbial communities at a microscale has already been demonstrated with pooled soil primary particles, where the majority of abundant bacterial and fungal taxa exhibited particular preferences for clay, silt, or sand-sized fractions with particulate organic matter (Hemkemeyer et al., 2018(Hemkemeyer et al., , 2019. Furthermore, comparing pooled samples of micro-and macroaggregates revealed that these two size classes also differ in microbial community composition (Constancias et al., 2014;Davinic et al., 2012;Fox et al., 2018), diversity (Bach et al., 2018;Ivanova et al., 2015) and their response to stress . However, information on the heterogeneity of microbial communities of individual aggregates within specific aggregate fractions is still lacking. A major limitation of analyzing individual aggregates is the difficulty of obtaining a sufficient quantity of nucleic acids from small amounts of soil for molecular methods. Attempts made so far, therefore, either pooled several aggregates for DNA extraction (Bach et al., 2018;Ivanova et al., 2015), sampled very large aggregates weighing 20-70 mg (Kravchenko et al., 2014), or applied whole genome amplification (WGA) . These solutions, however, do not deliver data on individual aggregates, provide coarse spatial resolution, or generate substantial bias in the results (Direito et al., 2014;Wang et al., 2016), respectively. To our knowledge, the only study that reported the bacterial community composition in smaller, that is below 3 mm, individual aggregates without applying WGA utilized taxonomic microarrays; a method of relatively low resolution, and focused solely on linking enzyme activity profiles with community structure (Kim et al., 2015). Furthermore, applying molecular methods to small samples that yield very low amounts of nucleic acids require validation to prove the consistent performance of the methods and rule out the possibility of contamination and stochastic effects influencing the results.
The tremendous scientific potential that individual soil aggregate-based microbial community analysis should have for characterizing the heterogeneity of soil microbial communities at a biologically and ecologically more meaningful scale motivated us testing the following hypotheses in this study: 1. Metagenomic DNA of sufficient quantity and quality for PCRbased analyses can be extracted from soil samples in the mgrange, thus representing the scale of macroaggregates 2. Increasing spatial resolution reveals heterogeneity in soil bacterial and archaeal community structure and abundance 3. A higher heterogeneity seen among small soil samples is not a result of contamination or sub-optimal performance of molecular methods 4. Comparing individual aggregates from the same soil unveils patterns of microbial co-occurrence within the soil microbial community not seen with the commonly used 250 mg soil sample size.

| Overview of the experiments
Three experiments were conducted in this study. In the 1st experiment, samples decreasing in size from 250 mg to 1 mg taken from the same soil were subjected to DNA extraction. To address the first two hypotheses, qPCR and high-throughput amplicon sequencing targeting the 16S rRNA gene were conducted to characterize the bacterial, archaeal, and fungal abundance and the prokaryotic diversity in these samples. The 2nd experiment addressed the third hypothesis by comparing 250 mg soil samples and aliquots of a homogenized soil slurry. The volumes of the aliquots were chosen to contain the amount of DNA expected from 1, 5, and 25 mg soil samples. Since all aliquots were taken from the same thoroughly homogenized soil slurry, differences in prokaryotic community structure between these soil homogenate samples must be results of contamination, stochastic effects, or sub-optimal performance of the DNA extraction and PCR. In the 3rd experiment, DNA was extracted from individual aggregates and 250 mg soil samples taken from the same soil to address the fourth hypothesis. All experiments included several control samples to test for the presence of contamination.

| Soil sampling and DNA extraction
The soil used in all experiments was loam topsoil of a Haplic Chernozem (FAO classification) from the Bad Lauchstädt experimental research station of the Helmholtz Centre for Environmental Research in Germany (51°24'N 11°53'E) (Merbach & Schulz, 2013). It originated from the Static Fertilization Experiment, initiated in 1902, and samples were collected in December from a plot without any fertilization since 1903 (treatment NIL) (Ludwig et al., 2011), which was under long-term sugar beet-potato-winter wheat-barley rotation. Consequently, the soil was compared with its fertilized variants depleted in nitrogen (Blair et al., 2006). The soil samples had a pH value of 7.1 (in 0.01 M CaCl 2 ) and 17.7 mg kg −1 organic C. It was sieved (2 mm mesh size) and stored at 4°C until use.
Before sampling, approximately 100 g of soil was incubated at room T in the dark for 24 h. The soil was then spread out in a sterile Technologies, Eugene, OR) but accurate quantification was not possible for many of the small samples due to results close to the background fluorescence in the blank controls. In preparation for this study, several DNA extractions methods were tested but were not found suitable. This included the Fast DNA Spin kit for soil (MP Biomedicals, LLC, Illkrich, France) and variations of the phenol-chloroform protocol (Miller et al., 1999).
In the 1st experiment, five size-groups of soil samples were collected: 250, 125, 25, 5, and 1 mg, respectively. Eight samples were taken from each size-group along with six control samples. Sample weights from all experiments are listed in Table 1. In the 2nd experiment, ten samples of 250 mg soil and six control samples were taken. Eight of the soil samples were processed normally in the DNA extraction, while for the other two, the DNA extraction was interrupted after the centrifugation following the bead beating. By this point, the samples had been turned into homogenized slurry by the bead beating, and the soil debris had been separated from the supernatant that contained the metagenomic DNA. The supernatant from the two samples was pooled and homogenized by vortexing.
The mass of the resulting suspension was 1112 mg and it originated from 497 mg soil in total. Accordingly, a 55.9 mg aliquot of this suspension contained the amount of DNA extractable from 25 mg soil, an 11.2 mg aliquot the amount from 5 mg soil, and a 2.2 mg aliquot the amount from 1 mg soil. Eight aliquots from each of these sizes, hereafter 25, 5, and 1 mg soil homogenate samples, were taken and mixed with 350 µl BashingBead Buffer from the DNA extraction kit to continue the DNA extraction. In the 3rd experiment, 37 individual soil aggregates, weighing 5.3 mg on average and similar in size (ca. 2 mm), were taken for DNA extraction along with 35 samples of 250 mg soil and nine control samples.

| Abundances of microbial groups assessed by qPCR
The abundance of Bacteria, Archaea, and Fungi was assessed by qPCR targeting the 16S rRNA gene and the ITS region as described previously (Hemkemeyer et al., 2015). Archaeal and fungal abundance were investigated only in the 1st experiment. Reactions were run in a Bio-Rad CFX96 real-time PCR system in duplicates from different dilutions of the DNA extracts. In the case of the 250 and 125 mg soil samples, 50-and 100-fold dilutions were taken; from the 25 mg samples 10-and 20-fold dilutions; and from the 5, 1 mg, individual aggregate, and control samples undiluted DNA extracts and twofold Results were compared with Tukey's HSD tests or Welch's t-test in case of the data from the 3rd experiment. The analysis was carried out in R 3.4.4 (www.r-proje ct.org). One of the 250 mg samples from the 1st experiment yielded a magnitude higher copy number in the fungal ITS qPCR assay than the others. It was treated as an outlier and excluded from the analysis.

| High-throughput sequencing of 16S rRNA gene amplicons and data processing
To characterize the bacterial and archaeal communities in the samples, DNA extracts were subjected to high-throughput amplicon sequencing of the V4 region of the 16S rRNA gene following the protocol of Kozich et al. (2013) with primers updated to match the modified 515f and 806r sequences according to Walters et al. (2016). In the case of the small soil samples and control samples, due to the low DNA yield, 10 µl DNA extract was used as a template in the PCRs.

Paired-end sequencing was done on Illumina MiSeq instruments at
StarSEQ (Mainz, Germany). Samples from the same experiment were sequenced in the same run. For the availability of all data, see Data Availability Statement.
The sequencing data from the three experiments were analyzed separately. Raw reads were processed with the dada2 (version 1.6.0) pipeline (Callahan et al., 2016) in R 3.4.4. Forward and reverse reads were truncated at positions 240 and 90, respectively. Reads with any ambiguous bases were discarded as well as forward reads with over two and reverse reads with over one expected error. The data from the 2nd experiment had higher quality allowing the reverse reads to be truncated at position 130 and keeping those with two or less expected errors. Error models were constructed from 10 6 randomly selected reads. Sequence variants (SVs) were inferred using the pool option. Forward and reverse SVs were merged trimming overhangs, and the removeBimeraDenovo function was employed to detect chimeras. The SVs were classified according to the SILVA reference version 132 (Pruesse et al., 2007) accepting only results with ≥70% bootstrap support. SVs shorter than 220 nt or longer than 275 nt, or identified as chimeric, mitochondrial, or chloroplast sequences, or not classified into Bacteria or Archaea were deleted. Good's index was calculated to estimate the coverage of the SVs. SVs with ≥0.1% relative abundance in any of the control samples of an experiment were regarded as a potential contaminant and removed from the dataset of that experiment.

| Analysis of sequencing results
Principal component analysis (PCA) plots were created in R using the rda function of the vegan package version 2.5-2 (Oksanen et al., 2018). To decrease the sparsity of the data, SVs not reaching 0.1% relative abundance in any of the samples included in the PCA were removed. Zeroes were replaced with the count zero multiplicative (CZM) method using the zCompositions package version 1.1.1 (Palarea-Albaladejo & Martin-Fernandez, 2015), and centered logratio (CLR) transformation was applied to the data to correct for compositional effects and differences in sequencing depth (Gloor et al., 2016).
Aitchison distances between the samples were calculated as Euclidean distances in the CLR transformed dataset (Gloor et al., 2017). SVs that did not have at least 0.1% relative abundance in any of the compared samples were removed and zeroes were replaced with the CZM method to allow CLR transformation before calculating Euclidean distances with the "vegdist" function of the vegan package. Results were compared with Tukey's HSD tests.
Plots illustrating the prevalence of abundant SVs among the soil samples were prepared in Cytoscape 3.7.1 (www.cytos cape.org).
CoNet 1.1.1 beta  in Cytoscape was used to construct co-occurrence networks from the data from the 3rd experiment. To limit the number of parallel significance tests and the sparsity of the data, only SVs with ≥0.2% relative abundance in at least one of the samples were included. Separate networks were constructed for the 250 mg soil samples and the soil aggregates.
However, the selection of SVs was done on the joint data matrix to ensure that both networks include the same SVs. The data were rel- To assess whether the DNA extraction could recover microbial DNA with similar efficiency from small quantities of soil as from 250 mg samples, estimates of the abundances of Bacteria, Archaea, and Fungi in 1 g of soil were calculated from the qPCR results ( Figure 1; Table   S1: https://doi.org/10.5281/zenodo.4282475). Similar estimates were obtained from the 250 and 5 mg soil samples from the 1st experiment.
Estimates from the 1 mg samples tended to be lower but were not significantly different. In contrast, the estimates of fungal abundance in a F I G U R E 1 Estimates of (a) bacterial, (b) archaeal, and (c) fungal abundance in a gram of soil based on qPCR from the samples from the 1st experiment (gene copy numbers per g of soil wet weight). One of the 250 mg soil samples was an outlier in the fungal ITS qPCR results and is not included in the plot. Sample groups not labeled with the same letter were significantly different in Tukey's HSD tests. Thick lines indicate the median values, the upper and lower hinges the 75th and 25th percentile, whiskers extend to the data extremes gram of soil were significantly lower from the 1 and 5 mg than from the 250 mg samples. Estimates of bacterial abundance obtained from the 25 and 125 mg samples and archaeal abundance from the 25 mg samples were significantly higher than from the 250 mg samples. Bacterial abundance estimates from the single aggregates and 250 mg soil samples of the 3rd experiment covered the same range ( Figure A1), but the mean of the estimates from the single aggregates (2.27 × 10 9 copies/g soil) was lower (p < 0.001) than from the 250 mg samples (4.69 × 10 9 copies/g soil).
The control samples were amplified in the qPCR assay targeting the bacterial 16S rRNA gene but yielded only 3-356 copies per µl DNA extract. In comparison, the 1 mg soil samples had 8777-45,534 copies per µl DNA extract (

| Removing potentially contaminant SVs from the sequencing results
It was possible to generate sequencing results from all control samples (the complete dataset with the taxonomic classification of the  (Table A1). They were similar in their prokaryotic community structures but very different from the soil samples ( Figure A2). Every SV that reached 0.1% relative abundance in any of the control samples of an experiment was considered as a potential contaminant. There were 450, 77, and 591 such SVs in the datasets of the 1st, 2nd, and 3rd experiments, respectively. In the data from the 1st experiment,

| Increasing spatial resolution reveals heterogeneity in soil bacterial and archaeal community structure but not in their abundance
The yield of high-quality 16S rRNA gene amplicon sequences was lower from the 1 mg samples than from the 5 and 25 mg samples in the 1st experiment (Figure 2a). Apart from this, however, the sequencing yield did not differ between the sample groups within any of the experiments. Thus, it is possible to compare the number of SVs detected in the samples without rarefying the data. The number of SVs in the 1st experiment was not significantly different between the 250, 125, and 25 mg samples, but decreased significantly in the 5 mg and even further among the 1 mg samples (Figure 2b).
An opposite trend was clear in the Good's coverage index (Table   A1) In total, 5620 SVs were detected in the eight 1 mg soil samples of the 1st experiment (Table S2:  The qPCR results did not confirm our hypothesis that increasing spatial resolution would reveal heterogeneity in microbial abundance. Bacterial and archaeal 16S rRNA gene and fungal ITS copy numbers did not show a larger variation among the 1 and 5 mg samples than between the 250 mg samples from the 1st experiment ( Figure 1). Similarly, in the 3rd experiment, bacterial abundance did not vary more in the single aggregates than in the 250 mg samples ( Figure A1).

| Impact of stochastic effects and inconsistent performance of the methods
The soil homogenate samples from the 2nd experiment served to test the influence of stochastic effects and sub-optimal performance of the DNA extraction and PCR when extracting small amounts of soil. The estimated bacterial abundance in a gram of soil based on the qPCR results was in general higher in the samples from the 2nd experiment but followed the same pattern as in the samples from the 1st experiment with no differences between the 1, 5, and 250 mg samples but significantly higher values in the 25 mg samples ( Figure   A3)

| Bacterial and archaeal co-occurrence patterns in 250 mg soil samples and aggregates
Networks of prokaryotic co-occurrence were constructed using the (Actinobacteria), and SV36 (Acidobacteria subgroup 6). Their relative abundance in the aggregates was 0.46 ± 0.16%, 0.33 ± 0.13%, and 0.30 ± 0.11%, respectively. These SVs potentially serve a keystone function by connecting two clusters in the network. One of the clusters contains several SVs of Thaumarchaeota, Verrucomicrobia, and Actinobacteria. The other is dominated by Acidobacteria subgroup 6.
The hub of the latter cluster is SV58 (Acidobacteria subgroup 6) with a relative abundance of 0.23 ± 0.16% that has the highest degree in the network being connected to 19 nodes. SV399 (Chloroflexi) (0.06 ± 0.05%) and SV123 (Acidobacteria subgroup 6) (0.13 ± 0.06%) are linked with negative associations to several members of this cluster.

| DISCUSS ION
The large disparity between the scale in which the soil microbiota is usually studied with molecular methods (0.25-1 g of soil) and the distance over which microbial interactions occur, impede the detection of interacting partners (Nunan, 2017). To gain information on the soil microbial diversity at an increased spatial resolution that considers soil structure, in this study, we reduced the amount of soil used for DNA extraction from 250 to 1 mg and also extracted individual soil aggregates. Bacterial and archaeal DNA were recovered with not significantly different efficiencies from the 250 mg and the 5 and 1 mg samples as shown by the qPCR results. This was not true for fungal DNA. Either the DNA extraction kit was not efficient in isolating fungal DNA from samples below 25 mg, or fungi may preferentially colonize larger soil aggregates. Our results show that the DNA extraction kit was the most efficient in recovering bacterial and archaeal DNA from 25 to 125 mg soil, although the variation in the yield of 16S rRNA gene copies was large among these samples.
Since this increased variation was apparent among the 25 mg soil homogenate samples of the 2nd experiment as well, it is not an indication of an uneven distribution of bacterial cells at the scale of 25 mg samples but must be due to this particular DNA extraction method not working with consistent efficiency with this amount of soil.
As a consequence of sampling small amounts of soil, the DNA extracts had low template concentrations for the subsequent PCR analyses. Thereby, we had to anticipate a high risk of contamination affecting the results (Weiss et al., 2014). Quantifiable amounts of Bacteria and Fungi, but not Archaea, were detected in the control samples without soil. However, they reached no more than 4% of the number of bacterial rRNA gene copies in the smallest soil samples, and thus, the influence of contamination on our results is negligible.
The bacterial community found in the control samples was distinct F I G U R E 3 Principle component analyses (PCA) plots from the 16S rRNA gene sequencing data from the 1st (a) and 3rd (b) experiments from the soil communities suggesting that the contamination originated from the reagents of the DNA extraction and sequencing library preparation rather than cross-contamination between samples (Glassing et al., 2016;Salter et al., 2014). Another concern of working with small samples is that molecular methods applied to such small amounts of a template may perform inconsistently leading to artificial variation in the results. The 2nd experiment showed that DNA extraction, PCR, and sequencing did not artificially generate more variation in the results from 5 mg samples than the variation present among the 250 mg samples and the 1 mg samples showed only a slightly higher variation. Therefore, the large heterogeneity in prokaryotic community composition and structure among the 1 and 5 mg soil samples from the 1st experiment was not caused by stochastic effects or PCR bias.
The samples of 25 up to 250 mg of soil were close to identical in prokaryotic community composition, thus they provide a good representation of the overall prokaryotic diversity of our soil. This is also indicated by the fact that increasing the amount of soil extracted up to 25 mg increased the number of SVs detected in the samples, but larger soil samples did not yield more SVs. Thus, the 25 mg samples had good coverage of the total prokaryotic community. In contrast, the 1 and 5 mg samples and single aggregates were heterogeneous in community structure. We found that while small soil samples could recover some SVs not necessarily detected with the conventionally used 250 mg samples; these SVs were typically low in abundance. Very few exceeded the relative abundance threshold we applied to control the sparsity of the data in our analysis of community structure. Therefore, the large heterogeneity of F I G U R E 4 SVs arranged according to how many of the 1, 5, 25, 125, or 250 mg samples from the 1st experiment they were detected in. Only SVs that reached at least 0.1% relative abundance in any of the samples are included. Each node represents one SV colored based on its phylum-level classification and sized according to its average relative abundance across all samples excluding those in which it was not detected the prokaryotic community structure we observed among the small samples was not because they would have enabled the detection of more SVs. Instead, it appears that they contained different subsets of the total community present in the 25-250 mg samples. This could be explained by the fact that the smaller samples contain fewer microhabitats, each of which harbors a local community of fewer species (Leibold et al., 2004). Interestingly, the 1 and 5 mg samples did not significantly differ in the abundance of Bacteria, Archaea, and a patchy distribution at the scale of a few micrometers (Nunan et al., 2003) but, for the soil of this study, not at the scale of macroaggregates or 1-5 mg samples.
Network analyses based on microbial co-occurrence have been applied to soil samples as large as 10 grams (Khan et al., 2019) and are typically used with 250 mg-1 g samples (Barberan et al., 2012;Karimi et al., 2020). In this study, however, we could not detect stable and significant associations between SVs from 35 samples of 250 mg soil. These samples were taken from the same well-mixed batch of soil and were similar in their prokaryotic community composition. This is fundamentally different from the above-cited studies that compared soil samples taken from different ecosystems or across an entire country, thus soil samples that can greatly differ in microbial community composition. In contrast, our 250 mg samples, coming from the same soil, were similar. It is likely that each of them gave a good representation of the overall prokaryotic diversity in our soil, in which case, the variation of the relative abundance of SVs in these samples was mostly random.
It is not surprising if small, random differences do not yield stable and significant associations in network analysis. The value of using much smaller samples is shown by our result that the 1 and 5 mg samples contained subsets of the total soil microbial diversity captured by the 250 mg samples, and with increasing spatial resolution the heterogeneity in the bacterial and archaeal community structure increased among the samples. This results in detectable co-occurrence patterns. Furthermore, the smaller spatial scale increases the likelihood that the observed co-occurrences indicate interactions (Cordero & Datta, 2016).
From 37 soil aggregates, we obtained a complex network of bacterial and archaeal co-occurrence that contained two clus- which is known to be a strong contributor to ammonium oxidation in agricultural soils (Leininger et al., 2006). Compared to ammonium-oxidizing bacteria, Thaumarcheaota are thought to be adapted to lower nitrogen concentrations (Pester et al., 2011); thus, the nitrogen-depleted soil of this study is likely a favorable environment for them. The two clusters in our aggregate co-occurrence network could represent two distinct types of metabolism adapted to a nitrogen-depleted soil: a chemoorganotroph that oxidizes SOC, and a chemolithotroph that oxidizes ammonia produced for example by ammonification from crop residues. The presence of the less abundant Verrucomicrobia and Actinobacteria SVs within the Thaumarchaota dominated cluster is possibly linked to an oligotrophic lifestyle (Bergmann et al., 2011;Fierer et al., 2007), but considering the limited information that 16S rRNA gene analyses can provide for these phyla, this remains yet only a hypothesis.
Shotgun sequencing and metagenomic analysis of DNA extracted from individual soil aggregates could shed more light on the nature of the associations we detected in the co-occurrence network.
In general, such aggregate-level analyses of the soil microbiota, which we call "aggregatomics," could inspire new ways of linking structure to function in soil microbial communities.
While the spatial scale that we reached in this study is not yet fine enough to reveal most microbial interactions as they may occur in microaggregates (Raynaud & Nunan, 2014), it should be able to support the development of hypotheses and experiments to understand the F I G U R E 5 Aitchison distances in the bacterial and archaeal community structure (16SrRNA gene amplicons) within sample groups from the 1st and 2nd experiments. Thick lines indicate the median values, the upper and lower hinges the 75th and 25th percentile, whiskers extend to the data extremes. Sample groups from the same experiment not labeled with the same letter were significantly different in Tukey's HSD tests patterns and processes shaping the assembly of soil microbial communities and modeling their behavior Tecon & Or, 2017). Developing DNA extraction protocols from even smaller soil samples, approaching the microaggregate level, should be a way forward to fuel soil aggregate-oriented research ("Aggregatomics"; https://www.thuen en.de/en/bd/field s-of-activ ity/feld-und-labor studi en/micro biolo gy-and-molec ular-ecolo gy/soil-aggre gatomics) for unveiling hidden patterns of functions and ecological interactions.

F I G U R E 6
Co-occurrence networks from the (a) 250 mg samples and (b) the single aggregates from the 3rd experiment. The frames mark the two clusters discussed in the text. It should be noted that in (a) unstable edges were not removed and the Benjamini-Hochberg correction for multiple comparisons was not applied

E TH I C S S TATEM ENT
None required.

ACK N OWLED G M ENTS
We thank Ines Merbach, Department Biozönoseforschung, Helmholtz-Zentrum für Umweltforschung-UFZ, Bad Lauchstädt, Germany, for providing the soil of this study. We thank Naomie Oßwald and Jerome Rischke who supported us at the early stages of this work as bachelor students, and Karin Trescher for excellent technical assistance. This project was carried out in the framework of the Priority Programme "Rhizosphere spatiotemporal organization -a key to rhizosphere functions," (SPP2089), financially supported by the German Research Foundation (DFG; Project number 403668538). Open access funding enabled and organized by ProjektDEAL.

CO N FLI C T O F I NTE R E S T
None declared.

DATA AVA I L A B I L I T Y S TAT E M E N T
All data of this study are provided in the results section of this paper apart from (i) the data in Table S1 (Sample weights and qPCR data) and