Tackling Soil ARG‐Carrying Pathogens with Global‐Scale Metagenomics

Abstract Antibiotic overuse and the subsequent environmental contamination of residual antibiotics poses a public health crisis via an acceleration in the spread of antibiotic resistance genes (ARGs) through horizontal gene transfer. Although the occurrence, distribution, and driving factors of ARGs in soils have been widely investigated, little is known about the antibiotic resistance of soilborne pathogens at a global scale. To explore this gap, contigs from 1643 globally sourced metagnomes are assembled, yielding 407 ARG‐carrying pathogens (APs) with at least one ARG; APs are detected in 1443 samples (sample detection rate of 87.8%). The richness of APs is greater in agricultural soils (with a median of 20) than in non‐agricultural ecosystems. Agricultural soils possess a high prevalence of clinical APs affiliated with Escherichia, Enterobacter, Streptococcus, and Enterococcus. The APs detected in agricultural soils tend to coexist with multidrug resistance genes and bacA. A global map of soil AP richness is generated, where anthropogenic and climatic factors explained AP hot spots in East Asia, South Asia, and the eastern United States. The results herein advance this understanding of the global distribution of soil APs and determine regions prioritized to control soilborne APs worldwide.

This PDF file includes: a, To understand the distribution of soil environmental ARG-carrying pathogens via metagenome-assembled genomes (contigs), we collected the data from 1385 (public database) and 258 (in-house) samples that had been sequenced using shotgun metagenomics; b, The metagenomic sequencing data was processed by our in-house pipelines to generate high-quality reads using Trimmomatic (v2.3).The high-quality reads from each sample were individually assembled into contigs using MEGAHIT v1.2.9 with the parameters "kmers 27, 37, 47, 57, 67, 77, 87, 97, 107, 117,127, 141"; c, Annotation of pathogens, ARG and MGEs.Contigs were aligned against a self-constructed pathogen database using CLARK (v1.2.6).After extracting the pathogen contigs, the open reading frame was first predicted and then used on gene sequences using the recently published SARG2.0 which was implemented in ARG-OAP (v2.2).Plasmid, integrative and conjugative elements (ICEs) and integrons were detected in the ARG-carrying pathogens.
The presence of plasmid sequences was checked by PlasFlow (v1.1) with default parameters.The ICE-encoding ARGs were determined based on similarity alignment (>70%) against the ICEs database downloaded from ICEberg2.0.Integron Visualization and Identification Pipeline (I-VIP) is a well-organized pipeline to identify, classify, annotate, and visualize class 1 integrons.The mobility of ARGs was predicted based on either their location on the plasmid contig or co-occurrence with an MGE (i.e. they shared a contig with an MGE gene); d, Paired reads were mapped to the metagenomic contigs with Bowtie2 (v2.3.2) using the default parameters.CoverM was used to remove reads aligned for <90% of their length and <95% identity.Filtered bam files were passed to SAMtools to determine how many positions were covered by reads.CoverM was used to calculate the mean coverage of contigs across samples using the "trimmed_mean" mode with default parameters.The relative abundance table of pathogens was generated and normalized by the number of metagenomic reads in each sample using R.The normalization calculation was performed according to Emerson et

Figure
Figure S1 to S8

Figure S2 |
Figure S2 | Overview of sample sequencing depth, pathogen and ARG abundance (n=1643).a, Total clean reads in each sample; b, Pathogen reads in each sample; c, Copies of ARG per cell in each sample; d, Proportion of pathogen reads in each habitat.

Figure
Figure S3 | Overview of abundance and ARG profiles of ARG-carrying pathogens (n=1443).a, Reads of ARG-carrying pathogens at the global scale; b, Abundance of ARG-carrying pathogens in each habitat; c, Abundance (copies of ARG per cell) of ARGs in each habitat; d, Composition of antibiotic resistome in each habitat.

Figure S4 |
Figure S4 | Co-occurrence patterns among antibiotic resistance genes and their host pathogens across seven habitats.a, Overall presence profile of pathogen-ARGs across all habitats based on infection objects and ARG categories.The ten ARGs and pathogens with the highest degree are listed; b, The coexistence characteristics of ARG-pathogens from the perspective of pairs number.Number of pairs between different pathogens and ARGs within ARG-carrying pathogens (in blue) and ARG-carrying pathogens containing MGEs (in green).Line chart shows average links per pathogen (value in parentheses)

Figure S6 |
Figure S6 | Important drivers of richness and model validation.a, Feature selection for random forest algorithms to predict AP richness based on 10-fold cross-validation; b, Hyperparameter tuning for random forest algorithms to predict AP richness.In the best model, ntree and mtry are 1200 and 10, respectively; c, Random forest mean predictor importance (percentage of increase of mean square error) of environmental indices as drivers for soil AP richness.The accuracy importance measure was computed for each tree and averaged over the forest (1000 trees).Percentage increases in the MSE (mean squared error) of variables were used to estimate the importance of these predictors, and higher MSE% values imply more important predictors.Significance codes are as follows: '**' P < 0.01, '*' P < 0.05.MSE, mean squared error.(pd: Population density; hdi: Human development index; cfv: Coarse fragments volumetric; dp: precipitation of driest month; ph: Soil pH; rad: BIO21-Highest_Weekly_Radiation; clay: Clay content (0-2 micro meter); ref: MCD43A4.005-BRDF-AdjustedReflectance 16-Day Global 500m; soc: soil organic carbon; brdf: Exponentially weighted difference in EVI

Figure S7 |
Figure S7 | Global map of soil-borne ARG-carrying pathogen (AP) richness.AP richness with prediction results derived from random forest analysis.Map was drawn in ArcGIS at 0.01 degree resolution using projection WGS84.

Figure S8 |
Figure S8 | Model accuracy assessment across the globe.Bootstrapped (100 iterations) coefficient of variation (standard deviation divided by the mean predicted value) as a measure of prediction accuracy.Sampling was stratified by habitats.
al; e, Using Random Forest models together with environmental factors we detected global AP richness and predicted the progression of APs under future climate.