Integrated Human Skin Bacteria Genome Catalog Reveals Extensive Unexplored Habitat‐Specific Microbiome Diversity and Function

Abstract The skin is the largest organ in the human body. Various skin environments on its surface constitutes a complex ecosystem. One of the characteristics of the skin micro‐ecosystem is low biomass, which greatly limits a comprehensive identification of the microbial species through sequencing. In this study, deep‐shotgun sequencing (average 21.5 Gigabyte (Gb)) from 450 facial samples and publicly available skin metagenomic datasets of 2069 samples to assemble a Unified Human Skin Genome (UHSG) catalog is integrated. The UHSG encompasses 813 prokaryotic species derived from 5779 metagenome‐assembled genomes, among which 470 are novel species covering 20 phyla with 1385 novel assembled genomes. Based on the UHSG, the core functions of the skin microbiome are described and the differences in amino acid metabolism, carbohydrate metabolism, and drug resistance functions among different phyla are identified. Furthermore, analysis of secondary metabolites of the near‐complete genomes further find 1220 putative novel secondary metabolites, several of which are found in previously unknown genomes. Single nucleotide variant (SNV) reveals a possible skin protection mechanism: the negative selection process of the skin environment to conditional pathogens. UHSG offers a convenient reference database that will facilitate a more in‐depth understanding of the role of skin microorganisms in the skin.


Introduction
In humans, skin health and diseases are related to the skin microbiome. [1]1b,2] However, the lack of sufficient reference data on microbial diversity hinders metagenomic data analysis and the comprehension of the potential function of microbial species.Culture research on skin microorganisms has constantly revealed fresh insight into the impact of skin community biology on human skin health. [3]Although some unknown skin microorganisms may evade current culture methods for various reasons (such as lack of nutrients in the growth medium or low abundance of skin microorganisms), they may play an important biological role that has yet to be elucidated.Therefore, obtaining skin microbial genomes and establishing a comprehensive catalog is vital for novel discovery.
Several studies have demonstrated that culture-independent and reference-free methods are successful strategies for novel species discovery and characterization. [4]Metagenomic analysis can involve binning de novo assembled contigs into hypothetical genomes, known as metagenomic assembly genome (MAG).Some studies have applied this method to reconstruct various MAGs, [4,5] one of which restored thousands of genomes of skin microorganisms in North American samples [5a] and provided an exhaustive landscape of the diversity of the North American skin microbiome.
Here the Unified Human Skin Genome (UHSG) was generated from 2519 human skin metagenome samples.The UHSG includes 2111 high-quality (HQ) MAGs and 3668 mediumquality (MQ) MAGs.Next, 1385 novel assembled genomes were identified and studied for their unique potential functions.A further genomic clustering procedure resulted in 813 inferred prokaryotic species that included 549 near-complete prokaryotic species (NCPS).In addition, the horizontal transfer of genes between species, the potential secondary metabolite biosynthesis, and single-nucleotide variants was assessed based on the NCPS.These provided us with new insights into the species and functions of skin microorganisms and their crucial roles in the human skin.

Large-Scale Uncultured and Unassembled Bacterial Species Discovered in the Skin
To achieve exhaustive characterization of the human dermal microbiota, we performed deep shotgun sequencing of 450 healthy facial samples from the 4D-SZ cohort (totaling 9.7 Terabyte (Tb) of high-quality data, an average of 21.5 Gb of data) and retrieved 2069 human skin metagenomic datasets (totaling 6 Tb of highquality data, average 2.9 Gb data) from 4 different studies [1b,2,6] (Figure S1 and Table S1, Supporting Information).The available skin microbiome samples were mainly from the United States (n = 1099, 43.6%) or China (n = 1271, 50.5%); most samples were healthy samples, reflecting geographical and healthy sample favoritism in current human skin microbiome studies.
To obtain comprehensive, high-quality reference genomes of the skin microbiome, we restored 27406 genomes using 2519 metagenome samples (Figure S1, Supporting Information).CheckM software was used to determine the genomes of contamination and completeness, whether high-quality (HQ, completeness ≥ 90%, contamination ≤ 5%) or medium-quality (MQ, completeness ≥ 50%, contamination ≤ 10%).Based on these metrics, 2111 HQ MAGs and 3668 MQ MAGs were obtained (UHSG, Figure S1 and Table S2, Supporting Information), which MAGs were obtained mainly from the samples collected in the 4D-SZ cohort (Figure S1, Supporting Information, n = 3594, 62.2%).Afterward, each HQ and MQ MAG was classified using the Genome Taxonomy Database Toolkit. [7](GTDB-Tk, Figure 1A).However, 24% (n = 1385) of the MAGs could not be classified as a reported species, and 32.2% (n = 1862) of the MAGs were uncultured genomes of bacteria, establishing that the part of the UHSG is not currently represented in the reference databases (Figure 1A).Two of the 5779 MAGs were genomes of archaea, one of which could be classified as Natronococcus amylolyticus.The other could not be classified as an existing species, thus indicating the scarcity of archaea in the skin microbiome (Table S2, Supporting Information).UHSG adequately covered different taxonomic groups (Figure 1A), among the top 20 most abundant species in the skin (Figure S2 Next, we focused on "novel" genomes that were not assigned (95% average nucleotide identity (ANI)) to an existing species in GTDB (Figure 1B).The "novel" genomes were concentrated in Proteobacteria (682, 49.21%), Actinobacteriota (228, 16.45%), Bacteroidota (167, 12.05%), and Deinococcota (62, 4.47%), involving 82 orders and 139 families (Figure 1C and Figure S3, Supporting Information).Furthermore, the coding space (ranging from 57.1% to 95.0%), gene duplication rate (ranging from 0% to 5.9%), and "novel" potential function (ranging from 2.4% to 62.7%) of the genomes were highly diverse (Figure 1C, Figure S3 and Table S3, Supporting Information), supporting the important position of "novel" genomes in maintaining the functional diversity in the skin.In addition, novel species-level branches within 11 phyla were found, which belong to phylum www.advancedscience.commembers of Bdellovibrionota, Verrucomicrobiota, Myxococcota, Cyanobacteria, Eremiobacterota, Planctomycetota, Armatimonadota, Acidobacteriota, Bdellovibrionota, Verrucomicrobiota, and Chloroflexota (Figure 1A,B).These branches were mainly assembled in high-depth sequencing samples (Figure S4, Supporting Information), which suggested that high-depth sequencing is beneficial to the genome assembly of unknown skin bacteria.
4b,8] Based on genome information (minimal contamination and maximum completeness), the best quality genomes from each cluster were chosen as representative species-level genome bins (rSGBs).There were 813 rSGBs inferred from the clustering procedure, of which 549 were near-complete prokaryotic species (NCPS, completeness ≥ 90%, contamination ≤ 5%) and 470 were "novel" species (Table S4, Supporting Information).The number of rS-GBs was found to be far fewer than those of the reported gut microbiome [4b] or oral microbiome, [5b] discrepancy may be due to the number of rSGB has not reached a saturation point, indicating that more species remain to be discovered (Figure S5, Supporting Information).All skin metagenomic datasets were mapped to 813 rSGBs, confirming the representation of human skin microbial diversity in the UHSG.Using BWA-MEM, a mean classification rate of 77.81% was obtained.In comparison to the HMP database, this represented a 35.5% improvement (Table S1, Supporting Information).Samples from China and the United States showed the most remarkable improvements in classification rate, highlighting the potential of the UHSG catalog for studying microbiome diversity among understudied populations.

Genomic Variation in Metabolic Potential of Human Skin Bacteria
5a,6a] The 5779 genomes encode a total of 14590154 protein-coding genes (Table S2, Supporting Information), which can be clustered into 2657347 nonredundant protein families (NPFs).On average, 90.8% and 0.03% of genes showed sequence/domain homology in the eggNOG, and CARD, respectively (Table S2, Supporting Information).6a] In addition, principal coordinate analysis (PCoA) of the NPF profiles revealed a clear separation between the 17 bacterial phyla Actinobacteriota, Proteobacteria, Firmicutes, and Bacteroidota (Figure S6, Supporting Information).Unexpectedly, the Cutibacterium granulosum and other species taxonomy of Cutibacterium showed clear isolation, suggesting differences in their potential function of skin (Figure 2 and Figure S6, Supporting Information).Specifically, cofactor and vitamin biosynthesis, amino acid metabolism, phosphotransferase system, and lipid metabolism were significantly different among different species taxonomy of Cutibacterium (Table S5, Supporting Information).
The core functional pathways were reconstructed based on the completeness ratio of KEGG modules (Figure 2 and Figure S7, Supporting Information).All species had near-complete central metabolism (i.e., glycolysis, gluconeogenesis, pentose phosphate pathway, and citric acid cycle), amino acid metabolism (i.e., lysine, proline, arginine, tryptophan, leucine, isoleucine, histidine and threonine biosynthesis), and nucleotide metabolism pathways (Figure S7, Supporting Information).The Actinobacteriota and Firmicutes phylum, especially Cutibacterium and Staphylococcus genus, displayed a more diverse phosphotransferase system (PTS) diversity than the other bacteria (Figure 2), suggesting their vital ability in the uptake of sugar and its derivatives.While Armatimonadota, Bacteroidota, Bdellovibrionota, Myxococcota, Planctomycetota, Proteobacteria (Moraxella and Acinetobacter), and Verrucomicrobiota showed richer functional diversity in lipid metabolism (Figure 2).In our previous study, [6a] two skin cutotypes were found based on data-driven categorization: the C-cutotype comprised mainly Cutibacterium, and the Mcutotype comprised mainly Moraxella.Here, through the functional analysis of the genome, the potential reason for the formation of two skin cutotype was that the two skin microbiota groups have different "nutrient requirements."More importantly, antibiotic resistance genes (ARGs) showed significant heterogeneity in different phyla (Figure 2).To evaluate the changes of ARGs between different phyla, NCPS genome sequences for ARGs were analyzed using the CARD database [9] (v3.1.4)(Table S6, Supporting Information).The reservoir of ARGs in the skin was mainly from Proteobacteria and Firmicutes.The drug-resistant reservoirs in the skin were found to be mainly in Staphylococcus in Firmicutes and Pseudomonas, Acinetobacter, Klebsiella, Enterobacter himalayensis, Yokenella regensburgei, and Pluralibacter gergoviae in Proteobacteria, which are multidrug-resistant bacteria (Figure S8, Supporting Information).Most drug-resistant bacteria were resistant to tetracycline antibiotics and fluoroquinolone antibiotics (Figure S8, Supporting Information), as these are widely used. [10]Notably, the drug-resistant bacteria of Proteobacteria in the skin were not resistant to fusidic acid, glycopeptide antibiotic, oxazolidinone antibiotic, para aminosalicylic acid, pleuromutilin antibiotic and streptogramin antibiotic.The drug-resistant bacteria of Firmicutes were not resistant to benzalkonium chloride, bicyclomycin, carbapenem, cephamycin, elfamycin antibiotic, glycopeptide antibiotic, glycylcycline, isoniazid, monobactam, nitrofuran antibiotic, nitroimidazole antibiotic, and para aminosalicylic acid (Figure S8, Supporting Information).Our findings could guide antibiotics use in the treatment of skin diseases.
There were 238 horizontally transferred genes (10.5%) with a genetic divergence of less than 1%, indicating recent transfers (Figure S10A, Supporting Information).MetaCHIP also identified 2038 (89.5%) gene transfers with a genetic divergence higher than 1%, i.e., representing non-recent transfers (Figure S10B, Supporting Information).Then, the COG system was used to annotate recently and non-recently detected HGTs predicted by MetaCHIP software. [12]The results showed that COG categories enriched with recent HGTs differ from those enriched with nonrecent HGTs.For example, COG categories C (energy production and conversion), Q (secondary metabolites biosynthesis, transport, and catabolism), and I (lipid transport and metabolism) were only enriched in the non-recent HGTs, while categories K (transcription), L (replication, recombination, and repair), and U (intracellular trafficking, secretion, and vesicular transport) were enriched in recent HGTs (Figure S10B, Supporting Information).This varied from the previous results [12] of HGT in soil and gut genomes.It shows that species in different growth environments have different needs for functional genes in the process of evolution.
HGT is a major factor contributing to the rapid spread of resistance. [13]Six drug resistance genes (adeF, mexQ) were identified in 2276 candidate HGT events (Figure S11, Supporting Information).The multidrug resistance gene (mexQ) of Pseudomonas aeruginosa, considered a global health issue, [14] was found to be transferred to Serratia marcescens_1, suggesting the need to pay attention to the transmission of multidrug resistance genes.In addition, the adeF gene with tetracycline and fluoroquinolone resistance [15] may be widespread among skin bacteria, implying that the type of antibiotics used on the skin surface needs to be carefully considered.
Next, a cluster relational network was constructed based on the shared gene to compare the degrees of shared genes in biosynthetic clusters.This method revealed the diversity of potential secondary metabolites in the skin, and many diverse and sparsely connected BGCs in the skin microbiota were found (Figure 4C and Figure S12, Supporting Information).The relational network of clusters was mainly formed between species within Actinobacteriota, Proteobacteria, Firmicutes, and Bacteroidota (Figure 4C and Figure S12, Supporting Information).For example, A conserved type I PKS locus nearly ubiquitous in the Actinobacteriota formed three dense network clusters.At the same time, the conserved type III PKS locus was a dense network cluster in Proteobacteria (Figure 4C).Proteobacteria formed three aryl-polyene, and four beta-lactone, six terpene; Actinobacteria formed two beta-lactone and three redox−cofactor conserved clusters; Bacteroidota formed conserved clusters of four aryl-polyene and two terpenes (Figure S12, Supporting Information).The high conservatism of these BGCs in the taxa may indicate the presence of a novel group of metabolites.

Single-Nucleotide Variants (SNVs) of Skin Microbiome
To further explore the potential functions of skin bacteria, the single nucleotide variants of the NCPS were studied.The singlenucleotide variants (SNVs) of the NCPS were quantified by inStrain. [22]This yielded 168 076 666 SNVs from NCPS in 2519 samples, of which 57.4% were synonymous mutations (Ks), 35.1% were non-synonymous mutations (Ka), and 7.5% occurred in the intergenic region (Figure 5A).Most notably, the proportion of synonymous and non-synonymous mutations (i.e., selective pressure) varied significantly between different species and skin environments.All bacteria exhibited negative selection in dry skin environments (Figure 5A).The selective pressure of the common skin bacteria Cutibacterium acnes, Moraxella_A osloensis, Corynebacterium kefirresidentii, Cutibacterium humerusii, Staphylococcus epidermidis, and Cutibacterium granulosum were positively selective in sebaceous and moist environments.While the selective pressure of pathogenic Pseudomonas (including P. sp003428805, P. sp002742565, and P. putida), Bosea spp., Microbacterium spp., and Comamonas tsuruhatensis were negatively selective in sebaceous and moist environments (Figure 5A).Additionally, we observed differential selection processes for certain bacteria across different skin environments.For instance, Sphingomonas spp.showed positive selection in sebaceous environments but negative selection in moist environments, Massilia spp.showed positive selection in moist environments but negative selection in sebaceous environments (Figure 5A).This finding suggests that conditional pathogenic bacteria are more inclined toward stabilizing selection, thus reducing the possibility of producing more toxic strains.
Next, SNVs were mapped to skin microbial gene function.It was found that SNVs appeared in almost all COGs, accounting for about 0.1 of the bacterial genome size, and the selection pressure of all COGs was negative selection, with lower negative selective pressure in dry environments compared to sebaceous and moist environments (Figure 5B).More importantly, the selective pressure of various functions of a single species was evaluated based on the genus level.Differences were noted in the selective pressure among different species of Staphylococcus, Pseudomonas, Corynebacterium, Acinetobacter, Moraxella, and Cutibacterium (Figure 5C,D and Figure S13, Supporting Information).These differences likely seem to be related to the conditional pathogenicity of bacteria, especially the negative selection pressure of functional genes of most Pseudomonas and Staphylococcus aureus concerning various diseases. [23]Furthermore, they also display differences in selective pressure across different skin environments.The negative selective pressure observed in species of Staphylococcus, Pseudomonas, Corynebacterium, Acinetobacter, Moraxella, and Cutibacterium in dry environments (Figure 5C,D and Figure S13, Supporting Information) further explains the lower bacterial burden in dry environments of the skin compared to sebaceous and moist environments. [24]Similarly, there are differences in selective pressure between sebaceous and moist environments.For instance, in moist environments, Staphylococcus aureus, Staphylococcus hominis, Staphylococcus capitis, Acinetobacter radioresistens, and Moraxella_A atlantea exhibit higher selective pressure compared to sebaceous environments.On the other hand, Pseudomonas_E sp003428805, Corynebacterium macginleyi, Corynebacterium pseudodiphtheriticum, and Corynebacterium kroppenstedtii shows lower selective pressure in sebaceous environments.In addition, many microbial functions showed positive selection pressure across the three skin environments, including proteins related to transport (K07791, K03291, K07791, K09815, K03308), transferases (K21828, K03801, K00791, K00954, K00611, K00556, K18100), kinase (K00859, K00925), and proteins related to carbohydrate metabolism (K22135, K00024, K01193, K13929, K13877, K00064, K00075, K12972) (Figure 5C,D).Our results emphasize that the interaction between skin bacteria and hosts may influence the direction of bacterial evolution.

Discussion
In this study, deep whole-metagenomic shotgun sequencing of 450 Han skin samples was used in combination with 2069 skin samples to construct the UHSG comprising 2111 high- The color shows the number of SNVs in synonymous, non-synonymous, and intergenic.The two histograms above show the genome size of the top 25 NCPS and Ka/Ks ratio in different skin environments.The pie chart shows the total number and ratio of SNVs in synonymous, non-synonymous, and intergenic.B) The total gene size, SNVs number (synonymous and non-synonymous), Ka/Ks ratio of function in different skin environments, and quality genomes and 3668 medium-quality genomes.Of the 813 prokaryote species in the set, 57.81% were "novel" species representatives, meaning that the majority of microbial diversity in the catalog still needs experimental characterization.During the preparation of this manuscript, a new collection of 622 skin prokaryotic species from culture and assembly were released, [5a] of which 217 prokaryotic species were not in the UHSG set; these 217 species will be included in a future version.The UHSG is a resource that further expands and complements skin microbial studies, providing a comprehensive view of skin microbial diversity.
Skin microbial genomic functional analysis revealed the core functions of microbiota and the specific functions of different taxonomic levels.6a] More importantly, skin microorganisms displayed strong heterogeneity in potential drug resistance function.The drug-resistant bacteria in the skin were more concentrated in Proteobacteria, especially Pseudomonas aeruginosa, which involves almost all kinds of antibiotics, [25] and also showed strong resistance to various antibiotics in the skin.23a] Our findings revealed that Pseudomonas aeruginosa is not resistant to a few antibiotics, including elfamycin, fusidic acid, and glycopeptide.The use of these antibiotics can possibly reduce the mortality rates associated with Pseudomonas aeruginosa infections to a certain extent.
In recent years, it has become popular to mine secondary metabolites of microorganisms based on metagenomic, and numerous secondary metabolites have been identified from the gut, soil, and ocean microbiome. [26]As the largest organ of the human body, the skin is in direct contact with the external environment.The microorganisms on the surface of the skin and the secondary metabolites they produce could play crucial roles in the human body.Several secondary metabolite synthesis genes were found in the genome of skin bacteria.As most of these secondary metabolites 80% could not be identified in the known database, it greatly expanded the secondary metabolite database.In addition, NRPSs and PKSs are essential natural products. [27]because they have antimicrobial, antifungal, antiparasitic, anti-tumor, and immunosuppressive properties gents with clinical value. [28]It was found that the relationship network constructed by NRPS and PKS clusters can reach 60 independent subnets (Figure 4), each representing a bioactive molecule that may be developed and utilized.This finding can act as a reference for subsequent microbial drug development.
With the establishment of the genomic catalog of skin microorganisms, it is clear that more than half of the microbial species and functions have yet to be characterized.Through anal-ysis, we characterized the function of bacteria in the skin, established the horizontal gene transfer between them, predicted the secondary metabolites, and characterized the SNVs of each bacterium and its adaptive evolution.Having such a comprehensive resource can help guide future research and prioritize targets for further experimental validation.By leveraging the information available in the resource, researchers can focus on specific microbial species, genetic pathways, or bioactive compounds that show potential for further investigation.In addition, according to the UHSG catalog, samples from community cohorts or specific clinical environments can reflect the abundance of skin bacteria.Through specific disease classification groups, targeted skin bacteria can be found to deepen the understanding of the role of skin bacteria in disease classification.With regards to the large amounts of uncultured bacteria in the skin, the UHSG catalog can significantly improve the research accuracy and resolution based on metagenome.Therefore, the genome presented in this study and its functional analysis will help elucidate the interactions between human health and skin microorganisms.

Experimental Section
Ethics Statement: The study was approved by the BGI Review Board of Bioethics and Biosafety (BGI-IRB19121).This study complied with all applicable institutional regulations concerning the ethical use of human information and samples.Each participant was required to provide their signed informed consent before enrolling in the program.
Subject Recruitment and Sampling: In this study, 450 healthy volunteers were recruited from the 4D-SZ cohort.Medication and medical treatment history were obtained from each person through questionnaires.Subjects with a history of skin diseases and antibiotic intake in the last six months were excluded.Each subject was requested not to use skincare or cosmetic products on the day before and the day of sampling.To maximize microbial skin load, samples were collected from the left and right cheeks of each subject.The room where the samples were collected was maintained at 20 °C and 50% humidity.6a,29] The DNA concentrations from skin samples were estimated by Qubit (Invitrogen).Due to the low bioburden typical of skin samples, the BGI NGS Platforms library was created using MGIEasy Universal DNA Library Prep Kit (BGI, Cat.No. 1000017571).The libraries were constructed using 0.5-10 ng of extracted DNA as input according to the manufacturer's recommended protocols.Libraries were then sequenced with 2*150 bp paired-end reads on a DNBSEQ-T1.
Metagenome Binning and Quality Assessment: Using bowtie2 [31] (v2.3.5.1), high-quality reads were mapped to contigs in each sample, and MetaBAT2 [33] (v.2.12.1) was used to bin each contig.Next, 27 406 bins proportion of SNVs number to total gene size of different functional units is described through the categories of eggNOG.C) The Ka/Ks differences in potential functions between different species of Staphylococcus in different skin environments.D) The Ka/Ks differences in potential functions between different species of Pseudomonas in different skin environments.with a total length of 4.74*10 10 nt were generated through MetaBAT2.The "lineage_wf" workflow of CheckM [34] (1.1.2) was used to calculate the completeness and contamination of each bin.High-quality (HQ) and medium-quality (MQ) genomes were classified according to the criteria of completeness ≥ 90%, contamination ≤ 5% and completeness ≥ 50%, contamination ≤ 10%, respectively.In total 2111 HQ MAGs and 3668 MQ MAGs were obtained.Barrnap [35] (0.9) (default parameters) was used to search the 16S rRNA sequences in the MAG genomes.4c] To evaluate the number of skin species, dRep (v2.2.4) [36] was used to cluster 5779 MAGs (HQ and MQ).The parameters were as follows: "-pa 0.9 -sa 0.95 -nc 0.30 -cm large."A total of 813 inferred prokaryotic species were obtained from the clustering process, of which 549 were nearly complete prokaryotic species.The construction of species profiles was obtained by aligning high-quality reads to 813 inferred prokaryotic species using bowtie2 [31] (v2.3.5.1).Subsequently, contigs of 813 inferred prokaryotic species coverage depths were calculated using "jgi_summarize_bam_contig_depths" with default parameters. [33]Considering the different sequencing depths of different samples, normalized contig coverage depths were used to estimate contig relative abundance.Finally, the median of the relative abundance of contigs within the same species was utilized as a measure of the species relative abundance.
Taxonomic Classification: GTDB-Tk [7a] (v1.3.1,GTDB database R05-RS95) was used to classify workflow, and with "gtdbtk classify_wf " was used to classify each MAG.The GTDB database release 95 covers nearly 200 000 genomes grouped into 31 910 species clusters.Protein sequence alignments generated by GTDB-Tk were used to construct a maximumlikelihood tree de novo.A phylogenetic tree was created using IQ-TREE [37] (v2.1.2) to illustrate the evolutionary relationships for the 5777 bacteria (two archaea were excluded).The Bayesian information criterion score of "ModelFinder" was used to automatically select the best-fit model.Phylogenetic trees were constructed using LG+F+R9 models.Phylogenetic trees were visualized using the R package "ggtree." Functional Characterization: The protein-coding sequences (CDS) of 5779 genomes were predicted and annotated using prodigal [38] (v2.6.3) and EggNOG [39] (v5.0).For the pan-genome analyses, firstly, orthologous gene clustering of the MAGs was performed using the MMseqs2 algorithm with options "-min-seq-id 0.5 -c 0.9" (similarity 50% and minimum coverage threshold of 90% of the length of the shortest sequence) at the protein level.Then, the protein level sequences of each MAG were compared with orthologous gene clustering using blastp (E-value cutoff of 1*10 −6 and an alignment length of at least 30% [6a] ) function of the blast [40] (v2.9.0+) to identify the common and specific sequences of 5779 MAGs.
SNV Analyses: A total of 539 nearly complete prokaryotic species were used to generate a catalog of SNVs.The high-quality reads were aligned to the nearly complete prokaryotic species by bowtie2 [31] Each sample generated a separate BAM file.The BAM file and nearly complete prokaryotic species were used as inputs and inStrain [22] (v1.0.0) was used with default parameters (minimal coverage of a position of five reads, minimal frequency of an SNP of 0.05, and an FDR of 1*10 −6 ) to identify synonymous and non-synonymous SNVs.By the number of synonymous or nonsynonymous SNVs of species sequencing depth, the effect of sequencing depth between different samples was corrected for comparison between different groups.
Horizontal Gene Transfer Analysis: MetaCHIP (v1.10.4) [12] was used with default parameters to analyze HGTs of 539 nearly complete species levels.Candidate HGT genes used the results identified in the GTDB database [7a] in the defined taxonomic groups, and the potential functions of candidate HGT genes were based on the results annotated by EggNOG. [39].
Biosynthetic Gene Cluster Analysis: With default parameters, antiSMASH [18] (v6.0.0) and DeepBGC [19] (v0.1.29)was used to process nearly complete prokaryotic species genomes.All potential secondary metabolite biosynthesis genes clusters are summarized in Table S8 (Supporting Information).The gene cluster network based on shared genes was established using a BLASTP [40] search of predicted biosynthetic protein sequences.Shared proteins were defined as a protein alignment in which at least 50% of the query sequence was covered and the amino acid identity was >50%.
Statistical Analysis: Principle coordination analysis (PCOA) was conducted based on the Bray-Curtis dissimilarity on the nonredundant protein families profile using the ade4 package [41] to assess the differences in potential functional genes of MAGs at phylum, class, order, family, genus, and species levels.The effect sizes of bacterial phylogeny on potential functional were estimated based on permutational multivariate analysis of variance (PERMANOVA) via the R vegan package.The p-values were generated by the Wilcoxon rank-sum test unless specifically mentioned.

Figure 1 .
Figure 1.Phylogenetic diversity of the human skin microbiota genome catalog.A) Maximum-likelihood phylogenetic tree of 5777 MAGs assembled in human skin.The outer circle of the clades describes the GTDB phylum annotation.MAGs are annotated by their isolated genome availability (1st layer), where they are classified as known species level (layer 2), 16S rRNA sequence availability (layer 3), from research project information (layer 4), MAGs quality control information (layer 5) and the bars represent the genome size (the outermost layer).B) The proportion of unclassified MAGs at the phylum level is displayed.The number of unclassified species at the phylum level and their proportion at the phylum level are represented in brackets.Diversity of composition and potential functional genes families of unclassified skin microbiota.C) The phylogenetic tree shows 82 (order levels) bacterial clades of the unclassified MAGs.The outer circle of the clades shows the number of MAGs for each clade.D) The average genome size and coding density of all species in each branch are displayed.E) The histogram shows the average number of genes, the average number of gene families, and the number of unknown functional genes of all species in each branch.

Figure 2 .
Figure 2. The potential differences in functions of skin bacteria at phylum level.The heatmap indicates the completeness rate of the KEGG metabolic module of NCPS in different phyla.The color depth shows the completeness ratio of KEGG metabolic modules in NCPS; dark purple, genes with coding all enzymes involved in the module were present; light purple, partially present; white, none.

Figure 3 .
Figure 3. Horizontal gene transfer exists between species of bacteria.A) The innerring and band connecting donor and recipient are colored by the protein family of the gene being transferred, with the width of the band correlating to the number of HGTs.The inner ring is colored with microbial order.The outer ring is colored by microbial phylum.B) Sankey diagram depicts the horizontal transfer between phyla.Left: donor.Right: recipient.The percentage represents the horizontal transfer ratio between the same phyla.

Figure 4 .
Figure 4. Diversity of potential biosynthetic gene clusters in skin bacteria.A) The number of biosynthetic gene clusters found on NCPS in each phylum, colored by putative product types as assigned by antiSMASH.B) The number of NRPS, PKS, NRPS-like, and PKS-like gene clusters found on NCPS of each phylum.C) Network of NRPS and PKS biosynthetic gene clusters, edges connect clusters that share genes.The line thickness increases with the increase of genetic similarity.Only the shared gene network of NRPS and PKS biosynthetic gene clusters predicted at one or two phyla levels is circled in purple.D) Biosynthetic gene clusters and NRPS/ PKS found on each NCPS.Purple: NRPS/PKS, other colors: biosynthetic gene clusters.

Figure 5 .
Figure 5. SNV density analysis of skin microbes.A) Histogram diagram depicts the distribution of the top 25 NCPS ranked by the number of SNVs.The color shows the number of SNVs in synonymous, non-synonymous, and intergenic.The two histograms above show the genome size of the top 25 NCPS and Ka/Ks ratio in different skin environments.The pie chart shows the total number and ratio of SNVs in synonymous, non-synonymous, and intergenic.B) The total gene size, SNVs number (synonymous and non-synonymous), Ka/Ks ratio of function in different skin environments, and