Genome and transcriptome analysis of Bacillus velezensis BS‐37, an efficient surfactin producer from glycerol, in response to d‐/l‐leucine

Abstract Surfactin is one of the most widely studied biosurfactants due to its many potential applications in different fields. In the present study, Bacillus velezensis BS‐37, initially identified as a strain of Bacillus subtilis, was used to efficiently produce surfactin with the addition of glycerol, an inexpensive by‐product of biodiesel production. After 36 hr of growth in glycerol medium, the total surfactin concentration reached more than 1,000 mg/L, which was two times higher than that in sucrose medium. Moreover, the addition of l‐ and d‐Leu to the culture medium had opposite effects on surfactin production by BS‐37. While surfactin production increased significantly to nearly 2,000 mg/L with the addition of 10 mM l‐Leu, it was dramatically reduced to about 250 mg/L with the addition of 10 mM d‐Leu. To systemically elucidate the mechanisms influencing the efficiency of this biosynthesis process, we sequenced the genome of BS‐37 and analyzed changes of the transcriptome in glycerol medium in response to d‐/l‐leucine. The RPKM analysis of the transcriptome of BS‐37 showed that the transcription levels of genes encoding modular surfactin synthase, the glycerol utilization pathway, and branched‐chain amino acid (BCAA) synthesis pathways were all at a relatively high level, which may offered an explanation why this strain can efficiently use glycerol to produce surfactin with a high yield. Neither l‐Leu nor d‐Leu had a significant effect on the expression of genes in these pathways, indicating that l‐Leu plays an important role as a precursor or substrate involved in surfactin production, while d‐Leu appears to act as a competitive inhibitor. The results of the present study provide new insights into the synthesis of surfactin and ways of its regulation, and enrich the genomic and transcriptomic resources available for the construction of high‐producing strains.


| INTRODUC TI ON
Biosurfactants, for their excellent surface activity, temperature tolerance, and biodegradability, have received great attention in the past decade. However, apart from a few biosurfactants such as sophorolipids, rhamnolipids, and mannosyl erythritol lipids (Chen, Juang, & Wei, 2015;Desai & Banat, 1997), their industrial application is largely restricted. Surfactin, a lipopeptide biosurfactant produced by Bacillus strains, is one of the most surface-active biosurfactants, and it has potential applications in medicine, agriculture, and microbial enhanced oil recovery (MEOR) (Ashish & Debnath, 2018;Paraszkiewicz et al., 2017;Rodrigues, Banat, Teixeira, & Oliveira, 2006). However, due to its high production cost, its industrial applications are still underdeveloped. Therefore, improving its yield and reducing its production cost will be of great significance. Most of the efficient surfactin producers reported to date prefer sucrose as the carbon source (Jiao et al., 2017;Yan, Wu, & Xu, 2017). By contrast, glycerol, which is a byproduct of biodiesel production, is a widely available and inexpensive source of carbon, making its conversion into value-added products of great interest (Clomburg & Gonzalez, 2013;Faria et al., 2011). Thus, producing surfactin efficiently utilizing glycerol as carbon source can not only reduce the production cost, but also solve the problem of glycerol valorization in the production of biodiesel.
These enzymes preferentially use amino acid and fatty acid residues present in the cytoplasm of the cell as substrates. The hydrophobic tail of surfactin is a β-hydroxylated fatty acid chain with different lengths and isoforms, and the hydrophilic head is a circular heptapeptide containing l-glutamic (l-Glu), l-aspartic (l-Asp), l-valine (l-Val), l-leucine (l-Leu), and d-leucine (d-Leu) (Seydlova & Svobodova, 2008;Yang, Li, Li, Yu, & Shen, 2015). Therefore, amino acids and fatty acids are of great importance for the synthesis of surfactin, regardless of the synthetic mechanism or structural characteristics of the surfactin subtype. Liu, Yang, Yang, Ye, & Mu, 2012 reported that the addition of different amino acids to the cultures had a significant influence on the proportion of surfactin variants with different fatty acids. The peptide moiety contains two l-Leu and two d-Leu residues (Figure 1), and the content of Leu in cells is crucial for the synthesis of surfactin (Etchegaray et al., 2017).
Moreover, the research of Coutte et al. indicated that leucine overproduction significantly improved the production of surfactin in recombinant Bacillus subtilis 168 derivatives (Coutte et al., 2015).
In this study, Bacillus velezensis BS-37 was found to efficiently produce surfactin at a very high level with the addition of glycerol.
Interestingly, the addition of l-and d-Leu to the culture medium as nitrogen source had opposite effects on surfactin production by BS-37.
To better understand the strain BS-37 and the effects of d-/l-Leu on surfactin synthesis, genome sequencing in combination with global transcriptome analysis was employed. The transcription levels of genes related to surfactin biosynthesis, including the modular surfactin synthetase, glycerol utilization pathway, branched-chain amino acid (BCAA) synthesis pathways, and the branched fatty acid (FA) metabolic pathway, were further studied. These results will hopefully provide a theoretical basis for later genetic manipulation of B. velezensis BS-37.

| Bacterial strains and growth conditions
Bacillus velezensis BS-37 is a mutant derivative of strain 723, which was isolated from petroleum-contaminated soil from the Shengli Oil Field, China (Zhu, Xu, Jiang, Huang, & Li, 2014). It was cultivated in 250-ml flasks containing 50 ml of minimal medium (0.2% NH 4 NO 3 , 1% with 2% glycerol (MMG) or 2% sucrose (MMS). Different concentrations (0-20 mM) of d-Leu or l-Leu were added into MMG medium, and a group without added amino acids served as the negative control.
The pH of all media was adjusted to 7.5 with 5 M HCl or 5 M NaOH.
Sterilization was conducted at 121°C for 20 min. The cultures were incubated at 37°C and 200 rpm. At various time-points, the cell density was measured by measuring the OD 600 using a spectrometer.
F I G U R E 1 Schematic representation of surfactin (Coutte et al., 2015)

| Surfactin production assay
The surfactin production was quantified according to a published HPLC method (Zhu et al., 2014). Briefly, 1 ml samples were extracted from the shake flasks, and a cell-free supernatant was obtained by centrifugation at 10,000 g for 10 min. An aliquot comprising 300 µl of the supernatant was withdrawn, diluted fivefold with methanol, and shaken for 1 min. Then, the precipitate of samples was removed through centrifugation for 2 hr at 10,000 g under 4°C.
Subsequently, the supernatant needed to be further filtered by a 0.22 μm nylon membrane remove impurities. The final sample was analyzed by HPLC on a U-3000 system (Thermo Fisher Scientific, USA) equipped with a Synchronis C18 column (4.6 × 250 mm, 5 μm; Thermo Fisher Scientific) and a UV detector (Thermo Fisher Scientific). The analytes were detected at 214 nm, using 90% (v/v) methanol, 10% (v/v) water, and 0.05% trifluoroacetic acid as mobile phase with a flow rate of 0.8 ml/min. The authentic surfactin (98%) reference standard was purchased from Sigma-Aldrich, USA. The proportions of surfactin isoforms were analyzed by comparing the peak areas observed by HPLC to the sum of areas of all surfactin peaks.

| Genome sequencing and annotation
Before sequencing, the BS-37 cells needed to be harvested through centrifugation for 5 min at 10,000 g and 4°C after cultivated for 24 hr. The isolation of genomic DNA was carried out using a Rapid Bacterial Genomic DNA Isolation Kit (GENEWIZ, China). Wholegenome sequencing was performed using a Pacific Biosciences Diamond (Buchfink, Xie, & Huson, 2015). Subsequently, the functions of genes were annotated using the GO database (http://www. geneontology.org/), and the pathways were annotated using the KEGG database (http://www.genome.ad.jp/kegg/). The proteins encoded by the annotated genes were classified based on phylogeny using the COG database (http://www.ncbi.nlm.nih.gov/COG/).

| RNA isolation, library construction, and Illumina sequencing
For RNA extraction, the cells of BS-37 for 12 hr were harvested by centrifugation at 8,000 g for 5 min at 4°C, at the period of the fermentation when the production rate was the highest. The resulting pellets were immediately frozen in liquid nitrogen. Total RNA was isolated using the Trizol Reagent (Invitrogen Life Technologies, USA), according to the manufacturer's instructions. Quality and integrity were determined using a NanoDrop spectrophotometer (Thermo Scientific) and a Bioanalyzer 2100 system (Agilent, USA).
To remove rRNA, the Ribo-Zero rRNA removal kit (Illumina, San Diego, CA, USA) was employed. The synthesis of first-strand cDNA was performed by random oligonucleotides and SuperScript III.
Subsequently, polymerase I and RNase H were used to synthesize the second-strand cDNA. The conversion of remaining overhangs to blunt ends was conducted by polymerase. The hybridization was performed after the adenylation of the 3′ ends of the DNA fragments and the combination with Illumina PE adapter oligonucleotides. To select the preferred cDNA fragments length of 300 bp, the library fragments were purified using the AMPure XP system (Beckman Coulter, Beverly, CA, USA). Illumina PCR Primer Cocktail in a 15 cycle PCR was used to selectively enrich the DNA fragments with ligated adaptor molecules on both ends. Besides, purification and quantification of products were conducted by AMPure XP system (Beckman Coulter) and Agilent high sensitivity DNA assay on a Bioanalyzer 2100 system (Agilent), respectively. Finally, the resulting library was sequenced on a NextSeq 500 platform (Illumina) by Shanghai Personal Biotechnology Co. Ltd.

| Mapping reads to the reference genome and normalized gene expression
After removing out rRNA reads, sequencing adapters, short-fragment reads, and other low-quality reads, the sequencing raw reads were preprocessed. The remaining clear reads were mapped to the reference genome of BS-37 using Bowtie 2 software based on the local alignment algorithm (Langmead & Salzberg, 2012). By calculating the RPKM value (reads per kilobase per million mapped reads), the expression levels and the gene length were normalized to the library (Frazee et al., 2015). DESeq software was used to quantify all transcripts with differential expression (Wang, Feng, Wang, Wang, & Zhang, 2010). Correcting the consequence of multiple hypothesis testing was operated by FDR (False Discovery Rate) control.

| Reconstruction of the KEGG pathway map
The assignment of functional annotation descriptions was conducted by BLASTP (Altschul et al., 1998) in conjunction with the KEGG database (E-value cutoff of 1E-10). Subsequently, the KEGG map was manually redrawn using Adobe Illustrator CC 2014 (Adobe Systems, USA). The expressed genes which involved in the biosynthesis of surfactin were generated as heat maps with the GraphPad Prism 7 (GraphPad Software, Inc., La Jolla, CA, USA).

| Genomic analysis of the strain BS-37
The optimal carbon source for strain BS-37 was found to be glycerol, and its fermentation products contained more than 50% of the desirable surfactin isoform C 15 (Liu, Lin, Wang, Huang, & Li, 2015; Yi et al., 2017). In order to investigate the genomic organization of the metabolic pathways responsible for glycerol utilization and biosynthesis of surfactin, the complete genome of strain BS-37 was sequenced. This strain was initially incorrectly identified as a B. subtilis.

Subsequently, it was demonstrated by complete genome sequenc-
ing that this is a strain of B. velezensis. The principal features of the BS-37 genome are shown in Figure (Table 1). Among the CDS, 3,395 (88.27%) were classified into 21 "cluster of orthologous groups" (COG) functional categories. The functions of most genes were associated with important transcription and metabolism pathways, such as amino acid, carbohydrate, inorganic ion and energy production and conversion pathways ( Table 2).
The putative key genes of glycerol and sucrose metabolism in BS-37 were identified by whole-genome analysis (Figure 3). The amino acid sequences encoded by these genes were arranged in comparison with those of B. subtilis 168 and MT45. In sucrose metabolism, five key enzymes catalyzing the steps from sucrose to glyceraldehyde-3-phosphate were compared, revealing that sucrose transport, hydrolysis, and fructokinase are represented by different isozymes. In fact, there were large differences between strain 168 and BS-37, with SacA, SacP, and GmuE having sequence similarities at the protein level of <80%. By contrast, the similarity with MT45 was higher than 90% in all cases. In glycerol metabolism, three key proteins were compared between 168 and MT45, and their protein sequence similarity reached over 90% (Table 3).
F I G U R E 2 Circular representation of the Bacillus velezensis BS-37 genome. The circular map consists of five circles. From the outmost circle inwards, each circle contains information about the genome regarding the G + C ratio, reverse CDS, forward CDS, rRNA/tRNA, and long segment repeats, respectively
The results revealed that up to 1,000 mg/L of surfactin could be accumulated after 36 hr of fermentation in minimal medium with glycerol (MMG). By contrast, the yield only reached about 350 mg/L in minimal medium with sucrose (MMS; Figure 4a). In addition, this strain reached an OD 600 of 3.5 in MMG, which was twice higher than in MMS (Figure 4b). These results indicate that BS-37 prefers glycerol over sucrose in terms of both growth and efficient surfactin production. Notably, the pH was constant during the 60 hr of cultivation.

| Effects of supplementing d-and l-Leu on surfactin production
To investigate the influence of d-/l-Leu on the growth of BS-37 and the yield of surfactin, a preliminary study of the addition of d-/l-Leu at a series of concentrations was conducted. All the production and biomass data were collected after 36 hr of fermentation.
With the increase in d-/l-Leu concentration, surfactin production showed great differences. The surfactin yield improved with increasing concentrations of l-Leu, and with the supplementation of 10 mM l-Leu reached a maximum of about 2,000 mg/L, twice that of the control (Figure 5a). Adding l-Leu increased the percentage of the surfactin fatty acid chain length variants iso-C 13 and F I G U R E 3 Key genes of the pathways channeling sucrose and glycerol into the glycolytic pathway of Bacillus sp. TA B L E 3 Analysis of amino acid identity of the putative key genes for glycerol and sucrose metabolism iso-C 15 (Figure 5b), as had already been reported (Liu, Yang, Yang, Ye, & Mu, 2012). However, the addition of d-Leu unexpectedly inhibited the production of surfactin. With the increase in d-Leu concentration, the surfactin yield gradually decreased, reaching a low titer of 250 mg/L with the supplementation of 10 mM d-Leu ( Figure 5c). Interestingly, it was found that the inhibitory effects of 10 mM d-Leu on surfactin accumulation can be relieved by adding 5 mM l-Leu, and the surfactin titer was restored to 1,000 mg/L.
The surfactin titer could even be increased to 1,600 mg/L by adding 10 mM l-Leu on the basis of 10 mM d-Leu (Figure 5d).

| The transcriptome analysis effects of adding d-or l-Leu
To reveal the transcriptomic correlates of the overproduction, we Three samples of each group were extracted and sequenced, generating over ten million reads for each sample with cleans ratios >85%. The sequence data are summarized in Table 4. Each sample had over 98% clean reads that could be matched to the reference genome, which indicated that the transcriptome data can provide sufficient support for further analysis.
Differentially expressed genes (DEGs) in response to l-/d-Leu are listed in Table 5. Compared with the control, five genes were upregulated and seven genes were downregulated when adding 10 Mm l-Leu; When 10 mM d-Leu was added, 17 genes were upregulated and 25 were downregulated. Interestingly, the expression level of the azlBCD operon (encoding genes azlB, azlC, and azlD), which is responsible for the transport of leucine, had little difference between control and l-/d-Leu. Sporulation can be induced in B. subtilis by starvation of carbon, nitrogen, and phosphorus. In this study, four genes involved in sporulation (cotV, cotX, rpsR, and cgeB) were found to be downregulated after adding d-Leu. Furthermore, the key genes related to amino acid and purine metabolism, such as argD, argB, purC, pyrB, pyrR, and guaC, were upregulated in the cultures with d-Leu.
These products provide the energy required for cell proliferation, offering an explanation for the higher biomass in the presence of d-Leu compared to the control in the early stages (Table 6).

| Construction of the KEGG pathway map for surfactin biosynthesis
In order to better understand the biosynthesis of surfactin in glycerol medium, we reconstructed the whole metabolic pathway of  (Table 6).
The transcripts were assessed according to their transcription levels and classified into high expression (RPKM ≥ 500), medium expression (100 ≤ RPKM < 500), and low expression (RPKM < 100) groups. According to the function, all modules were divided into glycolysis metabolism module, tricarboxylic acid cycle module, branched-chain amino acid metabolism module, branched fatty acid biosynthesis module, and modular enzymatic synthesis of surfactin module.
Modules 1, 2, and 3 of the pathway correspond to glycerol utilization, the subsequent glycolysis pathway, and TCA cycle, which provide the basis for the metabolism of the cell. The genes involved in the utilization of glycerol, including glpF, glpK, and glpD, were relatively highly expressed (Holmberg, Beijer, Rutberg, & Rutberg, 1990). Particularly, the transcription level (RPKM) of glpD, encoding glycerol-3-phosphate dehydrogenase, which is the key enzyme in the glycolysis pathway when cells are grown on glycerol, reached 3,913.68, which puts it into the top 1% of all transcription levels.
Most of the genes involved in glycolysis and the TCA cycle were also highly expressed to provide enough energy and carbon moieties for downstream metabolic pathways. These include pyruvate,   The precursor of straight-chain fatty acids is acetyl-CoA, while the precursors of branched-chain fatty acids are catabolic products of l-Val, l-Leu, and l-Ile (Kaneda, 1991). These are converted into three different branched-chain CoAs by the action of aminotransferase Furthermore, many studies on replacing the promoter of the srfA operon to abrogate the effect of quorum sensing on the synthesis of surfactin have been reported (Jiao et al., 2017;Sun et al., 2009;Willenbacher et al., 2016). Unfortunately, the results of these attempts were far from satisfactory and did not lead to high surfactin yields, indicating that a complex regulatory network controls srfA expression, so that modifying the srfA operon is not a good choice  (Jung et al., 2012). In module 5, the srfA operon (srfAA, srfAB, srfAC, srfAD) was expressed at a very high level in BS-37, implying that the native surfactin synthase (SrfA) promoter of BS-37 is a strong promoter (Guan et al., 2015). synthesis pathway, were all at relatively high levels. Furthermore, adding d-/l-Leu to the cultures did not affect the expression of genes associated with these pathways and the competitive inhibition effects of d-Leu could be relieved by l-Leu supplementation, implying that l-Leu improved surfactin production by acting solely as a precursor or substrate, while d-Leu decreased the yield by acting as a competitive inhibitor. The high transcription levels of genes encoding modular surfactin synthase, the glycerol utilization pathway, and branched-chain amino acid (BCAA) synthesis pathways can provide great support for the establishment of a B. velezensis cell factory for the production of surfactin with high yield and at low cost.

ACK N OWLED G EM ENTS
This work was supported by the National Natural Science Foundation of China (No. 21576133) and the Jiangsu Synergetic Innovation Center for Advanced Bio-Manufacture.

CO N FLI C T O F I NTE R E S T S
None declared. Hu contributed equally.

E TH I C S S TATEM ENT
None required.

DATA ACCE SS I B I LIT Y
Genome data are deposited into GenBank (accession number:CP023414.1).