Genomic features of lactic acid bacteria effecting bioprocessing and health


  • Todd R. Klaenhammer,

    Corresponding author
    1. Departments of Food Science and Microbiology, North Carolina State University, Box 7624, Raleigh, NC 27695-7624, United States
    2. Functional Genomics Program, Southeast Dairy Foods Research Center, Raleigh, NC 27695, United States
    • Corresponding author. Tel.: +1 919 515 2972; fax: +1 919 513 0014., E-mail address:

    Search for more papers by this author
  • Rodolphe Barrangou,

    1. Departments of Food Science and Microbiology, North Carolina State University, Box 7624, Raleigh, NC 27695-7624, United States
    2. Functional Genomics Program, Southeast Dairy Foods Research Center, Raleigh, NC 27695, United States
    Search for more papers by this author
  • B. Logan Buck,

    1. Departments of Food Science and Microbiology, North Carolina State University, Box 7624, Raleigh, NC 27695-7624, United States
    Search for more papers by this author
  • M. Andrea Azcarate-Peril,

    1. Departments of Food Science and Microbiology, North Carolina State University, Box 7624, Raleigh, NC 27695-7624, United States
    Search for more papers by this author
  • Eric Altermann

    1. Departments of Food Science and Microbiology, North Carolina State University, Box 7624, Raleigh, NC 27695-7624, United States
    Search for more papers by this author


The lactic acid bacteria are a functionally related group of organisms known primarily for their bioprocessing roles in food and beverages. More recently, selected members of the lactic acid bacteria have been implicated in a number of probiotic roles that impact general health and well-being. Genomic analyses of multiple members of the lactic acid bacteria, at the genus, species, and strain level, have now elucidated many genetic features that direct their fermentative and probiotic roles. This information is providing an important platform for understanding core mechanisms that control and regulate bacterial growth, survival, signaling, and fermentative processes and, in some cases, potentially underlying probiotic activities within complex microbial and host ecosystems.


Lactic acid bacteria (LAB) are a heterogeneous family of microorganisms that can ferment a variety of nutrients [1] primarily into lactic acid. They are mainly Gram-positive, anaerobic bacteria, non-sporulating, and acid tolerant. Biochemically, LAB include both homofermenters and heterofermenters. The former produce primarily lactic acid, while the latter yield also a variety of fermentation by-products, including lactic acid, acetic acid, ethanol, carbon dioxide and formic acid [2,3]. Although their primary contribution centers on rapid acid production and acidification of food products, they also contribute to flavor, texture and nutrition [3]. LAB are found naturally in a variety of environmental habitats, including dairy, meat, vegetable, cereal and plant environments, where fermentation can occur. Historically, the traditional roles for many LAB have been as starter cultures to drive food and dairy fermentations, leading to their widespread human consumption and generally recognized as safe (GRAS) status.

It was nearly 100 years ago that Russian scientist, Elie Metchnikoff, then at the Pasteur Institute, proposed that lactic bacteria in fermented milk could promote the development of a healthy intestinal microbiota. Specifically, the Nobel-laureate developed a theory that lactic-acid bacteria in the digestive tract could prolong life by preventing putrefaction. Since that time, it has been recognized that some LAB and the high G + C content Gram-positive bifidobacteria are also found naturally within human and animal cavities, including the gastrointestinal tract (Lactobacillus acidophilus, Lactobacillus gasseri, Lactobacillus johnsonii, Lactobacillus plantarum, Streptococcus agalactiae, Enterococcus faecalis), the oral cavity (Streptococcus mutans, Bifidobacterium longum), and the vaginal cavity (B. longum, S. agalactiae, Lactobacillus crispatus) [4–6]. LAB are considered to be important components of the normal intestinal microbiota, which contribute to a variety of functions including intestinal integrity, immunomodulation, and pathogen resistance. Selected groups of Lactobacillus and Bifidobacterium are used widely as probiotics primarily in dairy products and dietary supplements [5,7,8].

Defined most recently as “live microorganisms which, when administered in adequate amounts, confer a health benefit on the host”[9], probiotic cultures have been found useful in the maintenance of gastrointestinal (GI) health including the treatment of diarrheal diseases and preservation of intestinal integrity and mobility. Recent evidence from in vitro systems, animal models, and clinical studies suggests that LAB can enhance both specific and non-specific immune responses, possibly by activating macrophages, altering cytokine expression, increasing natural killer cell activity, and/or increasing levels of immunoglobulins [5,10,11]. However, the mechanisms through which these LAB function as immunomodulators are not characterized and specific reactions can be highly variable among different strains.

In the recent past, substantial progress has been achieved in microbial genomics, particularly in genome sequencing. To date, over 229 complete microbial genomes (prokaryotes and archaea) have been published (NCBI website,, covering a wide diversity of taxonomic groups. Early microbial genome analyses indicate that genomic content reflects an organism's metabolism, physiology, biosynthetic capabilities, and adaptability to varying conditions and environments. In the case of the LAB, genome analysis is revolutionizing our view of their metabolic processes, bioprocessing capabilities and potential roles in health and well-being.

2Genome characteristics

The published genome sequences of the lactic acid bacteria and bifidobacteria include Lactococcus lactis[12], S. mutans[13], S. pneumoniae[14], S. agalactiae[15], S. pyogenes[16], S. thermophilus[17], B. longum[18], L. plantarum[19], L. johnsonii[20] and L. acidophilus[21]. The 11 draft genomes represented by the Lactic Acid Bacteria Genomics Consortium [22,23] have recently been completed (unpublished results). The LAB analyzed were Lactobacillus brevis, L. casei, L. gasseri, Lc. cremoris, Leuconostoc mesenteroides, Oenococcus oeni, Pediococcus pentosaceus, and S. thermophilus. For the LAB and bifidobacteria, probiotic organisms, and other related industrial microbes, genome features are presented in Table 1.

Table 1. Genomes of lactic acid bacteria and other industrially used species
GenusSpeciesStrainSize (Mbp)%GCStatusReference
  1. Adapted from [22,23,25].

  2. aC, complete.

  3. bIP, in progress.

  4. cJGI, Joint Genome Institute.

Bifidobacterium longum NCC27052.360.1Ca [18]
  longum DJ010A2.459IPbJGIc
Brevibacterium linens BL2/ATCC91744.460.9IPJGI
Enterococcus faecalis V5833.237.5C [24]
Lactobacillus acidophilus NCFM2.034.7C [21]
gasseri ATCC3333231.835.1IPJGI 
johnsonii NCC5332.034.6C [20]  
plantarum WCFS13.344.5C [19]  
casei ATCC3342.541.1IPJGI 
casei BL232.64.6IP [23]  
rhamnosus HN0012.446.4IPLubbers et al. (unpublished) 
helveticus CNRZ322.437.1IPSteele et al. (unpublished) 
helveticus CM42.037CShinoda (unpublished) 
sakei 23K1.941.2C[23] 
delbrueckii ATCCBAA3652.345.7IPJGI 
delbrueckii ATCC118422.350IP [23]  
delbrueckii DN-1001072.1 IP [23]  
reuteri    IPJGI 
salivarius UCC118  IP [23]  
brevis ATCC3672.043.1IPJGI 
Lactococcus lactis ssp. lactisIL14032.335.4C [12]
lactis ssp. cremorisSK112.330.9IPJGI 
lactis spp. cremorisMG13632.637.1IP [23]  
Leuconostoc mesenteroides ATCC82932.037.4IPJGI
Oenococcus oeni ATCCBAA3311.837.5IPJGI
oeni IOEB84.131.837.9IP [23]  
Pediococcus pentosaceus ATCC257452.037.0IPJGI
Propionibacterium freudenreichii ATCC62072.667.4IP [23]
Streptococcus agalactiae 2603V/R2.235.7C [15]
mutans UA1592.036.8C [13]  
pneumoniae TIGR42.239.7C [14]  
pyogenes M11.938.5C [16]  
thermophilus LMD91.836.8IPJGI 
thermophilus LMG183111.939C [17]  
thermophilus CNRZ10661.839C [17]  

LAB and bifidobacteria are Gram-positive bacteria with low and high GC content, respectively, with small genomes ranging in size between 1.8 and 3.3 Mb (Table 1). For those species where complete genomes are published and annotated, a broad picture emerges of conserved and varying biosynthetic and metabolic capabilities. Glycolysis enzymes are uniformly represented among members of the LAB. A recent transcriptional analysis of global gene expression by L. acidophilus during growth on eight different carbohydrates revealed that genes of the glycolytic pathway were among the most highly expressed genes within the genome [25] (Fig. 1). Since LAB recover their primary energy via glycolysis, it seems likely that this is a universal feature. Genome analyses have shown that lactobacilli, bifidobacteria, streptococci, and lactococci possess broad saccharolytic potentials, which reflect the nutrient diversity provided by the range of environments they inhabit [13,14,18–21]. Analysis of the L. plantarum genome revealed many transporters, particularly PTS (phosphotransferase system) transporters (25) correlating with the organism's broad capacity to metabolize varied carbohydrates from different environments [19]. In particular, a “lifestyle adaptation island” was defined over a 213 kb region that harbored genes involved in sugar transport and metabolism. Similarly, the diversity of transporters in S. mutans and S. pneumoniae has been associated with an increased ability to utilize nutrient sources present in their environments, namely the oral cavity and respiratory tract [13,14]. Analysis of the L. johnsonii[20], L. acidophilus[21], and L. gasseri genomes further substantiate these observations showing a preponderance of PTS transporters, and only 2 to 3 ABC (ATP-binding cassette) transporters identified for maltose and complex carbohydrates like fructooligosaccharide and raffinose (Table 2). An overview of the carbohydrate transporters found in L. acidophilus is shown in Fig. 2, where PTS systems were predominant and lactose and galactose were predicted to be transported by a galactoside permease [25]. A recent analysis of 9 LAB genomes for transporter capabilities revealed that 13–17 % of their total genes encoded transport proteins (Lorca, G.L., Zlotopolski, V., Tran, C., Winnen, B., Hvorup, R.N., Nguyen, E., Huang, L.-W., and Saier, M.H., unpublished). This proportion was larger than observed for most bacteria. Interestingly, amino acid uptake systems predominated over sugar and peptide uptake systems.

Figure 1.

Hierarchical clustering analyses of gene expression patterns (left panel). The expression of 1889 genes (vertically) after growth on eight carbohydrates (horizontally) is shown colorimetrically. Least squares means, representing overall gene expression level corrected for systematic and random errors low = blue, high = red; hierarchical clustering of least squares means allows visualization of the relative expression levels of all genes within each treatment. Lanes from left to right are fructooligosaccharides; fructose; galactose; glucose; lactose; raffinose; sucrose; and trehalose. Microarrays were carried out using PCR products of predicted ORFs [26]. Expression of glycolysis genes (right panel). In a whole genome array, the global transcription responses during growth on eight different carbohydrates are denoted for d-lactate dehydrogenase (d-LDH, La55), phosphyglycerate mutase (PGM, La185), l-lactate dehydrogenase (l-LDH, La271), glyceraldehyde 3-phosphate dehydrogenase (GPDH, La698), phosphoglycerate kinase (PGK La699), glucose 6-phosphate isomerase (GPI, La752), 2-phosphoglycerate dehydratase (PGDH, La889), phosphofructokinase (PFK, La956), pyruvate kinase (PK, La957), fructose-biphosphate aldolase (FBPA,La1599). Expression: low inline image high [25].

Table 2. Carbohydrate utilization profiles and predicted transporters for lactobacilli with complete genomes (from [25])
  1. a Lactobacillus acidophilus.

  2. b Lactobacillus plantarum.

  3. c Lactobacillus johnsonii.

  4. d Lactobacillus gasseri.

  5. eDetermined by fermentation patterns obtained from API50CHO (BioMerieux, Durham, NC).

  6. fPTS, phosphoenolpyruvate phosphortransferase system transporter.

  7. gGPH, galactoside pentose hexuronide permease.

  8. hABC, ATP-binding cassette transporter.

PentosesArabinose Yese  
Ribose Yes   
DisaccharidesCellobiosePTS PTSPTS
GentiobiosePTSPTS Yes 
MelibiosePTSYes Yes 
SucrosePTSPTS Yes 
Turanose Yes   
OligosaccharidesFOSABC  Yes
Melezitose Yes   
Sugar alcoholsGalactitol PTS  
Mannitol PTS   
Sorbitol PTS   
Rhamnose Yes   
Modified SugarsAmygdalinYesPTSYesYes
ArbutinPTSPTS Yes 
Esculin PTSYesYes 
Gluconate PTS   
Figure 2.

Transporters and pathways predicted for carbohydrate utilization by Lactobacillus acidophilus. The diagram shows transporters, hydrolases and glycolysis enzymes, as predicted by the putative genome annotation. Gene and enzyme numbers are indicated for each enzymatic reaction. For transporters, red indicates a putative PTS transporter; green, a putative ABC transporter; and yellow, a galactoside permease. For sugars, identical compounds share the same color (from [21] with permission).

Although amino acid biosynthetic pathways are complete in L. lactis, they are deficient in varying levels in most other LAB. L. plantarum is missing only a few synthetic pathways including those for branched chain amino acid synthesis [19], whereas species of the “L. acidophilus complex” (L. gasseri, L. johnsonii and L. acidophilus) [27] are largely deficient in amino acid biosynthetic capacity [20,21]. Compensating for these deficiencies, the lactobacilli generally encode a large number of peptidases, amino acid permeases, and multiple oligo-peptide transporters that could support efficient processing and recovery of amino acids from nutritionally rich environmental sources. However, of the intestinal lactobacilli (including comparisons with L. gasseri and L. plantarum), only L. acidophilus and L. johnsonii were found to encode the cell wall-associated proteinase, PrtP. Interestingly, the gene predicted to encode the maturation protein, PrtM, was found in all these Lactobacillus genomes.

For the LAB, their genomes are collectively of low GC content and relatively small. Those species with the smallest genomes can be highly auxotrophic and deficient in a number of biosynthetic pathways, corresponding to their apparent adaptation to nutritionally rich environments [20,21]. For the lactobacilli, a summary of biosynthetic pathways (amino acids, nucleotides, fatty acids and vitamins) illustrated that L. gasseri and L. johnsonii exhibit the fewest metabolic biosynthetic pathways (6–8), whereas L. acidophilus shows a higher number at 14 pathways (K. Makarova, E. Koonin and LABGC, unpublished data). In contrast, L. plantarum encodes a more complete complement of biosynthetic pathways (22), supporting its more diverse metabolic capabilities. In this regard, a direct comparison of metabolic pathways via Kyoto Encyclopedia of Genes and Genomes (KEGG) between L. plantarum and L. johnsonii by Boekhorst et al. [28] highlighted the biosynthetic deficiencies in L. johnsonii and expanded capacity of L. plantarum.

In the current analysis of the complete LAB genomes, it has been suggested that evolution to nutritionally rich environments (e.g., milk, human GI tract) has promoted genome simplification and degradation for some species. Notably, in the recent genome analysis of two S. thermophilus strains, Bolotin et al. [17] found that 10% of the genes were pseudogenes and non-functional due to frameshifts, nonsense mutation, deletion, or truncation. Evidence for genome decay was particularly noted for genes involved in carbohydrate metabolism, uptake and fermentation. In contrast, a specific symporter for lactose was found in S. thermophilus that was absent from other pathogenic streptococci. It was suggested that evolution of S. thermophilus to milk resulted in genome degradation of many genes (including any pathogenic genes) that were despensible as this organism evolved to this specialized environment [17]. Interestingly, evidence was also presented for horizontal gene transfer between S. thermophilus and other organisms co-occupying the dairy environment. A 17 kb region was identified that contained multiple copies of IS1191 and a mosaic of fragments with over 90% identity to L. bulgaricus and L. lactis. Among them was a unique copy of metC that allows methionine biosynthesis, which is a rare amino acid in milk.

For L. lactis, considerable evidence has demonstrated that members of this species have aquired plasmid DNA elements encoding critical functions for growth and competition in a milk environment, such as lactose metabolism, proteolytic activity, bacteriocin production, exopolysaccharide production and resistance to bacteriophages [29]. In addition, it is also apparent that horizontal gene transfer has introduced important functions to the genomes of a number of LAB that are expected to promote their competition in these environments. Genes encoding sugar transporters and carbohydrate hydrolases can represent a large portion of strain-specific genes that have been acquired by horizontal gene transfer. It has been suggested previously that selected genes involved in sugar transport, catabolic properties, and exopolysaccharide synthesis in L. plantarum[19] have been acquired via horizontal gene transfer, as part of the adaptation process of this organism to a diverse number of environments (e.g., plants, cereals, GI tract). Evidence supporting horizontal gene transfer for these regions include their grouped position near the origin of replication, lowered GC content (41.5% versus 44.5%; Fig. 5), and high variability as to the presence or absense of these genes among different strains of L. plantarum[19,23]. The accumulating evidence suggests that evolution of the LAB to nutritionally complex environments has been driven by two major processes; first, gene degradation and loss of dispensible functions from ancestral types [17], and second, gene acquisition via horizontal gene transfer and duplication of important capabilities [17,19–21, D.A. Mills, K. Makarova, and E. Koonin, and LABGC, unpublished data].

Figure 5.

Circular plot of genome diversity found in 20 L. plantarum strains isolated from different environments, using the method of DNA–DNA hybridization to L. plantarum WCFS1 microarrays (D. Molenaar, R.J. Siezen, M. Kleerebezem, unpublished). From outside to inside: ring 1, base deviation index (BDI), from low (red) via intermediate (yellow) to high (green); ring 2, DNA variability, from low (present in all 20 strains) to high (absent in 1–19 strains compared to WCFS1); ring 3, gene clusters for plantaricin biosynthesis, non-ribosomal peptide biosynthesis (NRPS), prophages, polysaccharide biosynthesis, nitrate respiration and sugar metabolism; ring 4, GC%. This picture was generated with the Microbial Genome Viewer ( (from [23], reprinted with permission).

In addition, there are numerous examples of gene duplications and multiple copies of related genes predicted to direct important functions in the genomes of the sequenced LABs. Examples include PTS transporters, β- and phospho-β galactosidases, lactic dehydrogenases, peptidases, and oligopeptide and amino acid transporters [19–21]. Also notable are the multiple copies of homologs for mucus-binding (Mub) proteins found in L. gasseri (7), L. acidophilus (5), L. johnsonii (4), and L. plantarum (4). First discovered in L. reuteri, Tuomola et al. [30] reported a 358-kDa surface protein able to bind to mucin glycoproteins. The predicted Mub proteins, now revealed in the genomes of many intestinal lactobacilli, are unusually large proteins ranging in size from 1000 to 4300 amino acids and often represent the largest open reading frames (ORFs) in the genome. While similar in their large size and the presence of multiple repeats (4–6) of 20 aa sequences [20], their amino acid identity is relatively low at 24–38%, indicating considerable sequence variability within surface proteins presumed to serve important and similar roles in mucus binding. In this regard, Altermann et al. [21] found that 6.6% of unclassified COGs (clusters of orthologous groups) in the L. acidophilus genome were represented in five distinct regions (Fig. 3, COG regions I, II, III, IV, V). All the genes within these regions were predicted to be involved with host recognition or epithelial adherence; including mucus binding, fibronectin binding, and other cell surface associated type proteins. A region was identified in L. johnsonii that contained a mub gene within a predicted nine-gene operon that included a large serine rich protein with homology to a Streptococcus fimbrial adhesin. This unique region in L. johnsonii[20] was positioned within a common locus that has now been found in L. acidophilus, L. gasseri, and L. plantarum (Fig. 4; E. Altermann and T.R. Klaenhammer, unpublished data).

Figure 3.

Genome atlas of L. acidophilus NCFM. The atlas represents a circular view of the complete genome sequence of L. acidophilus NCFM. The right-hand legend describes the single circles in the top-down-outermost-innermost direction. The circle was created using Genewiz [73] and in house developed softwares [21]. Circle 1, Intermost, GC-Skew. Circle 2, COG classification. Predicted ORFs were analyzed using the COG database and grouped into the four major categories. 1, Information storage and processing; 2, Cellular processes and signaling; 3, Metabolism; 4, Poorly characterized; and 5, ORFs with uncharacterized COGs or no COG assignment. Circle 3, ORF orientation. ORFs in sense orientation (ORF+) are shown in blue; ORFs oriented in anti-sense direction (ORF−) in red. Circle 4, Blast similarities. Deduced amino-acid sequences compared against the non-redundant (nr) database using gapped BlastP. Regions in blue represent unique proteins in NCFM, whereas highly conserved features are shown in red. The degree of color saturation corresponds to the level of similarity. Circle 5, G + C content deviation. Deviations from the average GC-content are shown in either green (low GC spike) or orange (high GC spike). A boxfilter was applied to visualize contiguous regions of low or high deviations. Circle 6, Ribosomal machinery. tRNAs, rRNAs and ribosomal proteins are shown as green, cyan, or red lines, respectively. Clusters of thereof are represented as colored boxes to maintain readability. Circle 7, Mobile elements. Predicted transposases are shown as light purple, phage-related Integrases as orange dots. Circle 8, Stress response. Genes involved in general stress response, including chaperones, and genes involved in heat shock, DNA repair, and pH regulation, are shown in dark purple. Circle 9, Peptide and amino acid utilization. Proteases and peptidases are shown in green, non-sugar-related transporters in light blue dots. Circle 10, Outermost Two-component regulators (2CRS). Each 2CRS is represented as brown dots, consisting of a response-regulator and a histidine-kinase. In circles 7–10 each full dot represents one predicted ORF and clusters of ORFs are represented by stacked dots. Selected features representing single ORFs and ORF cluster are shown outside of circle 10 with bars indicating their absolute size. Origin and terminus of DNA replication are identified in green and red, respectively. Other features are: SlpA and B (S-layer proteins), CdpA (Cell division protein), sugar utilization (Sucrose, FOS, Trehalose, Raffinose), LacE (PTS-sugar transporter), BshA and B (Bile salt hydrolases), Mub-909 to Mub-1709 (mucus-binding proteins, numbers correspond to the La-number scheme), FbpA (fibronectin-binding protein), Cfa (Cyclopropane fatty acid synthase), Fibronectin_binding (fibronectin-binding protein cluster), EPS_cluster (Exopolysaccharides), Lactacin_B (bacteriocin), pauLA-I to pauLA-III (potential autonomous units), and prLA-I and prLA-II (phage remnants).

Figure 4.

Map (not drawn to scale) of surrounding genes of the possible L. johnsonii fimbriae operon LJ0388 to LJ0394 (middle panel, from [20]) and its comparison to the syntenic regions in L. gasseri (upper panel) and L. acidophilus (lower panel). Predicted open reading frames (ORFs) are shown as arrows, the black line represents the genome. ORFs are grouped into colored clusters, according to their location, their functionality, or their degree of homology. Homologous genes are connected by light colored parallelograms between the genome lines. Genes with no syntenic homology are shown in grey. Gene clusters in different shades of green represent putative insertion events in L. johnsonii and L. acidophilus. The dark green cluster (L. johnsonii) represents an ABC transporter and a putative sugar phosphatase. The mobile element (tranposase) adjacent to this insertion is shown in orange (L. johnsonii and L. gasseri). The light green cluster (L. johnsonii) shows the possible fimbriae operon and the green cluster in L. acidophilus indicates the potential autonomous unit pauLA-II [21], harboring a potential DNA modification system (methylase) and other phage-related proteins.

3Comparative genomics of lactobacilli

In spite of the explosion of genomic information on microorganisms, complete genomes of beneficial commensals, symbionts, and probionts are just now becoming available [31]. Comparison of the similarities and differences within these groups is expected to provide an important view of gene content, organization, and regulation that contributes to both gut and probiotic functionality [22,31]. A recent comparative analysis between the complete genomes of L. plantarum and L. johnsonii revealed striking differences in gene content and synteny in the genome, prompting a conclusion that these two species are only marginally more related to each other than to other Gram-positive bacteria [28]. Nevertheless, 70% of the proteins in L. johnsonii still had homologs (defined by a Blast score of <1E − 10) in the larger L. plantarum genome. Unique proteins found in these two genomes, when compared against the published and draft genomes of the LAB, were primarily unknown proteins and prophage-related ORFs [28]. While whole genome comparisons with draft and incomplete genomes for LAB-specific genes are useful, some inaccuracies should be expected because gaps in the draft sequences are likely to contain important information.

Complete genomes are now published or available for four Lactobacillus species (acidophilus, gasseri, johnsonii and plantarum). Whole genome comparison over these species (Fig. 6) substantiates the lack of synteny among L. plantarum and the other three lactobacilli. In contrast, L. johnsonii and L. acidophilus show extensive conservation of gene content and order over the length of the genome. L. gasseri and L. johnsonii are even more strikingly similar across the length of the genome, except for two apparent chromosomal inversion events in L. gasseri resulting in a reversal of gene order when compared to the other two closely related species. A comparison of ORFs between L. gasseri and L. johnsonii revealed that 83–85% of the proteins had homologs in both genomes [28]. Overall, the comparisons demonstrate a high degree of gene synteny in the three species that have been collectively referred to as members of the L. acidophilus complex. Differentiation of these three species, particularly L. gasseri and L. johnsonii, has been historically difficult using traditional or molecular taxonomic tools [27,32].

Figure 6.

Multiple whole genome comparison on protein level. The finished and annotated genomes of L. gasseri (top line), L. johnsonii (second line), L. acidophilus NCFM (third line), and L. plantarum (bottom line) were analyzed using a bidirectional BlastP algorithm. Deduced amino acid sequences from predicted open reading frames (ORFs) were compared to the respective partner-ORFeome using the standalone Blast provided by NCBI. Translating these results into a compatible format, visualization of the comparison was realized using the Artemis Comparison Tool (ACT, Sanger Center). Basepair positions are indicated for each genome in the white centerline. ORFs are shown above and below this line, indicating their length and orientation on the respective genome. Degrees of similarity and positional relationships are indicated by the red and blue bars. Red bars show the same chromosomal orientation, blue bars indicate opposite ones. The level of similarity is shown by color shading. The higher the similarity, the more intense the color of the bars. The overall cutoff-value was 1e − 70. Only BlastP hits with a more significant e-value are displayed.

In an effort to distinguish between L. gasseri and L. johnsonii strains, the translated genomic sequences of three Lactobacillus strains were compared to find genes unique to L. gasseri ATCC 33323, which do not occur in L. acidophilus NCFM or L. johnsonii NCC533. Fourteen unique genes were found in L. gasseri, which were also not found in the NCBI non-redundant database. Using specific primers designed for each of these unique ORFs, L. gasseri specific amplicons could be generated from all the L. gasseri strains evaluated (Table 3). Notably, primer pairs 4 and 11 appeared unique to the L. gasseri species. Although too few L. johnsonii strains were analyzed to support a definitive conclusion, this genome comparison suggests that species-specific genes are present and the two species can be differentiated (E. Altermann, E. Durmaz and T. Klaenhammer, unpublished).

Table 3. PCR amplicons generated in Lactobacillus species using L. gasseri-specific primers
Lactobacillus speciesPrimer set/strain1234567891011121314

  2. aDesignates strains that could not be distinguished by 16S rRNA sequencing and were thus initially classified as “L. gasseri/L. johnsonii”.

  3. *Indicates a reproducible PCR product.

  4. **The PCR product was 1.2 kb, all other products from this primer pair were 0.3 kb.

gasseri 11089** *      **  
 ATCC19992 (ADH)a** *      *****
 FR2** **     ** *
 JG141 * *   *  *** 
 AM1 * *      *** 
 SD10 * *      ****
 WD19 * *      ****
 SK12 * *      * ***
 RF81 * *      * **
 RF14 * *      * **
 ML3 * *      *  *
 ML1a*** *  **** **
johnsonii ATCC33200             *
acidophilus NCFM              

Conserved genes found between closely related species can also provide important clues to function and importance. Whole genome comparison between L. acidophilus, L. gasseri and L. johnsonii revealed a highly conserved region harboring a cluster of genes predicted to encode a cell surface exopolysaccharide (EPS) (Fig. 7). The cluster of genes was oriented similarly in L. acidophilus and L. johnsonii, and inverted in the L. gasseri genome as a result of the chromosomal rearrangement. Transcriptional analysis revealed that the eps genes were expressed by L. acidophilus during log phase growth on most of the eight carbohydrates examined [25]. Also apparent was a large region of inverted synteny between L. gasseri and L. johnsonii, at its downstream end (not shown). No gene synteny was found outside this low-GC region in L. acidophilus. Detailed in silico analyses revealed that for all three genomes, the low GC-content region harbors the EPS cluster. The variable EPS cores are embedded between the conserved regions and bear little or no similarities between each other. Each EPS cluster is bordered downstream by a transposase gene, followed by a low-GC spike (not shown). It is significant that this closely related group of intestinal lactobacilli have conserved this cluster and potentially the ability to produce an EPS layer.

Figure 7.

Organization of the exopolysaccharide (EPS) gene cluster conserved between L. gasseri, L. johnsonii, and L. acidophilus. The complete genomes of L. gasseri, L. johnsonii, and L. acidophilus were subjected to bidirectional BlastP analysis and results were visualized using ACT. Cut-off e-values for displayed degrees of similarity were 1e − 10. Also shown in the right panel are the expression profiles of the EPS gene cluster (vertical) during growth under eight different carbohydrates (horizontal). Low expression, green; high expression, red [25].

4Functional genomic analysis

Whole genome sequencing, genome data mining, and comparative genomics provide insights into genetic content, differences and similarities, and offer important clues into possible gene functions, both essential and unique. Thus far, genomic analyses of LAB have revealed a number of interesting features that are generally considered to be important to the roles of these organisms in bioprocessing or health. Among those considered potentially important to probiotic functions in the LAB are: adherence/attachment factors such as fimbrae [18,20], mucus-binding proteins [20,21], fibronectin-binding proteins [21], EPS clusters [19–21], and mannose-specific adhesion proteins [33]; prophage-encoded proteins suspected of imparting probiotic properties via lysogenic conversion [34,35]; bacteriocins [36,19–21]; two component regulatory systems and signaling pathways [21,26,37]; stress and acid tolerance factors [38,19–21]; and bile salt hydrolases [39,20]. As the list of genomic features in LAB expands, the need to characterize or confirm genes important in bioprocessing and health will increase exponentially. Within this list are also groups of unknown or unclassified proteins (25–40% of the ORFs defined), many of which are highly or differentially expressed [25] and are also likely to contribute important functions or features to LAB.

One powerful strategy to identify potentially significant genes impacting probiotic functionality is the “in vivo expression technology” (IVET) that has been employed in both L. reuteri[40] and L. plantarum[41]. The approach allows the identification of promoter elements that are expressed during in vivo transit of probiotic cultures, and secondarily reveals the corresponding genes driven by these promoters. A total of 75 inducible genes have thus far been identified by these strategies and included groups encoding nutrient acquisition, intermediate or cofactor biosynthesis, and stress responses [31]. Two of these same genes were again recovered in screening an alr-complementation library in L. plantarum for bile-inducible promoter elements [42]. The bile responsive elements identified in both studies [41,42] were linked to genes encoding an integral membrane protein and an argininosuccinate synthase. Both genes were subsequently shown to be induced, in situ, using a reverse transcriptase-PCR of RNA isolated from the intestine of mice fed L. plantarum WCFS1. In addition, four extracellular proteins were induced in L. plantarum which were considered candidates for interaction with host tissues [42]. Among the genes induced, a substantial number were shared by both L. plantarum and various pathogenic bacteria, prompting de Vos et al. [31] to speculate that common gene categories identified by the IVET strategies were obviously important in survival of L. plantarum in the host GI-tract environment, rather than virulence.

Functional genomic analysis to identify or confirm gene function is vital to our understanding of cellular physiology, metabolic pathways, sensing, signaling, and elucidating mechanisms that underly probiotic functions. Instrumental in this process are genetic and molecular tools that can be used for gene cloning, expression, complementation and inactivation. It was 20 years ago when Kok et al. [43] constructed one of the first cloning and shuttle vectors for LAB, based on the lactococcal replicon pWV01. Genetic accessibility via electroporation was first reported in 1987 by Chassy and Flickinger [44] and then widely expanded to various Gram-positive bacteria in 1988 [45]. Since then a plethoria of vectors, expression systems, and integration vectors have been constructed and used for genetic characterization of most LAB. In terms of functional genomics, notable among these have been the temperature-sensitive integration vectors pGhost [46], pSA3 [47], and the two plasmid-pORI28 system [48,49]. While effective in lactococci and some lactobacilli [50], their use in some thermophilic-probiotic lactobacilli was hampered because the vectors were genetically unstable at optimal growth temperatures (e.g., 37–40 °C). The pORI28 system was expanded for use in thermophilic lactobacilli by using the pGK12-derivative, pTRK699 as the helper plasmid [51]. This helper plasmid was relatively stable at optimum growth/transformation temperatures from 37 to 40 °C, but could be readily destabilized at 42 °C in species of the L. acidophilus complex. This pORI28-pTRK669 system has been markedly effective in targeting integration events and creating gene replacements in a variety of probiotic lactobacilli. An alternative integration system, avoiding potential replication problems, was introduced in 1997 by van Kranenburg et al. [52]. Based on the well-described plasmids pUC18 and pUC19 [53], derivatives harboring desired antibiotic resistance cassettes and regions of chromosomal homologies, were constructed in the replicative host Escherichia coli. Subsequent transformation into non-replicative target hosts of lactococci [52] and lactobacilli [54] forced crossover events within the regions of homology. Double-crossover events occurring under non-selective growth conditions, functionally inactivating selected target genes. As a result, a number of gene regions suspected to encode probiotic features have been characterized and functionally linked to important phenotypes.

At this point, few predicted gene functions have been confirmed by a functional genomic analysis. Table 4 lists those genes to date that have been functionally analyzed or modified by targeted insertional mutagenesis in probiotic lactobacilli. First among these was the ldhD gene in L. johnsonii[55]. Upon gene replacement of ldhD with a deleted version, the derivative produced exclusively L-LDH. This form is considered safer, albeit arguably [60], for use in infant probiotic applications. This was the first example of directed genetic engineering designed for an improvement of a health target in a probiotic culture. More recently, genes implicated in an acid tolerance response of L. acidophilus were investigated by a functional genomic approach. Azcarate-Peril et al. [38] identified four gene loci putatively involved in acid resistance by gene sequence similarity. Insertional mutagenesis in these regions (see Table 4) created acid sensitive derivatives confirming their participation in the acid tolerance of L. acidophilus. Notably, however, treatment of these mutants at pH 5.5 for 1 h at 37 °C, prior to challenge at pH 3.5, resulted in total recovery of acid tolerance to pH conditions that were previously lethal. Therefore, the organism is capable of orchestrating an acid tolerance response that appears capable of overcoming any single mechanism that may participate in acid resistance [38]. In addition, one two-component regulatory system (2CRS), a primary mechanism involved in environmental sensing and signal transduction, was also identified and insertionally inactivated in L. acidophilus[26]. This mutant exhibited lower resistance to acid and ethanol in log phase cells, and poor growth in milk. A whole-genome microarray revealed that the expression of approximately 80 genes was affected by the 2CRS mutation; including two oligopeptide-transport systems present in the L. acidophilus genome, other components of the proteolytic enzyme system, and a LuxS homolog suspected of participating in synthesis of the AI-2 signaling molecule.

Table 4. Gene and gene regions functionally analyzed from probiotic lactobacilli
OrganismGenePredicted functionMutant phenotype  
L. johnsonii d-ldhLactate dehydrogenase d-Lactate deficient [55]  
L. gasseri gusA Beta-glucuronidaseGus-negative [56]  
L. acidophilus lacL Beta-galactosidaseLactose-negative [51]  
msmE ABC-transporterFOS-deficient [57]   
brfA Beta-fructosidaseFOS-negative [57]   
treB PTS-transporterTrehalose-negative, cryosensitiveDuong et al. (unpublished)  
treC TrehalaseTrehalose-negative, cryosensitiveDoung et al. (unpublished)  
L. acidophilus cdpA S-layer/proteinaseCell division deficient, filamentous cells [58]  
slpA S-layerSodium chloride sensitive, ethanol sensitive, bile resistantAltermann et al. (unpublished)  
fbp Fibronectin-binding proteinReduced adherence to Caco-2 cellsBuck et al. (unpublished)  
mub Mucin-binding proteinReduced adherence to Caco-2 cellsBuck et al. (unpublished)  
LBA1633R-28 homology/adherenceNo effect on adherence to Caco-2 cellsBuck et al. (unpublished)  
LBA1634R-28 homology/adherenceNo effect on adherence to Caco-2 cellsBuck et al. (unpublished)  
L. plantarum lp-0373Mannose-specific adhesinNo effect on agglutination ability [33]  
msa Mannose-specific adhesinLoss of agglutination ability [33]   
lp-2018 d-Alanylation of techoic acidLoss of polyglycerol phosphate polymers in LT; altered pro-inflammatory cytokines secreted by PMBCs, and protective to IBD in colitis model [59]   
L. acidophilus gadC Amino acid antiporterAcid sensitivity [38]  
LBA867Transcriptional regulatorAcid sensitivity [38]   
LBA995Amino acid permeaseAcid sensitivity [38]   
LBA996Ornithine decarboxylaseAcid sensitivity [38]   
LBA1796Bacteriocin ABC transporterLoss of lactacin B productionDobson et al. (unpublished)  
LBA1272Cyclopropane FA synthaseLoss of membrane dihydrosterculic acid, acid sensitivityCourtney et al. (unpublished)  
LBA1524HKHistidine kinase of 2CRSAcid sensitivity, ethanol sensitivity, reduced growth in milk [26]   
bsh ABile salt hydrolaseInability to hydrolyze bile salts conjugated to chenodeoxycholic acid [39]   
bsh BBile salt hydrolaseInability to hydrolyze biles salts conjugated to taurine [39]   
LBA1430HKHistidine kinase, 2CRSBile sensitivityPfeiler et al. (unpublished)  

Nearly 30 genes (Table 4) have thus far been inactivated to reveal important phenotypic changes in properties suspected to be important to the functionality of probiotic lactobacilli. Among them are genetic loci linked to bile salt hydrolase activity [61,39], and various potential adhesions (mucus-binding proteins, fibronectin-binding protein) that contribute to the adherence ability to intestinal epithelial cells (Caco-2) in vitro cells (B.L. Buck, E. Altermann, and T.R. Klaenhammer, unpublished data), and mannose-specific receptors, displayed on the surface of yeast cells [33]. In the later case, the mannose specific adhesion (msa) exhibited a number of characteristic adhesion domains including a spacer region, a Mub protein-type domain, and a LPxTG cell wall binding motif. Various studies have implicated the roles of S-layer proteins and mucus-binding proteins in adherence of probiotic lactobacilli to intestinal tissues and mucin directly [62,63]. However, the recent comparative studies using parental and isogenic mutants are the first to seek clear evidence of the specific contributions of these cell surface proteins to adherence mechanisms, within the context of the whole bacterium.

In a recent study by B.L. Buck, E.A. Altermann, and T.R. Klaenhammer (unpublished), mutants of L. acidophilus were constructed by targeted inactivation of genes suspected to encode surface proteins potentially mediating adherence to mucus or the intestinal epithelium. Analysis of the adhesive properties of these mutants to intestinal Caco-2 epithelial cells, in vitro, revealed that two streptococcal R28 homolog mutants, LBA1633 and LBA1634, did not show reproducible decreases in adhesion. SlpA, a surface-layer mutant, showed the highest decrease in adhesion while fibronectin-binding protein (FbpA) and mucus-binding protein (Mub) mutants showed significant decreases in adhesion compared to the control (Fig. 8). Interestingly, it was also observed that treatment of the bacterial cells under a specific set of environmental conditions could result in an explosive increase of adherence ability, for both the parent and the five individual gene knockout mutants (Fig. 8, inset panels; B.L. Buck and T.R. Klaenhammer, unpublished observations). Therefore, additional factors are involved, inducible in both the parent and adherence-deficient mutants that can significantly elevate the capacity of L. acidophilus to adhere in this model system. These genetic regions and their potential adhesion and signaling factors are currently being identified by microarray analysis and investigated by functional genomic approaches.

Figure 8.

Relative adhesion of L. acidophilus NCFM mutant strains to Caco-2 monolayers. Adhesion is expressed as total cells adhering to Caco-2 monolayers in 17 microscopic fields. Each experiment was done in duplicate and replicated at least three times. The following mutant strains were tested: W+, L. acidophilus containing a plasmid integration in the β-galactosidase gene used as parental control; FbpA-, integration in gene encoding a fibronectin-binding protein; SlpA-, integration in gene encoding a surface layer protein; Mub-, an integration in the gene encoding a mucus-binding protein; and 1633− and 1634−, integration into individual tandem genes homologous to a gene encoding an epithelial cell-binding protein in S. pyogenes[64]. Inset photos show Gram stained L. acidophilus NCFM adhering to Caco-2 cells before (A) and after (B) cell treatment that promotes adherence ability (B.L. Buck and T.R. Klaenhammer, unpublished).

Of significant interest is the recent report of Grangette et al. [59], showing that the presence or absence of teichoic acids on the cell surface of L. plantarum can affect the cytokine expression pattern by peripheral blood mononuclear cells (PBMCs) and monocytes. The dlt operon, responsible for d-alanylation of teichoic acids, was disrupted to lead to a substantial reduction in the concentration of polyglycerol phosphate polymers (with d-Ala) in the techoic acids of the bacterial cell wall. Notably, this change in the chemical composition correlated with a reduced secretion of pro-inflamatory cytokines produced by PBMCs, and increased secretion of the anti-inflammatory cytokine IL-10, when exposed to the Dlt-mutant. Use of the Dlt-mutant in a murine colitis model was also found to be protective against TNBS-induced colitis. This result provides further evidence that LAB communicate with PBMCs and, for the first time, provides evidence that LAB may induce proinflamatory or anti-inflammatory reponses based on their cell wall composition in teichoic acids [59], and perhaps in the display of cell surface bound proteins or polysaccharides, as well. In this regard, a number of reports have already shown that different strains and species of lactobacilli, and other commensal bacteria, can modulate cytokine expression by both human and murine antigen presenting (dendritic) cells [65–68]. Overall the results suggest that variations in bacterial strains and species can direct immunological responses toward pro- or anti-inflammatory responses. Based on the results of Grangette et al. [59], the direction of these responses could reflect the chemical composition and architecture of the Gram-positive cell wall. Considering the field's current position, having a number of complete probiotic genomes (L. acidophilus, L. gasseri, L. johnsonii, and L. plantarum) and the capability to carry out functional genomic analyses on these organisms, the course to understanding the interactions of signaling molecules with immune cells will be both challenging and exciting in the years ahead.

5Concluding remarks

Today's exciting discoveries based on gene content and predicted function follow closely behind the explosion of DNA sequence information on microbial genomes. Genomic and comparative genomic analyses are revealing key gene regions in LAB worthy of continued investigation for their potential roles in both bioprocessing and health. Microarray analysis of LAB cultures, and various mutant derivatives thereof, promises to reveal genetic networks that orchestrate complex microbial responses to a variety of conditions that are critical to growth, metabolic activity, survival, communication, signaling, and probiotic functionality. Across the LAB, genome sequences have already provided information on genetic content that establishes platforms for metabolic [69] and nutrient engineering [70], understanding mechanisms of probiotic action [38,59,71], and providing platforms to engineer LAB for delivery of biotherapeutics [72]. The future, armed with genome information and genetic tools, is an exciting one. It is the first time in the history of this field that the promising potential of these beneficial organisms can be mechanistically investigated, understood, and inevitably expanded for the benefit of humankind.


The substantial contributions of many individuals participating in this work in our laboratory and as collaborators are gratefully acknowledged. Support for the LAB genetics and Functional Genomics programs has been provided through the North Carolina Agricultural Research Service, the North Carolina Dairy Foundation, Danisco, Inc., Dairy Management, Inc., the Southeast Dairy Foods Research Center, the California Dairy Research Foundation, the NIH Biotechnology Program, GAANN Fellowships in Biotechnology, IGERT Genomics Fellowships, and the USDA National Research Initiative Competitive Grants Program. Our special gratitude is expressed to Evelyn Durmaz, Rosemary Sanozky-Dawes, and Edwina Kleeman for many years of laboratory management and excellent research, to the US Department of Energy Joint Genome Institute for draft sequencing, the members of the U.S. Lactic Acid Bacteria Genomics Consortium (LABGC) for genome information and analysis, and to Fidelity Systems, Inc., for sequencing efforts to close and complete selected genomes used in these analyses. Thanks are also extended to A. Mercenier, W. de Vos, T. Shinoda, M. Saier, D.A. Mills, K. Makarova, and E. Koonin for providing data and/or preprints prior to publication.