Application of phylogenetic microarrays to interrogation of human microbiota


Correspondence: Oleg Paliy, 260 Diggs, Department of Biochemistry and Molecular Biology, Wright State University, 3640 Col. Glenn Hwy, Dayton, OH 45435, USA. Tel.: +1 937 775 3714; fax: +1 937 775 3730; e-mail:


Human-associated microbiota is recognized to play vital roles in maintaining host health, and it is implicated in many disease states. While the initial surge in the profiling of these microbial communities was achieved with Sanger and next-generation sequencing, many oligonucleotide microarrays have also been developed recently for this purpose. Containing probes complementary to small ribosomal subunit RNA gene sequences of community members, such phylogenetic arrays provide direct quantitative comparisons of microbiota composition among samples and between sample groups. Some of the developed microarrays including PhyloChip, Microbiota Array, and HITChip can simultaneously measure the presence and abundance of hundreds and thousands of phylotypes in a single sample. This review describes the currently available phylogenetic microarrays that can be used to analyze human microbiota, delineates the approaches for the optimization of microarray use, and provides examples of recent findings based on microarray interrogation of human-associated microbial communities.


Most microorganisms exist in nature not as populations of single species, but rather as multi-species assemblies – microbial communities. Such communities are found in ocean waters and terrestrial waterways, in soil, on the bark and leaves of plants, and on epithelial surfaces of animals (Kent & Triplett, 2002; Gao et al., 2007; McLellan et al., 2009; Olson & Kellogg, 2010; Grice & Segre, 2011). These microbial conglomerates play a central role in regulating the flow of energy and nutrients in the Earth's ecosphere and are responsible for most of the biomass production in the ocean. Members of microbial communities process metabolites and energy as parts of a large network of interacting cells, where intermediary products of the metabolism of one species are often utilized by other members (Duncan et al., 2004; Belenguer et al., 2006; Flint et al., 2008; De Vuyst & Leroy, 2011). This complex network of inter-species interactions also explains the relatively young age of the field of microbial population ecology, because previously many of the community members could not be easily grown as pure cultures in a laboratory setting. The advent and development of molecular tools provided an opportunity to bypass the culturing requirements and study community composition, dynamics, and functionality directly.

Considerable progress has been made recently in the study of microbial populations associated with humans. Called the human microbiota, this microbial community is believed to contain at least 10 times the number of cells in the human body, with the cumulative microbial gene count estimated to be 100-fold larger than the human genome (Gill et al., 2006). Renewed interest in the human microbiota is associated with the recognition of the important relationships these microorganisms form with our bodies. Human microbiota participate in the digestion of complex carbohydrates in the gut, in the protection of the host from pathogen invasion, in modulating proper immune function, and in the maintenance of epithelial homeostasis (Neish, 2009; Sekirov et al., 2010). At the same time, perturbation of the normal microbiota has been shown to be associated with a number of diseases including dental plaque, bacterial vaginosis, psoriasis, atopic dermatitis, inflammatory bowel disease (IBD), obesity, and colon cancer (Larsen & Monif, 2001; Gao et al., 2008; Sartor, 2008; Neish, 2009; Grice & Segre, 2011).

Different approaches can be taken to study complex microbial populations. We can look at the community membership where interrogation of small ribosomal subunit (SSU) RNA is used extensively to obtain the phylogenetic structure of the community (Suau, 2003; Sekirov et al., 2010). We can profile community functionality by cataloguing the pool of functional genes and proteins present in the population through meta-genomic, meta-transcriptomic, and meta-proteomic approaches (Gill et al., 2006; Klaassens et al., 2007; Booijink et al., 2010a Qin et al., 2010; Arumugam et al., 2011). We can also study the metabolic processes in microbial conglomerates by measuring levels of produced and consumed metabolites with metabolomic and metabonomic techniques (Marchesi et al., 2007; de Graaf et al., 2009; Martin et al., 2010).

The majority of recent advances in our understanding of human microbiota structure and dynamic changes in disease were made through phylogenetic interrogation of SSU rRNA. Whereas many different techniques have been successfully employed to provide novel findings, next-generation sequencing (NGS) and phylogenetic microarrays proved to be the most widely used. This review describes current advances in the use of phylogenetic microarrays to the study of human-associated microbiota.

Overview of currently available phylogenetic microarrays

While microarrays were originally developed (Schena et al., 1995) and are widely used to monitor gene expression in both prokaryotic and eukaryotic cells, they are now also employed for comparative genomics, for DNA sequencing and single-nucleotide polymorphism analysis, and for microbial detection (Wang et al., 2002; Loy & Bodrossy, 2006). Several types of microarrays have been used to date to characterize the composition and function of microbial communities, including community genome arrays, functional gene arrays, and phylogenetic microarrays (Zhou, 2003). Community genome arrays are constructed using whole-genomic DNA isolated from strains in pure culture and allow detection of individual species and strains in simple and complex communities (Wu et al., 2004). Functional gene arrays contain probes to genes encoding key enzymes involved in various biochemical processes and are useful for monitoring physiological changes in microbial communities (Wu et al., 2001; He et al., 2007). An excellent example of a functional gene array is the GeoChip, which currently contains 84 000 oligonucleotide probes for genes involved in biogeochemical cycling of carbon, nitrogen, phosphorus, and sulfur, for genes involved in metal and antibiotic resistance, and for genes coding for bioremediation of organic compounds (Zhou et al., 2010, 2011). Phylogenetic oligonucleotide arrays (phyloarrays) contain probes complementary to the small subunit (SSU) rRNA sequences and are thus well suited for the analysis of microbial community composition structure and variance. The choice of SSU rRNA for most phylogenetic studies is based on the ubiquity and sequence conservation of this molecule. Among different array types, phyloarrays are currently the most popular owing to the availability of a large set of near-full length SSU rRNA sequences deposited in NCBI, EMBL, RDP, and Greengenes databases. Currently, over 480 000 total 16S rRNA gene sequences of human microbiota origin are catalogued in NCBI GenBank (with length ≥ 1000 bp). Based on these sequence databases, many different types of phylogenetic microarrays have been recently developed for the interrogation of human-associated microbiota as shown in Table 1.

Table 1. Phylogenetic microarrays available to profile human microbiota
Array nameTarget communitySensitivity levelaPlatformDetectable groupsNo. of all probesbNo. of probes per groupNo. of each probecReference
  1. N/A, information was not available.

  2. a

    Species sensitivity is usually described as ability to detect individual phylotypes defined at 98% similarity cutoff level; varied sensitivity describes arrays that contain probes for different taxonomic group levels from species to genera to families.

  3. b

    Tot., number of total probes on the array; Phylo., number of probes targeting SSU RNA sequences.

  4. c

    Represents the number of different locations on the array each probe was placed on.

  5. d

    The numbers are given for the version of the array (G2) that has been used most widely. An updated version of PhyloChip has been recently developed (Hazen et al., 2010), the new format contains probes to 1464 families and 10 993 subfamilies of prokaryotes.

Intestinal biotaSpeciesGlass slide40 species120 Phylo.31Wang et al. (2004)
All bacteriaSpeciesAgilent1629 species10 265 Tot. / 9121 Phylo.N/A1Palmer et al. (2007)
PhyloChipdAll bacteriaVariedAffymetrix842 subfamilies506 944 Tot. / 297 851 Phylo.11+1Brodie et al. (2006)
HOMIMOral biotaSpeciesGlass slide272 species960 Tot. / 912 Phylo.1–22 Preza et al. (2009b)
Microbiota ArrayIntestinal biotaSpeciesAffymetrix775 species16 223 Tot. / 10 976 Phylo.5–111Paliy et al. (2009)
HITChipIntestinal biotaSpeciesAgilent1140 species4809 Tot.4–61Rajilic-Stojanovic et al. (2009)
AUS-HIT ChipIntestinal biotaVariedCustomArray500 species2242 Tot. / 2203 Phylo.1–23Kang et al. (2010)
Intestinal biotaGenusGlass slide310 genera1461 Tot. / 1412 Phylo.1–482Manges et al. (2010)
V-ChipVaginal biotaVariedGlass slide350 groups459 Phylo.1–31Dols et al. (2011)
OC chipOral biotaVariedGlass slide350 groups420 Phylo.1–31Crielaard et al. (2011)

The first phylogenetic array was developed by Guschin et al. in 1997. In their design, oligonucleotide probes complementary to the 16S rRNA gene sequences of selected genera of nitrifying bacteria were immobilized in gel pads on a glass slide (Guschin et al., 1997). One of the first phyloarrays for human microbiota interrogation was described by Wang et al., who placed 40-mer probes to 40 predominant gut bacteria onto epoxy slides. Using this array, researchers were able to detect between 25 and 37 species in human fecal samples (Wang et al., 2004). The number of species and phylogenetic groups that can be detected by a single microarray has been expanded significantly in further studies. Palmer et al. queried the prokaryotic SSU rRNA database to develop oligonucleotide probes to over 1600 bacterial and archaeal species. The microarray was employed to profile fecal samples of 26 infants and showed that infant fecal microbiota displayed remarkable temporal and inter-personal variation (Palmer et al., 2007). PhyloChip, an Affymetrix-based microarray, was also intended to cover all available prokaryotic 16S rRNA gene sequences and contains probes to operational taxonomic units (OTUs, also called phylotypes) representing thousands different subfamilies (Brodie et al., 2006; Hazen et al., 2010). This array has been used successfully in a number of studies looking not only at the human-associated microbiota (Cox et al., 2010ab; Lemon et al., 2010) but also profiling microbial communities in other environments (Wu et al., 2010; Deangelis et al., 2011; Mendes et al., 2011). Another phylogenetic microarray based on Affymetrix photolithographic technology, Microbiota Array, was developed in our laboratory (Paliy et al., 2009). This microarray was designed to be specific to bacteria residing in the human gastrointestinal tract and contained multiple sets of probes to 775 different species. Microbiota Array has already been utilized to successfully profile distal gut populations of healthy adults and healthy children as well as children diagnosed with irritable bowel syndrome (IBS), IBD, and obesity (Agans et al., 2011; L. Rigsbee, R. Agans, H. Kenche, H.J. Khamis, S. Michail and O. Paliy, manuscript in submission; H. Kenche & O. Paliy, unpublished results). Another microarray specific to human gut microbiota, HITChip, was developed in the group of Willem de Vos (Rajilic-Stojanovic et al., 2009). This array can detect levels of 1140 different species and was employed in a number of recent studies (Biagi et al., 2010; Booijink et al., 2010b; Van den Abbeele et al., 2010).

Several new phylogenetic array designs have recently been added to the growing list as shown in Table 1. As can be surmised from the table, different array designs vary in the platform used, in the lowest taxonomic level interrogated, and in the choice of targeted communities. For example, both PhyloChip and Microbiota Array were developed utilizing the Affymetrix platform, which enables arrays to contain multiple probes for each tested OTU and also provides means to curtail potential cross-hybridization through the use of mismatch probes. At the same time, HITChip and the array developed by Palmer and colleagues are based on the Agilent system, which allows more cost-efficient update of the array design (as no expensive lithographic masks are used) as well as the profiling of multiple samples on a single slide. When many arrays are needed, the use of standard glass slide platform remains the most economical method owing to low costs of the glass slides; however, extensive tests are often required to validate the quality of printed slides.

Another important consideration distinguishing various microarray designs is the specificity level and the intended target community. Species (OTU, phylotype)-level specificity provides the deepest interrogation that is close to that achieved by Sanger sequencing and is currently better than what next-generation sequencing allows, and a number of phylogenetic arrays achieve this specificity (Table 1). Reviewing targeting breadth of the available arrays, both PhyloChip and the array developed by Palmer et al. were designed to contain probes to as many prokaryotic species as possible. Such a design allows the microarray to be used in a variety of studies focused on different types of microbial populations, and PhyloChip is an excellent example of this design strategy with recent reports of microarray interrogation of human stomach, intestine and mouth, watershed communities, tropical soil, and plant rhizosphere (Cox et al., 2010a; Lemon et al., 2010; Wu et al., 2010; Deangelis et al., 2011; Mendes et al., 2011). The drawback of such broad microarray design is an expected higher number of false positives owing to the combination of large number of probes present and probe cross-hybridization; this can be mitigated to some extent by a rigorous probe selection process and stringent criteria for positive detection calls. In contrast, phylogenetic arrays specific to one particular community, such as Microbiota Array and HITChip developed for the interrogation of human gastrointestinal biota, benefit from the reduced cross-hybridization potential but can only be applied to the analysis of that particular community.

Other reports were made available recently that described microarrays based on nontraditional technologies. Candela et al. (2010) developed a DNA microarray based on the use of fragment ligation reaction coupled with the interrogation of the ligated products on a ‘detection’ array. In this approach, successful ligation of two adjacent oligonucleotides is dependent on the presence of the complementary target sequence; this method relies on high selectivity of ligase and therefore can distinguish single-nucleotide differences. The ligated products are quantified on a specially designed ‘universal’ detection microarray that contains probes complementary to the oligo tag sequences (‘zip-codes’) incorporated into the ligated oligonucleotides. The use of this universal array creates uniform hybridization conditions for all zip-coded sequences, and the same array platform can be used with multiple ligation probe sets which provides great flexibility (Hultman et al., 2008). One of the pilot ligation microarrays was made to quantify levels of 30 groups of human intestinal microbiota and was used to profile fecal samples of several young adults (Candela et al., 2010). Another microarray platform, termed restriction site tagged microarray, was developed by Zabarovsky et al. (2003). In this method, a rare-cutting restriction enzyme is chosen, and a set of short tags (19 bases long) is developed that match sequences flanking recognition sites of that restriction endonuclease in the genome of a particular species. A collection of such tags for one species constitutes that species ‘passport’. After digesting genomic DNA with the chosen restriction enzyme, different species can be distinguished through DNA hybridization to the custom microarray containing probes complementary to the restriction site flanking regions (Zabarovsky et al., 2003). The use of such restriction enzyme passports allows efficient discrimination of even closely related strains of bacteria; however, this approach has not yet been used for the in-depth interrogation of human-associated microbiota. Indeed, because this method relies on the use of organism genome sequence to develop that species’ passport, it might not be well suited to profile microbial communities with many uncultured and yet unsequenced members. Finally, microarrays based on the interrogation of large subunit rRNA genes have also been recently developed (Mitterer et al., 2004; Yoo et al., 2009). For example, Yoo and colleagues designed a diagnostic microarray containing multiple probes to 23S ribosomal DNA and 16S–23S intergenic spacers of 39 pathogenic bacteria. The DNA microarray was shown to have 100% specificity (no false positives) and close to 100% sensitivity (almost no false negatives; both values were defined based on comparison with culture-based identification) (Yoo et al., 2009).

Optimizing the use of phylogenetic microarrays

Phylogenetic microarrays are one of a number of currently available molecular approaches for the interrogation of complex microbial communities. We provide a short comparison of the most widely used methods in Table 2; a more detailed comparison of these and other technologies is available in a comprehensive review of gut microbiota (Sekirov et al., 2010). The main advantages of phylogenetic microarrays compared with other methodologies include (1) ability to profile one sample at a time, which is useful in clinical studies and as a diagnostic tool; (2) quantitative nature of the acquired data allowing direct comparison of levels of each OTU between samples; (3) short processing and data acquisition times (only 2 days from sample to data using Microbiota Array); and (4) currently lower costs compared with NGS, especially if we take into account that microarrays typically provide greater overall level of coverage of sample mixture (Xu et al., 2011). The main limitation of microarrays is their inability to reveal novel species in any sample, because the arrays can only detect those sequences for which they contain probes. In addition, the design, use, and analysis of microarrays are technically demanding and require extensive testing, validation, and optimization (Hashsham et al., 2004). For example, cross-hybridization of DNA fragments to multiple probes needs to be controlled and adjusted for to avoid artificially over-estimating microbiota diversity (Rigsbee et al., 2011).

Table 2. Molecular methods for the analysis of human microbiota
 Phylogenetic microarraysNGSSanger sequencingFISHqPCR
Taxonomic resolutionGoodModerateVery goodModerateModerate
Throughput capabilityHighHighModerateLowLow
SensitivityHighHighModerateLowVery high
CostModerate to highHighVery highLowModerate
Main advantageHigh-throughput quantitationNovel sequence discoveryTaxonomic identificationCell visualizationSensitivity

A number of approaches have been developed in our laboratory and by others to improve the robustness of the estimates provided by phylogenetic microarrays. The work in our group has focused on three aspects. We have developed a mathematical model of 16S rRNA gene amplification (an experimental strategy employed in most studies to selectively enrich DNA samples with 16S rRNA genes) linked to phylogenetic microarray detection based on the Microbiota Array design. The model aimed to determine optima for the amount of starting genomic DNA material and for the number of amplification cycles to be used in PCR to achieve detection of the maximum fraction of community members and at the same time maintain good accuracy of quantitative abundance measurements (Paliy & Foy, 2011). The model showed that the optimum experimental conditions included a combination of small amount of starting genomic DNA (up to 50 ng) and moderate number of PCR amplification cycles (15–20). We also developed two adjustment algorithms intended to improve the concordance between the actual bacterial numbers in the community and the distribution of measured DNA hybridization signals. The first algorithm accounts for the predicted cross-hybridization of 16S rRNA gene fragments among different species. It models the measured total signal as a combination of the true signal (binding of the complementary fragment to its target) and cross-hybridization signal owing to erroneous hybridizations. Levels of cross-hybridization signals were estimated from microarray validation experiments and were incorporated into the algorithm to calculate the true signal (Rigsbee et al., 2011). The goal of the second algorithm was to adjust the normalized microarray signal by an estimated number of 16S rRNA gene copies per species genome. Because different species of bacteria are known to possess a wide range of ribosomal RNA-encoding gene copies per genome [1–15, see (Lee et al., 2009)], measured total signal for each species is an indication not only of that species abundance but also of the number of 16S rRNA genes that species possesses. The use of the copy number adjustment algorithm allowed a better estimate of the actual species abundance in each sample (Rigsbee et al., 2011). Improvements in experimental and analytical procedures for other arrays were also described (Hamady et al., 2010; Salonen et al., 2010; Schatz et al., 2010; Holmes et al., 2011), and general optimizations of the phylogenetic microarray design and use are also available (Peplies et al., 2003; Letowski et al., 2004; Avarre et al., 2007). It has also been a good practice in most studies to confirm microarray findings with other molecular techniques. Good concordance of the microarray results was found for comparisons with NGS (Claesson et al., 2009; Manges et al., 2010; van den Bogert et al., 2011), quantitative real-time PCR (Paliy et al., 2009; Kang et al., 2010; Agans et al., 2011), and fluorescent in situ hybridization (FISH) (Rajilic-Stojanovic et al., 2009).

Application of phylogenetic microarrays to human microbiota analysis

Many different studies have been successfully performed utilizing phylogenetic microarrays for the interrogation of human-associated microbiota in health and disease. In this section, we present examples of the use of three different microarrays for such high-throughput analysis.

Because each particular microarray measures the hybridization level of every interrogated sequence, it provides quantitative signal for each examined OTU in every sample. This feature allows not only direct comparisons of OTU abundances between samples, but it also offers an opportunity to assess whether a particular OTU is detected in all or most samples. A set of such OTUs can be considered to form a core microbiome of species present in every community of particular human microbiota, which can potentially be attributed to an important role of these OTUs in inter-species and host–microbial interactions. Using Microbiota Array, we have recently profiled 60 samples of human fecal microbiota in healthy adults and adolescents and in adolescents diagnosed with obesity and diarrhea-predominant IBS [(Agans et al., 2011) and (L. Rigsbee, R. Agans, H. Kenche, H.J. Khamis, S. Michail and O. Paliy, manuscript in submission)]. We have now defined a robust core of 44 microbial phylotypes that were reliably detected in at least 59 samples (Fig. 1). We allowed the core bacterial species to be missing from one sample because consideration of individual microbiomes revealed cases where members of a particular genus were completely absent in one individual. An example was provided by a patient with IBS who had no members of genus Faecalibacterium detected in her fecal microbiota (Rigsbee et al., manuscript in submission); this genus constituted between 4% and 15% of total bacterial abundance in all other samples. Among core species were members of the genera Anaerostipes, Bacteroides, Blautia, Coprococcus, Dorea, Eubacterium, Faecalibacterium, Peptostreptococcus, Roseburia, and Streptococcus. Most of the gut microbiota phylotypes belonged to the so-called ‘shared’ group, which we defined as those present in multiple but not all samples. As can be observed from Fig. 1, many individual microbiomes contained unique species that were detected only in that particular sample. Phylogenetic microarrays have also been used to obtain core microbiomes in other studies (Rajilic-Stojanovic et al., 2009; Jalanka-Tuovinen et al., 2011); for example, a core of 75 OTUs was detected in lung microbiota among eight patients diagnosed with chronic obstructive pulmonary disease (Huang et al., 2010).

Figure 1.

Core fecal microbiome defined with Microbiota Array. The figure displays the distribution of detected OTUs among 60 samples of human fecal microbiota obtained from four groups of participants. Outmost bands illustrate sample designation to four different groups. Individual outer segments show phylotypes unique to each analysed sample; inner circle represents core species detected in at least 59 samples of each group; shared set (middle donut) enumerates OTUs detected in more than 1 but < 59 samples of the group. aHLT – healthy adults, kHLT – healthy adolescent children (kids), kOBE – obese children, kIBS – children diagnosed with IBS.

The diversity and temporal stability of microbiota in ileum of patients with ileostomy were studied with HITChip (Booijink et al., 2010b). Ileal contents had high amounts of Streptococcus, Veillonella, and Lactobacillus (Fig. 2). Overall, the ileal microbiota was less complex than that typically observed in the distal gut. Profiling ileal biota over a period of 28 days indicated that microbial communities were not only sufficiently different among participants but also unstable in the same individual even within 1 day, as substantial differences were detected in microbial profiles between morning and evening samples collected on the same day (Booijink et al., 2010b). Such temporal differences are in contrast with previous reports indicating relative stability of human distal gut microbiota over long periods of time (Zoetendal et al., 1998; Costello et al., 2009; Rajilic-Stojanovic et al., 2009; Claesson et al., 2011; Jalanka-Tuovinen et al., 2011). Relative instability of ileal biota was related by the authors to the more significant fluctuations of lumenal contents in the small intestine compared with those of the colon (Booijink et al., 2010b).

Figure 2.

Relative contribution of three bacterial genera to the ileal microbial communities in four patients with ileostomy as measured by HITChip. Each column represents a specific day as shown, each set of four columns corresponds to all samples collected from a single patient. Y axes (log2 scale) show relative abundance values of each genus. Figure is based on the data from (Booijink et al., 2010b).

PhyloChip microarray was employed by Lemon et al. (2010) to examine bacterial microbiota of the nostril and oropharynx in seven healthy adults. Diversity and stability of oropharynx microbiota were higher than those of the nostril microorganisms. Four phyla – Firmicutes, Proteobacteria, Actinobacteria, and Bacteroidetes – accounted for the majority of the detected bacteria (Fig. 3). Interestingly, while Firmicutes and Actinobacteria predominated in the nostril, Firmicutes and Proteobacteria abounded in the oropharynx. Nostril microbiota was thus more similar to that found on the skin, whereas oropharynx communities resembled that of the saliva. While Firmicutes were the most prevalent phylum in both regions, distinct families dominated numerically in each site. Moreover, a striking inverse relationship was observed in the relative abundances of the Firmicutes and another prevalent phylum in each sample as shown in Fig. 3 (Lemon et al., 2010).

Figure 3.

Relative distributions of bacterial phyla in the nostril and oropharyngeal samples as detected by PhyloChip. Pie charts show relative contribution of each phylum (average among seven profiled individuals) to overall microbiota abundance in each region. Bar graphs below each pie chart show relative amount (% total) of the two most abundant phyla in each individual sample. The figure is adapted from data in (Lemon et al., 2010) with permission.


Utilizing phylogenetic microarrays, NGS, and Sanger sequencing, initial large-scale studies of human microbiota focused on the compositional analysis of this complex microbial community, and we now have relatively good understanding of which community members are present at different sites and how the community structure fluctuates over time. Phylogenetic microarrays are now also actively used to obtain quantitative data on the changes experienced by human-associated microbial communities in different diseases of the gut, skin, airways, and vaginal canal. The examples provided in the section above illustrate a wide diversity of projects successfully employing phylogenetic microarrays for the analysis of human-associated microbial communities, and they highlight a variety of questions that can be answered with this technology. Phylogenetic arrays were also used to study gut microbiota development in infants (Palmer et al., 2007; Cox et al., 2010a); altered fecal microbiota in patients with IBD (Kang et al., 2010), IBS (Kajander et al., 2008) and Clostridium difficile infection (Manges et al., 2010); oral microbiota in children (Crielaard et al., 2011), adults (Olson et al., 2011) and the elderly (Preza et al., 2009a, b); differences in airway microbiota of pediatric and adult cystic fibrosis patients (Cox et al., 2010b); and microbiota associated with bacterial vaginosis (Dols et al., 2011). One potential barrier to the wider use of phyloarrays by many researchers is the complexity of the de novo array design and the substantial effort required to validate and test array performance. Even so, a panel of the phylogenetic microarrays already available (Table 1) makes it possible to analyze most human-associated microbial communities.

The combination of moderate costs and quantification capability of phyloarrays make them an attractive option as a choice of high-throughput method for current and future studies of human-associated microbiota composition. Especially appealing is a simultaneous application of phylogenetic microarrays and next-generation or Sanger sequencing to the analysis of the same microbial population (Crielaard et al., 2011; van den Bogert et al., 2011). While SSU rRNA gene sequencing provides the ability to identify novel members of such communities, microarrays can be used to quantitatively compare phylotype abundances among samples and between sample groups.

While recent studies have employed microarrays and sequencing to answer ‘Who is there?’ type questions, the future directions of microbiota research will likely involve the use of a combination of novel molecular tools to (1) obtain a large-scale view of the interactions between microbiota members and between microbiota and human host, and to (2) link microbiota function and activity to different diseases. In this integrative approach, microarray analysis can be supplemented with other high-throughput systems biology methods including metabonomics, meta-transcriptomics, and meta-proteomics (Klaassens et al., 2007; Booijink et al., 2010b; Martin et al., 2010). Combining these techniques would allow us to simultaneously profile community composition (phylogenetic microarrays, SSU RNA gene sequencing), overall gene content (meta-genomics), and gene (meta-transcriptomics) and protein (meta-proteomics) expression, and we will be able to link these data sets to the metabolite levels measured in the same fecal or lumenal samples (metabonomics). Only such truly integrative strategy can provide satisfactory understanding of the complex interplay among microbiota members and between microbiota and human host in both health and disease.

There are also several potential options for future improvements in phylogenetic microarray design and use. Availability of a large set of genome sequences of human-associated microbiota members through Human Microbiome Project (Peterson et al., 2009) and metaHIT initiative (Qin et al., 2010) opens an opportunity to design phylogenetic detection arrays based on functionally conserved genes such as groEL, rpoB, gyrA, and tuf (Loy & Bodrossy, 2006). Some phylogenetic microarrays such as Microbiota Array can already not only measure the levels of SSU rRNA genes but can also profile the abundances of SSU rRNAs itself, which provide estimates of the metabolic activity of community members (Rigsbee et al., 2011). Moreover, the arrays can be eventually expanded to include probes to microbial functional genes; this would allow us to combine on one array community structure interrogation based on phylogenetic probes with community function description based on the availability and abundance of metabolic genes (Louis & Flint, 2007). At the same time, phylogenetic microarrays can be used as microbial diagnostic arrays in clinical settings, where their ability to provide species-level detection of hundreds of human microbiota members in a short period of time can aid in disease diagnosis and the choice of best available treatment (Loy & Bodrossy, 2006).


We are grateful to Willem de Vos, Erwin Zoetendal, Seungha Kang, Bart Keijser, Frank Schuren, Amee Manges, Michael Markey, and Susan Lynch for valuable comments on the manuscript. The work in Paliy laboratory is supported by the National Institutes of Health grants AT003423 and HD065575.