The microbial world represents the most important and diverse group of organisms living on earth (Whitman et al., 1998;Curtis et al., 2002), comprising most of the diversity of the three domains of life defined by Woese and colleagues (1990): Archaea, Bacteria and Eucarya. Furthermore, these organisms are widely distributed across many environmental habitats, even the most extreme. Their numerous enzymatic machineries have allowed them to adapt to almost every ecological niche and take advantage of any environmental condition (Øvreås, 2000; Guerrero and Berlanga, 2006). Despite our increasing knowledge of the role of microorganisms in ecosystem functioning, our current vision of the microbial world is still incomplete and several issues remain unsolved. This is partially explained (i) by the tremendous diversity of the genes and metabolisms of the existing species but also of ecological niches and (ii) by technological limits such as our inability to culture the majority of microorganisms (Amann et al., 1995; Pace, 1997).
Because of this huge microbial biocomplexity, high-throughput molecular tools allowing simultaneous analyses of existing populations are greatly needed (Torsvik and Øvreås, 2002; Xu, 2006). Massive sequencing based on next-generation sequencing (NGS) technologies and microarrays are currently the most promising and complementary approaches to address these tasks (Claesson et al., 2009; Roh et al., 2010; van den Bogert et al., 2011). Using NGS, two specific strategies can be applied: metagenomics, which refers to the study of the collective genomes in a given environmental community and the 16S rDNA amplicon sequencing approach. In principle, these methods enable: (i) access to the wide diversity of microbial communities, (ii) identification of unknown microorganisms and (iii) the potential to link structure to functions (Simon and Daniel, 2009). Some limitations of metagenomics, however, have been demonstrated: for example, the huge difficulty of managing large amounts of sequence data, or the short sequence read length (400–500 bases maximum with 454 FLX Titanium instrument from Roche), which complicates contigs assembling, or the sequencing errors caused by NGStechnologies (Roh et al., 2010). Furthermore, Quince and colleagues (2008) estimated that detecting 90% of the richness in some hyperdiverse environments could require tens of thousands of times the current sequencing effort, which is inconceivable. Oligonucleotide microarray technologies have, however, been widely used for gene detection and gene expression quantification, and more recently, were adapted to profiling environmental communities in a flexible and easy-to-use manner (Zhou, 2003; Wagner et al., 2007). These approaches can monitor the presence, or the expression, of thousands of genes, combining qualitative and quantitative aspects in only one experiment (Tiquia et al., 2004; Marcelino et al., 2006; Dugat-Bony et al., 2011). Furthermore, this technology appears well adapted to multi-sample comparison. Although several whole-genome arrays have been developed in the last few years, phylogenetic oligonucleotide arrays (POAs), targeting the 16S rRNA genes, as well as functional gene arrays (FGAs), targeting key genes encoding enzymes involved in metabolic processes, are the two major approaches to assess diversity of microbial communities in the environment (Wagner et al., 2007). Currently, the most comprehensive tools developed are the high-density PhyloChip, with nearly 500 000 oligonucleotide probes to almost 9000 operational taxonomic units (Brodie et al., 2006), and the GeoChip 3.0 with ∼ 28 000 probes covering approximately 57 000 gene variants from 292 functional gene families (He et al., 2010). Whereas microarrays were demonstrated as being sufficiently sensitive, with detection of sequences representing genomic material from 0.05% to 5% of the total environmental community (Bodrossy et al., 2003; Peplies et al., 2004; Loy et al., 2005; Gentry et al., 2006; Marcelino et al., 2006; Palmer et al., 2006; Huyghe et al., 2008), these methods require a sequence a priori to determine probes and hence allow surveys only of microorganisms with available sequences in public databases (Chandler and Jarrell, 2005; Wagner et al., 2007).
The main problem that must be faced to construct oligonucleotide microarrays dedicated to microbial ecology is the probe design step. Indeed, environmental microarrays often require this step to be manually performed. Although numerous general probe design programmes are currently freely accessible for academics [for recent reviews see Lemoine and colleagues (2009)], only few may be useful for microbial ecology applications and are listed in Table 1. This review aims to show how probe design strategies can avoid the limitation of sequence availability and make possible the detection of previously uncharacterized microbial populations present in nature. We emphasize various recent methods combining the use of both degenerate and non-degenerate oligonucleotide probes to target either 16S rRNA markers, or new proteic variants. In conclusion, we highlight other procedures and limitations that must be circumvented, to improve microarray development in terms of specificity and sensitivity.