There is not one path anymore. Twenty years ago, you worked at the clean bench, you isolated new microbes able to grow on agar plates, and then you isolated single genes coding single enzymes for particular processes, you optimized them, and you wrote books or articles with conceptual and technical developments based on known biodiversity. If you were lucky, a small-to-medium company approached you and continued to go further with the applicability of such finding. Today you can be a biogeochemist with bioinformatics knowledge who writes books or articles trying to reveal the mysteries of microbial and genetic adaptation and diversity. It all sounds so . . . uncomplicated, doesn't it? But, of course, THIS DIDN'T HAPPEN overnight. It's been especially in the past fifteen years that a confluence of factors, mainly, technical developments, has resulted in some young people turning their backs on sequencing. But above all, there is the sense that biodiversity is at the centre of a vital scientific universe, with microbes as its capital: we know the communities, how diverse they are, but we are far from understanding the individual members and functions, and how each of them can be helpful, for example, to improve the human condition. It is like our human society: the government knows how many we are, but it does not know how each individual lives, and how many consortia (friends and family, to cite some) we constitute.
The co-founding editor of Microbial Microbiology wrote to me and asked, wouldn't I want entree into a crystal ball, to ‘predict’ the future and catch reader attention? Of course, there's more than a little romanticism to do this, there is a discernible sense that, as a young researcher put it: ‘those kinds of jobs – people predicting the future – exist, but just not for scientists’. However, I agreed, because I don't know why anyone who wants to be involved in scientific understanding would not want to turn their attention to future ideas. Following on from this, how do I know I have a correct vision over the next few years? It's hard to get a sort of accurate gauge on how you're doing or what you will do, but you have to just take it on faith that ‘there is a real biotech-market out there and an appreciation for what you're doing’.
I am a chemist-turned-enzymologist and now a microbiologist. Like all of you, I believe that microbes are important for the Earth System, playing a very important role in maintaining the well-being of our global environment. Despite the obvious importance of microbes, very little is known of their diversity, how many species are present in the environment, and what each individual species does – i.e. its ecological function. Until recently, there were no appropriate techniques available to answer these important questions. The vast majority of these organisms cannot be cultured in the laboratory and so are not amenable to study by the methods that have proven so successful with known microorganisms throughout the 20th century. It was only with the development of high-throughput technology to sequence DNA from the natural environment that information began to accumulate that demonstrated the exceptional diversity of microbes in Nature – in fact, most microbes are entirely novel and have not previously been described.
A non-exhaustive list of questions that should be addressed over the next few years includes: ‘is everything is everywhere?’, ‘do microorganisms exhibit bio-geographical patterns of distribution?’, ‘is the relative abundance of a certain group of microorganisms necessarily linked to their importance in the community functioning?’, ‘which organisms are of pivotal importance in the community?’, ‘how diverse are metabolic pathways and networks within the given ecosystem?’, ‘how do microbes and protein-coding genes interact with each other to lead to the overall system function?’, ‘how many specific microbes are responsible for the metabolism of different substrates?’, ‘how do environmental stimuli impact ecosystem functioning and long-term system stability?’, and finally ‘how can we improve the meta-genomic technology for accommodating the needs of microbial biologists and enzymologists?’.
To answer such questions, it should be noted that conceptual advances in microbial science will not only rely on the availability of innovative sequencing platforms but also on sequence-independent tools for getting an insight into the functioning of microbial communities. I believe that is so because, over the last four years, in all conferences there was the very same question: ‘how can I get information about hypothetical genes and functions’? The reasons are clear. First, every single cell or environmental genomic project added a huge number of putative genes, the function of which is often unknown and at best deduced from sequence comparison. Second, even the best annotations only created hypotheses of the functionality and substrate spectra of proteins which require experimental testing by classical disciplines such as physiology and biochemistry. This highlighted the difficulties of making sense of environmental sequence data: a significant proportion of the open reading frames could not be characterized because there were no similar sequences in the databases.
As we are primarily concerned with establishing the function of microorganisms in the environments and identifying new enzymes with biotech-promiscuity, there is an urgent need for characterizing protein functions from environmental DNA or proteomes. Once function has been identified, it can be mapped to metabolic pathways or proteins involved in a particular process (environmental or biotech-like), to determine the functional activity. To this end, I think that it will be helpful to generate visualization tools capable of generating functional and dynamic knowledge, if possible, non-destructively and in real time. That is, tools that identify the connected poles of activities (the so-called microbial reactomes) that shape the internal structure of an ecological niche, without a large-scale DNA sequence analysis. This is a straightforward concept since it constitutes the direct link from DNA and genes to proteins and functions, a major hurdle in both Systems Biology and Biotechnology studies. By doing this, it will be possible to unravel gene functions and add valuable information about how microorganisms adapt to changing environmental influences, and how biotech processes can be designed by new microbial functions that can be checked by visualizing directly the reactomes.
I think that methods for identifying at global-scale microbial reactomes are partially available, but we still need to solve many problems. For example, bioinformatics methods exist for isolating in silico microbial reactomes; but they rely on sequence data. Metatranscriptomics has the potential to describe how metabolic activities will change, but still does not reflect the protein level and does not predict microbial functions (only upregulation or downregulation). Metaproteomics gives valuable information on how microbes respond to stress, but it is limited to the low resolution and no direct functionality. Metabolomics has become very popular recently by combining new analytical and isotope analyses but, in the environmental context, its use is very meager because the difficulty in identifying and localizing metabolites. Finally, single-cell genomics is increasing in importance but, once again, the sequencing of such cells will predict functions based on sequence data or, in the best case functional hypothesis of certain individual functions. If we ignore these problems, we increasingly waste significant financial resources and staff effort in order to achieve a final goal: reconstruct experimentally based reactomes in single cells or complex communities.
So it does not need to take a crystal ball to see that the bottleneck in meta-genomic technology, both for microbial and biotech point of view, will not be only the design of powerful assembler computer programs but rather the development of technologies that provide direct analysis of complex mixtures and entail detecting specific substrate-protein transformations among thousands of other endogenous metabolites and proteins in order to get a clear picture of ‘who is doing what’. Some methods do exist for isolating single transformations from the natural environment; but these are not relevant for reactome coverage as they are not universal. Clearly, existing methods for enzymatic activity detection based on changes in spectroscopic properties should give rise to high-throughput chips than can be used to provide information on chemistry of reactions and identity of the product formed. This type of information will be extremely useful for ascribing functions to genetic sequences from environmental samples, thus minimizing annotation mistakes and suggesting biotechnological potential. I believe in a future where any single genome or environmental sequence project is done in parallel with chip-based enzyme screening, so that annotations are experimentally documented at the time when the paper is written. Only through obtaining holistic information can holistic hypotheses about ecosystem characteristics be formulated. The question then is: ‘how many reaction and substrate types should one have in a single high-throughput chip to cover the whole microbial metabolism’? As Shakespeare said in Hamlet, ‘that is the question’! I think that this is the time to think about it as progress to manage sequence information per se accelerates. All in all, it is clear that to access the microbes in their natural milieu and new enzymes from them, there is a strong need to elaborate a Systems Biology concept based on the combination of multiple strategies to understand the functioning of microbial communities as a whole, with metagenomic tools playing a pivotal role (Fig. 1).