The mantra of biology is more data, if possible measuring everything, at high resolution and throughput. Everyone who reviews research or PhD proposals is bombarded with statements on the use of top-notch-omics methods. Much rarer are clear visions on how the anticipated results will aid in finally understanding a particular phenomenon. When reviewing upshots of the proposed research (e.g. publications), we often complain about descriptive data gathering. Now, one could argue that this will change once the current proposals start to spin off publications. Still, glancing at old proposals, including those of this crystal glazer, we were hopeful at the time that the anticipated results would indeed provide us with the missing insights into our various subjects. Do we live in an extremely lucky, and thus unlikely time where is finally all downhill or is there a conceptual problem?

Of course there is a conceptual problem because the sheer amount of data, let alone their non-linear dynamic relationships, challenges our intuition and logical reasoning far beyond their capabilities. It is computational analysis, stupid. Historically, the more data mantra is perfectly understandable. In the molecular age, we have only been able to glimpse at tiny fragments of the whole for decades. With the emergent transcriptomics and proteomics methods a dream became true. Blinded by the suddenly available potential, the initial flood of papers was primarily descriptive. Attracted by this potential, computer scientists later developed bioinformatics methods that help now to sift through piles of expression data, identifying sets of co-regulated genes, regulons and functional correlations.

Much of the bioinformatics success is related to the fact that this research still is, for the most part, in a discovery mode to identify involved molecular components and structures of genetic networks. Understanding, however, implies the ability to accurately predict the non-linear dynamic behaviour. For this we need computational models that represent relevant biological mechanisms in a quantitative fashion, enabling what-if simulations and predictions of behaviour in not-yet-studied situations. While computational modelling has not been the pride of the biological toolbox so far, the last couple of years have brought forward a number of promising applications that make intelligent use of transcriptomics and proteomics data. Obviously further technical developments are still to come, in particular for proteomics, but even with standard technology a single PhD student can generate piles of ‘-omics’ data today. The times they are a changin' for biology. Development of models and computational methods to analyse and integrate such ‘-omics’ data and to design key follow-up experiments for unravelling complex mechanisms becomes the key challenge. One indication that resistance to the change is dwindling is the launch of a new section on computational biology in the traditional ASM flagship Journal of Bacteriology (Zhulin, 2009).

For the more recent addition to the ‘-omics’ arsenal – metabolomics – the technical challenges are perhaps even greater because of the chemical heterogeneity (and sometimes extreme similarity) of metabolites, their rapid turnover, chemical instability, dynamic range and often unknown structure. Nevertheless, I have no doubt that these problems will be solved eventually. The question is: will the experience from the above ‘-omics’ history promote faster intelligent use of metabolomics data? In one incarnation, metabolomics focuses on profiling of as many as possible metabolites to identify functional biomarkers for biological traits, with medical and plant metabolomics at the forefront. This line of research is primarily in the discovery mode and has clearly learned its lesson – suitable bioinformatics methods are available and are routinely used.

An entirely different matter is the nascent field of quantitative metabolomics. As the catalytic interactions between metabolites and enzymes are known for major parts of metabolic networks, the focus is not discovery but monitoring sometimes only subtle response of known system components to perturbations. Consequently, the data contain important functional and mechanistic information, but these are not immediately obvious. What does an increase in one metabolite signify, and what does it mean when changes occur in distant parts of the network? In sharp contrast to transcriptomics and proteomics, metabolite concentrations are not directly linked to genes. Instead, the concentration of a given metabolite is determined by the presence and in vivo activity of its cognate enzymes, their kinetic and regulatory parameters, the pathway flux and other factors. As the most informative metabolites are typically connected to many different enzymes, changes in their concentrations are extremely difficult to trace back to particular events.

An obvious molecular interpretation approach is kinetic models of metabolism, and the lack of such data for modelling was a key motivation for metabolomics method development in the first place. Here is the prediction: although there are still significant analytical and work-flow problems to be solved for quantitative microbial metabolomics, I predict a dramatic increase in the availability of such data in the near future. Profiling metabolite data are already generated in vast amounts and there are no conceptual problems for high-throughput (semi)quantitative metabolomics. As cost, effort and time per single analysis are only a fraction of those for other ‘-omics’ techniques, metabolomics time-course and large-scale screening data sets will soon outnumber those from gene-based ‘-omics’ by far.

This development will create a dilemma because we lack currently appropriate concepts, beyond simple correlation analyses, to obtain mechanistic insights from the expected metabolomics data. Unless the experiments are specifically designed for this purpose, kinetic modelling will not be able to exploit large-scale metabolomics data to a significant extent, simply because metabolite data alone are insufficient. Different from gene-based ‘-omics’, logical reasoning and the current bioinformatics/statistics methods will also not be overly useful. Unless my prediction is far off, the gap between our technical capacity for metabolomics data generation and our ability for digesting them will soon become huge. Thus, the call is open for intelligent computational methods. My guess (and that is all it is) is that methods enabling identification of the most probable conditions or mutants from metabolomics screens for specific follow-up analyses, as well as the design of such further experiments, have initially the greatest potential for obtaining mechanistic insights.


  1. Top of page
  2. References
  • Zhulin, I.B. (2009) It is computation time for bacteriology. J Bacteriol 191: 2022.