An integrative, multi-scale, genome-wide model reveals the phenotypic landscape of Escherichia coli

Authors


Abstract

Given the vast behavioral repertoire and biological complexity of even the simplest organisms, accurately predicting phenotypes in novel environments and unveiling their biological organization is a challenging endeavor. Here, we present an integrative modeling methodology that unifies under a common framework the various biological processes and their interactions across multiple layers. We trained this methodology on an extensive normalized compendium for the gram-negative bacterium Escherichia coli, which incorporates gene expression data for genetic and environmental perturbations, transcriptional regulation, signal transduction, and metabolic pathways, as well as growth measurements. Comparison with measured growth and high-throughput data demonstrates the enhanced ability of the integrative model to predict phenotypic outcomes in various environmental and genetic conditions, even in cases where their underlying functions are under-represented in the training set. This work paves the way toward integrative techniques that extract knowledge from a variety of biological data to achieve more than the sum of their parts in the context of prediction, analysis, and redesign of biological systems.

Synopsis

image

A data-driven, integrative modeling methodology is presented that unifies signal transduction, gene expression, and metabolic processes under a common framework. Training on an aggregated dataset results in improved prediction of regulatory connections and measured phenotypes.

  • A curated Escherichia coli dataset combining gene expression data for genetic and environmental perturbations, transcriptional regulation, signal transduction metabolic pathways, and growth data is constructed.
  • Gene expression, signal transduction, and metabolic datasets are incorporated into a novel integrative framework for genome-scaling modeling.
  • Training of the genome-scale model with the integrated dataset leads to high correlation between predicted and measured phenotypes and reveals new regulatory links.
  • A model enrichment technique identifies under-represented and highly variable knockouts to drive experimentation.