Genome-scale gene/reaction essentiality and synthetic lethality analysis

Authors

  • Patrick F Suthers,

    1. Department of Chemical Engineering, The Pennsylvania State University, University Park, PA, USA
    Search for more papers by this author
    • These authors contributed equally to this work
  • Alireza Zomorrodi,

    1. Department of Chemical Engineering, The Pennsylvania State University, University Park, PA, USA
    Search for more papers by this author
    • These authors contributed equally to this work
  • Costas D Maranas

    Corresponding author
    1. Department of Chemical Engineering, The Pennsylvania State University, University Park, PA, USA
    • Corresponding author. Department of Chemical Engineering, The Pennsylvania State University, University Park, PA 16802, USA. Tel.: +1 814 863 9958; Fax: +1 814 865 7846; E-mail: costas@psu.edu

    Search for more papers by this author

Abstract

Synthetic lethals are to pairs of non-essential genes whose simultaneous deletion prohibits growth. One can extend the concept of synthetic lethality by considering gene groups of increasing size where only the simultaneous elimination of all genes is lethal, whereas individual gene deletions are not. We developed optimization-based procedures for the exhaustive and targeted enumeration of multi-gene (and by extension multi-reaction) lethals for genome-scale metabolic models. Specifically, these approaches are applied to iAF1260, the latest model of Escherichia coli, leading to the complete identification of all double and triple gene and reaction synthetic lethals as well as the targeted identification of quadruples and some higher-order ones. Graph representations of these synthetic lethals reveal a variety of motifs ranging from hub-like to highly connected subgraphs providing a birds-eye view of the avenues available for redirecting metabolism and uncovering complex patterns of gene utilization and interdependence. The procedure also enables the use of falsely predicted synthetic lethals for metabolic model curation. By analyzing the functional classifications of the genes involved in synthetic lethals, we reveal surprising connections within and across clusters of orthologous group functional classifications.

Synopsis

Synthetic lethals (SL) refer to pairs of non-essential genes whose simultaneous deletion is lethal (Novick et al, 1989; Guarente, 1993). The study of synthetic lethality plays a pivotal role in elucidating functional associations between genes and gene function predictions (Ooi et al, 2006). One can extend the concept of synthetic lethality by considering gene groups of increasing size where only the simultaneous elimination of all genes is lethal whereas individual gene deletions are not. The availability of genome-scale metabolic models of organisms has provided the foundation for the development of computational frameworks to rapidly predict the effect of multiple genetic manipulations on the strain growth phenotype under different media. The majority of in vivo and in silico studies have concentrated on perturbing/deleting a single gene or a gene pair at a time. Thus, these analyses might fail to assess the full range of robustness and functional organization of the metabolic networks afforded by higher-order interactions and redundancies. Extending the concept of lethality for not just gene pairs but triples, quadruples, etc. can capture multi-gene/reaction interdependencies. The challenge in exhaustively identifying higher-order SLs lies in the combinatorial complexity of the underlying mathematical problem. This computationally intensive goal was made possible by developing an efficient procedure relying on bilevel optimization.

This framework is applied to the iAF1260 model of E. coli K12 (Feist et al, 2007) for aerobic growth on minimal glucose medium. We contrast the predicted SLs against experimental data and provide a number of model refinement possibilities. We elucidate all SL gene and reaction triples. We also introduce the concept of degree of essentiality to unravel the contribution of each reaction in “buffering” cellular functionalities. This study provides a complete analysis of gene and reaction essentiality and lethality for the latest E. coli model iAF1260 and ushers the computational means for performing similar analyses for other genome-scale models. Furthermore, by exhaustively elucidating all model growth predictions in response to multiple gene knockouts it provides a many-fold increase in the number of genetic perturbations that can be used to assess the performance of in silico metabolic models. We identified 83 genes and 4 non-gene associated reactions involved in 86 SL pairs (∼0.01% of total possible pairs) as shown in Figure 1. All these SL pairs were analyzed in detail in terms of their phenotypic, topological and functional impact. Of the 86 predicted SL pairs, 53 (∼62%) of them were found to yield auxotroph strains in silico that can be restored through supplementation. By representing all genes forming SL pairs as nodes connected by an edge, a variety of different topological motifs emerge (see Figure 1). These include disjoint pairs, stars and highly connected subgraphs.

We investigated the membership of SL gene pairs to clusters of orthologous groups (COGs) ontology (Tatusov et al, 2003) as illustrated using different colors in Figure 1. It has been previously noted that two functionally distant genes can cause synthetic lethality because a gene deletion not only causes the loss of function of the primary function but also creates a cascade of compensatory cellular responses possibly affecting many pathways (Schoner et al, 2008). These inter-category connections are thus indicative of the need to bring to bear different parts of metabolism to enable the production of all biomass precursors. We searched for experimental evidence to examine the validity of the in silico predicted SL pairs. Explicit experimental evidence was found in the literature for eleven such SLs. All of these SLs could be rescued by nutrient supplementation: five with amino acids alone, five with other metabolites, and one with a combination of amino acids and other nutrients.

Comparisons of in silico predictions and in vivo observations for single gene essentiality data (Kumar and Maranas, 2009) were used before to drive the process of metabolic model refinement (Becker and Palsson, 2008). Extending this workflow to include SL pairs, triplets, etc. provides additional layers of model validation and opportunities for correction. We identified 27 in silico SLs that are inconsistent with in vivo SL data in two different ways. The first one includes predicted SLs that contain one or more essential genes whereas the latter contains predicted SL that are in agreement with in vivo SL data but imply incorrect supplementation rescue (i.e. auxotrophy) scenarios. Using these results, we suggested 18 iAF1260 model modifications. The concept of synthetic (pair) lethality can be extended to SL triples where the simultaneous deletion of three genes is lethal. When searching for SL triples, all essential genes and SL pairs are excluded from consideration to eliminate trivial results. We identified 193 SL gene triples involving 114 genes and 15 non-gene associated reactions. Analyzing reactions, we found 96 SL reaction pairs and a total of 243 SL triples involving 163 reactions. A wide amount of participation for different reactions in SL is observed. Notably, TPI (triose-phosphate isomerase) is the most highly triple-participating reaction, with membership in 35 different SL triples.

To quantify the degree of dispensability of a gene or reaction in a metabolic network with respect to biomass formation we introduce the concept of degree of essentiality (DOE). This metric is defined as the size of the smallest SL that the gene or reaction is a member of. Therefore, essential genes or reactions have a DOE of one while genes or reactions that participate in SL pairs (and perhaps in higher-order SLs) have a DOE of two. We determined the DOE of up to three for all genes and reactions and the DOE of up to four for all reactions of central metabolism active under aerobic glucose conditions (see Figure 8). We can see that the majority of reactions in central metabolism have a DOE of greater than one. This occurrence is most likely due to the presence of multiple diverging and converging branches in pathways of central metabolism. It is important to note that reactions operating in opposite directions can have different DOEs. Such examples include reaction pairs FBP and PFK as well as PPC and PPCK.

Ancillary