SEARCH

SEARCH BY CITATION

Keywords:

  • synthetic biology;
  • metabolic engineering;
  • proteomics;
  • protein–protein interactions;
  • interactomics;
  • mass spectrometry

Abstract

  1. Top of page
  2. Abstract
  3. Introduction
  4. Experimental Techniques for PPIs
  5. Bioinformatic Analysis of PPIS
  6. Publication and Trends in PPI Research
  7. Engineering on the Horizon of PPI Research
  8. Conclusions
  9. Acknowledgements
  10. References

As we move further into the postgenomics age where the mountain of systems biology-generated data keeps growing, as does the number of genomes that have been sequenced, we have the exciting opportunity to understand more deeply the biology of important systems, those that are amenable to genetic manipulation and metabolic engineering. This is, of course, if we can make ‘head or tail’ of what we have measured and use this for robust predictions. The use of modern mass spectrometry tools has greatly facilitated our understanding of which proteins are present in a particular phenotype, their relative and absolute abundances and their state of modifications. Coupled with modern bioinformatics and systems biology modelling tools, this has the opportunity of not just providing information and understanding but also to provide targets for engineering and suggest new genetic/metabolic designs. Cellular engineering, whether it be via metabolic engineering, synthetic biology or a combination of both approaches, offers exciting potential for biotechnological exploitation in fields as diverse as medicine and energy as well as fine and bulk chemicals production. At the heart of such effective designs, proteins' interactions with other proteins or with DNA will become increasingly important. In this work, we examine the work done until now in protein–protein interactions and how this network knowledge can be used to inform ambitious cellular engineering strategies. Some examples demonstrating small molecules/biofuels and biopharmaceuticals applications are presented. © 2012 IUBMB Life, 65(1):17–27, 2013


Introduction

  1. Top of page
  2. Abstract
  3. Introduction
  4. Experimental Techniques for PPIs
  5. Bioinformatic Analysis of PPIS
  6. Publication and Trends in PPI Research
  7. Engineering on the Horizon of PPI Research
  8. Conclusions
  9. Acknowledgements
  10. References

As the omics world continues to move forwards, there is obviously an ever-increasing opportunity to leverage this technology to improve our understanding of how cells (and systems of cells) operate. Once obtained, this information should allow us to rationally design and control performance to suit our own purposes. This kind of information is consistent with the philosophy of the emerging area of synthetic biology (1, 2) and has already been successfully applied (3). This area seeks to apply rational engineering design principles, prototyping and modelling to design systems for practical use (4). Until now, the study of protein–protein interactions (PPI) has been dominated by basic biology or bioinformatic studies seeking to understand how cellular networks assemble and operate. This falls within the remit or philosophy of systems biology. Systems biology seeks (5) to understand a cell and predict performance; however, it is not aimed at tailoring or improving a cellular factory per se (but it can—and should—be done in that context (6)).

In this critical review, we analyse the bibliometric data we have obtained from the PubMed database (http://www.ncbi.nlm.nih.gov/pubmed). We mined this data in an attempt to understand how the field of PPIs has developed over the intervening years. We also carry out similar bibliometric mining on RNA, metabolic engineering, synthetic biology and systems biology to place this PPI work in context. As it is clear that for future prosperity, biotechnology is required to provide solutions to pressing problems, we have attempted to place this PPI data in the context of biotechnology. In particular, we assess the use of PPI data and models and their potential to aid in the engineering of cells to make useful products. These products may include such diverse items as (bio)pharmaceuticals (7, 8), biofuels (9), fine chemicals (10) and systems that might help remediate the environment (11, 12).

A key tool, of course, in PPIs is captured under the broad field of ‘proteomics’. With the emergence of soft ionisation methods in mass spectrometry (MS) (see Wright et al. (13) for a deeper review on proteomics and MS) this has allowed proteomics to really explode into the extremely large and widespread field it is today. A classic weakness in the proteomics arena has been the inability of proteomics practitioners, or the technology itself, to obtain comprehensive coverage of any system of interest, whether a whole proteome or protein complexes, interacting partners from pulldowns or states of post-translational modification. This is in significant contrast to what one observes in transcriptional microarrays where near 100% coverage is possible. This observation is even starker when one starts to examine the rocketing area of deep sequencing of messenger RNA (as we highlight later in this review). However, a number of technologies are starting to become more important in the proteomics area that will dramatically improve the information content obtained from such studies. These include the growth in multiple reaction monitoring/selected reaction monitorings (see Picotti and Aebersold (14) for methodological discussions), the development of absolute quantitation or even protein turnover at the scale of the network (15), complex (16), pathway (17) or proteome (18), studies on single-cell proteomics (19) and the growth in quantitative proteomics observed more generally (20, 21). There are a number of reviews that have covered the proteomics field (e.g. (13, 22)) and we will not seek to replicate those. In this article, we review the efforts made in the PPI field, from an experimental, a theoretical and an espitemological angle and see how the field can benefit from the momentum of synthetic biology (and metabolic engineering) (see elsewhere for extra background: e.g. (23, 24)).

Experimental Techniques for PPIs

  1. Top of page
  2. Abstract
  3. Introduction
  4. Experimental Techniques for PPIs
  5. Bioinformatic Analysis of PPIS
  6. Publication and Trends in PPI Research
  7. Engineering on the Horizon of PPI Research
  8. Conclusions
  9. Acknowledgements
  10. References

This section broadly considers and discusses the major approaches used to determine PPIs: (i) yeast two hybrid (Y2H), (ii) tandem affinity purification (TAP-tag), (iii) proximity ligation assay (PLA) and (iv) Chemical Crosslinking. See 1 for an overview.

thumbnail image

Figure 1. Schematic description of the major PPI experimental techniques: (A) TAP (modified and based on ref. 25), (B) Chemical crosslinking (modified and based on ref. 26), (C) The yeast two-hybrid system (modified and based on ref. 27) and (D) proximity ligation (modified and based on ref. 28).

Download figure to PowerPoint

Y2H Screening

The first publications for the use of a Y2H system provided information about the interaction of a single protein with its specific partners (29) simultaneously yielding the cloned interacting partner (30). This technique has seen noteworthy success over the past decade, allowing for some of the most complex and complete interactomics studies to be carried out (31, 32).

The technique makes use of two subunits of a protein (GAL4) which when combined provide a transcriptional activator for the expression of beta-galactosidase (29). The first step involves cloning the protein of interest into a plasmid that fuses the DNA binding domain of GAL4. A second plasmid codes the activating domain of GAL4 which is fused to either preselected proteins of interest or protein sequences encoded by a library of genomic DNA fragments (30). Only when two interacting proteins come into contact, following their heterologous expression, does the complete GAL4 protein occur, subsequently leading to expression of beta-galactosidase. This therefore permits detection of interacting proteins to occur through the use of galactose selection such as X-gal. In order for the system to function, the assay has to be carried out in a system where the screenable gene has been inactivated or does not natively exist. This is to ensure that a positive result is only noted when the two subunits of the fusion protein come into contact.

Although initially designed for function with yeast, the system has been modified for use within Escherichia coli (33, 34) permitting a greater ease of use without need for yeast growth facilities. A more complete list of the Y2H variations can be found in ref.27. Two hybrid studies have been shown to be the method of choice for many of the ambitious, high profile interaction studies involving Saccharomyces cerevisiae (35, 36), Drosophila melanogaster map (37) and even the human protein interactome (38). This is mainly due to the simplicity of the technique, the ability to scale up and the vast amount of information that is retrieved. For all the benefits of the technique, two hybrid screening does however suffer from a variety of flaws:

  • The system was designed to study binary interactions, whereby only a single protein is ever seen interacting with another at any point in time.

  • A large number of false positives, with reports of up to 70% noted previously (39).

  • If the system is plasmid based then the proteins may be overexpressed which can lead to inaccurate interaction (40).

  • The fusion portion of the protein has the potential to interfere with the function of the protein and could theoretically prevent proteins from interacting.

  • The protein may not function in the correct manner if expressed in a recombinant host.

Tandem Affinity Purification-Tag

TAP-tag has seen significant use as it was first published over a decade ago (25, 41). The method makes use of two simultaneous affinity tags expressed at either the C- or N-terminus of the desired protein. This technique initially relied on dimeric protein A followed by a TEV protease cleavage site and then calmodulin binding peptide. The technique is very open to modification and many different combinations of tandem tags have been developed since (42) allowing it to be tailored specifically to the protein interactions wanting to be studied. The method is similar to that seen in protein coimmunoprecipitation although this method requires the use of a greater number of antibodies to target each specific protein.

The DNA sequence for the protein tags is ideally inserted upstream or downstream of the DNA sequence for the protein of interest to allow for native expression of the tagged target. This will allow for the most representative interaction with surrounding proteins, thereby limiting false positives that may otherwise occur due to overexpression. Following cellular cracking, the protein sample is loaded into a column containing either magnetic or agarose beads coated with antiprotein A antibodies. The column is washed to remove contaminating proteins whilst still retaining interacting ones and the target protein. The protein A tag is cleaved from the antibody through the use of TEV protease and the protein of interest is eluted onto a calmodulin column where the secondary tag binds. The sample is washed again to remove any residual contaminants and then the proteins are eluted using ethylene glycol tetracetic acid. Samples are run through 1D SDS-PAGE gels against a protein sample from the wild-type organism to allow for visualisation of proteins of interest. Bands present only in the positive sample are removed from the gel, enzymatically digested, typically with trypsin, before being loaded onto a mass spectrometer for identification of the proteins.

The technique is relatively simple, requiring the chromosomal tagging of proteins of interest but afterwards making use of the same antibody and column components. Only one antibody is required unless a Western blot of the target protein is desired following tandem purification. A large number of proteins can be successfully tagged and detected in the mass spec with reports of 86% successful tagging—confirmed by western blot, and 65% successful identification through MS (43). A single tagged protein is able to pull down multiple interacting protein (41). The method has been used in large-scale interactome studies in yeast (44) but has been shown to be susceptible to a high-level of false positives with one group reporting using the same method on the same samples but carried out by different researchers yielding less than 30% of the same interactions (45). It is highly recommended that any protein interactions that are detected should be confirmed through retagging the detected interaction protein to verify the detection of the initial target.

Proximity Ligation Assay

PLA, commercially known as Duolink, is a novel technology for determining protein interactions in their native state (28). This allows for direct visualisation of the proteins within whole fixed cells or tissue sections, thereby preventing any erroneous data from cell lysis, cross-contamination or imperfect subcellular fractionation. The system does not suffer from bias due to overexpression or functional modification of the proteins from the addition of an affinity tag.

The technique makes use of two primary antibodies, ideally from different organisms, are used against the target proteins. These are subsequently targeted by species-specific secondary antibodies containing DNA linkers. When closer than 30 nm (depending on oligonucleotide length) (28) to an opposing oligonucleotide sequence on the opposite secondary antibody the DNA is linked and ligated using a connector oligo and ligase. This creates a circular DNA sequence between the two adjacent antibodies. Using rolling circle amplification, the circular DNA sequence is amplified up to 1,000 times (28, 46). Addition of fluorescently labelled detector oligos permits the binding of multiple reporter molecules for a single event, thereby allowing low level interactions to be identified. Results are analysed and quantified using microscopy and Duolink software.

PLA is an incredibly powerful technique that allows for studies to be done on weak and transient interactions. Amplification of the signal is done by RCA so low levels of protein can be detected. The major flaw in this method is that it can only be done when there is knowledge of the proteins to be targeted. It can provide useful information on interactions but no previously unknown targets will be detected. It is therefore incapable of doing large-scale interaction studies or complete interactomics. The technique is highly suited to being used as a confirmatory measure of the interaction between proteins seen in any of the previously mentioned methods where false positives are much higher.

Chemical Crosslinking

The use of chemical crosslinking allows for covalent clustering of nearby proteins through the use of synthetic linkers with the initial use of this technology being nearly 40 years ago (47). There are a wide variety of crosslinkers, allowing the technique to be used in a number of ways. A standard method is the introduction of the crosslinker molecule following the growth of the cell sample but before lysis. The crosslinker permeates the cell to link proteins and capture them in the configuration in which they act natively. The crosslinker will, in this case, be designed to have ends that are reactive to specific functional groups such as primary amines. Following lysis, the crosslinked proteins are purified and either detected using western blots or enzymatically digested and loaded onto a mass spectrometer.

The development of photo-crosslinking has provided a method for rapid connection of cellular proteins that ensures linkers are distributed throughout the cells. The method makes use of inert photoactivatable amino acids, such as photo-methionine and photo-leucine (48), in the growth media that are then incorporated into mature proteins without affecting function. Application of ultraviolet light activates the molecules resulting in covalent crosslinking and allows for subsequent analysis as mentioned above.

Recent developments have seen a rise in the use of crosslinking coupled with MS (49, 50) permitting the identification of protein complexes. The linker can be chosen to be of a specific length to provide information about distances between complexes, state the location of the crosslink and aid in structural understanding of multiprotein complexes (49).

The technique has many different available options that permit a wide range in the speed of processing and the quality of data that can be obtained. The linking of proteins in vivo increases the chances of detecting weakly interacting and transient partners (26).

Metalloprotein Interactions

A large challenge in modern interactomics studies is the identification of interactions between metalloproteins. These proteins perform vital roles within cellular survival, with involvement in energy metabolism (51) and signal transduction (52). Approximately 30% of proteins within sequenced genomes are predicted to contain a metal ion (53, 54). Determining the interacting partners of a particular metalloprotein, as with other proteins, can depend on the physiological function of the target. Interactions can range from stable complexes to incredibly transient interactions in the microsecond range (55) often seen with redox proteins (56). The use of coupled techniques such as in vivo crosslinking and subsequent TAP (see above) helps to identify specific partners whilst also managing to remove decoys due to the dual purification (53). For a more detailed insight into interactions of metalloproteins, the reader is advised to view the following papers (53, 55, 56).

Going Forward with Experimental Techniques

A key component for the development of interactomics data is the use of mass spectrometers (57–59). The development in this field has made a dramatic impact on all aspects of the life sciences with interactomics being no exception. Mass spectrometers allow for a far greater wealth of information to be obtained compared to western blots. De novo sequencing (60, 61) structural analysis (62) and identification of interacting residues (49) are all possible.

There are many variations of methods that can be applied to attempt to achieve the best results from an interactomics study. If the identification of a large number of interactions is required, then use of Y2H or TAP-tagging is highly suited. However, these experiments are susceptible to high degrees of false positives that must be validated. Smaller scale studies involving the interaction between weakly interacting proteins and low abundance proteins are better suited to methods such as PLA or chemical crosslinking. Techniques are liable to bias and certain methods may determine the type of interaction that is found (45). The methods detailed above are not mutually exclusive, with examples of two being used in combination to provide greater detection methods (63). The high level of complexity observed in these studies has meant that bioinformatics must play a role in determining the reliability of the PPI identifications.

Bioinformatic Analysis of PPIS

  1. Top of page
  2. Abstract
  3. Introduction
  4. Experimental Techniques for PPIs
  5. Bioinformatic Analysis of PPIS
  6. Publication and Trends in PPI Research
  7. Engineering on the Horizon of PPI Research
  8. Conclusions
  9. Acknowledgements
  10. References

One aim of PPI analysis is to evaluate the confidence of the experimentally observed interactions. This can be achieved by assigning scores to the observed interactions to evaluate their reliability. Various score calculation methods have been proposed. According to the techniques they use, the methods can be categorised into statistical (64–69) and nonstatistical methods (39, 70, 71). Based on the data types they use, the methods can be categorised into interaction data based (68, 69) and noninteraction data based methods (39, 72, 73). The interaction data-based methods can be further categorised into single dataset-based (68, 69) and meta-dataset-based methods (70, 74–76).

A lot of statistics based PPI analysis methods assess the PPI confidence in terms of a certain probability value. In the work by Collins et al. (65), a purification enrichment score (PE) of an interaction, which is inherited from Gavin et al. (31), is calculated. The confidence of each interaction is then calculated as the ratio of the likelihood of the PE given the true PPI distribution model and the likelihood given the false PPI distribution model. In Hart et al. (66), the probability of an interaction being generated at random (i.e., p-value) is used as a score, which is calculated by assigning the observed number of interactions between a pair of proteins to a hypergeometric distribution model. In Breitkreutz et al. (64), the significance analysis of interactome (SAINT) score is proposed. SAINT assigns the number of peptides identified for each interaction to a premodeled probability distribution and calculates the likelihood as a score. The posterior probability of a bait protein given the prey protein is used as a score in Sardiu et al. (67), which is calculated by the Bayesian rule. The methods mentioned above use sophisticated probability models that usually result in highly reliable results at the cost of an increase in the complexity of the method. Furthermore, the choice of candidate models and the setting up of model parameters still present difficult challenges.

Some other methods use the Z-score (number of observed interactions normalised by the distribution derived from random interaction assumption) as a simple PPI confidence assessment, e.g. (68, 69). Furthermore, a Z-score like interaction detection is proposed in ref.77. However, it has been pointed out by Sowa et al. (78) that the Z-score normalises each unique prey protein equally regardless the abundance of the prey proteins. To tackle this issue, a D-score was proposed (78). The Z-score-based and the related methods avoid using a predefined probability density function to model the distribution of PPI occurrence, and they usually result in relatively simple solutions. However, the limitation of the available data may cause some bias in the identifications.

Besides the methods mentioned above that are based on single datasets, meta-dataset-based methods are also proposed, e.g. (74–76). In Bader et al. (74), the analysis is based on two datasets obtained by using two different techniques. A positive training set includes the interactions determined by both of the techniques; a negative training set includes the interactions determined by one technique but not by the other. The two training sets are used to construct a logistic regression model that generates a probability score for each individual interaction. The logistic regression model is also applied to investigate the protein interaction similarity between various experimental techniques (75). The reproducibility and quality assessment of the available datasets have been proposed to be incorporated in the PPI score calculation to reduce the effect of noise (76). Such types of methods are expected to be able to remove the technology-specific artefacts and reduce the false positives caused by random noise.

PPI confidence scoring can also be performed based on noninteraction data (39, 72, 73, 79). In Li et al. (73), a novel PRINCESS (protein interaction confidence evaluation system with multiple data sources) method is proposed. The method integrates various pieces of biological evidence (e.g. interaction domain, functional annotation, network topological structure and gene expression) by using a Bayesian rule to calculate a probability-based score. A similar Bayesian rule-based method was proposed by Patil and Nakamura (79), which incorporates both PPI databases (e.g. database of interacting proteins and IntAct) and non-PPI data (e.g. genomics features). Based on the work presented in (79), the same group developed a database of high-confidence PPIs, that is, HitPredict (80). The reliability scores of the interactions contained in the database are calculated from a wider range of available data, for example, IntAct, BIOGRID, HPRD databases and sequence, structure, functional annotations of the interacting proteins. Deane et al. (39) proposed a method to evaluate the general reliability of a whole PPI dataset. The method is based on the hypothesis that the gene expression profiles of the interacted proteins are different from that of the noninteraction proteins; therefore, the interacted proteins can be discriminated from those noninteracted proteins in terms of their gene expression. This method is further improved in (72) by applying a maximum likelihood method. STRING (81) calculates the confidence score by benchmarking the functional association of two putatively interacting proteins against the function association presented in the Kyoto encyclopaedia of genes and genomes (KEGG) database (82).

Besides the statistics-based methods, there are a few methods that are based on nonstatistical techniques (39, 70, 71). In Deane et al. (39), the individual interaction between protein pairs is validated according to the existence of paralogous interactions. Several groups (70, 71) use cross-validation between different datasets. In Bjorkholm and Sonnhammer (70), each PPI dataset is scored by evaluating the interaction overlap with a reference dataset. Then the reliability score of each interaction is calculated as the sum of the scores of the datasets containing the interaction divided by the sum of the scores for all datasets. In Schaefer et al. (71), the scores for human PPIs are calculated by incorporating most of the available PPI databases, that is, over 70,000 interactions. The score of an interaction is calculated from the normalised number of databases containing the interaction, the normalised number of ortholog interactions and the normalised amount of experimental evidence.

Publication and Trends in PPI Research

  1. Top of page
  2. Abstract
  3. Introduction
  4. Experimental Techniques for PPIs
  5. Bioinformatic Analysis of PPIS
  6. Publication and Trends in PPI Research
  7. Engineering on the Horizon of PPI Research
  8. Conclusions
  9. Acknowledgements
  10. References

PPIs started to attract attention in 1997 and the field grew quickly until 2005, reaching 150–250 publications per annum over the years 2005–2010 (see Fig. 2). Even though it has undergone mutations—perhaps, most notably the shift from microbial systems to mammalian and the greater emphasis on medical applications (more on this below)—and even though its literature still steadily expands, the PPI field has now somewhat slowed down in relative terms. Numerically, this can be characterised by the rate of growth in the publication volume, which stands at 5% (calculated as the slope of the curve in log-scale averaged over the 2008–2011 period).

One can compare the publication dynamics of PPI research to that of more recent fields. For instance, the much younger field of deep sequencing-based research (RNA sequencing) is producing literature at a quick pace: even though publications about deep sequencing only started to take off in 2006, the output is now approaching more than a 1,000 papers a year (with a staggering growth rate of 84%). Deep sequencing (83, 84) possesses all the attributes of a technical breakthrough that is reshaping how the scientists can approach the world, akin in many respects to the improvement of microscopes by van Leeuwenhoek.

thumbnail image

Figure 2. Number of publications per annum for the following research area: PPI (‘metabolic engineering interaction network’, ‘protein-protein interaction network metabolic engineering’, ‘metabolic flux analysis interaction network’, ‘protein-protein interaction network metabolic flux analysis’, ‘synthetic biology interaction network’, ‘protein-protein interaction network synthetic biology’, ‘proteomic interaction network’, ‘protein-protein interaction network proteomic’, ‘(PPI OR “protein-protein interactions”) AND proteom*’, ‘(PPI OR “protein-protein interaction”) AND proteom*’), metabolic engineering (‘metabolic engineering’), synthetic biology (‘synthetic biology’), deep sequencing (‘deep sequencing’, ‘next generation sequencing’ and ‘RNAseq’) and systems biology (‘systems biology’). Data obtained from PubMed on 23 August 2012.

Download figure to PowerPoint

Or, one can compare the publication dynamics of PPI research to that of an older field. The systems biology field, fittingly, roughly started at the same time as the PPI field and, for about 5 years, both had similar growth patterns in publication volume. But systems biology has grown far beyond PPI research to reach 2,500 publications per annum and, more importantly, remains more active with a growth rate of 16%. The comparatively older field of metabolic engineering (which appeared in the mid-90s: see (85)), buoyed by the effervescence triggered by synthetic biology (healthy growth rate of 42%), enjoys renewed dynamism with a growth rate of 19%.

Of course, PPI research is different from other fields: deep sequencing research is powered by a technical breakthrough allowing for the exploration of a brand new world from a new angle; systems biology is itself an amorphous, all-encompassing field. These comparisons can however help us delineate what are the idiosyncrasies of PPI research and what new avenues or research it has to offer.

PPIs holds much promise but is failing to garner as much momentum as other fields, like synthetic biology or deep sequencing. This is in spite of technological developments that have seen MS play a more prominent role in laboratories, simplifying the acquisition of new data and rationalising the identification of new PPIs.

The data itself has been the focus of much work by bioinformaticians, who responded quickly to the emergence of a new sort of data. An illustration of the diligence with which methods were developed in response to rapidly growing PPI data acquisition is the observation that popular outlets for PPI research have been BMC Bioinformatics, Bioinformatics, PLoS Comp Biol or Comp Funct Genomics to name but a few. Our PubMed search (see Fig. 2) returned the following top journals: Proteomics (169 papers), J Proteome Res (109 papers), BMC Bioinformatics (99 papers), Mol Cell Proteomics (78 papers) and Bioinformatics (72 papers). Computational papers in PPI research have consistently represented about a fifth of the publication output.

So what exactly is left in PPI research if one pushes aside computational papers? Looking at the titles in 2012, the most common active verbs are ‘analyse’ and ‘reveal’, the most common nouns are ‘proteome/proteomics’, ‘cell’ and the most common adjectives are ‘human’ and ‘quantitative’. But the field has changed very much. Indeed, contrasting these words from those in titles of articles published 5 years ago, one can start to appreciate the mutation that the field has undergone (see Table 1). Five years ago, researchers were working on yeast, focusing on the molecular and function aspects of the detection of protein domains through which binding occurs via docking and on developing/improving the methods. Nowadays, researchers work on mammalian cells, revealing signalling pathways, mapping data in a quantitative fashion to get insight into potential targets for applications in human pathology, more particularly cancer.

Table 1. Comparison of common words in PPI paper abstracts
inline image

So, even though PPI research has become a much more applied field, its main area of application remains that of medical sciences. Engineering does not as yet constitute a territory for applied PPI science. Computational papers will occasionally refer to a ‘retro-engineering’ approach to the reconstruction of PPI networks, but only a limited number of PPI publications mention the word ‘engineering’ in its traditional sense, mostly in the context of developing in vitro display technologies and, again, with a view on revealing new interactions and identifying pharmaceutical targets (86–90).

Because of its focus on applications to human pathology, one can understand that PPI research invests much effort in pushing the limit of detection, to discover unheard-of interactions, be they associated with pathological or healthy conditions. The flip side, as we have just illustrated with the occurrences of ‘engineering’ in the PPI literature, is that little is made out of the known interactions. There have been cases demonstrated the potential for such a paradigm—these include the work from the Silver lab on circuit insulation in synthetic biology (91) and the DeLisa lab on identifying and optimising the PPIs (92).

The mainstream of PPI analysis is applied statistics based methods, as they usually provide more reliable results than the nonstatistical methods. The Z-score based and the similar methods are widely used currently due to their simplicity in calculation, but the Z-scores are easily affected by the bias contained in the datasets they are calculated from. In the future, it is expected that the probability-based methods will become overwhelmingly the method of choice, although they are currently still suffering from several difficulties, such as the availability of data, the choice of probability distribution models, model parameters setting and the complexity of their implementation. As more experimental data becomes available and dedicated toolkits are developed, those obstacles will be eventually overcome in the future. Furthermore, the datasets collected using different techniques will be more frequently considered simultaneously to improve the reliability of PPI analysis.

Engineering on the Horizon of PPI Research

  1. Top of page
  2. Abstract
  3. Introduction
  4. Experimental Techniques for PPIs
  5. Bioinformatic Analysis of PPIS
  6. Publication and Trends in PPI Research
  7. Engineering on the Horizon of PPI Research
  8. Conclusions
  9. Acknowledgements
  10. References

Bioinformatic methods have been developed to analyse, interpret and decipher the messages contained in noisy datasets. The validation of PPIs has also been a major driver in literature-mining approaches (93).

We have shown here that PPI does not seem to have had the volume or impact perhaps yet of some other major ‘big biology’ fields. As an example of a competitor, deep sequencing has allowed the discovery of microbes and microbial interactions that were inexistent or remained elusive in the laboratory. One difference may lie in the cleanness of the data generated by next-generation sequencing, whereas PPI advances are fettered by the complex and noisy nature of the datasets.

As a consequence, while the further discovery of subtler interactions may be markedly hindered by the increasing difficulty to detect them, a broader perspective may emerge where proteins are engineered and interactions rewired for applications outside the realm of the biomedical sciences. After all, the yeast two-hybrid system is itself the very product of such engineering approaches and has initiated research into high-throughput investigation of interaction networks (94). Synthetic biology and protein engineering are on the horizon of PPI research precisely because synthetic biology needs the PPI knowledge for its own end.

There are opposing views on what the exact definition of synthetic biology is and how this might impact on metabolic engineering. Recently, several papers (1, 2, 24, 95, 96) have appeared that explore the boundaries between synthetic biology and metabolic engineering and what the various philosophical differences and contributions might be. Readers are directed there to find more information.

For example, initial synthetic biology studies have often been focused on genetic control or small molecules production such as the antimalarial drug artemisinin (3), biofuels (alcohols and compounds allied to ‘biodiesel’), biosensors (97) and some emerging work on macromolecules and tissues—mammalian systems (98). In some of these areas, the use of proteomics approaches (shotgun or targeted) combined to pathway assembly and network approach has shown promising avenues to improve the performance of synthetic biology/metabolic engineering platforms (7, 99, 100).

In terms of cellular engineering, a number of groups are taking this forward. Different types of products can be made with different recombinant cellular systems. In terms of microorganisms, both macromolecules (proteins and polypeptide chains) and small molecules attract most of the attention, owing to the feasibility and the commercial potential of such undertakings.

As it becomes more quantitative, MS lends itself to modelling and engineering design as opposed to a mere characterisation method. For example, in synthetic biology/metabolic engineering programmes allied to terpene pathways for biofuels and medicines, a pathway was quantified in terms of the enzymes' abundances to locate the bottlenecks (99, 100). In another example, this time on protein production, we showed that quantitative proteomics combined with a priori metabolic network knowledge can lead to targets for forward engineering (7). Similar endeavours have yet to be attempted with PPI knowledge. We suspect that given the utility of this approach that this area will grow and not only use the KEGG-type metabolic assembly (101) but also the known interaction networks as well.

Conclusions

  1. Top of page
  2. Abstract
  3. Introduction
  4. Experimental Techniques for PPIs
  5. Bioinformatic Analysis of PPIS
  6. Publication and Trends in PPI Research
  7. Engineering on the Horizon of PPI Research
  8. Conclusions
  9. Acknowledgements
  10. References

The existence of large-scale techniques has given PPI research a rich source of information; however, interpreting this information is still a thorny problem. PPI research is therefore still maturing.

Widening it out, interactomics has been so very focused on the protein side of the equation without also thoroughly taking into consideration the interactions between DNA, RNA and metabolites within the cell (cf. systems biology). Now that whole-cell protein interactions have begun to be deciphered, the multiome interactions need to be tackled to gain a more complete systems understanding. Omics approaches can also provide useful information, which can be integrated in nonstatistical analyses (e.g. correlation in expression patterns). Beyond the goal of determining the exact PPI of a cell—a perhaps ill-defined problem itself, as interactions are inherently dynamic and partly nonspecific—there is the rightful question of whether such knowledge, however incomplete, could be used for engineering purposes. Previous examples (e.g. ref.91) show the potential for such applications. The synthetic biology and PPI community only have to wake up to that fact and work together.

Acknowledgements

  1. Top of page
  2. Abstract
  3. Introduction
  4. Experimental Techniques for PPIs
  5. Bioinformatic Analysis of PPIS
  6. Publication and Trends in PPI Research
  7. Engineering on the Horizon of PPI Research
  8. Conclusions
  9. Acknowledgements
  10. References

The authors would like to thank the EPSRC for funding (EP/E036252/1). Many thanks are given to Matej Martinkovic for the schematic diagrams provided representing the workflow of each of the respective interactomics methods.

References

  1. Top of page
  2. Abstract
  3. Introduction
  4. Experimental Techniques for PPIs
  5. Bioinformatic Analysis of PPIS
  6. Publication and Trends in PPI Research
  7. Engineering on the Horizon of PPI Research
  8. Conclusions
  9. Acknowledgements
  10. References