- Top of page
- Supporting Information
High-throughput microarray technologies are able to measure mRNA levels at a genomic scale. These measurements may provide valuable insights into the regulation of specific cell behaviour. Generally, computational approaches for the interpretation of mRNA data can be divided into: (a) approaches using clustering and (b) classical reverse engineering methods for network structure inference. Various approaches combine these strategies.
Clustering methods link genes that are close to each other based on their expression profiles. Examples of measures used are Euclidean distance, Pearson correlation and mutual information . Often, once the gene associations are established, additional knowledge of the studied organism (i.e. sequence or transcription factor list) is required to transfer the clusters into a causal transcriptional regulatory network. The combination of different clustering algorithms and distance measures created a wide range of methods in this category. The main concern thereby is the validation of the inferred large-scale network topologies . Faith et al.  provide a nice example of causal network inference using this approach, which they validated in the well studied Escherichia coli bacteria.
Classical reverse engineering methods infer the topological structure of a regulatory model from transcriptomic data. Depending on the model assumptions, gene regulatory networks can be generated using Bayesian networks , ordinary differential equations  or machine learning techniques . In reverse engineering, the dimension of search space increases with the number of model parameters. This implies that, usually, the inferred network will be of a small scale, frequently limited to a dozen molecules. The edges in these networks refer to (indirect) influences that are not physical interactions. In particular, the time-resolved reverse engineering of networks requires numerous data points and thus an enormous experimental effort.
The approaches mentioned so far have three principal fundamental problems: (a) the validity of the network structure inferred remains unclear until experimental proof; (b) the inferred gene regulatory networks only cover the transcriptional level, and do not reflect the impact of the gene expression changes on their own prior signalling; (c) pure mRNA network inference is not able to link transcriptional or signalling networks to functional cell state changes. The resulting networks are therefore of limited use for functional models of cell behaviour.
Another alternative for interpreting mRNA data is to use pathway mapping methods by querying databases containing regulatory information. The two most important bioinformatics tools in this category are Ingenuity® (Ingenuity Systems Inc, Redwood City, CA, USA) and ExplainTM, a product of BIOBASE GmbH (Wolfenbuettel, Germany). Besides not being freely available, they offer no or inconsistent molecular data export capabilities, which renders them unsuitable for network modelling and exchange. Also, conceptually, they comprise a rather huge database, with only distance-based algorithms available to cluster genes by calculating the shortest path or most visited nodes. Finally, they do not offer the ability to generate networks comprising multiple regulatory levels.
In the present study, we therefore propose a novel mixed strategy that overlays curated networks from the Pathway Interaction Database (PID) with case-specific microarray data. A multi-layer network, for a particular microarray experiment, is obtained from PID by applying the steps: (a) automatic integration of the comprehensive set of all known cellular networks from the PID into a computable object (i.e. master structure); (b) retrieval of an active-network from the master structure, where network edges that connect nodes with an absent mRNA level were excluded; and (c) reduction of the active-network complexity to a causal subnetwork from a set of seed nodes specific for the microarray experiment. The seed nodes comprised the receptors stimulated in the experiment, the consequently differentially expressed genes, and the expected functional cell states. The result obtained from these steps is a multi-layer receptor-signalling-transcription-cell state (RSTC) network linking the growth-factor receptor (R), signalling (S), transcription (T) and functional cell (C) state levels. Finally, we computed the consistency of the RSTC network with respect to the microarray measures, and determined the correctness of the obtained network. In this way, we leverage the power of microarrays but avoid solely relying on the data, whose validity and reproducibility has been criticized .
For reasoning over networks, we use discrete logical models that have been shown to provide interesting insights into the study of signalling pathways . Specifically, we deployed bioquali , which is a computational framework that implements a logical constraint-based model approach. This framework is able to explore the combinatorics of a large-scale system to determine whether the up- or down-regulation of specific network nodes can be explained by at least one possible signalling pathway.
To illustrate our approach, we studied the hepatocyte growth factor (HGF)-stimulated cell migration and proliferation in a keratinocyte-fibroblast co-culture. HGF exerts its function by binding to and activating MET [10–12], a receptor tyrosine kinase commonly mutated in metastasizing epithelial cells . This positive regulator of re-epithelialization is a ligand of receptor tyrosine kinases, the activation of which often stimulates keratinocytes to migrate, proliferate and survive. The mitogen-activated protein kinases are proteins that also play a crucial role in mediating cell proliferation and apoptosis , and they are involved in many growth factor-related pathways . Although the role of isolated proteins or protein families has been well documented, the mechanisms by which these proteins and their corresponding pathways interact with each other (affecting processes such as cell migration and proliferation) are not yet well understood.
- Top of page
- Supporting Information
Fig. S1. Network mapping RNA levels at all time points.
Fig. S2. Comparison of different methods that import PID.xml or BioPAX files into cytoscape.
Table S1. Networks generated with different subgraph algorithms.
Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.