SEARCH

SEARCH BY CITATION

Keywords:

  • Arabidopsis;
  • bioinformatics;
  • CORNET;
  • data integration;
  • maize;
  • networks

Summary

  1. Top of page
  2. Summary
  3. Introduction
  4. Materials and Methods
  5. Results and Discussion
  6. Acknowledgements
  7. References
  8. Supporting Information
  • To enable easy access and interpretation of heterogenous and scattered data, we have developed a user-friendly tool for data mining and integration in Arabidopsis, named CORNET. This tool allows the browsing of microarray data, the construction of coexpression and protein–protein interaction (PPI) networks and the exploration of diverse functional annotations. Here, we present the new functionalities of CORNET 2.0 for data integration in plants.
  • First of all, CORNET allows the integration of regulatory interaction datasets accessible through the new transcription factor (TF) tool that can be used in combination with the coexpression tool or the PPI tool. In addition, we have extended the PPI tool to enable the analysis of gene–gene associations from AraNet as well as newly identified PPIs. Different search options are implemented to enable the construction of networks centered around multiple input genes or proteins. New functional annotation resources are included to retrieve relevant literature, phenotypes, plant ontology and biological pathways.
  • We have also extended CORNET to attain the construction of coexpression and PPI networks in the crop species maize.
  • Networks and associated evidence of the majority of currently available data types are visualized in Cytoscape. CORNET is available at https://bioinformatics.psb.ugent.be/cornet.

Introduction

  1. Top of page
  2. Summary
  3. Introduction
  4. Materials and Methods
  5. Results and Discussion
  6. Acknowledgements
  7. References
  8. Supporting Information

In recent years, plant biology has witnessed a true data explosion. Not only have numerous small and moderate-scale experiments been performed to assay the functions of and relations between plant genes and proteins, but a number of large-scale resources have also been built. In particular, for the model plant Arabidopsis thaliana (Arabidopsis), but also for crops such as Zea mays (maize), data have been generated through transcriptome, protein–protein interactome mapping and regulatory interaction experiments, as well as computational approaches. These datasets can only be exploited to their full use through data integration, thereby leading to, for instance, the identification of the temporal and spatial activity of protein complexes and the prediction of putative functions for unknown genes (Brown et al., 2005; Gachon et al., 2005; Lisso et al., 2005; Rautengarten et al., 2005; Usadel et al., 2009).

To overcome problems in data formatting, data quality and data integration, webtools and databases have been developed (Brady & Provart, 2009; Usadel et al., 2009). Tools such as CORNET (De Bodt et al., 2010), ATTED-II (Obayashi et al., 2007, 2009), Bio-Array Resource (BAR) (Toufighi et al., 2005) and Virtual Plant (Katari et al., 2010) enable the mining and integration of different data types. Most other tools provide the possibility to either calculate coexpression networks (CoP (Ogata et al., 2010), CressExpress (Srinivasasainagendra et al., 2008), CSB.DB (Steinhauser et al., 2004), GeneCAT (Mutwil et al., 2008), Planet (Mutwil et al., 2011), PlantArrayNet (Lee et al., 2009), Plant Gene Expression Database (PED) (Horan et al., 2008), and PRIMe (Akiyama et al., 2008), SeedNet (Bassel et al., 2011)), or to search protein–protein interaction (PPI) data (Arabidopsis Interactome Mapping Database (Arabidopsis Interactome Mapping Consortium, 2011), AtPID (Cui et al., 2008), BIND (Bader et al., 2003), BioGRID (Stark et al., 2006), DIP (Salwinski et al., 2004), IntAct (Hermjakob et al., 2004), Membrane Interactome Database (MIND) (Lalonde et al., 2010), and MINT (Chatr-aryamontri et al., 2007)). The main resource for regulatory interactions in plants is the Arabidopsis Gene Regulatory Information Server (AGRIS) (Yilmaz et al., 2011). The AraNet probabilistic functional gene network of A. thaliana provides numerous gene–gene associations identified by using a large number of data types and applying thorough computational and experimental validation (Lee et al., 2010).

We have developed CORNET (acronym for CORrelation NETworks), a user-friendly tool for data mining and integration that is accessible through https://bioinformatics.psb.ugent.be/cornet. We collected microarray expression data, corresponding metadata describing sampled tissues, treatments and time points of sampling, PPI data, localization data, and functional information (Gene Ontology (GO), InterPro protein domains) in a central database. A user-friendly interface allows the database to be queried, enabling network construction through a multitude of search options addressing different biological questions. Coexpression networks could be calculated using user-defined and multiple predefined expression datasets. Similarly, webtools such as ATTED-II and BAR allow the use of different expression datasets for coexpression. However, the search options in CORNET are more extensive and flexible. PPI networks could be constructed with both experimentally identified and computationally predicted data. Moreover, CORNET allows the integration of coexpression and PPI networks, and a comprehensive visualization of the networks is generated in Cytoscape, providing a bird’s eye view of the results and the different degrees of reliability of the extracted information, an important feature that is not available with most other tools (Shannon et al., 2003).

In this manuscript, we describe the development of new functionalities and integration of new data types in CORNET 2.0 for Arabidopsis and the development of CORNET for maize. Existing PPI data are updated and new data from the Arabidopsis Interactome Mapping Consortium and from the MIND database are incorporated into the CORNET database (Lalonde et al., 2010; Arabidopsis Interactome Mapping Consortium, 2011). Available functional annotation data are updated and a number of new functional annotation resources such as Plant Ontology (PO), PubMed IDs, phenotypes and MapMan pathways and processes are included (Rhee et al., 2003; Usadel et al., 2005; Avraham et al., 2008). Furthermore, computationally inferred gene–gene associations (AraNet) have been added as a new data type in the PPI tool (Lee et al., 2010). A new, so-called transcription factor (TF) tool to extract regulatory interactions identified through experimental methods such as chromatin immunoprecipitation (ChIP)-chip and ChIP-seq as well as computationally predicted regulatory interactions based on microarray data has been developed (Yilmaz et al., 2011). Finally, CORNET for maize, allowing coexpression analysis using different expression compendia and platforms and the construction of PPI networks identified through orthology with interacting proteins from Arabidopsis (interologs) has been set up (Yu et al., 2004; Sekhon et al., 2011). In summary, this extended CORNET 2.0 allows the user to construct integrated networks compiling the majority of the currently available data types for plants.

Materials and Methods

  1. Top of page
  2. Summary
  3. Introduction
  4. Materials and Methods
  5. Results and Discussion
  6. Acknowledgements
  7. References
  8. Supporting Information

CORNET troubleshooting

CORNET 2.0 can be accessed through the following URL: https://bioinformatics.psb.ugent.be/cornet. The tool is fully functional in Firefox and Safari browsers. First users might need to accept a certificate before accessing the website. The site is ideally viewed at a 1280 × 1024 resolution. You need to allow pop-ups in your browser before clicking the ‘GO’ button. After calculations and database queries, Cytoscape will start automatically from the web. In other words, Cytoscape does not have to be installed on your computer. However, to enable the Cytoscape WebStart, an up-to-date version of Java is required. Please refer to our Frequently Asked Questions (FAQ) page for further details. CORNET data is available upon request.

P-value calculation for expression correlation

For testing the associated significance of a calculated correlation coefficient, the t-test statistic was used. In the first step, the t-value for Pearson’s correlation coefficient is calculated, taking into account the correlation coefficient and the number of conditions in the expression dataset. To calculate the P-value, the calculated t-value is then used as the upper limit in the integration of the cumulative t-distribution function. To solve this equation, a regularized lower incomplete beta function with the t-value and the degree of freedom as parameters is used (Heiman, 1996).

Preprocessing of microarray data

Arabidopsis microarray data was retrieved from the Gene Expression Omnibus (GEO) database for CORNET 1.0. For the compilation of the Affymetrix Maize Genome Array expression dataset (GPL4032), we downloaded 24 experiment series (GSE21070, GSE8188, GSE15048, GSE8278, GSE8275, GSE10023, GSE8194, GSE7030, GSE16567, GSE19501, GSE8308, GSE10237, GSE10236, GSE8320, GSE8179, GSE11531, GSE8174, GSE22479, GSE8176, GSE18491, GSE15371, GSE12892, GSE12770, and GSE10243) from the GEO database. These series contain 128 unique experiments involving cis-transcriptional variation studies in different inbred lines, expression profiling of mutants, nonadaptive and imprinted gene expression, expression profiling of different tissues and different treatments. The Affymetrix Genome array (GPL4032) contains 17 555 probesets for six different maize strains (B73, Ohio43, W22, W23, and Black Mexican Sweet). Approximately 13 000 probe sets for maize B73 were considered for the construction of a custom Chip Description File (CDF). In this way, we found that 9846 genes can be measured reliably by the microarray. For the Nimblegen Maize Whole-Genome Microarray 385K expression dataset (GPL12620), data from the 60 different experimental conditions were retrieved from the GEO database (Sekhon et al., 2011). The Nimblegen Whole-Genome Microarray 385K (GPL12620) contains 80 301 probe sets representing 22 600 genes. This comprehensive atlas contains global transcription profiles across developmental stages and plant organs of maize (60 distinct tissues representing 11 major organ systems of inbred line B73 (germinating seed, root, whole seedling, shoot apical meristem (SAM) and young stem, internodes, cob, tassel and anthers, silk, leaves, husk, seed)).

The microarray data are processed with the Robust Multi-array Average (RMA) procedure implemented in BioConductor (Irizarry et al., 2003a,b; Gautier et al., 2004; Gentleman et al., 2004). In the previous version of CORNET, the CDF tinesath1cdf was used based on TAIR7 genome annotation (Casneuf et al., 2007). In the current version, an up-to-date CDF based on TAIR10 and provided by Brainarray (TAIR10 genes – v14; brainarray.mbni.med.umich.edu) is used to define probe set–gene relations. For the Affymetrix microarray dataset of maize, we have used a similar pre-processing strategy as for Arabidopsis. A custom CDF was constructed in this case. In a first step, all probes were mapped against the reference genome (B73, version 4.53) using the software Exonerate allowing 0 or 1 mismatch and no gaps (Slater & Birney, 2005). In a second step, all probes that mapped to more than one gene were removed. In a last step, only probe sets with more than nine probes were retained.

Microarray data for gene–target relations

To identify possible regulatory interactions, microarray data annotated with ‘genetic_modification_design’, that is, experiments in which transgenic plants were profiled, are extracted from the CORNET database. Using the metadata stored in the CORNET database, we have compared transgenic with wildtype expression profiles and identified differentially expressed genes through the BioConductor package Limma (Smyth, 2004). Differentially expressed genes show a fold change > 2 and a false discovery rate (FDR) < 0.05.

Functional annotation data

The phenotype descriptions are kindly provided by TAIR, as well as the links to the relevant literature (PubMedIDs) (version 24 May 2011) (Rhee et al., 2003). These data can also be viewed on the TAIR gene pages, but cannot be downloaded in bulk. MapMan pathway and process annotation was retrieved from the MapMan store (http://mapman.gabipd.org/web/guest/mapmanstore) for both Arabidopsis and maize. The maize GO annotation and InterPro protein domain information where taken from MaizeSequence.Org (v4.53, July 2010).

Ortholog detection using OrthoMCL

Orthologs between Arabidopsis, rice and maize were retrieved from PLAZA v2.0 and are identified using OrthoMCL (Li et al., 2003). PLAZA v2.0 uses Arabidopsis protein sequences from TAIR9, rice protein sequences from TIGR6.1 and maize protein sequences from maizesequence.org (version 4.53) (Proost et al., 2009; Van Bel et al., 2012). An all proteins-against-all proteins similarity search was performed using BLAST (version 2.17) (Altschul et al., 1990). These results were fed into the OrthoMCL v1.4 program, which was run in mode 4 using default settings. A MCL inflation parameter of 2.0 was chosen.

Predicted PPIs in maize

Protein–protein interactions in maize are inferred based on interolog detection (Yu et al., 2004). In a first step, orthologs between Arabidopsis and maize are identified using OrthoMCL (Li et al., 2003). For two interacting proteins, one (or more) (co-)ortholog(s) are retrieved and all pairwise combinations between the respective orthologs are designated as putative PPIs in maize.

Results and Discussion

  1. Top of page
  2. Summary
  3. Introduction
  4. Materials and Methods
  5. Results and Discussion
  6. Acknowledgements
  7. References
  8. Supporting Information

As shown in Fig. 1, CORNET 2.0 is composed of three tools, namely the coexpression tool, the PPI tool and the TF tool constructing coexpression, PPI and gene–gene association and regulatory networks, respectively. These three tools can be used autonomously but can also be used consecutively to build integrated networks (see the middle panel of Fig. 1). Additionally, localization and functional information can be added to the constructed networks. The networks are automatically generated and displayed in Cytoscape for network exploration and analysis.

image

Figure 1. Functionalities of CORNET. Networks are constructed using the different data types (see middle panel). Coexpression links are represented by colored edges, protein–protein interactions (PPIs) are shown by black edges and regulatory interactions retrieved by the TF tool are displayed by arrows. Localization information is displayed in the pie charts. COE,coexpression; TF, transcription factor.

Download figure to PowerPoint

Coexpression tool

In the previous version of CORNET, we have developed the coexpression tool to study the coexpression of genes using user-defined expression datasets or multiple predefined expression datasets individually or simultaneously. Several predefined expression datasets, such as global compendia representing diverse experimental conditions as well as tissue or treatment-specific expression datasets, are provided. Depending on the nature of the studied genes and the interest of the user, different input expression datasets can be imagined. Global expression compendia will be used when a general view on the coexpression of, for instance, unknown genes is required. By contrast, when looking for genes that are similar to a drought stress-responsive gene, an expression dataset representing abiotic stress conditions can be used to identify specific and relevant relations. Moreover, coexpression can be calculated using multiple expression datasets, representing diverse conditions, and lead to the identification of those conditions in which the genes of interest show similar expression patterns.

These predefined expression datasets are now updated according the latest Arabidopsis genome annotation. An up-to-date CDF is used to delineate probe sets corresponding to specific genes. This CDF was retrieved from Brainarray (TAIR10 genes – v14; brainarray.mbni.med.umich.edu). All expression datasets are preprocessed using this new CDF, and scripts to preprocess user-defined datasets through CORNET are adapted. Owing to the improved genome annotation and slightly different parameter settings, expression for 21 428 genes can now be measured, compared with 20 777 genes using the Casneuf CDF based on TAIR7 (brainarray.mbni.med.umich.edu) (Casneuf et al., 2007).

Coexpression is determined by the calculation of a Pearson’s or Spearman correlation coefficient. The user can apply a correlation coefficient threshold and/or a threshold on the number of coexpressed genes. This functionality is extended to take into account the significance of the correlation by using a P-value threshold. The P-value calculation considers the correlation coefficient and the number of conditions in the expression datasets (see the Materials and Methods section). As the number of conditions is considerably different between predefined expression datasets, the P-value is a better coexpression measure than the correlation coefficient when comparing results from different expression datasets. As with the previous version, when selecting more than one expression dataset, correlation coefficients and P-values will be retrieved from the database (Pearson correlation coefficient (PCC) > 0.4 or PCC < −0.4, P-value < 0.05) and displayed as Cytoscape attributes. When only one predefined expression dataset is used, either Pearson’s or Spearman correlation coefficients are calculated on the fly, allowing correlations between PCC = 0.4 and PCC = −0.4 to be retrieved as well.

PPI tool

PPI from existing and new PPI databases  In CORNET 1.0, we had assembled the available experimentally identified PPIs for Arabidopsis from BIND (Bader et al., 2003), IntAct (Hermjakob et al., 2004), BioGRID (Stark et al., 2006), DIP (Salwinski et al., 2004), MINT (Chatr-aryamontri et al., 2007), TAIR (Rhee et al., 2003), the predicted PPIs from BAR (Geisler-Lee et al., 2007), AtPID (Cui et al., 2008) and the filtered (high stringency) and predicted (low stringency) interactions identified in our own study (De Bodt et al., 2009). In the new CORNET 2.0, we have updated each of these PPI databases according to the latest versions of these resources. Information on the PPI databases and timing of updates can be consulted on the FAQ page. As in the previous version, to be able to grasp the reliability of the PPIs, we distinguish between experimental and predicted PPIs and indicate the different data sources (database, experiment type, evidence code and PubMed ID) as edge attributes in Cytoscape (Shannon et al., 2003; De Bodt et al., 2010).

In addition to the updates of already existing databases, we have added new PPI databases. The recently published Arabidopsis interactome mapping data are integrated (Arabidopsis Interactome Mapping Consortium, 2011). These data consist of c. 6200 PPIs between 2700 proteins identified through yeast-two-hybrid experiments using a collection of c. 8000 open reading frames (see Tables 1, 2). In addition, the MIND consortium kindly provided us with the preliminary MIND 0.5 PPI data, an extension of the published MIND 0.2 dataset (Lalonde et al., 2010). This dataset consists of c. 17 000 potential interaction pairs resulting from a large-scale interaction screen covering c. 6.4 million pairs (among 3700 proteins) (see Tables 1, 2). The independent verification of these interactions by the consortium is currently ongoing (http://www.associomics.org). Therefore, users should be aware that some of these interactions will possibly not hold true in the final version of the MIND dataset. We will gradually update the database as new data are provided by the consortium. Below, some case studies demonstrate the added value of the newly added and updated databases. Lastly, we have included the G-protein interactome (Klopffleisch et al., 2011).

Table 1.   Statistics on the number of protein–protein interactions (PPIs) in CORNET
Data sourceArabidopsisArabidopsisMaizeMaize
Number of PPIsNumber of proteinsNumber of PPIsNumber of proteins
Arabidopsis Interactome Mapping Consortium6205277484432326
ArathReactome1111107
BIND907462
BioGRID5558266773291431
DIP2512401044327
IntAct5592252163851223
MINT16017259296
TAIR5415245676411268
MIND0.517 8332248
AtPID24 41811 70616 5994711
De Bodt et al. (2009) (filtered)18 674183615 4481744
De Bodt et al. (2009) (predicted)51 885301416 6642083
Geisler-Lee et al. (2007)19 979361725551092
G-protein interactome543433
AraNet gene–gene associations1 062 22219 647
Table 2.   Statistics on the CORNET database
 ArabidopsisArabidopsisMaize
MicroarrayCORNET 1.0CORNET 2.0 
  1. 1Affymetrix + Nimblegen statistics.

  2. AGRIS, Arabidopsis Gene Regulatory Information Server.

 Number of arrays30553055340 + 1801
 Number of experiments12091209128 + 601
 Number of series20020024 + 11
 Number of experiments with two replicates63463442 + 01
 Number of experiments with > two replicates57557585 + 601
Protein–protein interactions (PPIs)
 Arabidopsis Interactome Mapping Consortium – number of experimentally identified PPIs6205
 MIND0.5 – number of experimentally identified PPIs17 856
 G-protein interactome – number of experimentally identified PPIs543
 Other – number of experimentally identified PPIs10 353
 Number of experimentally identified PPIs430234 51910 624
 Number of computationally predicted PPIs89 18188 64235 156
 Total number of PPIs93 109122 24643 437
Regulatory interactions
 Number of confirmed AGRIS interactions574
 Number of unconfirmed AGRIS interactions12 464
 Number of gene–target relations (genetic_modification)156 563
 Total number of regulatory interactions168 651
Gene–gene associations
 AraNet – total number of associations1 062 222

An overview of the number of PPIs in each of the databases can be found in Table 1. When inspecting the PPI network and comparing the different data sources, we can observe that there is a relatively small overlap between databases containing experimentally identified PPIs (Supporting Information, Fig. S1). The complete network of experimentally identified and computationally predicted PPIs shows scale-free behavior, with few proteins taking part in many interactions and many proteins taking part in a few interactions (Fig. S2). The average degree (i.e. number of neighbors) of the proteins in the interaction network is 14.533. When distinguishing experimentally identified and computationally predicted PPIs, the average degree is 8.397 and 13.307, respectively, corresponding to the difference in network size. The average shortest path length of a protein to all other proteins in the combined PPI network is 1.815. When comparing the degree in this PPI network to the coexpression network of the same genes, we observe that the degree of coexpression networks is lower than the degree of the PPI network (degree of 5.852). This observation points to the relatively low coexpression of genes encoding the interacting proteins.

Integration of PPI and coexpression networks

We have further investigated the extent of coexpression between the genes encoding interacting proteins using the different predefined expression datasets available in CORNET (Table S1). From 89 540 (predicted, 65 828; experimentally identified, 24 365) PPIs for which coexpression can be assessed, 20 676 (23%) PPIs show coexpression in at least one expression dataset (using a PCC threshold based on the 99% quantile PPC of random pairs; see the Materials and Methods section, Table S2). There can be multiple reasons for the lack of coexpression in this PPI dataset, namely the proteins can interact in very specific conditions, a number of false-positive interactions are possibly included in the network, and/or post-transcriptional mechanisms can regulate the activity of the interacting proteins.

A total of 1518 (of 20 676 coexpressed) PPIs among 350 proteins show coexpression in all studied conditions. These PPIs are involved in housekeeping functions (e.g. ribosome biogenesis, translation, protein metabolic process – GO enrichment tested using hypergeometric distribution, Bonferroni-corrected P < 0.05). Contrary to these PPIs, specific coexpression in particular conditions can be observed in many cases. A total of 7246 PPIs among 4969 proteins, which show coexpression in one out of 14 expression datasets, probably have a function in specific conditions rather than a global function. The data integration functionalities of CORNET 2.0, namely the combined use of the PPI tool and the coexpression tool, allow nonexpert users to study the coexpression of genes encoding interacting proteins on a small scale. In this way, the user can investigate the spatial and temporal activity of protein complexes and discern stable and transient interactions.

Gene–gene association data from AraNet

Besides databases that contain physical interactions between two or more proteins, a number of resources currently provide gene–gene associations with high coverage. These gene–gene associations, most often identified through computational approaches, represent possible functional links between genes or proteins that do not necessarily interact physically. One of these resources is the AraNet Arabidopsis probabilistic functional gene network that represents gene–gene associations inferred based on the integration of diverse functional genomics, proteomics and comparative genomics data sets (Lee et al., 2010). The datasets include mRNA coexpression patterns measured from DNA microarray datasets, known Arabidopsis PPIs, protein sequence features including sharing of protein domains, similarity of phylogenetic profiles or genomic context of bacterial or archaebacterial homologs, and diverse gene–gene associations transferred from yeast, fly, worm and human genes based on orthology. It is essential to this effort that the benchmarking methods are customized for Arabidopsis genes. Each gene–gene association is weighted by the likelihood of the linked genes to participate in the same biological process (Lee et al., 2010). AraNet contains 1 062 222 functional linkages among 19 647 genes (c. 73% of the total Arabidopsis genes) (Tables 1, 2), consisting of 24 distinct types of gene–gene association. The gene–gene associations can be retrieved by using the PPI tool which displays the gene–gene associations as undirected black edges in Cytoscape, as for PPIs. The type of association or evidence and the according scores, as determined by AraNet, are stored in the CORNET database and can be explored in Cytoscape through the attribute browser.

New TF tool

As an increasing number of studies that assay regulatory interactions on a large scale (such as ChIP-chip and ChIP-seq experiments) are performed, we have extended CORNET to enable the retrieval and visualization of these data. The new TF tool not only covers experimentally identified regulatory interactions, but also integrates interactions that are computationally inferred from microarray data on transgenic plants (see the Materials and Methods section).

AGRIS

The AtRegNet data from AGRIS contains regulatory interactions between TFs and their target genes (Yilmaz et al., 2011). This dataset consists of direct as well as indirect interactions, and confirmed as well as unconfirmed interactions (Table 2). A confirmed direct target has been defined as a gene that responds to a given TF according to the following criteria: the TF binds directly to the regulatory region of the target gene, as shown by electrophoretic mobility shift assay, yeast-one-hybrid analysis, and/or ChIP; or the TF directly regulates the target gene, based on use of transgenic plants expressing an inducible TF–GR (transcription factor-glucocorticoid receptor) fusion protein, and the effect of CHX (cycloheximide) on the DEX (dexamethasone) activated/repressed genes; and in vivo evidence of regulation showing expression of the target gene is affected by either loss-of-function mutations in the TF or ectopic expression of that TF in the plant (Yilmaz et al., 2011). In some cases, information on the regulatory activity of the TF (repression or activation) is included. All these metadata can be inspected through the Cytoscape attribute browser (see FAQ).

Microarray-based gene–target relations

As the number of regulatory interactions that are experimentally identified in Arabidopsis is limited, computational prediction of putative interactions is valuable. Although the reliability of the predicted regulatory interactions is relatively low, such interactions allow us to identify putative associations between genes that can be informative in the functional interpretation of the network and to investigate whether these associations are supported by other data types. A straightforward approach to predicting regulatory interactions is by analyzing microarray data for transgenic Arabidopsis plants and identifying a link between the transgene and the up- or down-regulated genes (see the Materials and Methods section). The microarray experiments are performed on plants in which overexpression, induced overexpression, knockout or knockdown of one or more genes is present. The transgene itself often encodes a transcription factor. However, other gene types are also included. It is assumed that among these differentially expressed genes, both direct as well as indirect target genes of the transgene can be present. A summary of the number of regulatory interactions of each type is given in Table 2. This regulatory network consists of 168 651 interactions between 224 regulators (68 AGRIS + 169 microarray regulators) and 19 912 targets (9422 AGRIS + 17 466 microarray targets).

Similar to the coexpression tool and the PPI tool, metadata describing the data source, reliability and interaction type (confirmed or unconfirmed, activation or repression, direct or indirect) of these regulatory interactions is given in the Cytoscape visualization or attributes (see FAQ). If multimers rather than single genes are assayed, the proteins taking part in the multimer are indicated in the Cytoscape edge attributes. Regulatory interactions are displayed as directed edges in the network, as opposed to PPIs, gene–gene associations and coexpression links, which are undirected.

Integration of regulatory networks and PPI or coexpression networks

The TF tool can be used as such or can be used in combination with the coexpression tool or the PPI tool to construct integrated networks of different data types. For instance, by combining the TF tool and the coexpression tool, the user can investigate if and when the targets of a particular TF are coexpressed; or by combining the TF tool and the PPI tool, the user will be able to show if the members of a protein complex are regulated by the same (combination of) TFs.

Overall, the integration of regulatory interactions and coexpression links shows that the targets of both experimentally identified as well as computationally predicted regulatory interactions are poorly coexpressed in each of the predefined expression datasets (Table S3). On average, 4% of the possible links between gene targets show coexpression (using a PCC threshold based on the 99% quantile PPC of random pairs; see the Materials and Methods section). A few examples of regulators that have highly coexpressed targets are ARF10, AT-HSFB2A and TCP20, showing coexpression for more than 50% of the target pairs in several expression datasets. By contrast, many targets show average or low coexpression (e.g. 4% of the possible links between ARF2 targets are coexpressed, compared with 38% for ARF10). This difference between ARF2 and ARF10 targets might be the result of the specific tissues that were used in the two microarray experiments (seedlings with and without auxin treatment were profiled in the ARF2 experiment, while germinating seeds were profiled in the ARF10 experiment).

Other improvements to CORNET for Arabidopsis

Functional annotation and Cytoscape LinkOut  In the previous version of CORNET, we included the functional annotation contained in the TAIR functional gene descriptions, InterPro protein domains, GO annotations and localization to enhance the biological interpretation of the networks. To further enhance the interpretation as well as downstream analyses, we now provide PO annotations, MapMan pathways and processes, phenotypic descriptions of transgenic lines (TAIR) and PubMed IDs (TAIR) as node attributes in Cytoscape (Rhee et al., 2003; Harris et al., 2004; Heazlewood et al., 2007; Avraham et al., 2008). To take advantage of existing resources in the plant field, we have extended the default LinkOut options of Cytoscape with plant-specific links. The LinkOut option can be found by clicking the right mouse button on any node of interest and going to ‘Plants_Arabidopsis’. Links to resources such as TAIR, PLAZA, AtCOECIS, Genevestigator, TextPresso, Plan2L, and many others are provided (Rhee et al., 2003; Muller et al., 2004; Zimmermann et al., 2004; Krallinger et al., 2009; Proost et al., 2009; Vandepoele et al., 2009; Van Bel et al., 2012).

Cytoscape 2.8  We have upgraded the Cytoscape Webstart from version 2.6 to 2.8. This version allows a considerable number of new functionalities and plugins such as diverse visualization options (e.g. edge styles), Custom Node Graphics, Attribute Equations (spreadsheet-like functionality) and the use of nested networks (http://www.cytoscape.org) (Smoot et al., 2011). The plugins Agilent Literature Search (with network representation of the text mining results), MCODE (connectivity-based clustering) and AllegroMCODE (faster clustering using a high performance GPU computing architecture), jActiveModules (visualizing network dynamics), Network Analysis (calculating network metrics such as node degree and shortest path length), Network Modifications (union, intersection, difference of networks) and CytoSQL (importing data from local mySQL databases) are included. Other plugins can be installed upon request.

CORNET for maize

Expression compendia and coexpression tool  As a considerable amount of transcriptomics data is available for the crop species maize, we have extended CORNET to allow the integration of this new species. An important difference between Arabidopsis and maize is the absence of one main microarray platform that is widely used by the maize community. Therefore, we have expanded the CORNET system to enable the use of different microarray platforms. Currently, we provide an Affymetrix and a Nimblegen expression dataset, but datasets from additional platforms can be integrated. We have compiled an expression dataset on 9846 genes based on Affymetrix microarray experiments retrieved from the GEO similar to the Arabidopsis expression datasets (De Bodt et al., 2010). Moreover, we have included the genome-wide atlas of transcription during maize development generated using the Nimblegen platform representing 22 600 genes (Sekhon et al., 2011).

By default, the input data are GRMZM codes of your genes of interest. Other maize identifiers, such as PLAZA v2.0 identifiers (e.g. ZM08G15930) or Affymetrix probe identifiers (e.g. ZM.17362.s1_at), are also accepted as input (Proost et al., 2009; Van Bel et al., 2012). Additionally, CORNET allows orthologous gene identifiers from Arabidopsis using the AGI code (e.g. AT2G33610) or orthologous gene locus identifiers from rice (TIGR identifiers without the LOC_ prefix, e.g. Os02g10060). Note that, if an orthologous gene (from Arabidopsis and/or rice) is given and if it is part of a group of orthologous genes with a many-to-many relationship, all maize genes from this orthologous group will be taken into account. Arabidopsis and rice orthologs of the maize genes in the generated networks can be inspected through the Cytoscape attribute browser (see the Materials and Methods section).

Projected PPIs

All Arabidopsis PPIs were used as a source to computationally predict putative maize PPIs through interolog detection (Yu et al., 2004) (see the Materials and Methods section). Both experimentally identified PPIs as well computationally predicted interactions are used as sources. However, we would like to point out that maize interactions inferred from predicted Arabidopsis interactions can be untrustworthy and care should be taken in considering these data. In the prediction of interactions based on orthology between Arabidopsis and maize, we assume that protein functions are conserved between the two species. Owing to the relatively large evolutionary distance between Arabidopsis and maize, we acknowledge that this conservation might not always hold true. For instance, when no one-to-one orthology relationships can be determined, but rather many-to-many relationships between Arabidopsis and maize genes exist, species-specific duplications might result in independent functional divergence after duplication in both species, giving rise to a difference in interaction partners. Coexpression between the genes encoding interacting proteins can be used as support for the predicted interactions (De Bodt et al., 2009). Similar to Arabidopsis, interactions predicted using experimental source PPIs are represented by solid edges and interactions predicted using predicted source PPIs are represented by dashed edges.

Visualization of CORNET networks

For network visualization, the existing software Cytoscape was favored (Shannon et al., 2003) (Fig. 1) because its functionalities allow browsing and zooming into the constructed networks, a visual as well as textual representation of diverse attributes (e.g. correlation coefficient, localization databases) and further exploration and analysis of the networks. The network visualization legend can be viewed in Fig. 2, on the tool output page, the FAQ page and in the VizMapper tool in Cytoscape. The textual representation of the attributes can be displayed in the lower data panel of Cytoscape by clicking on the ‘select attributes’ button (see FAQ) and copied and/or exported in tabular format.

image

Figure 2. Legend of Cytoscape visualization. The degree of expression correlation (COR) is represented by the color of the edges (representing correlations) in the coexpression network. Protein–protein interactions (PPIs) are depicted by black edges, whereas their reliability can be assessed through the width (number of data sources) and the style (detection method) of the edges (representing interactions). Regulatory interactions are represented as black arrows. Solid edges represent confirmed interactions while dotted edges represent unconfirmed interactions. The shape and color of the arrow are determined by the nature of the interaction. The shape of the nodes (∼genes/proteins) depicts the nature of the gene/protein as query or as neighbor in either the coexpression, PPI tool or transcription factor (TF) tool.

Download figure to PowerPoint

Case study 1: new PPIs and gene-gene associations

A protein–protein and gene–gene association network was built around KNOTTED-LIKE FROM ARABIDOPSIS THALIANA (KNAT1) or BREVIPEDICELLUS1 (BP1), a member of the class I knotted1-like homeobox gene family with a role in leaf and ovule development (see Fig. 3). This network demonstrates the great value of updating and expanding the PPI databases as well as the advantage of adding the computationally predicted gene–gene associations. The blue edges in Fig. 3 represent PPIs identified through yeast-two-hybrid experiments from the Arabidopsis Interactome Mapping consortium that have not been identified previously (Arabidopsis Interactome Mapping Consortium, 2011). Although, false-positive interactions might be present among these interactions as they were only identified by one experimental technique, most probably true, novel interactions are included. Moreover, the network is expanded by a considerable number of gene–gene associations (green edges in Fig. 3) that have been identified through AraNet (Lee et al., 2010). In this way, for instance, a number of homeobox transcription factors are added to the network through AraNet links with KNAT1 as well as with the experimentally identified BLH and KNAT interactors. However, no common functional role for KNAT1 and these homeobox genes is currently reported. In summary, this new version of CORNET allows the extension of a well investigated PPI network of KNAT and BLH proteins (black edges in Fig. 3) (for a global comparison between databases, see Table 2 and Fig. S1).

image

Figure 3. Integration of protein–protein interactions and gene–gene associations. By integrating the Arabidopsis Interactome Mapping Consortium data and the AraNet data, a considerable number of new associations to KNAT1 are found. Blue edges represent protein–protein interactions newly identified by the Arabidopsis Interactome Mapping Consortium; green edges represent AraNet associations.

Download figure to PowerPoint

Case study 2: regulatory interactions

We have integrated the regulatory network with the coexpression network around MYB5 (Fig. 4). In a first step, all putative target genes as well as all putative regulators of MYB5 were retrieved using the TF tool. This search resulted in a regulatory network containing 31 genes and 55 regulatory links identified through micorarray experiments. Therefore, each target is assumed to be an indirect target, even though some direct targets could reside in the list of differentially expressed genes identified by comparing the wildtype with the transgenic microarray data. Consequently, all arrow points are shown as diamonds (Fig. 4). Moreover, regulatory links are shown as dotted arrows, as these links are computationally inferred. In a second step, we identified the coexpression between the genes in the MYB5 regulatory network by considering all predefined expression datasets available in the CORNET coexpression tool. A group of targets that show high coexpression (lower part of Fig. 4) can be distinguished from a group of genes that show little coexpression (upper part of Fig. 4). Strikingly, the majority of the latter genes, including MYB5, are also up-regulated upon knockout of SWINGER (SWN) and CURLY LEAF (CLF).

image

Figure 4. Integration of regulatory interactions and coexpression. The regulatory network around ATMYB5, including (in)direct ATMYB5 targets that coexpress. Regulatory interactions are shown by black arrows (experimental interactions by solid arrows, and computationally predicted interactions by dotted arrows), and coexpression links are shown by colored edges where the color represents the degree of coexpression as measured by Pearson’s correlation coefficient (PCC; blue, 0.9 > PCC > 0.8; pink, 0.8 > PCC > 0.7).

Download figure to PowerPoint

Knockout of MYB5 shows reduced seed coat mucilage and irregularly shaped seed coat epidermal cells (Gonzalez et al., 2009; Li et al., 2009). When both MYB5 and the homolog MYB23 are mutated, the number and size of trichomes are reduced and trichomes have fewer branches (Li et al., 2009). SWN and CLF take part in a large protein complex consisting of VERNALIZATION 2 (VRN2), VERNALIZATION INSENSITIVE 3 (VIN3) and FERTILIZATION INDEPENDENT ENDOSPERM (FIE). The complex has a role in establishing FLOWERING LOCUS C (FLC) repression during vernalization. In addition, SWN has a role in controlling seed initiation. Therefore, the integrated network could suggest a common role for MYB5 and SWN in seed development.

Case study 3: Arabidopsis and maize coexpression network

The development of CORNET for maize allows us to perform coexpression analysis in this crop species. In addition, it allows the user to construct corresponding coexpression networks in both Arabidopsis and maize. Fig. 5 displays the comparison of such coexpression networks. We have constructed a coexpression network around the LEAFY gene. This gene is a single-copy gene in many species and has a role in the transition from the vegetative to the flowering phase. Through orthology (OrthoMCL orthologs retrieved from PLAZA), we can identify two co-orthologs of the LEAFY gene in maize (Proost et al., 2009; Van Bel et al., 2012). These genes were described as ZFL1 (GRMZM2G098813) and ZFL2 (GRMZM2G180190) and have pleiotropic functions in reproductive development such as flower identity and patterning, similar to the Arabidopsis ortholog LEAFY (Bomblies et al., 2003). ZFL1 and ZFL2 are thought to be largely redundant in function, as only the double mutant plants show severe morphological defects (Bomblies et al., 2003). However, a quantitative trait locus study suggested that both genes are possibly evolving subtle differences in function through subfunctionalization (Force et al., 1999; Bomblies & Doebley, 2006). Similarly, the coexpression network of these two maize genes indicates a possible different role of these genes in maize development, as two coexpression clusters can be discerned. However, more detailed functional analyses are necessary to explain the biological importance of the observed structure of the coexpression network. For instance, the coexpression of LEAFY with CUP-SHAPED COTYLEDON3 (CUC3, AT1G76420) is conserved in maize (coexpression of the CUC3 ortholog (GRMZM2G430522) with ZFL1), while coexpression of LEAFY with CUC1 (AT3G15170) and CUC2 (AT5G53950) is not. As far as we know, the association between LEAFY and CUC genes has not been described before. This case study exemplifies the utility of the construction of Arabidopsis and maize coexpression networks (and PPI networks) in gene functional and evolutionary analysis as well as in translational research.

image

Figure 5. Coexpression network in Arabidopis and maize. A maize (green nodes) and Arabidopsis (gray nodes) coexpression network (Pearson’s correlation coefficient > 0.75) was constructed for the Arabidopsis LEAFY genes and its two maize co-orthologs (yellow diamonds). Orthology relationships between Arabidopsis and maize genes as identified in PLAZA are indicated by blue edges (Proost et al., 2009; Van Bel et al., 2012).

Download figure to PowerPoint

Conclusion

We have developed CORNET 2.0, a user-friendly tool for the construction and integration of coexpression, PPI, gene–gene association and regulatory interaction networks. The majority of interaction databases are covered, thereby providing the user with regularly updated data that can be used in the versatile searches in the three tools (coexpression tool, PPI tool and TF tool). Moreover, functional annotation data from numerous databases such as InterPro, GO, PO, MapMan, TAIR phenotype data, and PubMedIDs are compiled in Cytoscape to easily sift through the information available on the genes in the network. Moreover, Cytoscape LinkOut allows one to return to the original resources for further details. The comprehensive interface and the intuitive visualization provide the means to nonexpert users to build hypotheses on the role of one or more genes of interest, grasp the biological relevance of a group of genes and pinpoint putatively novel genes or associations involved in a biological process of interest.

As new types of data become available and cross-species analysis becomes more important in translation research, we foresee a number of possible extensions of the CORNET platform (Mochida & Shinozaki, 2010; Moreno-Risueno et al., 2010; Mochida & Shinozaki, 2011; Sucaet & Deva, 2011). First, cis-regulatory elements, which can be tightly linked with the coexpression results and the regulatory interactions, can be incorporated. Secondly, through comparative genomics approaches, coexpression networks constructed for Arabidopsis and maize, and possible other plant species, can be compared to detect conserved coexpression. Thirdly, the CORNET system can be further extended to enable the use of other types of transcriptomics data such as tiling array or next-generation RNA sequencing (RNA-seq) data when sufficient and diverse experiments have been performed to construct expression datasets for coexpression analysis. Finally, the CORNET platform can be further connected to other plant resources and software tools. For instance, numerous methodologies for the computational inference of gene regulatory networks exist. These predicted networks can also be integrated in CORNET. We foresee that molecular biologists make use of CORNET for hypothesis generation to guide their experiments. Results from such experiments can be used to validate the performance of the approaches taken in CORNET and will allow for optimalization of the methodologies (e.g. measures of coexpression, approaches for prediction of regulatory interactions) and parameter choice (e.g. correlation coefficient thresholds) used in CORNET. As such, CORNET can be continually improved and expanded.

Acknowledgements

  1. Top of page
  2. Summary
  3. Introduction
  4. Materials and Methods
  5. Results and Discussion
  6. Acknowledgements
  7. References
  8. Supporting Information

We would like to thank Marijn Vandevoorde, Thomas Van Parys, Michiel Van Bel and Lieven Sterck, Klaas Vandepoele and Ken Heyndrickx for assistance and helpful suggestions. We are grateful to the MIND consortium, the Arabidopsis Interactome Mapping Consortium and Alan Jones for providing their interactome data. This work was supported by grants from Ghent University (‘Bijzonder Onderzoeksfonds Methusalem project’ no. BOF08/01M00408), the Interuniversity Attraction Poles Programme (IUAP VI/25 (BioMaGNet) and VI/33), initiated by the Belgian State, Science Policy Office, the European Union 6th Framework Programme (‘AGRON-OMICS’, LSHG-CT-2006-037704), and the Research Foundation-Flanders (postdoctoral fellowship to S.D.B.).

References

  1. Top of page
  2. Summary
  3. Introduction
  4. Materials and Methods
  5. Results and Discussion
  6. Acknowledgements
  7. References
  8. Supporting Information
  • Akiyama K, Chikayama E, Yuasa H, Shimada Y, Tohge T, Shinozaki K, Hirai MY, Sakurai T, Kikuchi J, Saito K. 2008. Prime: a web site that assembles tools for metabolomics and transcriptomics. In Silico Biology 8: 339345.
  • Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. Journal of Molecular Biology 215: 403410.
  • Arabidopsis Interactome Mapping Consortium. 2011. Evidence for network evolution in an Arabidopsis interactome map. Science 333: 601607.
  • Avraham S, Tung C-W, Ilic K, Jaiswal P, Kellogg EA, McCouch S, Pujar A, Reiser L, Rhee SY, Sachs MM et al. 2008. The plant ontology database: a community resource for plant structure and developmental stages controlled vocabulary and annotations. Nucleic Acids Research 36: D449D454.
  • Bader GD, Betel D, Hogue CWV. 2003. BIND: the biomolecular interaction network database. Nucleic Acids Research 31: 248250.
  • Bassel GW, Lan H, Glaab E, Gibbs DJ, Gerjets T, Krasnogor N, Bonner AJ, Holdsworth MJ, Provart NJ. 2011. Genome-wide network model capturing seed germination reveals coordinated regulation of plant cellular phase transitions. Proceedings of the National Academy of Sciences, USA 108: 97099714.
  • Bomblies K, Doebley JF. 2006. Pleiotropic effects of the duplicate maize FLORICAULA/LEAFY genes zfl1 and zfl2 on traits under selection during maize domestication. Genetics 172: 519531.
  • Bomblies K, Wang RL, Ambrose BA, Schmidt RJ, Meeley RB, Doebley J. 2003. Duplicate floricaula/leafy homologs zfl1 and zfl2 control inflorescence architecture and flower patterning in maize. Development 130: 23852395.
  • Brady SM, Provart NJ. 2009. Web-queryable large-scale data sets for hypothesis generation in plant biology. Plant Cell 21: 10341051.
  • Brown DM, Zeef LAH, Ellis J, Goodacre R, Turner SR. 2005. Identification of novel genes in Arabidopsis involved in secondary cell wall formation using expression profiling and reverse genetics. Plant Cell 17: 22812295.
  • Casneuf T, Van de Peer Y, Huber W. 2007. In situ analysis of cross-hybridisation on microarrays and the inference of expression correlation. BMC Bioinformatics 8: 461.
  • Chatr-aryamontri A, Ceol A, Montecchi Palazzi L, Nardelli G, Schneider MV, Castagnoli L, Cesareni G. 2007. Mint: the molecular interaction database. Nucleic Acids Research 35: D572D574.
  • Cui J, Li P, Li G, Xu F, Zhao C, Li Y, Yang Z, Wang G, Yu Q, Li Y et al. 2008. Atpid: Arabidopsis thaliana protein interactome database – an integrative platform for plant systems biology. Nucleic Acids Research 36: D999D1008.
  • De Bodt S, Carvajal D, Hollunder J, Van den Cruyce J, Movahedi S, Inzé D. 2010. CORNET: a user-friendly tool for data mining and integration. Plant Physiology 152: 11671179.
  • De Bodt S, Proost S, Vandepoele K, Rouze P, Van de Peer Y. 2009. Predicting protein-protein interactions in Arabidopsis thaliana through integration of orthology, gene ontology and co-expression. BMC Genomics 10: 288.
  • Force A, Lynch M, Pickett FB, Amores A, Yan YL, Postlethwait J. 1999. Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151: 15311545.
  • Gachon CMM, Langlois-Meurinne M, Henry Y, Saindrenan P. 2005. Transcriptional co-regulation of secondary metabolism enzymes in Arabidopsis: functional and evolutionary implications. Plant Molecular Biology 58: 229245.
  • Gautier L, Cope L, Bolstad BM, Irizarry RA. 2004. Affy – analysis of Affymetrix genechip data at the probe level. Bioinformatics 20: 307315.
  • Geisler-Lee J, O’Toole N, Ammar R, Provart NJ, Millar AH, Geisler M. 2007. A predicted interactome for arabidopsis. Plant Physiology 145: 317329.
  • Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J et al. 2004. Bioconductor: open software development for computational biology and bioinformatics. Genome Biology 5: R80.
  • Gonzalez A, Mendenhall J, Huo Y, Lloyd A. 2009. TTG1 complex MYBs, MYB5 and TT2, control outer seed coat differentiation. Developmental Biology 325: 412421.
  • Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C et al. 2004. The gene ontology (GO) database and informatics resource. Nucleic Acids Research 32: D258D261.
  • Heazlewood JL, Verboom RE, Tonti-Filippini J, Small I, Millar AH. 2007. SUBA: the Arabidopsis subcellular database. Nucleic Acids Research 35: D213D218.
  • Heiman GW, ed. 1996. Basic statistics for the behavioral sciences. Boston, MA, USA: Houghton Mifflin.
  • Hermjakob H, Montecchi-Palazzi L, Lewington C, Mudali S, Kerrien S, Orchard S, Vingron M, Roechert B, Roepstorff P, Valencia A et al. 2004. IntAct: an open source molecular interaction database. Nucleic Acids Research 32: D452D455.
  • Horan K, Jang C, Bailey-Serres J, Mittler R, Shelton C, Harper JF, Zhu JK, Cushman JC, Gollery M, Girke T. 2008. Annotating genes of known and unknown function by large-scale coexpression analysis. Plant Physiology 147: 4157.
  • Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP. 2003a. Summaries of Affymetrix genechip probe level data. Nucleic Acids Research 31: e15.
  • Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP. 2003b. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4: 249264.
  • Katari MS, Nowicki SD, Aceituno FF, Nero D, Kelfer J, Thompson LP, Cabello JM, Davidson RS, Goldberg AP, Shasha DE et al. 2010. VirtualPlant a software platform to support systems biology research. Plant Physiology 152: 500515.
  • Klopffleisch K, Phan N, Augustin K, Bayne RS, Booker KS, Botella JR, Carpita NC, Carr T, Chen JG, Cooke TR et al. 2011. Arabidopsis G-protein interactome reveals connections to cell wall carbohydrates and morphogenesis. Molecular Systems Biology 7: 532.
  • Krallinger M, Rodriguez-Penagos C, Tendulkar A, Valencia A. 2009. Plan2l: a web tool for integrated text mining and literature-based bioentity relation extraction. Nucleic Acids Research 37: W160W165.
  • Lalonde S, Sero A, Pratelli R, Pilot G, Chen J, Sardi MI, Parsa SA, Kim DY, Acharya BR, Stein EV et al. 2010. A membrane protein/signaling protein interaction network for Arabidopsis version AMPv2. Frontiers in Physiology 1: 24.
  • Lee I, Ambaru B, Thakkar P, Marcotte EM, Rhee SY. 2010. Rational association of genes with traits using a genome-scale gene network for Arabidopsis thaliana. Nature Biotechnology 28: 149156.
  • Lee TH, Kim YK, Pham TT, Song SI, Kim JK, Kang KY, An G, Jung KH, Galbraith DW, Kim M et al. 2009. RiceArrayNet: a database for correlating gene expression from transcriptome profiling, and its application to the analysis of coexpressed genes in rice. Plant Physiology 151: 1633.
  • Li L, Stoeckert CJ Jr, Roos DS. 2003. Orthomcl: identification of ortholog groups for eukaryotic genomes. Genome Research 13: 21782189.
  • Li SF, Milliken ON, Pham H, Seyit R, Napoli R, Preston J, Koltunow AM, Parish RW. 2009. The Arabidopsis MYB5 transcription factor regulates mucilage synthesis, seed coat development, and trichome morphogenesis. Plant Cell 21: 7289.
  • Lisso J, Steinhauser D, Altmann T, Kopka J, Müssig C. 2005. Identification of brassinosteroid-related genes by means of transcript co-response analyses. Nucleic Acids Research 33: 26852696.
  • Mochida K, Shinozaki K. 2010. Genomics and bioinformatics resources for crop improvement. Plant and Cell Physiology 51: 497523.
  • Mochida K, Shinozaki K. 2011. Advances in omics and bioinformatics tools for systems analyses of plant functions. Plant and Cell Physiology 52: 20172038.
  • Moreno-Risueno MA, Busch W, Benfey PN. 2010. Omics meet networks – using systems approaches to infer regulatory networks in plants. Current Opinion in Plant Biology 13: 126131.
  • Muller HM, Kenny EE, Sternberg PW. 2004. Textpresso: an ontology-based information retrieval and extraction system for biological literature. PLoS Biology 2: e309.
  • Mutwil M, Klie S, Tohge T, Giorgi FM, Wilkins O, Campbell MM, Fernie AR, Usadel B, Nikoloski Z, Persson S. 2011. PlaNet: combined sequence and expression comparisons across plant networks derived from seven species. Plant Cell 23: 895910.
  • Mutwil M, Øbro J, Willats WGT, Persson S. 2008. GeneCAT – novel webtools that combine blast and co-expression analyses. Nucleic Acids Research 36: W320W326.
  • Obayashi T, Hayashi S, Saeki M, Ohta H, Kinoshita K. 2009. ATTED-II provides coexpressed gene networks for Arabidopsis. Nucleic Acids Research 37: D987D991.
  • Obayashi T, Kinoshita K, Nakai K, Shibaoka M, Hayashi S, Saeki M, Shibata D, Saito K, Ohta H. 2007. ATTED-II: a database of co-expressed genes and cis elements for identifying co-regulated gene groups in Arabidopsis. Nucleic Acids Research 35: D863D869.
  • Ogata Y, Suzuki H, Sakurai N, Shibata D. 2010. CoP: a database for characterizing co-expressed gene modules with biological information in plants. Bioinformatics 26: 12671268.
  • Proost S, Van Bel M, Sterck L, Billiau K, Van Parys T, Van de Peer Y, Vandepoele K. 2009. PLAZA: a comparative genomics resource to study gene and genome evolution in plants. Plant Cell 21: 37183731.
  • Rautengarten C, Steinhauser D, Büssis D, Stintzi A, Schaller A, Kopka J, Altmann T. 2005. Inferring hypotheses on functional relationships of genes: analysis of the Arabidopsis thaliana subtilase gene family. PLoS Computational Biology 1: e40, 0297-0312.
  • Rhee SY, Beavis W, Berardini TZ, Chen G, Dixon D, Doyle A, Garcia-Hernandez M, Huala E, Lander G, Montoya M et al. 2003. The Arabidopsis information resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community. Nucleic Acids Research 31: 224228.
  • Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D. 2004. The database of interacting proteins: 2004 update. Nucleic Acids Research 32: D449D451.
  • Sekhon RS, Lin H, Childs KL, Hansey CN, Buell CR, de Leon N, Kaeppler SM. 2011. Genome-wide atlas of transcription during maize development. Plant Journal 66: 553563.
  • Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T. 2003. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Research 13: 24982504.
  • Slater GS, Birney E. 2005. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6: 31.
  • Smoot ME, Ono K, Ruscheinski J, Wang PL, Ideker T. 2011. Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics 27: 431432.
  • Smyth GK. 2004. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology 3: Article3.
  • Srinivasasainagendra V, Page GP, Mehta T, Coulibaly I, Loraine AE. 2008. CressexPress: a tool for large-scale mining of expression data from arabidopsis. Plant Physiology 147: 10041016.
  • Stark C, Breitkreutz B-J, Reguly T, Boucher L, Breitkreutz A, Tyers M. 2006. BioGRID: a general repository for interaction datasets. Nucleic Acids Research 34: D535D539.
  • Steinhauser D, Usadel B, Luedemann A, Thimm O, Kopka J. 2004. CSB.DB: a comprehensive systems-biology database. Bioinformatics 20: 36473651.
  • Sucaet Y, Deva T. 2011. Evolution and applications of plant pathway resources and databases. Briefings in Bioinformatics 12: 530544.
  • Toufighi K, Brady SM, Austin R, Ly E, Provart NJ. 2005. The botany array resource: e-northerns, expression angling, and promoter analyses. Plant Journal 43: 153163.
  • Usadel B, Nagel A, Thimm O, Redestig H, Blaesing OE, Palacios-Rojas N, Selbig J, Hannemann J, Piques MC, Steinhauser D et al. 2005. Extension of the visualization tool MapMan to allow statistical analysis of arrays, display of corresponding genes, and comparison with known responses. Plant Physiology 138: 11951204.
  • Usadel B, Obayashi T, Mutwil M, Giorgi FM, Bassel GW, Tanimoto M, Chow A, Steinhauser D, Persson S, Provart NJ. 2009. Coexpression tools for plant biology: opportunities for hypothesis generation and caveats. Plant, Cell & Environment 32: 16331651.
  • Van Bel M, Proost S, Wischnitzki E, Movahedi S, Scheerlinck C, Van de Peer Y, Vandepoele K. 2012. Dissecting plant genomes with the PLAZA comparative genomics platform. Plant Physiology 158: 590600.
  • Vandepoele K, Quimbaya M, Casneuf T, De Veylder L, Van de Peer Y. 2009. Unraveling transcriptional control in Arabidopsis using cis-regulatory elements and coexpression networks. Plant Physiology 150: 535546.
  • Yilmaz A, Mejia-Guerra MK, Kurz K, Liang X, Welch L, Grotewold E. 2011. AGRIS: the Arabidopsis gene regulatory information server, an update. Nucleic Acids Research 39: D1118D1122.
  • Yu H, Luscombe NM, Lu HX, Zhu X, Xia Y, Han JD, Bertin N, Chung S, Vidal M, Gerstein M. 2004. Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs. Genome Research 14: 11071118.
  • Zimmermann P, Hirsch-Hoffmann M, Hennig L, Gruissem W. 2004. GENEVESTIGATOR. Arabidopsis microarray database and analysis toolbox. Plant Physiology 136: 26212632.

Supporting Information

  1. Top of page
  2. Summary
  3. Introduction
  4. Materials and Methods
  5. Results and Discussion
  6. Acknowledgements
  7. References
  8. Supporting Information

Fig. S1 Comparison of Arabidopsis experimental protein-protein interaction (PPI) databases.

Fig. S2 Power law distribution of the degree in function of the number of nodes.

Table S1 Number of coexpressed genes encoding for interacting proteins

Table S2 Pearson correlation coefficient thresholds

Table S3 Number of coexpressed regulatory targets

Please note: Wiley-Blackwell are not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing material) should be directed to the New Phytologist Central Office.

FilenameFormatSizeDescription
NPH_4184_sm_TableS1-S3-FigS1-S2.pdf174KSupporting info item