•To enable easy access and interpretation of heterogenous and scattered data, we have developed a user-friendly tool for data mining and integration in Arabidopsis, named CORNET. This tool allows the browsing of microarray data, the construction of coexpression and protein–protein interaction (PPI) networks and the exploration of diverse functional annotations. Here, we present the new functionalities of CORNET 2.0 for data integration in plants.
•First of all, CORNET allows the integration of regulatory interaction datasets accessible through the new transcription factor (TF) tool that can be used in combination with the coexpression tool or the PPI tool. In addition, we have extended the PPI tool to enable the analysis of gene–gene associations from AraNet as well as newly identified PPIs. Different search options are implemented to enable the construction of networks centered around multiple input genes or proteins. New functional annotation resources are included to retrieve relevant literature, phenotypes, plant ontology and biological pathways.
•We have also extended CORNET to attain the construction of coexpression and PPI networks in the crop species maize.
If you can't find a tool you're looking for, please click the link at the top of the page to "Go to old article view". Alternatively, view our Knowledge Base articles for additional help. Your feedback is important to us, so please let us know if you have comments or ideas for improvement.
In recent years, plant biology has witnessed a true data explosion. Not only have numerous small and moderate-scale experiments been performed to assay the functions of and relations between plant genes and proteins, but a number of large-scale resources have also been built. In particular, for the model plant Arabidopsis thaliana (Arabidopsis), but also for crops such as Zea mays (maize), data have been generated through transcriptome, protein–protein interactome mapping and regulatory interaction experiments, as well as computational approaches. These datasets can only be exploited to their full use through data integration, thereby leading to, for instance, the identification of the temporal and spatial activity of protein complexes and the prediction of putative functions for unknown genes (Brown et al., 2005; Gachon et al., 2005; Lisso et al., 2005; Rautengarten et al., 2005; Usadel et al., 2009).
We have developed CORNET (acronym for CORrelation NETworks), a user-friendly tool for data mining and integration that is accessible through https://bioinformatics.psb.ugent.be/cornet. We collected microarray expression data, corresponding metadata describing sampled tissues, treatments and time points of sampling, PPI data, localization data, and functional information (Gene Ontology (GO), InterPro protein domains) in a central database. A user-friendly interface allows the database to be queried, enabling network construction through a multitude of search options addressing different biological questions. Coexpression networks could be calculated using user-defined and multiple predefined expression datasets. Similarly, webtools such as ATTED-II and BAR allow the use of different expression datasets for coexpression. However, the search options in CORNET are more extensive and flexible. PPI networks could be constructed with both experimentally identified and computationally predicted data. Moreover, CORNET allows the integration of coexpression and PPI networks, and a comprehensive visualization of the networks is generated in Cytoscape, providing a bird’s eye view of the results and the different degrees of reliability of the extracted information, an important feature that is not available with most other tools (Shannon et al., 2003).
In this manuscript, we describe the development of new functionalities and integration of new data types in CORNET 2.0 for Arabidopsis and the development of CORNET for maize. Existing PPI data are updated and new data from the Arabidopsis Interactome Mapping Consortium and from the MIND database are incorporated into the CORNET database (Lalonde et al., 2010; Arabidopsis Interactome Mapping Consortium, 2011). Available functional annotation data are updated and a number of new functional annotation resources such as Plant Ontology (PO), PubMed IDs, phenotypes and MapMan pathways and processes are included (Rhee et al., 2003; Usadel et al., 2005; Avraham et al., 2008). Furthermore, computationally inferred gene–gene associations (AraNet) have been added as a new data type in the PPI tool (Lee et al., 2010). A new, so-called transcription factor (TF) tool to extract regulatory interactions identified through experimental methods such as chromatin immunoprecipitation (ChIP)-chip and ChIP-seq as well as computationally predicted regulatory interactions based on microarray data has been developed (Yilmaz et al., 2011). Finally, CORNET for maize, allowing coexpression analysis using different expression compendia and platforms and the construction of PPI networks identified through orthology with interacting proteins from Arabidopsis (interologs) has been set up (Yu et al., 2004; Sekhon et al., 2011). In summary, this extended CORNET 2.0 allows the user to construct integrated networks compiling the majority of the currently available data types for plants.
Materials and Methods
CORNET 2.0 can be accessed through the following URL: https://bioinformatics.psb.ugent.be/cornet. The tool is fully functional in Firefox and Safari browsers. First users might need to accept a certificate before accessing the website. The site is ideally viewed at a 1280 × 1024 resolution. You need to allow pop-ups in your browser before clicking the ‘GO’ button. After calculations and database queries, Cytoscape will start automatically from the web. In other words, Cytoscape does not have to be installed on your computer. However, to enable the Cytoscape WebStart, an up-to-date version of Java is required. Please refer to our Frequently Asked Questions (FAQ) page for further details. CORNET data is available upon request.
P-value calculation for expression correlation
For testing the associated significance of a calculated correlation coefficient, the t-test statistic was used. In the first step, the t-value for Pearson’s correlation coefficient is calculated, taking into account the correlation coefficient and the number of conditions in the expression dataset. To calculate the P-value, the calculated t-value is then used as the upper limit in the integration of the cumulative t-distribution function. To solve this equation, a regularized lower incomplete beta function with the t-value and the degree of freedom as parameters is used (Heiman, 1996).
Preprocessing of microarray data
Arabidopsis microarray data was retrieved from the Gene Expression Omnibus (GEO) database for CORNET 1.0. For the compilation of the Affymetrix Maize Genome Array expression dataset (GPL4032), we downloaded 24 experiment series (GSE21070, GSE8188, GSE15048, GSE8278, GSE8275, GSE10023, GSE8194, GSE7030, GSE16567, GSE19501, GSE8308, GSE10237, GSE10236, GSE8320, GSE8179, GSE11531, GSE8174, GSE22479, GSE8176, GSE18491, GSE15371, GSE12892, GSE12770, and GSE10243) from the GEO database. These series contain 128 unique experiments involving cis-transcriptional variation studies in different inbred lines, expression profiling of mutants, nonadaptive and imprinted gene expression, expression profiling of different tissues and different treatments. The Affymetrix Genome array (GPL4032) contains 17 555 probesets for six different maize strains (B73, Ohio43, W22, W23, and Black Mexican Sweet). Approximately 13 000 probe sets for maize B73 were considered for the construction of a custom Chip Description File (CDF). In this way, we found that 9846 genes can be measured reliably by the microarray. For the Nimblegen Maize Whole-Genome Microarray 385K expression dataset (GPL12620), data from the 60 different experimental conditions were retrieved from the GEO database (Sekhon et al., 2011). The Nimblegen Whole-Genome Microarray 385K (GPL12620) contains 80 301 probe sets representing 22 600 genes. This comprehensive atlas contains global transcription profiles across developmental stages and plant organs of maize (60 distinct tissues representing 11 major organ systems of inbred line B73 (germinating seed, root, whole seedling, shoot apical meristem (SAM) and young stem, internodes, cob, tassel and anthers, silk, leaves, husk, seed)).
The microarray data are processed with the Robust Multi-array Average (RMA) procedure implemented in BioConductor (Irizarry et al., 2003a,b; Gautier et al., 2004; Gentleman et al., 2004). In the previous version of CORNET, the CDF tinesath1cdf was used based on TAIR7 genome annotation (Casneuf et al., 2007). In the current version, an up-to-date CDF based on TAIR10 and provided by Brainarray (TAIR10 genes – v14; brainarray.mbni.med.umich.edu) is used to define probe set–gene relations. For the Affymetrix microarray dataset of maize, we have used a similar pre-processing strategy as for Arabidopsis. A custom CDF was constructed in this case. In a first step, all probes were mapped against the reference genome (B73, version 4.53) using the software Exonerate allowing 0 or 1 mismatch and no gaps (Slater & Birney, 2005). In a second step, all probes that mapped to more than one gene were removed. In a last step, only probe sets with more than nine probes were retained.
Microarray data for gene–target relations
To identify possible regulatory interactions, microarray data annotated with ‘genetic_modification_design’, that is, experiments in which transgenic plants were profiled, are extracted from the CORNET database. Using the metadata stored in the CORNET database, we have compared transgenic with wildtype expression profiles and identified differentially expressed genes through the BioConductor package Limma (Smyth, 2004). Differentially expressed genes show a fold change > 2 and a false discovery rate (FDR) < 0.05.
Functional annotation data
The phenotype descriptions are kindly provided by TAIR, as well as the links to the relevant literature (PubMedIDs) (version 24 May 2011) (Rhee et al., 2003). These data can also be viewed on the TAIR gene pages, but cannot be downloaded in bulk. MapMan pathway and process annotation was retrieved from the MapMan store (http://mapman.gabipd.org/web/guest/mapmanstore) for both Arabidopsis and maize. The maize GO annotation and InterPro protein domain information where taken from MaizeSequence.Org (v4.53, July 2010).
Ortholog detection using OrthoMCL
Orthologs between Arabidopsis, rice and maize were retrieved from PLAZA v2.0 and are identified using OrthoMCL (Li et al., 2003). PLAZA v2.0 uses Arabidopsis protein sequences from TAIR9, rice protein sequences from TIGR6.1 and maize protein sequences from maizesequence.org (version 4.53) (Proost et al., 2009; Van Bel et al., 2012). An all proteins-against-all proteins similarity search was performed using BLAST (version 2.17) (Altschul et al., 1990). These results were fed into the OrthoMCL v1.4 program, which was run in mode 4 using default settings. A MCL inflation parameter of 2.0 was chosen.
Predicted PPIs in maize
Protein–protein interactions in maize are inferred based on interolog detection (Yu et al., 2004). In a first step, orthologs between Arabidopsis and maize are identified using OrthoMCL (Li et al., 2003). For two interacting proteins, one (or more) (co-)ortholog(s) are retrieved and all pairwise combinations between the respective orthologs are designated as putative PPIs in maize.
Results and Discussion
As shown in Fig. 1, CORNET 2.0 is composed of three tools, namely the coexpression tool, the PPI tool and the TF tool constructing coexpression, PPI and gene–gene association and regulatory networks, respectively. These three tools can be used autonomously but can also be used consecutively to build integrated networks (see the middle panel of Fig. 1). Additionally, localization and functional information can be added to the constructed networks. The networks are automatically generated and displayed in Cytoscape for network exploration and analysis.
In the previous version of CORNET, we have developed the coexpression tool to study the coexpression of genes using user-defined expression datasets or multiple predefined expression datasets individually or simultaneously. Several predefined expression datasets, such as global compendia representing diverse experimental conditions as well as tissue or treatment-specific expression datasets, are provided. Depending on the nature of the studied genes and the interest of the user, different input expression datasets can be imagined. Global expression compendia will be used when a general view on the coexpression of, for instance, unknown genes is required. By contrast, when looking for genes that are similar to a drought stress-responsive gene, an expression dataset representing abiotic stress conditions can be used to identify specific and relevant relations. Moreover, coexpression can be calculated using multiple expression datasets, representing diverse conditions, and lead to the identification of those conditions in which the genes of interest show similar expression patterns.
These predefined expression datasets are now updated according the latest Arabidopsis genome annotation. An up-to-date CDF is used to delineate probe sets corresponding to specific genes. This CDF was retrieved from Brainarray (TAIR10 genes – v14; brainarray.mbni.med.umich.edu). All expression datasets are preprocessed using this new CDF, and scripts to preprocess user-defined datasets through CORNET are adapted. Owing to the improved genome annotation and slightly different parameter settings, expression for 21 428 genes can now be measured, compared with 20 777 genes using the Casneuf CDF based on TAIR7 (brainarray.mbni.med.umich.edu) (Casneuf et al., 2007).
Coexpression is determined by the calculation of a Pearson’s or Spearman correlation coefficient. The user can apply a correlation coefficient threshold and/or a threshold on the number of coexpressed genes. This functionality is extended to take into account the significance of the correlation by using a P-value threshold. The P-value calculation considers the correlation coefficient and the number of conditions in the expression datasets (see the Materials and Methods section). As the number of conditions is considerably different between predefined expression datasets, the P-value is a better coexpression measure than the correlation coefficient when comparing results from different expression datasets. As with the previous version, when selecting more than one expression dataset, correlation coefficients and P-values will be retrieved from the database (Pearson correlation coefficient (PCC) > 0.4 or PCC < −0.4, P-value < 0.05) and displayed as Cytoscape attributes. When only one predefined expression dataset is used, either Pearson’s or Spearman correlation coefficients are calculated on the fly, allowing correlations between PCC = 0.4 and PCC = −0.4 to be retrieved as well.
In addition to the updates of already existing databases, we have added new PPI databases. The recently published Arabidopsis interactome mapping data are integrated (Arabidopsis Interactome Mapping Consortium, 2011). These data consist of c. 6200 PPIs between 2700 proteins identified through yeast-two-hybrid experiments using a collection of c. 8000 open reading frames (see Tables 1, 2). In addition, the MIND consortium kindly provided us with the preliminary MIND 0.5 PPI data, an extension of the published MIND 0.2 dataset (Lalonde et al., 2010). This dataset consists of c. 17 000 potential interaction pairs resulting from a large-scale interaction screen covering c. 6.4 million pairs (among 3700 proteins) (see Tables 1, 2). The independent verification of these interactions by the consortium is currently ongoing (http://www.associomics.org). Therefore, users should be aware that some of these interactions will possibly not hold true in the final version of the MIND dataset. We will gradually update the database as new data are provided by the consortium. Below, some case studies demonstrate the added value of the newly added and updated databases. Lastly, we have included the G-protein interactome (Klopffleisch et al., 2011).
Table 1. Statistics on the number of protein–protein interactions (PPIs) in CORNET
AGRIS, Arabidopsis Gene Regulatory Information Server.
Number of arrays
340 + 1801
Number of experiments
128 + 601
Number of series
24 + 11
Number of experiments with two replicates
42 + 01
Number of experiments with > two replicates
85 + 601
Protein–protein interactions (PPIs)
Arabidopsis Interactome Mapping Consortium – number of experimentally identified PPIs
MIND0.5 – number of experimentally identified PPIs
G-protein interactome – number of experimentally identified PPIs
Other – number of experimentally identified PPIs
Number of experimentally identified PPIs
Number of computationally predicted PPIs
Total number of PPIs
Number of confirmed AGRIS interactions
Number of unconfirmed AGRIS interactions
Number of gene–target relations (genetic_modification)
Total number of regulatory interactions
AraNet – total number of associations
1 062 222
An overview of the number of PPIs in each of the databases can be found in Table 1. When inspecting the PPI network and comparing the different data sources, we can observe that there is a relatively small overlap between databases containing experimentally identified PPIs (Supporting Information, Fig. S1). The complete network of experimentally identified and computationally predicted PPIs shows scale-free behavior, with few proteins taking part in many interactions and many proteins taking part in a few interactions (Fig. S2). The average degree (i.e. number of neighbors) of the proteins in the interaction network is 14.533. When distinguishing experimentally identified and computationally predicted PPIs, the average degree is 8.397 and 13.307, respectively, corresponding to the difference in network size. The average shortest path length of a protein to all other proteins in the combined PPI network is 1.815. When comparing the degree in this PPI network to the coexpression network of the same genes, we observe that the degree of coexpression networks is lower than the degree of the PPI network (degree of 5.852). This observation points to the relatively low coexpression of genes encoding the interacting proteins.
Integration of PPI and coexpression networks
We have further investigated the extent of coexpression between the genes encoding interacting proteins using the different predefined expression datasets available in CORNET (Table S1). From 89 540 (predicted, 65 828; experimentally identified, 24 365) PPIs for which coexpression can be assessed, 20 676 (23%) PPIs show coexpression in at least one expression dataset (using a PCC threshold based on the 99% quantile PPC of random pairs; see the Materials and Methods section, Table S2). There can be multiple reasons for the lack of coexpression in this PPI dataset, namely the proteins can interact in very specific conditions, a number of false-positive interactions are possibly included in the network, and/or post-transcriptional mechanisms can regulate the activity of the interacting proteins.
A total of 1518 (of 20 676 coexpressed) PPIs among 350 proteins show coexpression in all studied conditions. These PPIs are involved in housekeeping functions (e.g. ribosome biogenesis, translation, protein metabolic process – GO enrichment tested using hypergeometric distribution, Bonferroni-corrected P < 0.05). Contrary to these PPIs, specific coexpression in particular conditions can be observed in many cases. A total of 7246 PPIs among 4969 proteins, which show coexpression in one out of 14 expression datasets, probably have a function in specific conditions rather than a global function. The data integration functionalities of CORNET 2.0, namely the combined use of the PPI tool and the coexpression tool, allow nonexpert users to study the coexpression of genes encoding interacting proteins on a small scale. In this way, the user can investigate the spatial and temporal activity of protein complexes and discern stable and transient interactions.
Gene–gene association data from AraNet
Besides databases that contain physical interactions between two or more proteins, a number of resources currently provide gene–gene associations with high coverage. These gene–gene associations, most often identified through computational approaches, represent possible functional links between genes or proteins that do not necessarily interact physically. One of these resources is the AraNet Arabidopsis probabilistic functional gene network that represents gene–gene associations inferred based on the integration of diverse functional genomics, proteomics and comparative genomics data sets (Lee et al., 2010). The datasets include mRNA coexpression patterns measured from DNA microarray datasets, known Arabidopsis PPIs, protein sequence features including sharing of protein domains, similarity of phylogenetic profiles or genomic context of bacterial or archaebacterial homologs, and diverse gene–gene associations transferred from yeast, fly, worm and human genes based on orthology. It is essential to this effort that the benchmarking methods are customized for Arabidopsis genes. Each gene–gene association is weighted by the likelihood of the linked genes to participate in the same biological process (Lee et al., 2010). AraNet contains 1 062 222 functional linkages among 19 647 genes (c. 73% of the total Arabidopsis genes) (Tables 1, 2), consisting of 24 distinct types of gene–gene association. The gene–gene associations can be retrieved by using the PPI tool which displays the gene–gene associations as undirected black edges in Cytoscape, as for PPIs. The type of association or evidence and the according scores, as determined by AraNet, are stored in the CORNET database and can be explored in Cytoscape through the attribute browser.
New TF tool
As an increasing number of studies that assay regulatory interactions on a large scale (such as ChIP-chip and ChIP-seq experiments) are performed, we have extended CORNET to enable the retrieval and visualization of these data. The new TF tool not only covers experimentally identified regulatory interactions, but also integrates interactions that are computationally inferred from microarray data on transgenic plants (see the Materials and Methods section).
The AtRegNet data from AGRIS contains regulatory interactions between TFs and their target genes (Yilmaz et al., 2011). This dataset consists of direct as well as indirect interactions, and confirmed as well as unconfirmed interactions (Table 2). A confirmed direct target has been defined as a gene that responds to a given TF according to the following criteria: the TF binds directly to the regulatory region of the target gene, as shown by electrophoretic mobility shift assay, yeast-one-hybrid analysis, and/or ChIP; or the TF directly regulates the target gene, based on use of transgenic plants expressing an inducible TF–GR (transcription factor-glucocorticoid receptor) fusion protein, and the effect of CHX (cycloheximide) on the DEX (dexamethasone) activated/repressed genes; and in vivo evidence of regulation showing expression of the target gene is affected by either loss-of-function mutations in the TF or ectopic expression of that TF in the plant (Yilmaz et al., 2011). In some cases, information on the regulatory activity of the TF (repression or activation) is included. All these metadata can be inspected through the Cytoscape attribute browser (see FAQ).
Microarray-based gene–target relations
As the number of regulatory interactions that are experimentally identified in Arabidopsis is limited, computational prediction of putative interactions is valuable. Although the reliability of the predicted regulatory interactions is relatively low, such interactions allow us to identify putative associations between genes that can be informative in the functional interpretation of the network and to investigate whether these associations are supported by other data types. A straightforward approach to predicting regulatory interactions is by analyzing microarray data for transgenic Arabidopsis plants and identifying a link between the transgene and the up- or down-regulated genes (see the Materials and Methods section). The microarray experiments are performed on plants in which overexpression, induced overexpression, knockout or knockdown of one or more genes is present. The transgene itself often encodes a transcription factor. However, other gene types are also included. It is assumed that among these differentially expressed genes, both direct as well as indirect target genes of the transgene can be present. A summary of the number of regulatory interactions of each type is given in Table 2. This regulatory network consists of 168 651 interactions between 224 regulators (68 AGRIS + 169 microarray regulators) and 19 912 targets (9422 AGRIS + 17 466 microarray targets).
Similar to the coexpression tool and the PPI tool, metadata describing the data source, reliability and interaction type (confirmed or unconfirmed, activation or repression, direct or indirect) of these regulatory interactions is given in the Cytoscape visualization or attributes (see FAQ). If multimers rather than single genes are assayed, the proteins taking part in the multimer are indicated in the Cytoscape edge attributes. Regulatory interactions are displayed as directed edges in the network, as opposed to PPIs, gene–gene associations and coexpression links, which are undirected.
Integration of regulatory networks and PPI or coexpression networks
The TF tool can be used as such or can be used in combination with the coexpression tool or the PPI tool to construct integrated networks of different data types. For instance, by combining the TF tool and the coexpression tool, the user can investigate if and when the targets of a particular TF are coexpressed; or by combining the TF tool and the PPI tool, the user will be able to show if the members of a protein complex are regulated by the same (combination of) TFs.
Overall, the integration of regulatory interactions and coexpression links shows that the targets of both experimentally identified as well as computationally predicted regulatory interactions are poorly coexpressed in each of the predefined expression datasets (Table S3). On average, 4% of the possible links between gene targets show coexpression (using a PCC threshold based on the 99% quantile PPC of random pairs; see the Materials and Methods section). A few examples of regulators that have highly coexpressed targets are ARF10, AT-HSFB2A and TCP20, showing coexpression for more than 50% of the target pairs in several expression datasets. By contrast, many targets show average or low coexpression (e.g. 4% of the possible links between ARF2 targets are coexpressed, compared with 38% for ARF10). This difference between ARF2 and ARF10 targets might be the result of the specific tissues that were used in the two microarray experiments (seedlings with and without auxin treatment were profiled in the ARF2 experiment, while germinating seeds were profiled in the ARF10 experiment).
Other improvements to CORNET for Arabidopsis
Functional annotation and Cytoscape LinkOut In the previous version of CORNET, we included the functional annotation contained in the TAIR functional gene descriptions, InterPro protein domains, GO annotations and localization to enhance the biological interpretation of the networks. To further enhance the interpretation as well as downstream analyses, we now provide PO annotations, MapMan pathways and processes, phenotypic descriptions of transgenic lines (TAIR) and PubMed IDs (TAIR) as node attributes in Cytoscape (Rhee et al., 2003; Harris et al., 2004; Heazlewood et al., 2007; Avraham et al., 2008). To take advantage of existing resources in the plant field, we have extended the default LinkOut options of Cytoscape with plant-specific links. The LinkOut option can be found by clicking the right mouse button on any node of interest and going to ‘Plants_Arabidopsis’. Links to resources such as TAIR, PLAZA, AtCOECIS, Genevestigator, TextPresso, Plan2L, and many others are provided (Rhee et al., 2003; Muller et al., 2004; Zimmermann et al., 2004; Krallinger et al., 2009; Proost et al., 2009; Vandepoele et al., 2009; Van Bel et al., 2012).
Cytoscape 2.8 We have upgraded the Cytoscape Webstart from version 2.6 to 2.8. This version allows a considerable number of new functionalities and plugins such as diverse visualization options (e.g. edge styles), Custom Node Graphics, Attribute Equations (spreadsheet-like functionality) and the use of nested networks (http://www.cytoscape.org) (Smoot et al., 2011). The plugins Agilent Literature Search (with network representation of the text mining results), MCODE (connectivity-based clustering) and AllegroMCODE (faster clustering using a high performance GPU computing architecture), jActiveModules (visualizing network dynamics), Network Analysis (calculating network metrics such as node degree and shortest path length), Network Modifications (union, intersection, difference of networks) and CytoSQL (importing data from local mySQL databases) are included. Other plugins can be installed upon request.
CORNET for maize
Expression compendia and coexpression tool As a considerable amount of transcriptomics data is available for the crop species maize, we have extended CORNET to allow the integration of this new species. An important difference between Arabidopsis and maize is the absence of one main microarray platform that is widely used by the maize community. Therefore, we have expanded the CORNET system to enable the use of different microarray platforms. Currently, we provide an Affymetrix and a Nimblegen expression dataset, but datasets from additional platforms can be integrated. We have compiled an expression dataset on 9846 genes based on Affymetrix microarray experiments retrieved from the GEO similar to the Arabidopsis expression datasets (De Bodt et al., 2010). Moreover, we have included the genome-wide atlas of transcription during maize development generated using the Nimblegen platform representing 22 600 genes (Sekhon et al., 2011).
By default, the input data are GRMZM codes of your genes of interest. Other maize identifiers, such as PLAZA v2.0 identifiers (e.g. ZM08G15930) or Affymetrix probe identifiers (e.g. ZM.17362.s1_at), are also accepted as input (Proost et al., 2009; Van Bel et al., 2012). Additionally, CORNET allows orthologous gene identifiers from Arabidopsis using the AGI code (e.g. AT2G33610) or orthologous gene locus identifiers from rice (TIGR identifiers without the LOC_ prefix, e.g. Os02g10060). Note that, if an orthologous gene (from Arabidopsis and/or rice) is given and if it is part of a group of orthologous genes with a many-to-many relationship, all maize genes from this orthologous group will be taken into account. Arabidopsis and rice orthologs of the maize genes in the generated networks can be inspected through the Cytoscape attribute browser (see the Materials and Methods section).
All Arabidopsis PPIs were used as a source to computationally predict putative maize PPIs through interolog detection (Yu et al., 2004) (see the Materials and Methods section). Both experimentally identified PPIs as well computationally predicted interactions are used as sources. However, we would like to point out that maize interactions inferred from predicted Arabidopsis interactions can be untrustworthy and care should be taken in considering these data. In the prediction of interactions based on orthology between Arabidopsis and maize, we assume that protein functions are conserved between the two species. Owing to the relatively large evolutionary distance between Arabidopsis and maize, we acknowledge that this conservation might not always hold true. For instance, when no one-to-one orthology relationships can be determined, but rather many-to-many relationships between Arabidopsis and maize genes exist, species-specific duplications might result in independent functional divergence after duplication in both species, giving rise to a difference in interaction partners. Coexpression between the genes encoding interacting proteins can be used as support for the predicted interactions (De Bodt et al., 2009). Similar to Arabidopsis, interactions predicted using experimental source PPIs are represented by solid edges and interactions predicted using predicted source PPIs are represented by dashed edges.
Visualization of CORNET networks
For network visualization, the existing software Cytoscape was favored (Shannon et al., 2003) (Fig. 1) because its functionalities allow browsing and zooming into the constructed networks, a visual as well as textual representation of diverse attributes (e.g. correlation coefficient, localization databases) and further exploration and analysis of the networks. The network visualization legend can be viewed in Fig. 2, on the tool output page, the FAQ page and in the VizMapper tool in Cytoscape. The textual representation of the attributes can be displayed in the lower data panel of Cytoscape by clicking on the ‘select attributes’ button (see FAQ) and copied and/or exported in tabular format.
Case study 1: new PPIs and gene-gene associations
A protein–protein and gene–gene association network was built around KNOTTED-LIKE FROM ARABIDOPSIS THALIANA (KNAT1) or BREVIPEDICELLUS1 (BP1), a member of the class I knotted1-like homeobox gene family with a role in leaf and ovule development (see Fig. 3). This network demonstrates the great value of updating and expanding the PPI databases as well as the advantage of adding the computationally predicted gene–gene associations. The blue edges in Fig. 3 represent PPIs identified through yeast-two-hybrid experiments from the Arabidopsis Interactome Mapping consortium that have not been identified previously (Arabidopsis Interactome Mapping Consortium, 2011). Although, false-positive interactions might be present among these interactions as they were only identified by one experimental technique, most probably true, novel interactions are included. Moreover, the network is expanded by a considerable number of gene–gene associations (green edges in Fig. 3) that have been identified through AraNet (Lee et al., 2010). In this way, for instance, a number of homeobox transcription factors are added to the network through AraNet links with KNAT1 as well as with the experimentally identified BLH and KNAT interactors. However, no common functional role for KNAT1 and these homeobox genes is currently reported. In summary, this new version of CORNET allows the extension of a well investigated PPI network of KNAT and BLH proteins (black edges in Fig. 3) (for a global comparison between databases, see Table 2 and Fig. S1).
Case study 2: regulatory interactions
We have integrated the regulatory network with the coexpression network around MYB5 (Fig. 4). In a first step, all putative target genes as well as all putative regulators of MYB5 were retrieved using the TF tool. This search resulted in a regulatory network containing 31 genes and 55 regulatory links identified through micorarray experiments. Therefore, each target is assumed to be an indirect target, even though some direct targets could reside in the list of differentially expressed genes identified by comparing the wildtype with the transgenic microarray data. Consequently, all arrow points are shown as diamonds (Fig. 4). Moreover, regulatory links are shown as dotted arrows, as these links are computationally inferred. In a second step, we identified the coexpression between the genes in the MYB5 regulatory network by considering all predefined expression datasets available in the CORNET coexpression tool. A group of targets that show high coexpression (lower part of Fig. 4) can be distinguished from a group of genes that show little coexpression (upper part of Fig. 4). Strikingly, the majority of the latter genes, including MYB5, are also up-regulated upon knockout of SWINGER (SWN) and CURLY LEAF (CLF).
Knockout of MYB5 shows reduced seed coat mucilage and irregularly shaped seed coat epidermal cells (Gonzalez et al., 2009; Li et al., 2009). When both MYB5 and the homolog MYB23 are mutated, the number and size of trichomes are reduced and trichomes have fewer branches (Li et al., 2009). SWN and CLF take part in a large protein complex consisting of VERNALIZATION 2 (VRN2), VERNALIZATION INSENSITIVE 3 (VIN3) and FERTILIZATION INDEPENDENT ENDOSPERM (FIE). The complex has a role in establishing FLOWERING LOCUS C (FLC) repression during vernalization. In addition, SWN has a role in controlling seed initiation. Therefore, the integrated network could suggest a common role for MYB5 and SWN in seed development.
Case study 3: Arabidopsis and maize coexpression network
The development of CORNET for maize allows us to perform coexpression analysis in this crop species. In addition, it allows the user to construct corresponding coexpression networks in both Arabidopsis and maize. Fig. 5 displays the comparison of such coexpression networks. We have constructed a coexpression network around the LEAFY gene. This gene is a single-copy gene in many species and has a role in the transition from the vegetative to the flowering phase. Through orthology (OrthoMCL orthologs retrieved from PLAZA), we can identify two co-orthologs of the LEAFY gene in maize (Proost et al., 2009; Van Bel et al., 2012). These genes were described as ZFL1 (GRMZM2G098813) and ZFL2 (GRMZM2G180190) and have pleiotropic functions in reproductive development such as flower identity and patterning, similar to the Arabidopsis ortholog LEAFY (Bomblies et al., 2003). ZFL1 and ZFL2 are thought to be largely redundant in function, as only the double mutant plants show severe morphological defects (Bomblies et al., 2003). However, a quantitative trait locus study suggested that both genes are possibly evolving subtle differences in function through subfunctionalization (Force et al., 1999; Bomblies & Doebley, 2006). Similarly, the coexpression network of these two maize genes indicates a possible different role of these genes in maize development, as two coexpression clusters can be discerned. However, more detailed functional analyses are necessary to explain the biological importance of the observed structure of the coexpression network. For instance, the coexpression of LEAFY with CUP-SHAPED COTYLEDON3 (CUC3, AT1G76420) is conserved in maize (coexpression of the CUC3 ortholog (GRMZM2G430522) with ZFL1), while coexpression of LEAFY with CUC1 (AT3G15170) and CUC2 (AT5G53950) is not. As far as we know, the association between LEAFY and CUC genes has not been described before. This case study exemplifies the utility of the construction of Arabidopsis and maize coexpression networks (and PPI networks) in gene functional and evolutionary analysis as well as in translational research.
We have developed CORNET 2.0, a user-friendly tool for the construction and integration of coexpression, PPI, gene–gene association and regulatory interaction networks. The majority of interaction databases are covered, thereby providing the user with regularly updated data that can be used in the versatile searches in the three tools (coexpression tool, PPI tool and TF tool). Moreover, functional annotation data from numerous databases such as InterPro, GO, PO, MapMan, TAIR phenotype data, and PubMedIDs are compiled in Cytoscape to easily sift through the information available on the genes in the network. Moreover, Cytoscape LinkOut allows one to return to the original resources for further details. The comprehensive interface and the intuitive visualization provide the means to nonexpert users to build hypotheses on the role of one or more genes of interest, grasp the biological relevance of a group of genes and pinpoint putatively novel genes or associations involved in a biological process of interest.
As new types of data become available and cross-species analysis becomes more important in translation research, we foresee a number of possible extensions of the CORNET platform (Mochida & Shinozaki, 2010; Moreno-Risueno et al., 2010; Mochida & Shinozaki, 2011; Sucaet & Deva, 2011). First, cis-regulatory elements, which can be tightly linked with the coexpression results and the regulatory interactions, can be incorporated. Secondly, through comparative genomics approaches, coexpression networks constructed for Arabidopsis and maize, and possible other plant species, can be compared to detect conserved coexpression. Thirdly, the CORNET system can be further extended to enable the use of other types of transcriptomics data such as tiling array or next-generation RNA sequencing (RNA-seq) data when sufficient and diverse experiments have been performed to construct expression datasets for coexpression analysis. Finally, the CORNET platform can be further connected to other plant resources and software tools. For instance, numerous methodologies for the computational inference of gene regulatory networks exist. These predicted networks can also be integrated in CORNET. We foresee that molecular biologists make use of CORNET for hypothesis generation to guide their experiments. Results from such experiments can be used to validate the performance of the approaches taken in CORNET and will allow for optimalization of the methodologies (e.g. measures of coexpression, approaches for prediction of regulatory interactions) and parameter choice (e.g. correlation coefficient thresholds) used in CORNET. As such, CORNET can be continually improved and expanded.
We would like to thank Marijn Vandevoorde, Thomas Van Parys, Michiel Van Bel and Lieven Sterck, Klaas Vandepoele and Ken Heyndrickx for assistance and helpful suggestions. We are grateful to the MIND consortium, the Arabidopsis Interactome Mapping Consortium and Alan Jones for providing their interactome data. This work was supported by grants from Ghent University (‘Bijzonder Onderzoeksfonds Methusalem project’ no. BOF08/01M00408), the Interuniversity Attraction Poles Programme (IUAP VI/25 (BioMaGNet) and VI/33), initiated by the Belgian State, Science Policy Office, the European Union 6th Framework Programme (‘AGRON-OMICS’, LSHG-CT-2006-037704), and the Research Foundation-Flanders (postdoctoral fellowship to S.D.B.).