Arabidopsis microarray data was retrieved from the Gene Expression Omnibus (GEO) database for CORNET 1.0. For the compilation of the Affymetrix Maize Genome Array expression dataset (GPL4032), we downloaded 24 experiment series (GSE21070, GSE8188, GSE15048, GSE8278, GSE8275, GSE10023, GSE8194, GSE7030, GSE16567, GSE19501, GSE8308, GSE10237, GSE10236, GSE8320, GSE8179, GSE11531, GSE8174, GSE22479, GSE8176, GSE18491, GSE15371, GSE12892, GSE12770, and GSE10243) from the GEO database. These series contain 128 unique experiments involving cis-transcriptional variation studies in different inbred lines, expression profiling of mutants, nonadaptive and imprinted gene expression, expression profiling of different tissues and different treatments. The Affymetrix Genome array (GPL4032) contains 17 555 probesets for six different maize strains (B73, Ohio43, W22, W23, and Black Mexican Sweet). Approximately 13 000 probe sets for maize B73 were considered for the construction of a custom Chip Description File (CDF). In this way, we found that 9846 genes can be measured reliably by the microarray. For the Nimblegen Maize Whole-Genome Microarray 385K expression dataset (GPL12620), data from the 60 different experimental conditions were retrieved from the GEO database (Sekhon et al., 2011). The Nimblegen Whole-Genome Microarray 385K (GPL12620) contains 80 301 probe sets representing 22 600 genes. This comprehensive atlas contains global transcription profiles across developmental stages and plant organs of maize (60 distinct tissues representing 11 major organ systems of inbred line B73 (germinating seed, root, whole seedling, shoot apical meristem (SAM) and young stem, internodes, cob, tassel and anthers, silk, leaves, husk, seed)).
The microarray data are processed with the Robust Multi-array Average (RMA) procedure implemented in BioConductor (Irizarry et al., 2003a,b; Gautier et al., 2004; Gentleman et al., 2004). In the previous version of CORNET, the CDF tinesath1cdf was used based on TAIR7 genome annotation (Casneuf et al., 2007). In the current version, an up-to-date CDF based on TAIR10 and provided by Brainarray (TAIR10 genes – v14; brainarray.mbni.med.umich.edu) is used to define probe set–gene relations. For the Affymetrix microarray dataset of maize, we have used a similar pre-processing strategy as for Arabidopsis. A custom CDF was constructed in this case. In a first step, all probes were mapped against the reference genome (B73, version 4.53) using the software Exonerate allowing 0 or 1 mismatch and no gaps (Slater & Birney, 2005). In a second step, all probes that mapped to more than one gene were removed. In a last step, only probe sets with more than nine probes were retained.