• Crohn's disease;
  • ulcerative colitis;
  • autoimmune disease;
  • microarray;
  • microRNA


  1. Top of page
  2. Abstract
  6. Acknowledgements
  8. Supporting Information


Inflammatory bowel disease (IBD) is a complex disorder involving pathogen infection, host immune response, and altered enterocyte physiology. Incidences of IBD are increasing at an alarming rate in developed countries, warranting a detailed molecular portrait of IBD.


We used large-scale data, bioinformatics tools, and high-throughput computations to obtain gene and microRNA signatures for Crohn's disease (CD) and ulcerative colitis (UC). These signatures were then integrated with systemic literature review to draw a comprehensive portrait of IBD in relation to autoimmune diseases.


The top upregulated genes in IBD are associated with diabetogenesis (REG1A, REG1B), bacterial signals (TLRs, NLRs), innate immunity (DEFA6, IDO1, EXOSC1), inflammation (CXCLs), and matrix degradation (MMPs). The downregulated genes coded tight junction proteins (CLDN8), solute transporters (SLCs), and adhesion proteins. Genes highly expressed in UC compared to CD included antiinflammatory ANXA1, transporter ABCA12, T-cell activator HSH2D, and immunoglobulin IGHV4-34. Compromised metabolisms for processing of drugs, nitrogen, androgen and estrogen, and lipids in IBD correlated with an increase in specific microRNA. Highly expressed IBD genes constituted targets of drugs used in gastrointestinal cancers, viral infections, and autoimmunity disorders such as rheumatoid arthritis and asthma.


This study presents a clinically relevant gene-level portrait of IBD subtypes and their connectivity to autoimmune diseases. The study identified candidates for repositioning of existing drugs to manage IBD. Integration of mice and human data point to an altered B-cell response as a cause for upregulation of genes in IBD involved in other aspects of immune defense such as interferon-inducible responses. (Inflamm Bowel Dis 2012;)

Inflammatory bowel disease (IBD) is a debilitating illness associated with the altered regulation of the gastrointestinal mucosal immune system leading to intestinal inflammation in the presence of native luminal flora. Nearly 1.5 million people in the United States are affected by IBD and the incidence and prevalence of the disease continues to increase internationally.1, 2 IBD is comprised of two distinct phenotypes; ulcerative colitis (UC) and Crohn's disease (CD), each of which have unique clinical manifestations despite sharing many genetic and consequently mechanistic features.3–5 Although the exact pathophysiology of IBD is not yet fully understood, the etiology of this disease is known to be multifactorial, driven by a number of genetic and environmental factors including loss of regulation of the host's innate immune response and defects in mucosal barrier function.6

There appears to be a correlation between IBD and autoimmune disorders. Over a quarter of IBD patients have been known to experience musculoskeletal complications resulting from various subtypes of arthritis.7, 8 Moreover, microRNA (miRNA) expression changes have high similarity in autoimmune disorders, including IBD.9 The complex host–pathogen interactions within the colon are also known to play a vital role in the pathogenesis of IBD and drive differential gene expression. Recent studies of the intestinal microbiome have shown a shift in the composition of bacterial populations in patients with UC and CD and the involvement of bacterial–host interactions in the pathogenesis of IBD.10–17 However, it still remains unclear how resident bacteria interact with the host to activate and sustain chronic activation of the intestinal immune system through genetic regulation.

A system approach has already emerged in the literature for studying IBD within the context of autoimmune disorders. A comprehensive first draft of the human disease network (HDN) was constructed recently using the disease gene information in the literature.18 This and other system models19–24 identified disease–disease associations not readily observed in the disease–phenotype-based Medical Subject Headings (MeSH) disease classification tree. Some of these models utilized text search-based data whereas others based the draft network on gene lists identified via the literature and genome-wide association studies (GWAS).25, 26 A number of other disease network drafts utilized protein–protein interaction networks in addition to gene lists to identify gene–disease modules.27, 28 In all cases, however, the starting point is a set of gene lists specific for each disease. Hu and Agarwal29 obtained such lists utilizing high-throughput microarray analysis of thousands of genomic expression profiles existent in the literature. Their draft of the human disease–drug network brought out new possibilities for further investigation such as the use of malaria drugs in treating CD and also showed the potential of known fold changes in transcript copy numbers in guiding the type of drugs (agonists, antagonists) to be targeting druggable proteins and pathways.

The present study benefits from the previous work on the HDN and studies of global gene and miRNA expression profiling of IBD. It presents gene signature analysis of IBD subtypes UC and CD in terms of altered pathways, biological processes, disease correlations, transcripts, and possible drug targets, all determined via bioinformatics analysis of integrated microarray and miRNA datasets with open-access datasets. Gene lists derived from microarray data have the advantage of having features such as statistical significance and fold changes to be used for additional annotation. However, previous disease and drug network modeling studies suggest potentially high levels of noise in predictions, the extent of which depends on the accuracy of gene lists used to represent the specific diseases.29 In that regard, our previous meta- and large-scale microarray analyses of cancer tissue datasets from hundreds of different laboratories point to accurate replication of nonmicroarray literature with microarray data.30 Our gene lists derived from integrated datasets intersected with the shorter gene lists produced by Noble et al31, 32 with highly significant P values in a hypergeometric test. Moreover, cellular pathways statistically enriched by Noble et al gene lists turned out to be a subset of our predicted pathways. Our predictions of IBD-associated genes captured nonmicroarray IBD research literature with significant P values. Taken together, these results illustrate the predictive potential of microarray datasets in identifying genes associated with CD and UC subtypes of IBD. The resulting gene lists come with statistical significance, rank, and fold changes, parameters that are invaluable in biomarker discovery and drug repositioning and have been included in the Supporting files.


  1. Top of page
  2. Abstract
  6. Acknowledgements
  8. Supporting Information

Microarray Datasets, Normalization, and Probe to Gene Mapping

The microarray datasets used in this study are all in the public domain and were generated from RNA isolated from the inflamed colonic mucosa of UC and CD patients and hybridized onto the Affymetrix Human Genome U133 Plus 2.0 GeneChip Array platform for analysis. A total of 176 colonic mucosal microarray dataset samples were obtained from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO)33, 34 in raw CEL format. Following the removal of duplicate samples, sample outliers were removed using principal component analysis (PCA).35 Two of the principal components accounted for 91% of the variance within the dataset. There were a total of 15 outliers detected and removed from further analysis. The final sample population is comprised of 29 healthy control (HC) samples, 48 UC samples, and 41 CD samples from six independent studies (GSE9452,36 GSE9686,37 GSE10191,38 GSE10616,39 GSE13367,40 and GSE1687941). Gene expression datasets were normalized using robust multiarray averaging (RMA) background adjustment, quantile normalization with median polishing, and log2 transformed summarization procedures within MatLab42–44 and were mapped to Entrez gene IDs using a custom CDF file during the normalization process.45, 46 Following preprocessing, batch effect removal was performed using the COMBAT source code in R.47 Principal component analysis of the samples was performed in MATLAB in order to visualize sample clustering and variance (figure 1). It was found that two principal components accounted for 52% of the variation within the dataset.

thumbnail image

Figure 1. Scatterplot of principal components for microarray data. The graph was obtained for UC, CD, and HC samples following removal of duplicates and outliers, Robust Multichip Average normalization; and subsequent removal of batch effects, as described in Materials and Methods. The ellipse was constructed around the centroid of each data cluster, with the semimajor and semiminor axis equal to two standard deviations of the data in x and y, respectively.

Download figure to PowerPoint

Significant Genes

Two distinct microarray data analysis methods—significance analysis of microarrays (SAM)48 and the rank product (RP) method49, 50 representing meta analysis—were performed in MatLab and R (2.10.0), respectively, in order to identify statistically significant upregulated and downregulated genes in UC/HC and CD/HC comparisons. In SAM, statistically significant genes were determined using a fold change (FC) and P-value cutoff of 1.5 and 0.001, respectively, over 1000 permutations.48 For the RP method, only those genes with an enrichment P-value less than 10−6 were identified as significant.

Comparison with Gene Lists Obtained Using Other Microarray Platforms

The significant gene lists determined in this study were compared with corresponding lists obtained by Noble et al31, 32 comparing UC and CD biopsies to controls on the Agilent platform. In addition to the Agilent annotation file, NCBI, DAVID,51, 52 Clone/Gene ID Converter,53 and Source54 databases were used to map the Agilent probe IDs to their corresponding Entrez gene IDs. The Agilent platform mapped to 18,989 genes, of which 17,211 were also present on the Affymetrix array. The CD and UC gene lists identified from our analysis were then compared to the two gene lists obtained from Noble et al. A hypergeometric test was used to determine the significance of the intersections in terms of P-value.

Literature Citation Algorithm

Differentially expressed gene lists were used in an automated PubMed text search algorithm.55 Briefly, a query of the gene symbol and the applicable search term was conducted in PubMed abstracts for all genes available on the Affymetrix HG-U133 Plus 2.0 platform, limiting results to nonmicroarray literature. The six gene lists were then annotated with these results, identifying those genes that were cited in relation to the particular search terms. Random gene lists from the same platform of equal size to the lists under consideration were obtained and used as a control. The number of related genes in each of the random iterations was determined, and the mean and standard deviation were calculated from these values to obtain the parameters of a normal distribution. The expected value and the standard deviation were then used to compute the P-values for the significant association of each of our cancer gene lists with the known nonmicroarray literature.

Functional Enrichment

The physiological pathways and biological processes implicated in IBD were determined through the functional enrichment of significant gene lists using DAVID.51, 52 The Kyoto Encyclopedia of Genes and Genomes (KEGG) and BioCarta pathway enrichment of the differentially expressed gene lists as well as for gene lists identified by Noble et al were determined using a P-value cutoff of 0.01. KEGG color was used to generate enriched pathway images for a few selected enriched pathways.56 Gene ontology (GO) biological process enrichment was performed using GO level 3 and a P-value cutoff of 5*10−5.

Identification of Known Drug Targets

In order to identify significant genes that encode for known therapeutic targets, differentially expressed genes for both UC and CD (Union, Table 2) were first converted to UNIPROT accession numbers using DAVID51, 52 and then queried against a CHEMBL database.57 A histogram of the targeted differentially expressed genes is presented in Figure 5 along with their respective fold change in UC and CD and the number of therapeutics targeting each significant gene product.

Gene – miRNA Interactome

Previous research has identified significant differentially expressed colonic miRNAs for both active Crohn's colitis and active UC relative to healthy controls.58–62 Overall, there were found to be 26 and 33 differentially expressed miRNA extracted from relevant literature for UC and CD, respectively. The corresponding validated miRNA targets were found using miRWalk.63 Targeted messenger (m)RNAs were then intersected with our list of differentially expressed genes for IBD to find interactions in which miRNAs were upregulated and targeted mRNAs were downregulated and vice versa.

Disease Network Related to IBD

Statistically significant genes identified within this study were queried against the Genetic Association Database (GAD) to find associations of identified genes with other MeSH disease terms as annotated in published, peer-reviewed genetic association studies such as GWAS.26 The disease list was then further limited to include only those MeSH disease terms with the parent term “Autoimmune Diseases” in order to see linkages between differentially expressed genes within IBD and autoimmune disorders. The node size shown in the resulting network diagram is proportional to the degree of edges associated with each node.


  1. Top of page
  2. Abstract
  6. Acknowledgements
  8. Supporting Information

In this study we integrated microarray data obtained from multiple laboratories (Table 1a) along with other publicly available data and various knowledge bases to better define the transcriptional regulation of UC and CD and discover genetic commonalities with other diseases. Each unique patient sample was hybridized onto the Affymetrix Human Genome U133 Plus 2.0 GeneChip Array platform, which contains 54,675 oligonucleotide probes, covering over 47,000 human transcripts mapped onto 17,778 NCBI Entrez gene IDs.45 These samples were merged into one dataset and normalized together prior to subsequent analysis to reduce laboratory-specific noise and increase statistical power. Significant genes were identified using both SAM and RP (Table 1b) methods. The significant gene lists were then intersected, resulting in 1229 upregulated and 828 downregulated UC/HC genes, 1050 upregulated and 539 downregulated CD/HC genes (Table 1b). From these four lists, significant genes common to both UC and CD (common genes) and exclusive to either UC or CD (exclusive genes) were identified for each phenotype by finding the intersection and unique elements of each list. The resulting six gene lists were then used for subsequent analysis. The top 200 up- and downregulated gene lists for UC/HC and CD/HC comparisons are provided as Supporting File 1. Table 2 shows the top 40 genes present within each of the six comparisons, as ranked by fold change. Since mucosal tissue is a composite, multiple cell types (enterocytes and inflammatory cells) contribute to the altered regulation of genes in these lists and provide a system level perspective of transcriptional regulation.

Table 1. Overview of the Microarray Datasets Used in the Study
GEO Accession No.PMIDHCUCCDTotal
  1. a) Corresponding NCBI GEO accession numbers and the PubMed citation ID (PMID) as well as the numbers of HC, UC, and CD samples within each dataset. Duplicate samples and outliers were removed from the final sample set used for subsequent analysis. B) Summary of results of the microarray data analysis as the numbers of significant genes obtained for UC/HC, CD/HC, and UC/CD comparisons using merged SAM and Rank Product methods.

Subtotal 516362176
Duplicates 19131143
Outliers 321015
Total 294841118
Microarray Data Analysis
SAM upregulated12621063
SAM downregulated828539
RP upregulated13151237
RP downregulated1044867
SAM ∩ RP upregulated12291050
SAM ∩ RP downregulated828539
Exclusive up26485
Exclusive down34960
Common up965 
Common down479 
Table 2. IBD Signatures Obtained from Microarray Analysis
Significant Genes Common to UC and CD
GeneUC P-valueCD P-valueUC FCCD FCGeneUC P-valueCD P-valueUC FCCD FC
Ulcerative ColitisCrohn's Disease
Exclusive UpregulatedExclusive DownregulatedExclusive UpregulatedExclusive Downregulated
GeneP-valueFCGeneP-valueFCGeneP-valueFCGene SymbolP-valueFC
  1. a) The top 40 significantly genes common in UC/HC and CD/HC comparisons ranked by average fold change (FC). B) The top 40 genes uniquely expressed in either UC or CD.


Significant Genes in UC and CD

Table 2A shows nearly 40-fold upregulation of pancreatic cancer associated64 regenerating islet-derived growth factor genes REG1A and REG1B as the most highly upregulated genes in both phenotypes of IBD. REG1A is a biomarker for celiac disease65 and was also linked to beta cell generation and damage in Type I diabetes.66 Moreover, REG1A and REG1B are known to induce CXCL6 expression in cancer.67 Consistent with this observation, CXCL6 is also highly upregulated in our list. Approximately 20% of the top 40 commonly upregulated genes in IBD with highest fold rank are CXC chemokines involved in inflammation, including CXCL13, interleukin (IL)-8, CXCL6, CXCL11, CXCL9, and CXCL5. The top chemokine in this list, CXCL13, promotes the migration of B lymphocytes to the tissue and is associated in the literature with systemic lupus,68 prostate cancer,69 and rheumatoid arthritis.70 The second most highly expressed chemokine in IBD is IL-8 (CXCL8), one of the major mediators of the inflammatory response. IL-8 functions as a chemoattractant and is also a potent angiogenic factor. The third most highly upregulated gene in our list, S100A8, codes a soluble IBD biomarker protein calprotectin.71–74 Altered expression of this protein is also linked to cystic fibrosis.75 Other genes involved in inflammatory processes such as CHI3L1, ANXA1, TNIP3, and BCL2A are also highly upregulated in both phenotypes of IBD. The matrix metalloproteinase (MMP) family proteins are abundant among the top 40 most highly upregulated genes in IBD. MMP proteins partake in the breakdown of extracellular matrix in biological processes and in disease events, such as arthritis and metastasis.74, 76, 77 Genes involved in the host defense, antimicrobial enzyme IDO1, exosome component PI3, and defensin DEFA6, were found to be highly expressed in both IBD phenotypes. Thus, top upregulated genes in IBD relate to processes such as diabetogenesis, innate immunity, chemokine-induced inflammation and adapted immunity, and breaking down of the extracellular matrix.

The gene in IBD with the most diminished expression is CLDN8, encoding a claudin tight junction protein. Its expression is downregulated in colorectal and renal carcinoma.78 Diminished expression of CLDN8 in IBD suggests a breakdown in the physical barrier preventing solutes and water from passing freely between epithelial cell sheets. Several other downregulated genes in the list are also associated with the control of water and ion flux in intestinal epithelia: the solute transporter genes SLC16A9, SLC16A1, SLC30A10, and SLC3A1; and the organic solute transporter OSTα. These genes are all markers of enterocyte differentiation and indicate that IBD is associated with alterations in differentiated cell function.

Differentially expressed genes unique to each phenotype were also identified for both UC and CD (Table 2b). The most upregulated gene exclusive to UC codes for the antimicrobial defensin protein DEFA5. Defensins are abundant in the epithelia of mucosal surfaces of the intestine, respiratory tract, urinary tract, and vagina.79 Other genes, which are highly expressed in UC but not in CD, include protease and protease inhibitors (SERPINB3, SERPINB4, SERPINB7, ASRGL1, KLK10, CTSE, and ADAM9), transporter genes (SLC6A2, SLCO1B3, ABCA12); antiinflammatory ANXA1; and the zinc binding glycoprotein encoding gene AZGP1P1. Several immunoglobulin heavy and light chain variable region encoding genes such as IGHV1-46, IGHV1-69, IGHV2-5, IGLV3-25, IGHV3-73, and IGHV4-34 are also highly expressed in UC but not in CD. Genes with high expression in CD but not in UC include oxireductase DUOX2 and DUOXA2, the protease TIMP1, and the abundant neutrophil granulocyte protein encoding gene TCN1. Whether such genes can be used as biomarkers differentiating UC from CD will require further investigation using standard assays in stool samples and rectal swabs.

The genes with diminished expression in CD but with typical expression in UC include the water channel protein AQP8, also in the bile secretion pathway. This protein facilitates the diffusion of GPX2, a radical scavenger. The downregulation of Aqp8 in IBD mouse models was related to defending against severe oxidative stress.80 Other genes downregulated exclusively in UC code for the cytosolic enzyme PCK1 involved in the regulation of gluconeogenesis; the basement membrane protein coding gene LAMA1; and the two solute transporter genes SLC26A2 and SLC39A5. Although downregulated genes are not optimal as biomarkers, nevertheless, when statistically enriched on signaling pathways, these genes can point to the molecular mechanisms of deregulation in IBD. The gene NR1H4 (FXR) is diminished in expression in UC (but not in CD). This gene codes a bile-dependent transcription factor in the bile secretion pathway. Its downregulation may be responsible for the presence of excessive lipids in the stools of individuals with UC. Other genes poorly expressed in UC include those coding TGFB receptor ACVR1C; tyrosine phosphatase PTPRO, a tumor suppressor; and IL1R2, a cytokine receptor that belongs to the interleukin 1 receptor family.

Validation of Microarray Predictions Using Independent Data

We evaluated the statistical significance of our gene lists in comparison to other published gene lists derived from microarray studies not used in our analysis since these datasets were generated on a different platform: Agilent-012391 Whole Human Genome Oligo Microarray G4112A. Comparison of our gene lists to previously published significant gene lists provided by Noble et al31, 32 showed a high degree of overlap with significant P-values for each comparison based on a hypergeometric test (Table 3a). The results shown in Table 3a point to a high degree of match between upregulated gene lists obtained using different platforms. The intersection for downregulated genes is smaller, although still statistically significant.

Table 3. Comparison of UC and CD Signatures to Independent Data
Top 501.30E-113.00E-036.95E-141.89E-04
Top 100< E-508.32E-03< E-509.80E-04
Top 2004.44E-161.74E-02< E-503.00E-03
Top 5002.00E-143.70E-04< E-503.00E-02
All3.50E-141.00E-04< E-503.00E-02
Search TermsList NameP-valueList ArticlesSearch ArticlesArticle Overlap
  1. a) Statistics for the intersection of the results from our study with Noble et al.31, 32 b) Literature search results using the top 200 genes from each of the six significant gene lists (up UC, down UC, up CD, down CD, common up, and common down).

Cancer NOT microarrayUC_DOWN2.62E-031939294678
Cancer NOT microarrayUC_UP2.62E-0968801946279
Cancer NOT microarrayCD_UP2.61E-1065790946298
Cancer NOT microarrayCD_DOWN3.11E-041544394669
Crohn's disease NOT microarrayUC_UP2.62E-0968801946279
Crohn's disease NOT microarrayCD_UP2.61E-1065790946298
Crohn's disease NOT microarrayUC_DOWN2.62E-031939294678
Crohn's disease NOT microarrayCD_DOWN3.11E-041544394669
Ulcerative colitis NOT microarrayCD_UP2.61E-1065790946298
Ulcerative colitis NOT microarrayUC_DOWN2.62E-031939294678
Ulcerative colitis NOT microarrayUC_UP2.62E-0968801946279
Ulcerative colitis NOT microarrayCD_DOWN3.11E-041544394669

We also determined the enrichment of our top 200 significantly altered gene lists (Supporting File 1) within the relevant nonmicroarray literature. Table 3b shows that our predicted genes are highly statistically enriched within the cancer literature, establishing a possible link between IBD and colorectal cancer. The match between our IBD upregulated gene list and nonmicroarray IBD literature is also good, with significant P values. The match between IBD research literature and our downregulated gene lists lead to higher but still significant P values; a trend also observed when directly comparing our significant gene lists with those obtained by Noble et al.

Activated Pathways and Processes in IBD

Commonly upregulated pathways involved in the pathogenesis of IBD (CD and UC) include cytokine–cytokine interaction, chemokine signaling, intestinal immune network for IgA production, adhesion and diabedesis of lymphocytes and granulocytes, cell adhesion molecules, and complement and coagulation cascades (Table 4). Other upregulated pathways included the NOD-like receptor signaling pathway and the Toll-like receptor pathway. The NOD-like receptor pathway shown in Figure 2a relates bacterial effects on the cell surface (bacterial peptidglycans, pore-forming toxins, and other bacterial secretion systems) to expression changes of chemokines and cytokines, which subsequently activate innate and adapted immunity. The nodes upregulated in both UC and CD are shown in yellow, those upregulated in CD and UC only shown in red and brown, respectively. The Toll-like pathway shown in Figure 2b relates the recognition of conserved microbial components by Toll-like receptors to subsequent cellular events, which ultimately lead to inflammation and activation of sentinel immune cells. The genes CASP1, CASP5, and CARD6 are highly upregulated in both UC and CD along the NOD receptor signaling. The genes TLR1 and TLR2 are highly expressed in lipopolysaccharide (LPS) interactions in both phenotypes in the Toll-like pathway. The biological processes associated with these pathways are also significantly enriched within GO biological processes including immune defense response, cellular response to wounding, leukocyte activation, cell proliferation, and response to molecules of bacterial origin, among others (Fig. 3).

thumbnail image

Figure 2. Examples of cellular pathways statistically enriched with significantly upregulated genes in IBD. (a) Nod-like receptor signaling pathway. (b) Toll-like receptor signaling pathway. (c) The PPAR signaling pathway. The nodes shown in orange, red, and yellow indicate significantly upregulated genes in UC/HC, CD/HC, and (UC U CD)/HC comparisons, respectively.

Download figure to PowerPoint

thumbnail image

Figure 3. Gene ontology biological processes (level 3) statistically enriched by significantly upregulated and downregulated genes for UC and CD.

Download figure to PowerPoint

Table 4. KEGG and BioCarta Cellular Pathways Statistically Enriched by Differentially Expressed Genes in UC/CD and CD/HC Comparisons
PathwayUC UPCD UPNoble UC UPNoble CD UP
Natural killer cell mediated cytotoxicity 9.0E-03 3.5E-03
Lck and Fyn tyrosine kinases in TCR Activation8.2E-03   
Costimulatory signal during T-cell activation6.9E-03   
Prion diseases4.5E-038.4E-03  
Antigen Processing and Presentation2.1E-036.1E-03 1.8E-08
Antigen processing and presentation2.1E-036.1E-03 1.8E-08
NOD-like receptor signaling pathway4.7E-036.0E-04  
IL-10 antiinflammatory signaling pathway 1.8E-03  
Toll-like receptor signaling pathway1.7E-034.9E-04  
T cytotoxic cell surface molecules2.4E-041.3E-03  
T helper cell surface molecules2.4E-041.3E-03  
Autoimmune thyroid disease5.1E-048.2E-04 6.6E-08
Focal adhesion6.4E-04   
B cell receptor signaling pathway2.1E-046.2E-04  
Primary immunodeficiency5.3E-051.1E-04  
Systemic lupus erythematosus1.5E-042.5E-06 4.8E-03
Viral myocarditis1.2E-048.3E-067.3E-036.3E-08
Type I diabetes mellitus2.7E-051.4E-061.6E-034.9E-10
Allograft rejection2.6E-051.3E-061.0E-031.1E-10
local acute inflammatory response2.1E-057.3E-07  
ECM-receptor interaction3.3E-071.4E-05  
B lymphocyte cell surface molecules7.8E-063.9E-06  
Chemokine signaling pathway1.8E-068.7E-063.7E-032.0E-04
Adhesion molecules on lymphocyte3.9E-062.1E-06  
Neutrophil and its surface molecules3.9E-062.1E-06  
Adhesion and diapedesis of granulocytes4.1E-066.2E-093.1E-038.4E-04
Hematopoietic cell lineage3.3E-062.7E-08  
Leukocyte transendothelial migration4.3E-072.3E-06  
Graft-versus-host disease9.4E-073.2E-081.3E-032.4E-10
Complement and coagulation cascades1.8E-075.0E-09  
Intestinal immune network for IgA production1.1E-072.3E-08  
Monocyte and its surface molecules7.4E-083.3E-08  
Adhesion and diapedesis of lymphocytes7.8E-098.7E-08  
Cytokine-cytokine receptor interaction6.2E-101.1E-105.2E-045.6E-04
Cell adhesion molecules (CAMs)6.5E-123.0E-11 1.5E-06
  1. Pathways enriched by upregulated and downregulated genes are shown in a) and b), respectively, along with associated P-values.

Arginine and proline metabolism7.0E-03 
Tryptophan metabolism5.4E-03 
Nuclear receptors in lipid metabolism and toxicity5.1E-03 
Drug metabolism7.0E-038.2E-04
Mitochondrial carnitine palmitoyltransferase (CPT) system3.7E-046.0E-03
Starch and sucrose metabolism 2.2E-03
Selenoamino acid metabolism1.7E-03 
Metabolism of xenobiotics by cytochrome P450 6.6E-04
Valine, leucine and isoleucine degradation3.3E-074.8E-04
Nitrogen metabolism1.9E-047.0E-05
PPAR signaling pathway4.9E-051.1E-05
Propanoate metabolism4.1E-06 
Butanoate metabolism4.1E-06 
Fatty acid metabolism4.3E-093.8E-07

Drug metabolism pathway is downregulated in both UC and CD (Table 4). Drug metabolism genes ADH1C, ADH6, CYP2B6, FMO5, GSTA1, GSTM4, MAOA, UGT1A7, UGT2A3 are all downregulated in IBD compared to healthy controls. A highly related pathway, the metabolism of xenobiotics by cytochrome p450 is downregulated in CD. The detox enzymes with diminished expression in this pathway include the drug metabolism genes ADH1C, ADH6, UGT1A7, UGT2A3 as well as genes CYP2B6 and CYP2S1 that code enzymes to metabolize anti-cancer drugs, gene EPHX1 for epoxide enzyme, and genes GSTA1 and GSTM4 coding enzymes involved in the detoxification of electrophilic compounds, including carcinogens, therapeutic drugs, environmental toxins, and products of oxidative stress.

Table 4 shows that peroxisome proliferator-activated response (PPAR) signaling as well as nitrogen metabolism are diminished in IBD. Genes downregulated in the nitrogen metabolism pathway consist of zinc metalloenzymes that catalyze the reversible hydration of carbon dioxide (CA1, CA2, CA4, CA7), cytoplasmic enzyme CTH, and the K-type mitochondrial glutaminase GLS. The PPAR signaling pathway clearly demonstrates the downregulation of the transcription factor PPARγ within adipocytes leading to the downregulation of lipid metabolism through a variety of interactions (Fig. 2c). The attenuated metabolism of lipids is also represented within the enriched GO biological processes (Fig. 3).

Autoimmune Diseases Related to IBD

A number of KEGG pathways related to other autoimmune diseases are identified in Table 4 as being statistically enriched with IBD genes, including type 1 diabetes mellitus, systemic lupus erythematosus, autoimmune thyroid disease, and primary immunodeficiency. For this reason we investigated the involvement of identified significant genes related to IBD and other autoimmune diseases using GAD. This database provides annotation of genes associated with specific diseases. We found 280 differentially expressed genes in IBD in our gene list known to be associated with 35 different autoimmune and IBD related MeSH disease terms through 918 unique associations in (Fig. 4). The three largest disease nodes in Figure 4 (nodes with the highest connectivity to IBD genes) are type 1 diabetes mellitus, rheumatoid arthritis, and systemic lupus erythematosus.

thumbnail image

Figure 4. Interactome map of autoimmune diseases. The map was obtained by projecting IBD significant gene list onto gene lists of autoimmune diseases (obtained from the Genetic Association Database). Diseases are colored in green and genes are colored in pink. The size of each node corresponds to the number of interactions with other nodes, with larger nodes having more interactions.

Download figure to PowerPoint

IBD genes found to be associated with autoimmune diseases include the upregulated leukocyte antigen genes HLA-DMB, HLA-DOB, HLA-DPB1, HLA-DQB1, HLA-DRB4, and HLA-F. In addition to the aforementioned genes common with IBD, lupus and IBD share central pattern recognition proteins such as C1QA, C1C1QB, C1QC, C1R, and C1S. Recent research implicated hereditary C1Q deficiency as associated with systemic lupus erythematosus and increased susceptibility to bacterial infections.81 Other genes common to lupus include the ADA gene coding an enzyme deficiency that causes immunodeficiency disease, and the genes FCGR2B, FCGR3B code for proteins regulating the antibody production by B cells.

Identification of Known Drug Targets with Altered Gene Expression In IBD

The significant gene lists were projected onto genes with UNIPROT accessions in ChEMBL, the database for bioactive drug-like small molecules.57 Those genes with a minimum fold change of 2.0 are presented in Figure 5, while the entire list of therapeutic targets is available in Supporting File 2. The figure shows gene targets along with their respective fold change in UC and CD relative to HC. The majority of the small-molecule targeted genes encode cell surface receptors, oxidoreductase, G-protein coupled receptors, and kinases. Among the drug targets with the largest fold change expression are MMP3, MMP1, IL-8, PLAU, PTGS2, and CXCR4. Drugs targeting MMP3 include nonspecific MMP inhibitor with poor performance in clinical trials.82 Doxycycline, a tetracycline antibiotic, is known to inhibit MMP activity and is used clinically for the treatment of periodontal disease. IL-8 is an antibody therapeutic target in inflammatory diseases83 but with mixed results stemming from side effects. The small molecule drugs with activity on IL-8 include diclofenac, ibuprofen, indomethacin, and tolmetin. The drug target PTGS2 is known to play a role in promoting colon cancer.84 Nonsteroidal antiinflammatory drugs targeting PTGS2 appear to inhibit the proliferation of cultured hepatocellular cancer cells.85 CXCR4 is a receptor playing a role in immune system signaling between cells and is involved in the growth of tumors. It is a drug target in metastatic lung cancer.86 These observations indicate the challenges of repositioning existing drugs targeting IBD genes with highly elevated expression.

thumbnail image

Figure 5. IBD genes targeted by bioactive small molecules. The number of therapeutics targeting each gene is presented in parentheses next to the gene symbol on the y-axis. The fold change of each differentially expressed gene is also shown for both UC and CD.

Download figure to PowerPoint

Drugs with activity on CXCr proteins include ibuprofen, naproxen, indoprofen, and ketoprofen. Drug targets with negative fold changes include carbonic anhydrases CA1 and CA2 and PPARγ. Existing drugs such as acetazolamide (Diamox) and methazolamide (Neptazane) lead to the inhibition of carbonic anhydrase and as such appear unsuitable for the treatment of IBD. PPARγ is another drug target with lower expression in IBD. This gene encodes a member of the PPAR subfamily of nuclear receptors regulating transcription of genes implicated in the pathology of numerous diseases including obesity, diabetes, atherosclerosis, and cancer. Small drugs with activity against PPARγ including fenofibrate, pioglitazone, and others have found use in treating Type II diabetes. Drug targeted proteins VCAM1, ICAM1, ITGAL, TNF, and IL-8 map into the acute inflammatory pathway, whereas the leukocyte migration pathway contains drug targets VCAM1, ICAM1, ITK, ITGAL, CXCR4, MMP9, and MMP2. Combination therapies designed for IBD with existing drugs will be limited by the poor drug metabolism associated with this disease.

Gene – miRNA Interactome

A total of 26 and 33 miRNA are known to be differentially expressed within active UC and CD, respectively, as compared to healthy controls.58–62 Of these miRNA, 20 were shown to interact with 90 significant genes for UC (Fig. 6a) and 19 miRNA were shown to interact with 44 significant genes for CD (Fig. 6b). Gene interactions with miRNA in which both were either upregulated or downregulated were not considered for our analysis since these interactions defy the understood paradigm of miRNA–mRNA interactions.87 We identified nine miRNA that are coexpressed within UC and CD including miR-126, miR-126*, miR-127-3p, miR-155, miR-21, miR-29b, miR-31, miR-324-3p, and miR-375. Each miRNA is upregulated in both phenotypes except miR-375, which is reported to be downregulated in UC59 and upregulated in CD.61 There were found to be 17 and 8 genes within the network that are targets for bioactive small molecules for UC and CD, respectively. The figure indicates a widespread downregulation of significant genes due to the upregulation of several miRNA.

thumbnail image

Figure 6. Micro-RNA – Transcript Interactome Map. The figure illustrates the possible regulation patterns of a subset of IBD significant genes by miRNA, significantly increased or decreased in UC (a) and CD (b). The colors pink and blue indicate upregulated and downregulated nodes, respectively. The links shown in the figure indicate gene/miRNA pairings reported in the literature (not necessarily for IBD). The miRNA shown in the figure correspond to those significantly upregulated or downregulated in IBD.

Download figure to PowerPoint


  1. Top of page
  2. Abstract
  6. Acknowledgements
  8. Supporting Information

IBD is a complex disorder involving commensal bacteria, multiple human cell types, and a variety of cellular networks including those related to adaptive and innate immunity. Our integrated microarray analysis of inflamed human colonic mucosa captures the widespread genetic perturbations of IBD phenotypes as associated with a variety of cell types, pathways, and biological processes, a system-level perspective of transcriptional regulation attributable to the heterogeneous nature of colonic biopsy samples. By combining raw microarray data from six independent studies, we were able to increase the statistical power of our analysis while minimizing laboratory-specific bias and experimental noise. A total of 118 unique patient samples were used including 48 UC, 41 CD, and 29 HCs. Differentially expressed miRNA obtained through literature curation were also incorporated into our analysis in order to investigate possible mechanisms of posttranscriptional regulation through validated interactions with differentially expressed genes related to IBD.63 Bioactive therapeutic small molecules targeting differentially expressed genes as well as gene involvement in other autoimmune diseases were also investigated to facilitate the identification of druggable pathways and therapeutic targets that may be candidates for a wide variety of related illnesses or existing therapeutics that may be repositioned for the treatment of IBD.

Our analysis of microarray data showed as much as a 40-fold increase in transcript number for some of the genes in IBD relative to HCs. The most upregulated gene in IBD is REG1A, coding a protein typically secreted by the exocrine pancreas and associated with islet cell regeneration. A survey of the literature shows upregulated REG1A as associated with pancreatic cancer, celiac disease, and Type I diabetes.64–66 The top upregulated genes for IBD in our list were abundant with CXC chemokines involved in inflammation, and were previously linked to such diseases as lupus, prostate cancer, and rheumatoid arthritis.68–70 The MMP family proteins are also abundant among the top 40 most highly upregulated genes in IBD. MMP proteins were previously shown to be involved in arthritis and metastasis.88 The third most highly upregulated gene in our list, S100A8, codes a soluble IBD biomarker protein calprotectin.71–74 High levels of expression of this protein is also linked to cystic fibrosis.75 Commonly upregulated pathways in IBD included the previously recognized Toll-like and NOD-like receptor signaling pathways, cytokine-cytokine interaction, and chemokine signaling. Cellular processes activated by IBD include immune defense response, cellular response to wounding, leukocyte activation, and regulations of immune response and cell proliferation as well as response to molecules of bacterial origin. Some of the pathways activated in IBD appear to be druggable, such as the calcium signaling pathway, endocytosis, cytokine–cytokine receptor interaction, and leukocyte transendothelial migration pathways. In our gene signatures, the genes UGT1A, HSD11B2, HSD17B2, HSD3B2, and LCMT1 are all significantly downregulated and are known to play a role in androgen and estrogen metabolism. The specific role the sex hormones play in the pathology of IBD is yet to be explored extensively in the literature.89, 90 All these pathways contain at least eight drug targets each, which are also significant genes in our HC to IBD comparison.

KEGG disease pathways enriched by IBD significantly upregulated genes included those with an autoimmune component, namely, autoimmune thyroid disease, asthma, primary immunodeficiency, systemic lupus, viral myocardidatis, Type I diabetes mellitus, autograft rejection, and graft-vs.-host disease. Patients with IBD were previously associated with a higher prevalence of rheumatoid arthritis, lupus, and hypothyroidism, with increased prevalence of asthma, eczema, allergic rhinitis, and diabetes.91 Moreover, the literature also contains links with already recognized IBD-associated diseases (cancer and malaria) and the diseases statistically enriched by our IBD genes cited above. A polymorphism in FCGR2B, an IBD significant gene, is associated with protection against malaria but susceptibility to systemic lupus erythematosus.92 Histone deacetylase inhibitors are currently being harnessed as a potential treatment for malaria, systemic lupus erythematosus, a wide range of neurodegenerative conditions, and asthma.93 In addition, the beneficial effects of malaria drugs have been recognized in the management of systemic lupus erythematosus and rheumatoid arthritis.94 Add to these other examples listed in the Results section, it appears that many of the top upregulated genes in IBD are also upregulated in autoimmune disorders. For this reason, we investigated the upregulated genes shared by IBD and other autoimmune diseases as possible drug targets by intersecting our gene list with the genes associated with specific diseases in the GAD.26 The genes known as drug targets and upregulated in multiple autoimmune diseases provide clues for the multidimensional nature of IBD. For example, the ATP binding transporter gene ABCC1—significantly upregulated in a number of diseases including, IBD, and rheumatoid arthritis, and is known to bind to HIV Tat protein—is targeted by drugs/supplements such as abacavir, tenovir, nevirapin, and quercetin, all with antiviral effects, as well as by the anticancer supplement apigenin and the transplant rejection medicine cyclosporine. Other significantly upregulated genes shared by rheumatoid arthritis and IBD include TLR8 (a target of the immune response modifier amiquimod); MMP3 (a target of failed drug marimastat); ICAM-1 (targeted indirectly by cholesterol-lowering lovastatin); IL-8 (targeted by tolmetin and ibuprofen); and IL2RA (targeted by ascomysin and budesonide, used for treating CD and asthma). The intersection of IBD significant genes with known drug targets provides valuable information for possible repositioning of existing drugs in the treatment and management of IBD.

The data on significantly altered miRNA expression in IBD, when integrated with the literature and our significant gene lists, indicate a strong role for miRNA in post transcriptional regulation gene expression related to IBD. Increased expression of miR-16 has been shown in the literature to downregulate CNTN3, C1QTNF3, ACVR2A, and CYCS, while Let-7f-1 has been shown to downregulate CNTN3, ANPEP, NAAA, PDCD, UGDH, HSDL2, and ACVR2A. These two miRNAs are highly expressed in UC and the genes shown are highly downregulated. Similarly, increased expression of mir-23b is linked to downregulation of PRAP1 in both CD and UC and increased miR-21 is associated with downregulated CTN3, PDK4, and PDCD genes in CD. The aforementioned proteins play roles in a multitude of events ranging from mitochondrion electron transport (CYCS), antiautoimmune processes (PDCD), regulation of glucose metabolism (PDK4), small intestinal microvillar events (ANPEP), neurite outgrowth-promoting activity (CNTN3), signaling through serine kinases (ACVR2A), and induction of glycosylation (UGDH). These results suggest the restorative potential of miRNA-based therapies on IBD-specific diminished biological processes. The results of our study can be put in context with mouse studies, emphasizing the effect of B lymphocytes on immunity and metabolism in the gut. A recent study in mice95 showed that the lack of B-cell immunity was associated with upregulation of genes involved in other aspects of immune defense, inflammatory, and interferon-inducible responses. The gene list provided in supplemental table 1 of Shulzhenko et al95 contains 14 significant genes from our CD/HC and UC/HC comparisons. Among these genes, DUOX2 and DUOXA2, involved in synthesis of hydrogen peroxide, are highly upregulated in CD. The genes significantly upregulated in both UC and CD with fold changes greater than 2.5 include CFI, which codes complement factor I. This protein is involved in the destruction of foreign invaders (such as bacteria and viruses), triggers inflammation, and removes debris from cells and tissues. Also upregulated in protein-deficient mouse and human IBD are interferon-induced proteins IFIT3, which potentiates antiviral signaling, and IFITM1, with antiproliferative effects; and chemokine CXCL9, thought to be involved in T-cell trafficking. The genes downregulated in B-cell deficient mice and in CD and UC consist of CYP27A1 in the synthesis of cholesterol, steroids, and other lipids, and EDN3, essential for generation of enteric neurons and was previously linked to Hirschsprung disease and Waardenberg syndrome. NR1H4 is a gene significantly downregulated both in B-cell-deficient mouse and in UC. It encodes a ligand-activated transcription factor, involved in bile acid synthesis and transport. These observations suggest the role of ineffective B-cell-mediated immunity in IBD, possibly causing activation of innate immunity and a reduction in the ability to process lipids.

In conclusion, this study presents a portrayal of the molecular rewiring of the human gut mucosa altered by IBD. Our statistical analysis reveals cellular processes altered in IBD, with a focus on biomarker selection, and possible repositioning of currently available drugs for the treatment of IBD. Our results show gene signature commonalities between IBD and asthma, lupus, and rheumatoid arthritis, among others. The results raise the question of compromised B-cell immunity in IBD patients along with overcompensation of other components of the immune system, possibly facilitating research on new treatment protocols for this chronic disorder.


  1. Top of page
  2. Abstract
  6. Acknowledgements
  8. Supporting Information

Author contributions: P.C. and A.T. designed the approach for microarray analysis of IBD. P.C. carried out the project with input from N.D. and W.D. R.G.P. and S.W.B. provided medical and biological insights into IBD and its link to autoimmune disorders and cancer. P.C. and A.T. wrote the article. All authors read and approved the final version. P.C. was supported by an unrestricted Calhoun endowment fellowship at Drexel Biomedical Engineering. We thank Dr. Jim Brown of GSK for useful discussions on infectious diseases and potential causes and treatments of IBD.


  1. Top of page
  2. Abstract
  6. Acknowledgements
  8. Supporting Information

Supporting Information

  1. Top of page
  2. Abstract
  6. Acknowledgements
  8. Supporting Information

Additional Supporting Information may be found in the online version of this article.

IBD_22958_sm_SuppFile1.txt23KSupporting Information File 1: The top 200 significant genes identified for both UC/HC and CD/HC comparisons. The columns in the table show gene symbols, fold changes, and SAM p-value.
IBD_22958_sm_SuppFile2.txt2KSupporting Information File 2: Differentially expressed genes targeted by a bioactive small molecule for UC/HC and CD/HC comparisons. The columns 1, 2, and 3 indicate the gene symbol, fold change in UC and fold change in CD, respectively.

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.