• Open Access

Development of a gene expression database and related analysis programs for evaluation of anticancer compounds


To whom correspondence should be addressed.

E-mails: akihiro.tomida@jfcr.or.jp; mmatsuura@jfcr.or.jp


Genome-wide transcriptional expression analysis is a powerful strategy for characterizing the biological activity of anticancer compounds. It is often instructive to identify gene sets involved in the activity of a given drug compound for comparison with different compounds. Currently, however, there is no comprehensive gene expression database and related application system that is; (i) specialized in anticancer agents; (ii) easy to use; and (iii) open to the public. To develop a public gene expression database of antitumor agents, we first examined gene expression profiles in human cancer cells after exposure to 35 compounds including 25 clinically used anticancer agents. Gene signatures were extracted that were classified as upregulated or downregulated after exposure to the drug. Hierarchical clustering showed that drugs with similar mechanisms of action, such as genotoxic drugs, were clustered. Connectivity map analysis further revealed that our gene signature data reflected modes of action of the respective agents. Together with the database, we developed analysis programs that calculate scores for ranking changes in gene expression and for searching statistically significant pathways from the Kyoto Encyclopedia of Genes and Genomes database in order to analyze the datasets more easily. Our database and the analysis programs are available online at our website (http://scads.jfcr.or.jp/db/cs/). Using these systems, we successfully showed that proteasome inhibitors are selectively classified as endoplasmic reticulum stress inducers and induce atypical endoplasmic reticulum stress. Thus, our public access database and related analysis programs constitute a set of efficient tools to evaluate the mode of action of novel compounds and identify promising anticancer lead compounds.

Cancer chemotherapy has gradually improved with the development of new anticancer agents. In particular, recent progress in the development of molecular cancer therapeutics has revealed new types of anticancer agents that directly target abnormal proteins in cancer cells.[1, 2] These agents are effective against certain types of cancer where the target protein plays a predominant role in the growth and survival of the cancer cells, but are generally less successful against other types of tumor.

To improve the present status of cancer chemotherapy, it is essential to search for novel compounds that selectively target new classes of molecular targets in cancer or induce cancer-specific cell death with new modes of action. It has been shown that compounds that selectively interfere with cellular biological targets abrogate specific signaling pathways and modulate the expression of individual subsets of signature genes. Therefore, gene signature-based analysis is a powerful strategy for characterizing the mechanism of action of drug candidates. Recently, Lamb et al.[3, 4] has developed a systematic approach named C-map to find connections among small molecules sharing a mechanism of action, chemicals and physiological processes, and diseases and drugs. They used a reference collection of gene expression profiles from cultured human cells treated with bioactive small molecules. However, the system was not specifically designed for anticancer drugs and lacks several standard agents.

Here, we obtained comprehensive gene expression datasets of anticancer drugs, consisting of standard anticancer agents, molecularly targeted drugs, and related inhibitors. The datasets include some compounds that are not contained in the C-map database. To develop a comprehensive database, we used the Affymetrix GeneChip HG-U133 Plus 2.0 arrays (Affymetrix, Santa Clara, CA, USA), which contained more probes (54 675 probe sets) than the arrays that were mainly used for the C-map database (HT_HG-U133A arrays; 22 283 probe sets). We further developed a calculation program that enables us to rapidly compare gene signatures of test compounds with those of antitumor agents to predict likely modes of action. Using our established systems, we successfully show that ER stress is differentially involved in the effect of antitumor agents.

Materials and Methods

Cell line and compounds

We used human colon adenocarcinoma HT-29 cells for the analysis. The cells were cultured in a humidified atmosphere of 5% CO2 and 95% air at 37°C. The anticancer compounds used in our analysis are listed in Table 1. These compounds were added to culture medium, with the solvent being <0.5% of the medium's volume.

Table 1. Treatment samples of anticancer compounds, their corresponding control sample, manufacturer, solvent, target/mode of action, and drug concentration for treatment
Treatment sample IDCompoundControl sample IDManufacturerSolventTarget/mode of actionConcentration
  1. 17-AAG, 17-N-allylamino-17-demethoxygeldanamycin; DHFR, dihydrofolate reductase; ER, endoplasmic reticulum; 5-FU, 5-fluorouracil; HDAC, histone deacetylase; Hsp90, heat shock protein 90; mTOR, mammalian target of rapamycin; SERCA, sarco/endoplasmic reticulum Ca2+-ATPase; Topo, topoisomerase. –, This product was provided as a solution (20 mM MES [2-Morpholinoethanesulfonic acid, monohydrate] buffer, pH 5.5)


Bristol-Myers Squibb

(New York, NY, USA)

DMSODNA cross-linker30 μM
GR002Trichostatin ACS1

Wako Pure Chemical Industries

(Osaka, Japan)


Cayman Chemical Company

(Ann Arbor, MI, USA)


Millennium Pharmaceuticals

(Cambridge, MA, USA)

DMSOProteasome100 nM
GR005MG-132CS1Peptide Institute (Osaka, Japan)DMSOProteasome1 μM
GR006GeldanamycinCS1Sigma (St Louis, MO, USA)DMSOHsp9030 nM
GR00717-AAGCS1Wako Pure Chemical IndustriesDMSOHsp90100 nM
GR008VincristineCS2Eli Lilly (Indianapolis, IN, USA)Distilled waterTubulin30 nM
GR009PaclitaxelCS2Bristol-Myers SquibbDMSOTubulin30 nM
GR010DocetaxelCS2Astra Zeneca (London, UK)DMSOTubulin30 nM
GR0115-FUCS2SigmaDMSOPyrimidine100 μM
GR012GemcitabineCS2Eli LillySalinePyrimidine1 μM
GR013MelphalanCS3Sigma75% DMSODNA cross-linker100 μM
GR014Mitomycin CCS3Sigma50% DMSODNA alkylator10 μM
GR015OxaliplatinCS3Yakult (Tokyo, Japan)DMSODNA cross-linker3 μM
GR016BleomycinCS3SigmaDistilled waterDNA cleavage30 μg/mL
GR017Actinomycin DCS3SigmaDMSORNA synthesis30 nM
GR018NeocarzinostatinCS3SigmaDNA cleavage3 μg/mL
GR019MethotrexateCS3SigmaDMSODHFR1 μM
GR0206-MercaptopurineCS3SigmaDMSOPurine100 μM

Santa Cruz Biotechnology

(Santa Cruz, CA, USA)

GR022EverolimusCS3SigmaDMSOmTOR10 μM
GR023PP242CS3SigmaDMSOmTOR10 μM
GR024NimustineCS4SigmaDistilled waterDNA alkylator1 mM
GR025SN38 (Irinotecan)CS4YakultDMSOTopo I3 μM
GR026CamptothecinCS4SigmaDMSOTopo I3 μM
GR027TopotecanCS4SigmaDMSOTopo I3 μM
GR028DoxorubicinCS4SigmaDMSODNA intercalator/Topo II3 μM
GR029EtoposideCS4Bristol-Myers SquibbDMSOTopo II30 μM
GR030MitoxantroneCS4SigmaDMSODNA intercalator/Topo II3 μM
GR031PemetrexedCS4Santa Cruz BiotechnologyDMSODNA/RNA synthesis1 μM
GR0322-DeoxyglucoseCS1SigmaDistilled waterER stress (glycolysis)10 mM
GR033TunicamycinCS1Nacalai Tesque (Kyoto, Japan)DMSOER stress (N-glycosylation)3 μg/mL
GR034ThapsigarginCS1Wako Pure Chemical IndustriesDMSOER stress (SERCA)10 nM
GR035A23187CS1Wako Pure Chemical IndustriesDMSOER stress (Ca2+ ionophore)3 μM
GR036Vorinostat (16 h)CS5Cayman Chemical CompanyDMSOHDAC10 μM
GR037Bortezomib (16 h)CS5Millennium PharmaceuticalsDMSOProteasome100 nM
GR038Vincristine (16 h)CS5Eli Lilly Distilled waterTubulin30 nM
GR039Paclitaxel (16 h)CS5Bristol-Myers SquibbDMSOTubulin30 nM
GR040Docetaxel (16 h)CS5Astra ZenecaDMSOTubulin30 nM
GR0415-FU (16 h)CS5SigmaDMSOPyrimidine100 μM
GR042Mitomycin C (16 h)CS5Sigma50% DMSODNA alkylator10 μM
GR043VorinostatCS6Cayman Chemical CompanyDMSOHDAC10 μM
GR044BortezomibCS6Millennium PharmaceuticalsDMSOProteasome100 nM
GR045Vorinostat (16 h)CS7Cayman Chemical CompanyDMSOHDAC10 μM
GR046Bortezomib (16 h)CS7Millennium PharmaceuticalsDMSOProteasome100 nM
GR047Gemcitabine (16 h)CS8Eli LillySalinePyrimidine1 μM
GR048Oxaliplatin (16 h)CS8YakultDMSODNA cross-linker3 μM
GR049Bleomycin (16 h)CS8SigmaDistilled waterDNA cleavage30 μg/mL
GR050Neocarzinostatin (16 h)CS8SigmaDNA cleavage3 μg/mL
GR051Methotrexate (16 h)CS8SigmaDMSODHFR1 μM
GR0526-Mercaptopurine (16 h)CS8SigmaDMSOPurine100 μM
GR053PP242 (16 h)CS8SigmaDMSOmTOR10 μM
GR054Etoposide (16 h)CS8Bristol-Myers SquibbDMSOTopo II30 μM
GR055Pemetrexed (16 h)CS8Santa Cruz BiotechnologyDMSODNA/RNA synthesis1 μM

Drug treatment and GeneChip analysis

The HT-29 cells were seeded in 6-well plates in RPMI-1640 medium supplemented with 10% heat-inactivated FBS and 100 μg/mL kanamycin. After 20 h of culture, we added the anticancer compounds to the cells at various concentrations. Incubation was then continued for a further 6 or 16 h. In each batch, we also prepared untreated cells as a negative control (Table 1, CS1–CS8). Total RNAs were extracted from the drug-treated cells using an RNeasy Mini kit (Qiagen, Hilden, Germany). The quality of the RNAs was assessed using an Agilent 2100 Bioanalyzer and RNA 6000 Series II Pico Kit (Agilent Technologies, Palo Alto, CA, USA). We then carried out gene expression analyses using the GeneChip 3' IVT Expression Kit and GeneChip Human Genome U133 Plus 2.0 Array (both Affymetrix) according to the protocols provided by the manufacturer. Hybridization was carried out at 45°C for 16 h in a hybridization oven (Affymetrix). The GeneChips were then automatically washed and stained with streptavidin–phycoerythrin conjugate in an Affymetrix GeneChip Fluidics Station. Fluorescence intensities were scanned with a Gene-Array Scanner (Affymetrix). Affymetrix GeneChip Command Console (AGCC) version 3.1 was used for the data output.

Statistical analysis

All analyses except for original C-map analysis were carried out using R version 2.15.0 (http://www.r-project.org/) and Bioconductor version 2.10 (http://bioconductor.org/).

Data preprocessing

The R package software of Affymetrix Micro Array Suite 5.0, MAS5, was used to generate signal intensities for each of the HG-U133 Plus 2.0 arrays in the study. Expression values were normalized to a mean target level of 100.

Identifying gene signatures

We examined gene expression changes in HT-29 cells after exposure to 35 anticancer compounds (55 treatment samples). Gene sets were extracted and classified as upregulated or downregulated after exposure to the drug. For each treatment sample, we calculated treatment-to-control ratio statistics and selected upregulated and downregulated probe sets as gene signatures (see Doc. S1 for details).

Hierarchical clustering

Probe sets for hierarchical clustering were composed of the collection of all gene signatures. We carried out hierarchical clustering using the logarithm of the ratio statistics of 55 treatment samples and the probe set. We used Ward's method for linkage and Pearson's correlation for distance metric.

Connectivity map analysis

For the C-map analysis, we prepared the up- and down-signature as input query, which consists of the HG-U133A probe sets. For each treatment sample, up-signature (ratio >3) and down-signature (ratio <1/3) were selected from the HG-U133 Plus 2.0 array data. The probe sets not included in the HG-U133A array were deleted. We used the online application in the C-map website developed by Lamb et al.[3] (http://www.broadinstitute.org/cmap/).

Ranking gene expression changes

We adopted the connectivity score based on the Kolmogorov–Smirnov statistic as developed by Lamb et al.[3] to investigate the relationship between gene signature and compound. In order to calculate the Kolmogorov–Smirnov statistic faster, it was effective to rank the probe sets in descending order of the treatment-to-control ratio and save as a database. For each treatment sample, ranking was calculated as described in the supporting online material for Lamb et al.[1] We developed a program for calculating connectivity scores. The source code of this program is available on our website (http://scads.jfcr.or.jp/db/cs/).

Finding significant pathways from the KEGG PATHWAY database

The KEGG PATHWAY is a service of KEGG,[5] which is a collection of manually drawn pathway maps that represent knowledge on biological networks. We carried out an analysis to identify significant pathways from the KEGG database. For each pathway, probe sets included in the pathway were extracted using the packages hgu133plus2.db and KEGG.db in Bioconductor. Next, gene signatures and probe sets in the pathway were converted to gene symbols. For each pathway, a 2 × 2 contingency table was created for statistical testing (see Doc. S1). We carried out Fisher's exact test for all pathways and extracted significant pathways. According to the multiple testing theory, significance was defined using the criterion that the false discovery rate[6] equals 0.15. The pathways were called significant when the values were <0.15. We developed an R program for the analysis in which the package qvalue in Bioconductor was used for the calculation of q-values.[7, 8] The source code of the program is available on our website.


Development of a gene expression database of antitumor agents

For our analyses, we chose 35 compounds consisting of clinically-used standard anticancer agents and related drugs (Table 1). The p53-mutant human colon cancer HT-29 cell line was used in this study because it represents a typical type of solid tumor and is relatively resistant to cell cycle arrest and apoptosis. We examined the growth inhibitory effect of these agents on the cells and determined effective dosages of the drugs (Fig. S1). We used a concentration of drug that was 3–10-fold greater than the GI50 value and caused >80% growth inhibition after 48 h of treatment (Table 1). Drug treatment conditions were carefully chosen to enable primary changes in gene expression to be monitored before a secondary cellular response had emerged. Cells were treated for a relatively short time (6 h) for acquisition of gene expression data. As shown in Table 2, the majority of agents caused significant gene expression changes after treatment. However, for the agents that did not show a dramatic effect on gene expression after 6 h of treatment, we also analyzed the gene expression data after a longer exposure time (16 h) (Table 2).

Table 2. Number of upregulated and downregulated gene signatures for 35 anticancer compounds (55 treatment samples)
Treatment sample IDCompoundUpregulated × 3 (×2)Downregulated × 3 (×2)
  1. †Number of probe sets such that the treatment-to-control ratio is more than 3 or <1/3 and the larger signal intensity of treatment or control is 300 is shown. ‡Results when the threshold values are changed (2 or 1/2 for ratio, 100 for signal intensity). 17-AAG, 17-N-allylamino-17-demethoxygeldanamycin; 5-FU, 5-fluorouracil.

GR001Cisplatin19 (256)52 (697)
GR002Trichostatin A232 (1427)181 (1238)
GR003Vorinostat233 (1389)173 (1245)
GR004Bortezomib94 (494)68 (659)
GR005MG-13263 (428)51 (583)
GR006Geldanamycin0 (29)0 (32)
GR00717-AAG7 (68)0 (41)
GR008Vincristine0 (45)0 (6)
GR009Paclitaxel0 (41)0 (9)
GR010Docetaxel1 (37)0 (8)
GR0115-FU3 (131)14 (154)
GR012Gemcitabine2 (26)5 (49)
GR013Melphalan85 (716)242 (1553)
GR014Mitomycin C3 (69)9 (252)
GR015Oxaliplatin0 (11)9 (81)
GR016Bleomycin0 (6)1 (5)
GR017Actinomycin D26 (294)188 (1384)
GR018Neocarzinostatin0 (57)2 (138)
GR019Methotrexate1 (28)8 (62)
GR0206-Mercaptopurine2 (38)2 (52)
GR021Temsirolimus10 (132)0 (24)
GR022Everolimus16 (162)0 (13)
GR023PP24298 (788)37 (530)
GR024Nimustine7 (131)13 (318)
GR025SN38 (Irinotecan)75 (602)512 (2445)
GR026Camptothecin102 (735)809 (3151)
GR027Topotecan28 (576)190 (1268)
GR028Doxorubicin49 (459)184 (1323)
GR029Etoposide2 (41)7 (133)
GR030Mitoxantrone47 (312)179 (1408)
GR031Pemetrexed2 (16)5 (34)
GR0322-Deoxyglucose130 (586)16 (439)
GR033Tunicamycin209 (768)63 (673)
GR034Thapsigargin69 (323)3 (119)
GR035A23187266 (986)86 (931)
GR036Vorinostat (16 h)434 (2142)478 (2057)
GR037Bortezomib (16 h)268 (1379)299 (1882)
GR038Vincristine (16 h)28 (293)77 (335)
GR039Paclitaxel (16 h)21 (263)60 (281)
GR040Docetaxel (16 h)18 (221)57 (270)
GR0415-FU (16 h)26 (543)39 (556)
GR042Mitomycin C (16 h)25 (404)57 (605)
GR043Vorinostat (16 h)297 (1543)229 (1444)
GR044Bortezomib (16 h)118 (596)97 (904)
GR045Vorinostat (16 h)462 (2266)465 (2106)
GR046Bortezomib (16 h)307 (1551)373 (1893)
GR047Gemcitabine (16 h)15 (339)8 (186)
GR048Oxaliplatin (16 h)7 (167)53 (410)
GR049Bleomycin (16 h)3 (23)7 (29)
GR050Neocarzinostatin (16 h)13 (255)9 (180)
GR051Methotrexate (16 h)67 (692)53 (507)
GR0526-Mercaptopurine (16 h)35 (388)9 (273)
GR053PP242 (16 h)192 (1206)106 (924)
GR054Etoposide (16 h)33 (456)31 (405)
GR055Pemetrexed (16 h)24 (400)8 (186)

Hierarchical clustering and C-map analysis

With the data that we obtained, we first used hierarchical cluster analysis. The analysis revealed that the drugs were divided into two major clusters, one containing conventional genotoxic drugs and the other including tubulin binding agents (paclitaxel, docetaxel, and vincristine), proteasome inhibitors (bortezomib and MG-132) and Hsp90 inhibitors (geldanamycin and 17-AAG) (Fig. 1). Moreover, we found that drugs with similar mechanisms of action were clustered together. As shown in Figure 1, DNA topoisomerase I inhibitors (camptothecin and SN-38), HDAC inhibitors (vorinostat and trichostatin A), mammalian target of rapamycin (mTOR) inhibitors (everolimus, temsirolimus, and PP242), the tubulin binding agents (paclitaxel, docetaxel, and vincristine), the proteasome inhibitors (bortezomib and MG-132), the Hsp90 inhibitors (geldanamycin and 17-AAG), and ER stress inducers (thapsigargin, 2-deoxyglucose, tunicamycin, and A23187) each formed a mechanism-specific cluster. Further analysis revealed that each cluster of compounds modulates a cluster-specific signature gene set (Table S1). However, the 16-h treatment data tended to cluster together. Next we carried out the C-map analyses for further validation. This analysis used a collection of genome-wide transcriptional expression data from cells treated with chemical compounds and is useful in finding functional connections between compounds. When we used our gene signatures for HDAC inhibitors (trichostatin A and vorinostat) as “queries”, we were able to obtain output data that contained compounds with the same mode of action (Table 3). Similarly, when we entered the signature of the proteasome inhibitors, our output results contained proteasome inhibitors, MG-262 and MG-132, as hit compounds (Table S2).

Table 3. Results of the Connectivity map for HDAC inhibitors, showing list of ‘hit compounds’ as related compounds to the input gene signatures
RankC-map nameDoseCellScoreUpDown
  1. The number of up- and down-signatures are shown in parenthesis.

(a) Trichostatin A (up, 164; down, 123)
1Trichostatin A100 nMMCF71.0000.829−0.731
2Trichostatin A1 μMMCF70.9900.818−0.727
3Trichostatin A100 nMMCF70.9890.811−0.732
4Trichostatin A1 μMMCF70.9870.810−0.731
5Trichostatin A100 nMMCF70.9830.814−0.720
6Trichostatin A100 nMMCF70.9770.820−0.705
7Trichostatin A1 μMMCF70.9770.802−0.722
8Trichostatin A100 nMMCF70.9740.826−0.695
9Vorinostat10 μMMCF70.9720.802−0.716
10Trichostatin A100 nMMCF70.9700.807−0.707
11Trichostatin A100 nMMCF70.9700.806−0.708
12Trichostatin A100 nMMCF70.9700.792−0.722
13Trichostatin A100 nMMCF70.9690.821−0.691
14Vorinostat10 μMMCF70.9690.803−0.709
15Trichostatin A1 μMMCF70.9680.799−0.711
16Vorinostat10 μMMCF70.9670.783−0.727
17Trichostatin A100 nMMCF70.9670.770−0.738
18Trichostatin A1 μMMCF70.9650.784−0.723
19Trichostatin A100 nMMCF70.9650.803−0.703
20Trichostatin A100 nMMCF70.9630.787−0.716
(b) Vorinostat (up, 157; down, 119)
1Trichostatin A1 μMMCF71.0000.818−0.729
2Trichostatin A100 nMMCF70.9970.807−0.735
3Trichostatin A100 nMMCF70.9920.817−0.716
4Trichostatin A1 μMMCF70.9860.800−0.725
5Trichostatin A100 nMMCF70.9810.811−0.707
6Trichostatin A1 μMMCF70.9800.799−0.716
7Trichostatin A100 nMMCF70.9760.812−0.697
8Trichostatin A100 nMMCF70.9750.801−0.706
9Trichostatin A1 μMMCF70.9700.778−0.721
10Trichostatin A100 nMMCF70.9690.814−0.684
11Trichostatin A100 nMMCF70.9680.816−0.681
12Trichostatin A100 nMMCF70.9670.767−0.729
13Vorinostat10 μMMCF70.9660.781−0.714
14Vorinostat10 μMMCF70.9640.803−0.688
15Vorinostat10 μMMCF70.9630.791−0.698
16Trichostatin A1 μMMCF70.9630.801−0.688
17Vorinostat10 μMMCF70.9630.789−0.700
18Trichostatin A1 μMMCF70.9630.784−0.705
19Trichostatin A100 nMMCF70.9620.795−0.693
20Vorinostat10 μMMCF70.9610.799−0.686
Figure 1.

Hierarchical cluster analysis based on the collection of gene signatures of 35 anticancer compounds (55 treatment samples). In total, 3237 probe sets were used for clustering. The values in the heatmap are the logarithm of sample-to-control ratio of intensity values. Neither normalizing nor scaling was carried out. 17-AAG, 17-N-allylamino-17-demethoxygeldanamycin; 5-FU, 5-fluorouracil. Green, downregulated genes; red, upregulated genes.

To validate the applicability of our gene signature data to other types of cancer, we carried out an additional study on bortezomib. This agent is used for myeloma treatment. Therefore, we treated a human myeloma RPMI8226 cells with bortezomib and obtained gene expression data. Our clustering analysis revealed that the bortezomib signature data of RPMI8226 cells was clustered together with the proteasome inhibitors' data of HT-29 cells. Similarly, when RPMI8226 cells were treated with SN-38 or doxorubicin, the signature data were clustered with DNA damaging agents' data of HT-29 cells. (Fig. S2). These results confirmed that our signature data are applicable to other cancer cells.

Collectively, these analyses showed that our gene expression data were reliable enough to analyze modes of action of the anticancer drugs.

Application of our database for analysis of the mode of action

Currently, there is no gene expression database open to the public that is specialized in anticancer agents. Based on the obtained gene expression data of anticancer agents, we developed our calculation program (connectivity scoring analysis) to compare gene signatures of test compounds to those of antitumor agents in our database for prediction of their likely modes of action. Basically, we adapted the algorithms of C-map to our datasets. The C-map contains data for a large number of compounds, but lacks information for several anticancer agents such as bortezomib. This makes it difficult to simply focus on a comparison of gene expression signatures for test compounds and standard anticancer agents. When we entered the signature gene set of the HDAC inhibitors as a “query” in our established system, we were able to obtain output data containing HDAC inhibitors, trichostatin A, or vorinostat (Table 4). Similarly, when we entered the signatures of the proteasome inhibitors, we obtained output data containing proteasome inhibitors (Table S3). These results indicate that our system can accurately predict the mode of action of an anticancer compound. Using this system, we were able to generate simpler results because our database specifically focuses on anticancer agents.

Table 4. Results of the connectivity scoring analysis using our database
RankCompound nameConnectivity scoreUp scoreDown score
  1. The number of up- and down-signatures are shown in parentheses.

(a) Trichostatin A (up, 232; down, 181)
1Trichostatin A1.0000.992−0.994
3Vorinostat 0.9760.976−0.964
4Vorinostat (16 h)0.9210.924−0.906
5Vorinostat (16 h)0.9060.905−0.893
8Etoposide (16 h)0.6610.732−0.581
9Gemicitabine (16 h)0.6580.717−0.591
10Neocarzinostatin (16 h)0.6470.688−0.597
(b) Vorinostat (up, 233; down, 173)
2Trichostatin A0.9880.981−0.982
4Vorinostat (16 h)0.9250.925−0.912
5Vorinostat (16 h)0.9160.918−0.900
7Etoposide (16 h)0.6750.745−0.597
9Neocarzinostatin (16 h)0.6580.715−0.591
10Gemicitabine (16 h)0.6540.726−0.573

Recently, several reports have shown that ER stress is induced by a wide variety of chemotherapeutic agents.[9-20] However, to what extent ER stress is involved in the effect of these agents on cancer cells is still unclear. Because the proteasome inhibitors (bortezomib and MG-132) and the Hsp90 inhibitors (geldanamycin and 17-AAG), but not other agents, were closely clustered together with the four ER stress-inducer agents (thapsigargin, 2-deoxyglucose, tunicamycin, and A23187) (Fig. 1), we examined whether our connectivity scoring analysis could predict these agents as ER stress inducers. We first extracted the ER stress signature gene set, consisting of 58 probe sets whose expressions were all changed by treatment with the four ER stress-inducing agents. Then we entered the ER signature gene set in our calculation program. As expected, we obtained a result containing the proteasome inhibitors (bortezomib and MG-132) as “hit compounds” (Table 5). By contrast, the output result did not contain the Hsp90 inhibitors (geldanamycin and 17-AAG). We examined expression changes of the ER stress genes by these agents in more detail. Our analysis revealed that the proteasome inhibitors strongly induced the ER stress-related genes, but the Hsp90 inhibitors did not (Fig. 2). This finding is consistent with our connectivity scoring analysis. We further found that the proteasome inhibitors preferentially induced a subset of the ER stress-related genes (class 1 genes in Fig. 2) and marginally induced the others (class 2 genes). We carried out the mapping of class 1 and 2 genes to the KEGG pathway of “protein processing in endoplasmic reticulum” and found the class 2 genes were mainly involved in the core protein processing machinery in the ER (Fig. S3). By contrast, most of the class 1 genes were not included in the core protein processing machinery in the ER, and many of them were known downstream effectors of main ER stress signaling pathways, including PERK-eIF2alpha-ATF4, ATF6, and IRE1-XBP1.[21-31] These results show that the proteasome inhibitors induce atypical ER stress. Thus, our database and its application program constitute an efficient tool for predicting the likely modes of action of anticancer agents.

Table 5. Results of the connectivity scoring analysis using our database. Endoplasmic reticulum stress-related genes (58 probe sets) were used as input queries
RankCompound nameConnectivity scoreUp scoreDown score
10Bortezomib (16 h)0.7000.729−0.661
Figure 2.

Heatmap of subcluster using endoplasmic reticulum stress-related genes (58 probe sets), whose expressions were all changed by treatment with the four stress-inducing agents. The row names are gene symbols of 58 probe sets, which were converted using the NetAffx database, NA32, supplied by Affymetrix. Two main clusters of upregulated genes were named “class 1” and “class 2” genes. 17-AAG, 17-N-allylamino-17-demethoxygeldanamycin.


Novel free-access platform for evaluating antitumor agents

Several previous analyses, including the C-map, have shown that genome-wide gene expression analysis is effective in predicting modes of action of chemical compounds.[3, 4, 32-34] In the present report, we describe the development of a comprehensive gene expression dataset specializing in the analysis of standard antitumor agents. This open-to-the-public database is available for evaluating the likely mechanisms of action of new anticancer compounds.

In our clustering analysis, drugs with similar mechanisms of action such as genotoxic drugs, proteasome inhibitors, HDAC inhibitors, and ER stress inducers were clustered together (Fig. 1). These results strongly suggest that our gene expression data accurately reflect the mode of action of the agents. The KEGG pathway analysis confirmed that the ER stress-related gene set was actually induced by the ER stressors (Table S4), which is consistent with our clustering results.

We acquired our gene expression data using human colon cancer HT-29 cells, whereas the C-map data mainly consisted of data that were obtained using human breast cancer MCF7 cells or human prostate cancer PC3 cells. It is noteworthy that our signature data of compounds in HT-29 cells were closely related to those in the C-map (Table 3). Moreover, we obtained gene expression data in human myeloma RPMI8226 cells and found that the data were closely related with the data obtained in HT-29 cells (Fig. S2). These results indicate that our signature data could be applicable to data that are obtained in other types of cancer.

To obtain gene expression data that reflect the mode of action of agents, the exposure time of cells to drugs is an important issue. We basically chose a short exposure time (6 h) and, in most of the agents, significant gene expression changes were observed (Table 2) and the data were clustered in a mechanism of action-dependent manner (Fig. 1). These results indicate that the exposure time would be basically suitable for many agents. By contrast, for some agents that did not show dramatic effects on gene expression in the 6-h treatment, we tested a longer exposure time (16 h) (Table 1). However, we found that the 16-h treatment data tended to cluster together, even though the agents had different modes of action (Fig. 1). These observations suggest that longer exposure time might not necessarily be better than shorter exposure time even though gene expression changes are increasingly dramatic. Thus, the drug treatment regime must be carefully chosen for gene signature acquisition and subsequent mechanism analysis.

Mining cryptic linkages and gaps between ER stress and related agents

As reported previously, ER stress is closely related with tumor microenvironment conditions as well as the effect of several antitumor agents.[32] Therefore, we included well-known ER stress-inducing agents in our compound panel. We found that the tubulin binding agents, proteasome inhibitors, and Hsp90 inhibitors were clustered together with the ER stress inducers in a group different from classical genotoxic agents (Fig. 1). This observation indicates that these drugs could have unique modes of action.

It has been reported that proteasome inhibitors induce ER stress as well as suppressing nuclear factor-κB activation by interfering with the degradation of I-κB.[12] In our clustering analysis, the protease inhibitors formed a cluster with the ER stress inducers (Fig. 1) and the analysis with our program predicted that the inhibitors induce ER stress (Table S3). Moreover, KEGG pathway analysis revealed that these agents modulate the expression of ER stress-related genes (indicated as “protein processing in endoplasmic reticulum” in Table S4). These data support the notion that ER stress could play an essential role in the mode of action of proteasome inhibitors.

Our gene signature analysis further revealed that the proteasome inhibitors induce an atypical type of ER stress. In particular, we found that the inhibitors induce a subset of ER stress-related genes (class 1 genes) while only marginally inducing the other genes (class 2 genes) (Figs. 2, S3). It is still unclear what causes this atypical gene expression pattern. Presumably there is a negative feedback mechanism that selectively suppresses class 2 gene expression. It is noteworthy that the analysis with our program was able to detect the mechanistic difference between the proteasome inhibitors and the well-known ER stress-inducing agents. Namely, when we entered the ER signature gene set in our program, the ER stress inducers ranked significantly higher than the proteasome inhibitors (Table 5). Thus, our program could potentially predict detailed mechanisms of action of anticancer drugs.

The Hsp90 inhibitors did not typically induce the ER stress-related genes (Fig. 2, Table 5), although they were clustered with the ER stress inducers (Fig. 1). To determine the connection between the inhibitors and ER stress, we carried out our connectivity scoring analysis. We found that the signature of the Hsp90 inhibitors weakly correlated with those of some ER stress inducers, such as thapsigargin or tunicamycin (Table S5), suggesting that the inhibitors would not be typical ER stressors but could marginally induce ER stress. It was reported that some Hsp90 inhibitors induce ER stress, but the level of ER stress induction differs among the inhibitors.[20] These data suggest that ER stress induction by Hsp90 inhibitors could depend both on compound types and on cell types.

As described above, we developed in-house R programs for calculating scores for ranking gene expression changes, and for searching statistically significant pathways from the KEGG database. The program for searching pathways enables us to easily produce simple charts for each pathway analyzed and give graphical information of the location of genes in each pathway. The programs and database developed in this study will be made available on our website. Our database included some compounds that are not present in the C-map database. Therefore, unanticipated characteristics of a novel compound might be obtained by using our database.

In summary, we have developed a publicly available gene expression database of standard anticancer agents as well as some related application programs. Our gene expression database is specialized in antitumor agents, and our datasets include some anticancer agents not contained in other databases, such as C-map. To establish a more comprehensive database, we plan to add new antitumor agents and update our database. Thus, our database would be suitable for primary characterization of new candidate compounds in comparison with known anticancer agents. We have also acquired data concerning differential sensitivity of human cancer cell lines to anticancer agents and established a “sensitivity-based” signature database.[33-36] Further trials are planned to develop an integrated database of antitumor agents that include both gene expression-based and sensitivity-based signatures. Our public database and related programs will be helpful for evaluating candidate compounds as novel antitumor agents.


The present study is supported by the Scientific Support Programs for Cancer Research project/Screening Committee of Anticancer Drugs (SCADS), Grant-in-Aid for Scientific Research on Innovative Areas from The Ministry of Education, Culture, Sports, Science and Technology, Japan.

Disclosure Statement

The authors have no conflict of interest.






Connectivity map


endoplasmic reticulum


histone deacetylase


heat shock protein 90


Kyoto Encyclopedia of Genes and Genomes