Gene expression profiling identifies potential relevant genes in alveolar rhabdomyosarcoma pathogenesis and discriminates PAX3-FKHR positive and negative tumors



We analyzed the expression signatures of 14 tumor biopsies from children affected by alveolar rhabdomyosarcoma (ARMS) to identify genes correlating to biological features of this tumor. Seven of these patients were positive for the PAX3-FKHR fusion gene and 7 were negative. We used a cDNA platform containing a large majority of probes derived from muscle tissues. The comparison of transcription profiles of tumor samples with fetal skeletal muscle identified 171 differentially expressed genes common to all ARMS patients. The functional classification analysis of altered genes led to the identification of a group of transcripts (LGALS1, BIN1) that may be relevant for the tumorigenic processes. The muscle-specific microarray platform was able to distinguish PAX3-FKHR positive and negative ARMS through the expression pattern of a limited number of genes (RAC1, CFL1, CCND1, IGFBP2) that might be biologically relevant for the different clinical behavior and aggressiveness of the 2 ARMS subtypes. Expression levels for selected candidate genes were validated by quantitative real-time reverse-transcription PCR. © 2005 Wiley-Liss, Inc.

Rhabdomyosarcomas (RMS) are rare but very aggressive tumors of childhood. It is generally hypothesized that these tumors arise as a consequence of regulatory disruption of the growth and differentiation pathways of myogenic precursor cells, but the actual identity of this precursor is still a matter of debate and research.1, 2 Based on morphology, 2 major RMS subtypes can be identified: embryonal RMS (ERMS) and alveolar RMS (ARMS). The embryonal RMS includes also the botryoid, anaplastic and spindle-cell variants. An additional subtype is the pleomorphic RMS, which is exceptionally found in childhood.3 ARMS represents approximately 25–30% of RMS and has a worse prognosis than ERMS. Cytogenetic and molecular analyses have demonstrated that ARMS frequently harbor the reciprocal chromosomal translocation t(2;13)(q35;q14), in which the PAX3 and FKHR genes are juxtaposed; also the less frequent variant translocation t(1;13)(p36;q14) has been associated to ARMS. The PAX3-FKHR chimeric protein binds regulatory sequences of genes involved in growth, differentiation, apoptosis and in vivo migration of rhabdomyoblasts, including c-met, IGF-1, PDGF-R, BCL-X, SDF-1 and CXCR4.4, 5, 6, 7, 8 Moreover, retrospective analysis of PAX3-FKHR positive and negative ARMS9, 10, 11 suggested that the translocation positive ARMS patients fared worse than the negative counterpart. Thus, the presence of the t(2;13) reciprocal translocation should be considered one of the main features affecting the biology of ARMS cells and might represent an adverse prognostic factor.

Recently, the genomic approach based on global gene expression profiling by DNA microarrays has seen an explosive number of applications for cancer classification, prognosis and functional analysis of gene networks implicated in cancer biology.12, 13, 14 Among a number of different cancer types, few studies were conducted in ARMS cell lines to define the gene expression profile and the influence of the exogenously expressed fusion gene PAX3-FKHR15, 16 and more recently in ARMS and ERMS tumors.17, 18, 19

To better characterize the pattern of gene activation in ARMS we analyzed a series of tumor specimens by using a novel version of the cDNA microarray platform previously established in our laboratory.20, 21 We also studied the differential gene expression profile of PAX3-FKHR positive and negative ARMS. Our results suggest that PAX3-FKHR positive ARMS possess a significantly different gene expression signature compared to PAX3-FKHR negative tumors and that a small set of genes can correctly classify the 2 ARMS subtypes. This candidate molecular signature has been verified with a multiple resampling schema, implementing a complete validation protocol.22, 23 Furthermore, we have identified some novel deregulated genes possibly implicated in ARMS pathogenesis.

Material and methods

Patient enrolment and clinical characterization

Tumor specimens were obtained from patients enrolled in the pediatric sarcoma protocol SPM96 of the Italian Association of Pediatric Hematology and Oncology (AIEOP). The study was approved by the Ethics Committee as part of the AIEOP clinical trial. Relevant clinical data of patients enrolled in the study is reported in Table I. Diagnosis was accomplished based on standard histological and immunohistochemical studies and was reviewed by the central panel of pathologists in all of the cases. Expression of PAX3-FKHR and PAX7-FKHR, MyoD1, myogenin were determined by RT-PCR24, 25 in each tumor sample. Staging was assessed according to IRS criteria.26

Table I. Patient Description
PatientSexAge (years/months)Fusion transcript: PAX3/FKHRFusion transcript: PAX7/FKHRPrimary siteStage (IRS)Outcome
  1. Clinical and cytogenetic data of 14 children affected by alveolar rhabdomyosarcoma enrolled in the pediatric sarcoma protocol AIEOP SPM96. Gene expression profiling for the last four patients (11–14) was analyzed in a blind protocol, to confirm the predictive value of discriminant genes identified with PAM algorithm. Specific fusion transcripts PAX3/FKHR and PAX7/FKHR have been identified through molecular studies (i.e. RT-PCR assays) according to the AIEOP SPM96 sarcoma protocol. All 14 tumors resulted negative for the PAX7-FKHR transcript. F, female; M, male; DOD, dead of disease; I CCR, first continuous complete remission; m, months; y, years; PD, progressive disease; GU, genitor-urinary; BP, bladder-prostate.

1F2y 9m+Head/neckIIDOD
2F13y 3m+LimbsIIII CCR 24 m
3F14y 10m+LimbsIVDOD
4F6y 0m+LimbsIVI CCR 18 m
5F9y 8m+LimbsIVAlive, PD
6M9y 5mRetroperitoneumIIII CCR 28 m
7M8y 10mHead/neckIIII CCR 23 m
8M9y 7mPelvisIVDOD
9M3y 1mParatesticularII CCR 60 m
10F0y 10mHead/neckIIDOD
11F1y 8mPelvisIVI CCR 21 m
12F2y 10m+GU not BPIIOn 1st line therapy
13M8y 9mRetroperitoneumIIII CCR 8 m
14M9y 10m+RetroperitoneumIVDOD

Microarray fabrication

Microarrays (Human Array 2.0) were constructed arraying on glass slides cDNA inserts corresponding to the 3′-portion of mRNAs, from a collection of 4,670 different bacterial clones. This collection was obtained by the systematic sequencing of human skeletal muscle, heart and hematopoietic cell specific libraries. Library bacterial clones were inoculated in 96-well, 2 ml Assay Block (Costar, Milipitas, CA), containing 600 μl LB/Ampicillin (50 μl/ml) and incubated at 37°C for 16 hr. Approximately 1 μl of culture suspension was transferred to 96-well plate (Costar) with 100 μl of the following solution: 67 mM Tris–HCl (pH 8.8), 16 mM ammonium sulphate, 0.1% (v/v) Tween 20, 1.5 mM MgCl2, 0.15 mM for each of the four dNTPs, 0.2 μM for each of forward primer A (5′-TCCCGGCTCGTATGTTGTGTGGAAT-3) and reverse primer B (5′-GTTGTAAAACGACGGCCAGTGAATTG-3′) and 1 unit of Taq DNA Polymerase (Invitrogen, Grand Island, NY). Reactions were amplified in MJ Research thermalcyclers using the following cycling program: 5 min initial denaturation and bacterial lyses at 95°C; then 30 cycles of 30 sec denaturation at 95°C; 30 sec annealing at 55°C; 40 sec extension at 72°C; and final 10 min extension at 72°C. PCR buffer and unincorporated nucleotides were removed by filtering through 96-well multiscreen filter plates (Millipore, Bedford, MA). The purification protocol was automated using the 96 channel robotic workstation Multimek (Beckman Instruments, Fullerton, CA). Quality control and quantification of the amplified products were accomplished by separation of 3 μl samples on agarose gels containing EtBr followed by image analysis with Chemi Doc UV transilluminator equipped with the Quantity One software (Bio-Rad, Hercules, CA). PCR products were then lyophilized and stored at –20°C. For microarray printing, PCR products were dissolved in 15 μl of Micro Spotting Solution (ArrayIt; Telechem, Sunnyvale, CA) by vigorous shaking of plates for 4–5 hr at 4°C, and transferred to 384-well plates. Spotting was performed using the robotic station Genpak Array 21 (Genetix, Hampshire, UK) equipped with 32 Stealth Micro Spotting Pins SMP 3B (ArrayIt) settled to obtain spots of 120 μm diameter. Samples were spotted in duplicate on derivatized glass slides (MICROMAX Glass Slides: SuperChip™ I, PerkinElmer, Wellesley, MA) at 50% relative humidity. Microarrays were then processed in a UV cross linker (Stratagene, La Jolla, CA) (total power of 300 mJ) for binding the DNA to the slides. To remove unbound DNA, slides were rinsed once in 1% SDS, 3× SSC for 1 min at room temperature and twice in distilled water for 5 min at room temperature. Processed microarrays were dried in a laminar flux chamber and stored in sealed boxes under vacuum at room temperature.

Tissue samples, RNA extraction, quality control and labeling

Each tumor biopsy was selected by the local pathologist and shipped in dry ice to our laboratory within 24 hr from surgery. Upon arrival samples were minced and subsequently mechanically homogenized for RNA extraction. Total RNA was isolated using the RNA-zol reagent (Tel-Test, Friendswood, USA), following the manufacturer's instructions. 100 ng total RNA aliquots were used for quality control by capillary electrophoresis using the RNA 6000 Nano LabChip and the Agilent Bioanalyzer 2100 (Agilent Technologies, Palo Alto, CA). All RNA samples used in this study showed absence of genomic DNA contamination and no sign of degradation. As reference RNA sample for competitive microarray experiments, we used the total RNA prepared from male fetal skeletal muscle by Stratagene Europe (Amsterdam, the Netherlands). This RNA has been prepared from 5 different donor fetuses, ranging from 18 to 21 weeks of age.

Total RNA was reverse-transcribed and labeled using a Micromax TSA Labeling Kit (PerkinElmer). Two micrograms of total RNA were used in each reaction but only half of the labeled cDNA was hybridized to microarrays.

Microarray hybridization

Microarray hybridization was carried out in dual slide chambers (HybChamber, Gene Machines, San Carlos, CA) humidified with 100 μl of 3× SSC. Labeled cDNA was dissolved in 40 μl of hybridization buffer, denatured at 90°C for 2 min and applied directly to the slides. Slides were covered with 22 × 40 mm cover slips and the reaction was carried out overnight at 65°C by immersion in a high precision water bath (W28, Grant, Cambridge, UK). Posthybridization washing was performed according to the Micromax TSA Detection kit (PerkinElmer). Two replicates of each experiment were performed using different microarray slides, in which sample and reference RNAs, labeled either with Cy3 or Cy5 fluorochromes, were crossed in both combinations (dye-swapping procedure).

Statistical analysis of expression data

Array scanning was carried out using a GSI Lumonics LITE dual confocal laser scanner with ScanArray Microarray Analysis System (PerkinElmer). Raw scanner images were analyzed with QuantArray Analysis Software (GSI Lumonics, Ottawa, Canada). Normalization of expression levels27 of all spot replicates was performed with MIDAS (TIGR Microarray Data Analysis System). The Lowess (Locfit) normalization function was applied to expression data of all experiments and then values of spot replicates within and among arrays were averaged. Identification of differentially expressed genes was performed with one and two class Significance Analysis of Microarray (SAM) program28 with default settings. SAM uses a permutation-based multiple testing algorithm and identifies significant genes with variable false-discovery rates (FDR). This can be manually adjusted to include a reasonable number of candidate genes with acceptable and well defined error probabilities. In addition, a set of discriminant genes was selected using the Predictive Analysis of Microarray (PAM) program.29 Principal component analysis, cluster analysis and profile similarity searching were performed with J-Express and R software.30 In particular, hierarchical cluster analysis was performed with Pearson correlation or Euclidean distance coefficient as distance measure with complete linkage.

Complete validation analysis

Stability of gene signatures may significantly depend on the selection of cases and on the choice of training and validation sets.31 We applied a standard complete validation procedure implemented to control selection bias in predictive classification expression studies.32, 33 The procedure was applied within the high performance BioDCV computing system (

Validation of relative gene expression by quantitative RT-PCR

Quantitative RT–PCR was used to validate the results obtained from microarray experiments. We prepared 2 total RNA pools for both groups of ARMS (PAX3-FKHR positive and negative) using the same amount of RNA from each tumor sample. Three micrograms of aliquot of total RNA from each pool was used to perform 3 independent cDNA syntheses in a final volume of 10 μl, using oligo-dT primer and SuperScript II reverse transcriptase (Invitrogen). One microliter aliquot of diluted first-strand cDNA was PCR amplified in 10 μl volume using SYBR Green chemistry, according to the recommendations of manufacturer (Applera, Norwalk, CT). Gene-specific primers were designed using Primer 3 software to amplify fragments of 120–180 bp in length, close to the 3′-end of the transcript. To avoid the amplification of contaminant genomic DNA, we selected primers lying on distinct exons, separated by a long (more than 1,000 bp) intron. The dissociation curve was used to confirm the specificity of the amplicon. PCR reactions were performed in a GeneAmp 9600 thermalcycler coupled with a GeneAmp 5700 Sequence Detection System (Applied Biosystems, Foster City, CA). Thermal cycling conditions were as follows: 15 min denaturation at 95°C; followed by 35 cycles of 25 sec denaturation step at 95°C, annealing and elongation steps for 1min each at 59°C and a final 3 min elongation at 72°C. To evaluate differences in gene expression we chose a relative quantification method where the expression of target gene is standardized by a nonregulated reference gene (GADPH). To calculate the relative expression ratio a mathematical method implemented in the software REST34 was used. This method is based on PCR efficiencies and the mean crossing point deviation between the two sets of samples: PAX3-FKHR negative and PAX3-FKHR positive. Subsequently, the expression ratios of the investigated genes were challenged for significance by a nonparametric randomization.


The Human Array 2.0

In this study we have used a new release of the muscle-specific microarray platform,35 denominated Human Array 2.0.20, 21 It consists of 4,670 different clones that were obtained in their vast majority from systematic sequencing of human skeletal muscle, heart and, in a small percentage, from hematopoietic cell cDNA libraries (GEO Platform No. GPL2011). These libraries were produced using a strategy that allows the selection of a short 3′-end region of mRNAs36 and the tagging of each transcript with an unique probe. Moreover, since the 3′-end noncoding region is the least conserved part of the genes, our microarray greatly reduces the cross-hybridization of different mRNAs derived from genes with high similarity in the coding region (e.g. gene families) to the same probe. The strategy was also designed to obtain 3′-tags with very uniform size (300–600 bp), thus ensuring a comparable efficiency in the amplification and printing of all cDNA clones of the collection.37 Given its tissue-specificity, the microarray platform can identify genes that might play a role in the pathogenesis of ARMS and those that might determine the different biological and clinical characteristics of ARMS subtypes.

Expression profiling of ARMS biopsies

Previous studies6, 7, 8 suggested that PAX3-FKHR positive and negative ARMS, despite their histological and immunohistochemical similarities, represent at least 2 different clinical entities. We used the Human Array 2.0 to define gene expression profile of ARMS when compared to fetal skeletal muscle. In addition we identified a small set of genes whose expression differentiate translocation positive ARMS from negative.

Our experimental design implies that the expression data are obtained as relative values and therefore, the selection of an appropriate reference sample is of great importance. Although RMS seems to originate from satellite muscle cells1 or from uncommitted myogenic precursor cells,2 we first compared fetal and adult skeletal muscle as reference samples in competitive hybridization with RNA of the same ARMS specimen (patient No. 6). Expression data obtained in the two parallel experiments were analyzed with SAM28, using the same false discovery rate (0.29%), which resulted in 209 differentially expressed genes in patient sample respect to fetal skeletal muscle and 714 differentially expressed genes respect to adult skeletal muscle (data not shown). The divergence from adult muscle was considered too high to allow a statistically significant comparison between profiles of different patients and therefore fetal muscle RNA was chosen as reference sample for our experiments. Furthermore, because of the very early onset of RMS, we thought that the “normal” tissue counterpart to which the tumor profile could be compared to identify possible genes involved in the tumor biology was fetal muscle.

Ten patients were enrolled in this study: 5 positive for the chromosomal translocation and 5 negative. Clinical characterization of these patients is detailed in Table I. We compared the transcription profiles of the rhabdomyosarcoma samples by microarray competitive hybridization against an arbitrary total RNA reference prepared from a pool of 5 human fetal skeletal muscles (Stratagene, Garden Grove, CA). We applied an unsupervised hierarchical clustering algorithm to group genes on the basis of their expression pattern similarity among the tumor samples. For this analysis we used the fluorescence values of all 4,670 genes of the array. The analysis showed that expression profiling divides the 10 patients in 2 distinct groups that reflect the different PAX3-FKHR status (Fig. 1). This result was confirmed by applying different distance measures (Euclidean and Pearson correlation) to the expression dataset (GEO Series No. GSE2787).

Figure 1.

Hierarchical cluster analysis with complete linkage method of gene expression profiles of 10 alveolar rhabdomyosarcoma patients PAX3-FKHR positive (No. 1–5) and negative (No. 6–10). The shape of the dendrogram is identical when we apply Euclidean distance or Pearson correlation methodologies. This analysis is based on the complete set of probes present in our microarray platform (Human Array 2.0). Patients are clearly divided into two subgroups and this is in agreement with molecular genetic characteristics. The dendrogram was obtained with the R statistical software with “cluster” package and “agnes” function.

Identification of transcripts commonly deregulated in all ARMS in comparison to fetal skeletal muscle

The comparison of transcription profiles of tumor samples with fetal skeletal muscle by using SAM analysis identified, with 0.21% false discovery rate, 171 differentially expressed genes common to all ARMS patients (Supplementary Information, s-Table I). Of these, 53 (31%) were over expressed and 118 (69 %) under expressed. These genes were classified according to FATIGO ( in 9 different functional classes (Fig. 2a). Genes involved in morphogenesis were fast skeletal myosin alkali light chain 1 (MYL1), tropomyosin 1α (TPM1), tropomyosin 2β (TPM2), desmin (DES) and cell motility (actin α1 (ACTA1), myosin heavy polypeptide 2 (MYH2), nebulin (NEB)) that code for the principal components of the sarcomeric contractile machinery showed a significant downregulation in ARMS when compared to fetal skeletal muscle. The 3 subunits of the troponin complex (troponin C (TNNC2), troponin I (TNNI2), and troponin T (TNNT3)) that is a key regulator of contraction in fast-twitch skeletal muscle also showed a down regulation.

Figure 2.

(a) Functional classification of genes altered in ARMS. The transcripts differentially expressed in the 10 ARMS samples in comparison to fetal skeletal muscle have been grouped here according to their biological function in 9 different classes. This classification was performed following the criteria of the program FATIGO. p-value of statistical significance analysis for each category, calculated by the R statistical software, is reported. (GO: gene ontology). (b) ARMS differentially expressed genes identified by SAM. Hierarchical clustering of ARMS patients with a limited set of genes belonging to two functional classes relevant for tumor biology: cell growth and/or maintenance and cell communication. The expression of these two groups of functionally related genes appears to be generally altered in all ARMS patients with respect to normal fetal skeletal muscle. Each column represents an ARMS sample profile and each row contains values referring to individual genes. A color-coded scale for the normalized expression values has been used: red represent up regulation and green down regulation relative to fetal skeletal muscle control. The complete list of differentially expressed genes identified by SAM algorithm is provided in the Supplementary Information (s-Table I).

Many differentially expressed genes appear to be involved in cell communication and cell growth and/or maintenance, and therefore could play an important role in ARMS biology. These genes are listed in Figure 2b, together with their expression levels in the 10 ARMS samples. In this context two interesting genes, involved in cell proliferation and differentiation control, galectin 1 (LGALS1) and bridging-integrator protein-1 isoform BIN1+12A (BIN1 also called AMPHL2) were under expressed (bolded in Figure 2b). Our data shows that these genes were underexpressed in all ARMS, thus leading to hypothesize that this may contribute to the failure of precursor tumor cells to withdraw from cell cycle and to differentiate, leading to abnormal proliferation. We then found a large group of deregulated genes belonging to the functional category of cell metabolism that includes genes for protein biosynthesis. They appeared generally overexpressed: eukaryotic translation initiation factor 2, beta (EIF2S2), proteasome β3 subunit (PSMB3), eukaryotic translation initiation factor 4A, isoform 2 (EIF4A2), ribosomal protein L37a (RPL37), eukaryotic translation initiation factor 4H (WBSCR1). Finally, a large group of transcripts differentially expressed in ARMS are still functionally uncharacterized.

Transcripts discriminating PAX3-FKHR positive and negative ARMS

PAM algorithm29 was successfully applied to cancer class prediction from gene expression data in a variety of studies.38 PAM classifier algorithm is based on the nearest shrunken centroids methodology that gives higher weight to genes whose expression level is stable within samples of the same class. We applied PAM on the 10 ARMS profiles (the training set) obtaining 103 transcripts perfectly discriminating between PAX3-FKHR positive and negative ARMS (s-Table II). As a general result we observed that a larger number of genes were overexpressed (in comparison to fetal skeletal muscle) in PAX3-FKHR negative ARMS than in the positive counterpart (Fig. 3a). These genes were grouped in different functional classes using the program FATIGO. We focused our attention on discriminating genes belonging to the class of cell growth and communication (Fig. 3b) since among them there might be some new candidate molecular markers for ARMS characterization and classification. Quantitative RT-PCR analysis was used to quantify the expression levels of 6 selected genes overexpressed in PAX3-FKHR negative ARMS. They are as follows: Ras-related C3 (RAC1), Cofilin 1 (CFL1), member RAS oncogene family (RAB7), prothymosin, alpha (PTMA), cyclin D1 (CCND1) and insulin-like growth factor binding protein 2 (IGFBP2) (Supplementary Information, s-Table IV). As it can be seen from the Supplementary Information (s-Table III) the expression values obtained with quantitative RT-PCR for all tested transcripts were in agreement with microarray results.

Figure 3.

(a) Percentage of overexpressed transcripts among discriminant genes identified by PAM in the ARMS profiles. This figure shows that all the PAX3-FKHR negative tumors generally show a higher percentage of upregulated genes (in comparison to fetal skeletal muscle) than PAX3-FKHR positive ARMS. (b) Discriminant genes identified by PAM analysis. Genes belonging to cell growth and/or maintenance and cell communication functional classes are shown that can discriminate the profile of translocation positive or negative ARMS samples, as obtained by analysis of expression values using PAM algorithm. The complete list of differentially expressed genes identified by PAM algorithm is provided in the Supplementary Information (s-Table II). Genes given in bold were validated by Quantitative RT-PCR.

The accuracy of the discrimination (obtained by cross-validation) was 100% for all patients except for a single PAX3-FKHR negative patient (No. 10) for whom the accuracy value dropped to 80% (Fig. 4a). To confirm the predictive value of these discriminating genes we used PAM algorithm to analyze the expression profiles obtained from 4 additional ARMS specimens, in a single-blinded protocol (the test set). The novel samples were correctly classified by discriminant analysis as 2 translocation positive and 2 negative ARMS, respectively (Fig. 4b). The classification was in agreement with the results of the diagnostic RT-PCR (Table I). When searching for discriminant genes, it is necessary to challenge the discriminant algorithm on a training set of expression profiles that are homogeneous to assure a consistent analysis. In our initial PAM study, the homogeneity of the ARMS training set of profiles was hampered by the presence of sample No. 10. This patient was the only infant in our series and, as reported, RMS in very young patients (especially within the ARMS subgroup) may represent a different biological entity.39, 40, 41 For this reason we performed a second cycle of PAM analysis, excluding the profile of sample No. 10. Using this new training set we found that a group of only 10 genes discriminate the nine ARMS patients with 100% accuracy in two classes. These genes are shown in bold face in Table II and were included in the list of 103 discriminant transcripts obtained with the initial PAM analysis. The classification prediction of the 4 additional patients (No. 11–14, Table I), introduced blindly, was also performed with this small set of genes and again a correct (100% accuracy) classification was indeed achieved.

Figure 4.

PAM classification of ARMS patients. (a) Training set. The method of the nearest centroids found a set of 103 genes whose differential expression values distinguish the two different subtypes of ARMS. Applying the PAM algorithm with these 103 genes, 9 patients were assigned to the correct class with 100% accuracy and patient number 10 was also classified correctly, but with 80% accuracy. (b) Test set. In a blind protocol, we analyzed the expression profiles of 4 additional ARMS samples (Table 1, patient No. 11–14) using the 103 discriminant genes obtained with the training set. PAM analysis confirmed the predictive value of discriminant genes assigning all 4 samples to the correct class with 100% accuracy. Two were in fact classified as translocation-positive and 2 as translocation negative, according to molecular genetic classification. PAX3-FKHR positive patients are represented by rhombs, while negative patients by squares.

Table II. Top 50 Highest Ranking Marker Genes Discriminating PAX3-FKHR Positive and Negative Arms Identified by Complete Validation Procedure
IDArchive pos.Gene symbol# ExtsMeanSDAverage ARMS positive for t(2;13)SDAverage ARMS negative for t(2;13)SD
  1. Genes are sorted according to the molteplicity occurrence in each of the 441 lists: genes with higher occurrence are considered as strongly discriminant. Gene molteplicity (# Exts), mean position in the list (Mean), standard deviation (SD) of this mean are reported with the mean and the standard deviation of the log2 ratio among positive and negative ARMS. In bold genes found also with PAM analysis.


Complete validation analysis

In a very recent paper, Michiels et al.31 challenged the methodologies used to identify gene signatures in microarray data for the prediction of cancer patient outcome. They generated multiple random training- and validation-sets and found that the lists of genes discriminating different cancer features were highly unstable. The molecular signatures seemed to depend strongly on the selection of patients in the training sets.

For this reason, our full ARMS dataset was resampled into different pairs of stratified train and test sets, respectively consisting of 5 training samples and 2 test samples for each class. The portioning strategy extends the 10 vs. 4 analysis discussed in the previous subsection. There are exactly 441 different stratified partitions. For each partition, Entropy-based Recursive Feature Elimination was applied with Support Vector Machine (SVM) classifiers to the training set, obtaining a ranked list of all genes. Finally, accuracy for k best genes was estimated in terms of the Average Test Error (ATE). Results are summarized in Figure 5, displaying ATE vs. number of feature set (genes) with a 95% confidence interval bars obtained by bootstrap replicates. The same procedure was replicated with randomized labels to exclude overfitting effects. For this randomized label experiment, the ATE over the 1,000 runs was close to the 50% no-information error rate, and significantly different from the results with true labels. Furthermore, to investigate the stability of the classifiers with respect to the single sample level, the 14 sampletracking profiles (Supplementary Information, s-Figure 1) were computed,23 showing that 5 of the PAX3-FKHR positive ARMS samples (No. 1, 2, 4, 5, 14) may be correctly identified with less than 20 genes. The No. 8, 9, 10 and 11 PAX3-FKHR negative ARMS require not less than 50 genes for optimal classification. Only the No. 10 sample does not reach perfect classification with the full panel of n1 = 4,992 genes; however, the sample tracking profiles do not indicate No. 10 or any other sample as an outlier.

Figure 5.

Complete validation procedure. Predictive classification accuracy (ATE) with 95% confidence interval versus increasing number of features using real (decreasing line) and random class (upper line) labels.

Finally, genes' signature was assessed using the multiplicity occurrence in each of the 441 lists generated by the feature selection as a ranking: genes with higher occurrence are considered as strong discriminants between ARMS positive and negative for t(2;13). Table II shows the list of these top 50 genes.


Cytogenetic and molecular analyses have demonstrated the frequent association of ARMS with the reciprocal translocation t(2;13)(q35;q14) juxtaposing the PAX3 and FKHR genes. Previous reports suggest that this specific genetic alteration might be associated to an increased risk of failure among ARMS patients.9, 10 Microarray methodology has already been used successfully to study RMS,16, 17, 18 and we addressed our experiments to study the molecular basis of RMS pathogenesis. We used the expression profiling technology with cDNA microarrays on 10 ARMS: 5 were positive for the t(2;13) translocation and 5 were negative. We employed a dedicated muscle-specific microarray composed of 4,670 EST sequences (Human Array 2.0) and compared the transcription profiles of tumor samples and human fetal skeletal muscle, used as a normal tissue control. The very early onset of RMS is supporting the use of fetal skeletal muscle as the most suitable tissue to which the tumor profile could be compared to find possible genes involved in the tumor biology. In this respect, differentially expressed genes common to all tumor samples compared to fetal skeletal muscle could be considered important factors for rhabdomyosarcoma progression, but they should not be considered as markers for the origin of RMS from specific cell precursors. In particular, we propose that two novel RMS marker genes (Galectin and BIN1), involved in cell communication and cell growth/maintenance, could have an important role in ARMS tumorigenesis. Galectin 1 (LGALS1) is a β-galactoside-binding protein implicated in modulating cell–cell and cell–matrix interactions. LGALS1 has been shown to stimulate apoptosis42 and to induce cell arrest during the S-G2 transition in mammalian cell lines.43 LGALS1 enhances the fusion of myoblasts in myotubes and this explains why galectin-1 is expressed at high levels when maximal cell fusion occurs during muscle development.44 Bridging-integrator protein-1 isoform BIN1+12A (BIN1) is a nucleocytoplasmic adaptor protein that interacts at the Myc box region at the amino terminus of the c-myc protein. Cells overexpressing human BIN1 grow more slowly than control myoblasts, and they differentiate more rapidly when depleted of growth factors.45 In particular, in vitro studies have shown that BIN1 may induce membrane curvature, thus contributing to the biogenesis of T-tubules.46 Moreover, there are evidences that BIN1 participates in a cell-death mechanism engaged by c-Myc in tumor cells and may help those cells to escape the death mechanism associated with c-Myc activation.47 In fact, BIN1 is frequently down regulated or inactivated in breast and prostate cancers and in malignant melanoma.48 A relevant number of differentially expressed genes are still uncharacterized; the investigation of their role in the pathogenesis of ARMS is warranted since some of them may help the understanding of the molecular basis and behavior of this tumor.

An interesting result of our study is that transcriptional profiles obtained with the muscle microarray platform correctly discriminate PAX3-FKHR positive and negative patients and that this property is essentially based on a small set of discriminating genes. This was confirmed with the blind analysis of 4 additional ARMS biopsies that were correctly assigned to either subgroups and remarkably with a standard complete validation procedure implemented to control selection bias in predictive classification expression studies.22, 23

It is difficult to establish whether among the deregulated genes that characterize the translocation-positive ARMS samples, there are some specific transcripts that could be directly related to this primary genetic event. Further functional studies are needed to investigate this issue and some possible candidate genes are presently under study. Our work has identified instead a series of transcripts that could be important for the biology of the less characterized class of PAX3-FKHR negative ARMS. We focused on 4 discriminating genes that were up regulated in t(2;13) negative ARMS: small GTP binding protein Rac1 (RAC1), Cofilin 1 (CFL1), cyclin D1 (CCND1) and insulin-like growth factor binding protein (IGFBP2). The small GTP-binding protein RAC1, a member of the Rho family, is involved in a wide range of biological processes including cell motility, adhesion, morphology, proliferation and inhibition of skeletal myogenesis. It was demonstrated that Rac1 could induce focal complex/adhesion turnover both directly through PAK149 and indirectly by antagonizing Rho activation.50 Moreover, it is widely accepted that the activation of Rac1 in response to extracellular signals enhances cell spreading and migration, promoting the formation of lamellipodia or membrane ruffling. The major contribution to formation of lamellipodia depends upon several changes in cytoskeletal structures. Rac1 stimulates the activity of LIM-Kinase through the activation of its downstream target Pak1.51 LIM-kinase phosphorilates and inactivates cofilin (CFL1), a protein that can promote actin depolimerization.52 Moreover, Rac1 has a key role in the regulation of cell cycle since it controls downstream signaling by mitogen-activated protein kinase (MAPK) pathways, including the p38 pathway53 and the Jun-N-terminal kinase (JNK) pathway54 as well as through nuclear factor kB (NF-kB) activity.55, 56 The critical role of Rac1 in rhabdomyosarcoma tumorigenesis was confirmed by other studies reporting that it was constitutively activated in RMS derived cell lines.57 In summary we can hypothesize that RAC1 over expression in PAX3-FKHR negative ARMS patients might inhibit cell cycle exit and myogenic differentiation,58 supporting myoblast proliferation and neoplastic transformation. RAC1 and cofilin could also be involved in cell migration and therefore they might be implicated in tumor invasion and metastasis. IGFBP2 was reported to be markedly overexpressed in many tumors and tumors cell lines.59, 60, 61 This elevated expression of IGFBP2 might be promoted by autocrine/paracrine stimulation via IGF2, which is often secreted in tumors in large quantities.62, 63 In addition, the expression level of IGFBP2 was found altered in some rhabdomyosarcoma cell lines15 and also in the RMS SAGE library.18 IGFBP2 contains an RGD motif in its C-terminus and, since this domain could mediate the binding to integrins,64 it was hypothesized that this protein can act independently from IGF on tumor cells through integrin signaling pathway. Recently, it has been demonstrated that the binding of IGFBP2 to integrins causes the phosphorylation of FAK and p42/44 mitogen-associated protein kinase (MAPK), 2 important components of the integrin-mediated signal cascade. This interaction seems to affect cell proliferation and adhesion, thus contributing to tumor cell dissemination and progression.65

As we mentioned in the Introduction, other expression studies have been published on RMS. We therefore compared our expression data to the datasets published by Khan et al.,15, 16 Wachtel et al.,17 Baer et al.19 and Shaaf et al..18 In the first paper, published in 1998, Khan et al. used a cDNA microarray containing 1.238 cDNA to investigate the gene signatures of 7 cell lines characterized by the presence of the PAX3-FKHR fusion gene. Among the 37 genes overexpressed in ARMS cell lines we found in our dataset 4 genes with the same expression profiles: cyclin-dependent kinase 4 (CDK4), insulin-like growth factor binding protein 2 (IGFBP2), small nuclear ribonucleoprotein polypeptides B and B1 (SNRPB) and TNF receptor-associated protein 1 (TRAP1). Differently, we cannot compare our data with a second work of Khan et al. because in this case the experimental design was too different. In fact in this second paper both the PAX3 gene and the PAX3-FKHR chimeric gene were independently introduced into the fibroblast-like cell line 3T3 and the differential effect on cell transcriptome was investigated to clarify the action of these transcription factors. For similar reasons, we can not compare our data with those described in the paper of Baer et al. because these authors analyzed the gene expression profiles of 12 childhood RMS and 12 Ewing's sarcomas, whereas we focused our attention on the transcriptional signatures of ARMS positive and negative for t(2;13).

Wachtel et al. described the gene signatures of different RMS subgroups obtained with the Affymetrix HG-U133A oligonucleotide platform. By using a different approach (cDNA microarray) we confirmed that PAX3-FKHR positive and negative ARMS are characterized by significantly different gene expression profiles, but only 3 differentially expressed genes found by our experiments are in agreement with the dataset published by these authors. They are as follows: endothelial differentiation, lysophosphatidic acid G-protein-coupled receptor, 2 (EDG2), nibrin (NBS1) and hypothetical protein FLJ10853 (FLJ10853). This result is not surprising because it has been demonstrated that relatively large differences exist between studies approaching the same experimental issues but using different microarray platforms and this is particularly evident when Affymetrix chips and deposition microarray are compared.66, 67

It is remarkable that our gene expression dataset shows a good agreement with the RMS transcriptome analysis obtained recently with the alternative genomic technology of SAGE.68 SAGE does not require a priori knowledge about the sequences of transcripts and therefore provides a complete view of transcript abundances. With the simultaneous comparison of subsets of SAGE libraries of fetal skeletal muscle, embryonal and alveolar RMS Shaaf, GJ et al. identified 251 differentially expressed genes. Among these genes, 166 are present in our cDNA platform. Comparing our differentially expressed genes (s-Table I) in 5 ARMS positive for t(2;13) respect to the SAGE datasets we found 23 common differentially expressed transcripts: 12.00% genes (20/166) showed the same expression trend, while only 1.80% transcripts (3/166) showed opposing regulation (MYLK2, RPL37, RPL13). The results of this comparison are shown in the Table III.

Table III. Correlation Between Microarray and Sage Datasets
Array-IDRef. seq.Gene symbolLocus linkExpression value (log2 ratio)Tag count × 100,000 (SAGE)
Average ARMS positive for t(2;13)Standard deviationARMS36 positive for t(2;13)Fetal sk. muscle
  1. List of the 20 differentially expressed transcripts in t(2;13) positive ARMS that are in agreement with the SAGE data. For each gene the log2 ratio expression value and number of SAGE tags in translocation positive ARMS and fetal skeletal muscle are reported. Expression values in SAGE study are indicated as tag-count normalized to 100,000 for each library (ref.18).


In conclusion our results demonstrate that a muscle-specific cDNA platform was able to identify tumor specific gene expression profiles of human ARMS when compared to fetal skeletal muscle and to classify correctly tumors in distinct groups (PAX3-FKHR positive and negative). By applying the discriminant analysis to our expression dataset, we have identified a small set of genes whose expression levels discriminate translocation positive and negative ARMS with a strong predictive value.

Supplementary information

On-line Supplementary information is available at They contain complete sets of expression data and lists or profiles of altered transcripts found in our experiments. MIAME standards have been followed in microarray data analysis and presentation. Expression datasets have been submitted to GEO database ( Accession numbers for Human Array 2.0 platform is GPL2011 and for expression datasets is GSE2787 series.


The authors thank Nicola Vitulo for bioinformatic management of the muscle transcript database. We are also grateful to Beniamina Pacchioni, Stefano Cagnin and Stefano Campanaro, for help with microarray fabrication and to Stefania Bortoluzzi for critical evaluation. A special thanks to Stefano Merler for developing the Complete Validation Analysis.