DNA microarray analyses of melanoma gene expression: a decade in the mines


*Address correspondence to Keith S. Hoek, e-mail: keith.hoek@usz.ch


Gene expression profiling in melanoma is an exercise in prospecting for fundamental molecular biologies useful for formulating hypotheses to explain disease characteristics. Over the last 10 years DNA microarray technologies have been employed in scores of such melanoma studies. As soon as the technology became available, researchers grasped the new tool and began to hammer away at melanoma’s molecular strata. A small army of data miners toiled into the dross, believing that glittering seams of relevance had been struck and from great slag heaps of data they would extract biological gold. What exactly has a decade of ceaseless sappering brought us, what can be profitably refined from what has already been extracted and what remains undiscovered? This review is a critical analysis of the multiple research programs, which have attempted to define and contextualize broad transcriptional changes relevant to melanoma biology.

Introduction – the data miner’s canary

There was a time when coal miners, working in atmospherically hostile environments and needing some way to monitor air quality, brought down into the darkness small caged birds. The coal miner’s canary, more sensitive to toxic gases than the miner, provided an essential warning if it suddenly fell silent and died. Such a mechanism ensured that the miner’s working environment was both safer and more productive. For the high-throughput data miner, reproducibility (the final arbiter of hypothesis testing) is his miner’s canary. If he does not want to get into trouble he should not go to work without reference to it. This is illustrated by a recent review, which cataloged 69 genes reported by others as being transcriptionally affected by DNA methylation in melanoma (Rothhammer and Bosserhoff, 2007). Three of the cited sources, which together identified 49 of these genes, used DNA microarrays to assess changes effected by treatment of cell lines with 5-aza-2′deoxycytidine (Gallagher et al., 2005; Muthusamy et al., 2006; Van Der Velden et al., 2003). Two additional uncited studies also used gene expression profiling to examine DNA methylation effects in melanoma (Karpf et al., 2004; Mori et al., 2005). It is conspicuous that the results of five studies with identical aims and nearly identical strategies show no consistency. The conclusion that must be drawn is that gene expression profiling of melanoma has failed to reproducibly reveal genes whose expression is affected by DNA methylation – the data miner’s canary has died (Figure 1). That the cited review accepted such different outcomes uncritically shows the canary’s passing went unnoticed. This failure to recognize inconsistency is symptomatic of how our view of DNA microarray data analysis is frequently dimmed by our unfamiliarity with it. The reason gene expression profiling failed to expose the role of DNA methylation in melanoma is because neither the researchers nor the reviewers gave sufficient attention to statistical criteria essential to high-throughput investigations. This review is an attempt to critically assess the bulk of gene expression profiling in melanoma research, to differentiate robust findings from weak, to point out why some strategies succeed and others do not, and why we must keep an eye on that bird.

Figure 1.

 The data miner’s canary is dead. Jean-Baptist Greuze’s La jeune fille qui pleure son oiseau mort (1765) as a metaphor for the state of statistical rigor in most high-throughput analyses of melanoma.

DNA microarrays are small platforms of glass or silica hosting tens of thousands of single-stranded DNA sequences. The sequences are fully characterized and deposited onto their platforms such that the identity and location of each is standardized. For gene expression arrays, these sequences correspond to (or complement) the sequences specific to mRNA transcripts. From cell cultures or tissues mRNA is extracted, labeled (or used to derive labeled cDNA) and hybridized to DNA microarrays. Unbound or weakly bound material is washed off and the platform is scanned to detect the label’s signal. Signal intensity gives an approximation of the relative proportion for each labeled sequence and therefore an estimate of gene expression in the sample. With very large numbers of different probes on a platform concomitantly large sets of data are generated with each experiment. Critical to this review is the importance of understanding that, from a statistical viewpoint, each probe sequence on a platform represents a single experiment among perhaps tens of thousands. It follows that with standard significance testing DNA microarrays are thus guaranteed to produce false positives in the data. Mining large datasets for biologically relevant information must be considered with this in mind.

The term ‘data mining’ once had a pejorative connotation among the statistical community where it was associated with the inappropriate search for statistical significance within large datasets. Conventional statistical approaches begin with a testable hypothesis, collect relevant data and then test the significance of that data in relation to the hypothesis. It is a key that the sample used to test (or validate) the hypothesis is not the same as that used to derive it. Data mining, as it was once known, involved the mistaken idea that analysis of a dataset to derive a hypothesis also constitutes the test; the consequence being that chance patterns in the data can come to be held as significant. Although statisticians now call this practice data dredging, much that is today termed data mining can still be referred to in its originally negative sense (Ioannidis, 2005a). When considering what we have gained as a research community from high-throughput analyses of melanoma, and this review will attempt to cover scores of such works, it is important that we can differentiate dredging from mining.

In discriminating good work from bad, it is informative to consider the nature of the experiments conducted, the data obtained and the pitfalls peculiar to high-throughput analysis. Initially, the results of different experiments included few or no replicates and were qualified by arbitrarily chosen fold change limitations, which is a strategy blind to biological variation and now discredited (Allison et al., 2006). Over time significance calculations became broadly insisted upon and researchers transitioned from reporting only fold-change to increasing use of statistics. It is also now recognized that having sufficient sample numbers or replicates is vital and a minimum of five samples per class has become a general recommendation (Allison et al., 2006; Pavlidis et al., 2003; Pawitan et al., 2005; Tsai et al., 2003). Despite this an additional and necessary statistical consideration, the control of false discovery rates, is still frequently ignored. As already mentioned each probe or probe-spot on an array is its own experiment. When employed to interrogate gene expression in two different groups of samples a t-statistic for that experiment is calculated and compared against an arbitrarily chosen P-value to assess its significance. The critical issue is that, because DNA microarrays perform tens of thousands of these experiments simultaneously, an estimable number of individual results will pass the P-value limit by chance alone. The number of false positives that occur depends on a combination of the number of probes on the array and the P-value cutoff chosen. For example, if you perform experiments with an array comprising a thousand distinct probes and use a P-value cutoff of 0.05 you can expect approximately 50 probes to erroneously discard the null hypothesis (that there is no significant difference in expression between sample classes). If your test reveals 55 genes as being significant then it is likely that 90% of them are false positives. To take such a list of genes and try to use it to explain the biology of the system you are investigating (e.g. through GO term analyses) would be an essentially pointless exercise. Controlling false discovery rates involves either reducing (by orders of magnitude) the P-value cutoff or adjusting each probe’s calculated t-statistic according to the number of tests conducted, a strategy commonly referred to as multiple testing correction and widely accepted among high-throughput analysts (Allison et al., 2006; Benjamini and Hochberg, 1995). Perhaps more bedeviling than failure to address the above-mentioned criteria is non-replication of research discoveries, where conclusions are drawn on the basis of single studies (Ioannidis, 2005b), that this is inherently risky in high-throughput research lies in how very large datasets virtually ensure chance connections between sample groupings. For example, most of the studies mentioned in this review employ some form of clustering. As it is known that sharply defined clusters can be derived from random data, it is not possible to discern from an individual study whether the clustering observed is characteristic of its sample set or if it is reproducible using another (Michiels et al., 2007; Miller et al., 2004). An answer to this is to perform an identical study on one or more additional test sets to validate the hypothesis generated. Melanoma researchers have produced high-throughput studies of varying rigor across a broad range of melanoma-related topics. That so many works are of dubious value today is partly linked to the fact that the beginnings of high-throughput analysis in melanoma coincided with the earliest stages of DNA microarray development.

Mining melanoma

In the mid-1990s, soon after the advent of DNA microarrays (Schena et al., 1995), melanoma was one of the first cancers recognized as an obvious and amenable target for their practical application. Between 1996 and 2006, at least 129 separate reports were published detailing experiments using DNA microarrays to investigate various aspects of gene expression in melanoma biology (Figure 2). The earliest of these works, which interrogated relatively few genes on custom-made platforms, were as much validations of DNA microarray technology as they were investigations of melanoma (Derisi et al., 1996; Maniotis et al., 1999). When technological development stabilized to the point that arrays became commercially available array-based research programs were able to focus more on the disease itself. Studies seeking to identify improved clinical markers or understand malignant transformation compared the gene expression patterns of diseased and healthy tissues. Melanoma progression was analyzed by considering transcriptional signatures derived from different clinical stages, looking to explain their physiological dynamism and find improved progression markers. Gene expression patterns related to patient survival were sought in an effort to provide clinicians with more powerful prognostic tools. Healthy and diseased cells were subjected to a battery of different treatments, exploring everything from the effects of UV radiation on gene expression in melanocytes to the response of melanoma cells to various small-molecule challenges. An intensely investigated aspect of melanoma gene expression has been the examination of metastatic potential. The comparison of transcription patterns of cells which are characterized as being tumorigenic, invasive or otherwise aggressive against those of cells which are less so has been a vigorously pursued stratum. The overall situation can be summarized as follows; an industrious decade of gene expression profiling in melanoma research has generated an enormous quantity of data – what have we gained?

Figure 2.

 Expression profiling reports in melanoma science. Output of refereed papers documenting gene expression profiling experiments performed on melanoma cell lines or tissues between 1996 and 2006. The proportion of papers employing Affymetrix platforms is highlighted in grey.

Breaking new ground

In 1996, a paper published a detailed result of the collaboration between Pat Brown’s DNA microarray group at Stanford and Jeffrey Trent’s well-established cancer genetics team at the NIH. This collaboration employed Brown’s DNA microarrays to hunt for tumorigenicity genes in human melanoma samples supplied by Trent. The experiment focused on the tumorigenic UACC-903 line, in which tumorigenicity is suppressed when transformed with a normal chromosome 6. With this strategy of comparison, it was hoped that a tumor suppressor would be identified. Indeed, on chromosome 6 the gene for Waf1 (CDKN1A), a p53-regulated factor involved in regulating cell cycle progression at G1, was shown to be upregulated with tumor suppression. The approach was able to confirm other genes identified previously through traditional approaches and added further candidates for factors involved in the regulation of tumorigenicity (Derisi et al., 1996). As a technical document outlining the practical application of a new technique the work remains noteworthy. However, from the perspective of more than a decade of additional experimentation and methodological improvement the results themselves are of questionable value. The platform used could interrogate fewer than one thousand different transcripts, representing <3% of all mRNA species. While their specific approach used an element of technical replication, crucial to establish the technique’s utility, no biological replication was employed and calculations of significance could not be made. However, as that report there have been a large number of similarly targeted programs and these documents a steady improvement in the way gene expression analysis in melanoma research has been conducted. Platforms have been enlarged to include almost all known transcript species. The numbers of biological replicates and use of statistics have improved. Unfortunately, full observance of minimum criteria has been rare. Despite this there are cases where we may yet draw meaningful conclusions.

Transformation from melanocyte to melanoma

Understanding the molecular differences between normal pigmented cells and melanoma is a useful starting point for both identifying practical clinical markers and comprehending the role of transcriptional change in neoplastic transformation. Accordingly, a number of different groups have employed gene expression arrays to document the up- and downregulation of genes in melanoma when compared with healthy cells and tissue. The initial studies used small arrays to identify by fold-change filtering a small number of genes with expression differences between samples (De Wit et al., 2005; Dooley et al., 2003; Mirmohammadsadegh et al., 2004; Seykora et al., 2003; Zuidervaart et al., 2003). The combined results of these small studies are that there is almost no agreement between them. The reasons for this are very likely due to their use of small platforms, few replicates and insufficient statistical stringency. There have been three studies using larger array formats to compare melanoma against normal cells or tissues (Haqq et al., 2005; Hoek et al., 2004; Talantov et al., 2005). Each of these identified putative molecular markers but only one also described signaling changes underlying transformation (Hoek et al., 2004). Systematic analysis of their respective gene lists shows that two genes (CITED1 and CDH3) were identified by all groups as being differentially expressed. For programs interrogating the greater fraction of expressed genes this is a diminishingly small yield. The study which included an assessment of co-regulation patterns within changed gene sets identified the Notch signal pathway as a candidate driver of transformation (Hoek et al., 2004). The problem is that despite its sophisticated correlative analyses this work used only two control samples. Therefore, in the absence of comparison to a larger and better controlled analysis, no firm conclusions can be reached.

With the passage of time and a growing enthusiasm for public databases it has only recently become possible to combine different datasets and carry out larger multistudy analyses. Such an analysis performed for this review shows that DNA microarray programs of sufficient scope may return with high reproducibility a large number of genes whose expression in melanoma are significantly changed. The strategy involved combining data from experiments which employed the same platform (in this case the Affymetrix HG-U133A probe set library) and source material (cell lines). Four sources were drawn from to assemble a 28 sample melanocyte control set (Hoek et al., 2004, 2006; Magnoni et al., 2007; Ryu et al., 2007). Five separate melanoma sets, with a minimum of 12 samples per set, were obtained from three different studies (Hoek et al., 2006; Johansson et al., 2007; Packer et al., 2007). Each melanoma set was separately compared with the melanocyte control set to identify significant differences in gene expression (full details in Appendix S1-A). Finally, genes which were not identified as being down- or upregulated in all cases were discarded, and the remaining genes were ranked based on their average performance – this approach, which analyses each institutional dataset in isolation before the final qualitative filter, avoids any institutional bias. Table 1 shows 86 genes, which undergo significant and reproducible downregulation in melanoma cells. Table 2 shows the top 150 genes (from a total of 1610 – see Appendix S1), which undergo significant and reproducible upregulation in melanoma cells. The results recapitulate a large number of factors well known to melanoma research. These include the downregulation of epithelial cadherin, dipeptidyl peptidase IV and c-Kit; and the upregulation of known melanoma antigens (MAGEA6, MAGEA3, MAGEA12, and PRAME), neuronal cadherin and osteopontin. Additional genes confirm earlier reports such as the upregulated Notch-2. Less familiar factors include a downregulated putative tumor suppressor (WFDC1) and upregulated tumor protein D52 (TPD52) neither of which have been specifically associated with melanoma before. These gene lists, selected by employing separate multiple-testing controlled analyses of five large sample sets and rigorous selection of uniform pattern reproducibility, currently represent the strongest candidates for gene expression change in melanoma neogenesis as measured by transcription profiling. A detailed examination of the mechanisms for their expression would likely provide useful additional clues about the process of transformation.

Table 1.   Genes downregulated in melanoma
RankIDGene symbolGene title
Dpp4Dipeptidyl-peptidase 4
CDH1Cadherin 1, type 1, E-cadherin (epithelial)
 3203256_atCDH3Cadherin 3, type 1, P-cadherin (placental)
FBLN1Fibulin 1
 5206498_atOCA2Oculocutaneous albinism II (pink-eye dilution homolog, mouse)
 6205051_s_atKITv-kit Hardy-Zuckerman 4 feline sarcoma viral oncogene homolog
 7218723_s_atC13orf15Chromosome 13 open reading frame 15
 8219478_atWFDC1WAP four-disulfide core domain 1
 9214580_x_atKRT6Keratin 6A, keratin 6B, keratin 6C
ADH1Balcohol dehydrogenase IB (Class I), beta polypeptide
EDNRAEndothelin receptor type A
SLC7A8Solute carrier family 7, member 8
13212935_atMCF2LMCF.2 cell line derived transforming sequence-like
14218638_s_atSPON2Spondin 2, extracellular matrix protein
15214071_atMPPE1Metallophosphoesterase 1
16205551_atSV2BSynaptic vesicle glycoprotein 2B
17209959_atNR4A3Nuclear receptor subfamily 4, group A, member 3
18204777_s_atMALMal, T-cell differentiation protein
20219315_s_atC16orf30Chromosome 16 open reading frame 30
21202283_atSERPINF1Serpin peptidase inhibitor, clade F, member 1
22206030_atASPAAspartoacylase (Canavan disease)
24209924_atCCL18Chemokine (C-C motif) ligand 18
25204187_atGMPRGuanosine monophosphate reductase
26219040_atCORO7Coronin 7
27218831_s_atFCGRTFc fragment of IgG, receptor, transporter, alpha
28205824_atHSPB2Heat shock 27 kDa protein 2
29202286_s_atTACSTD2Tumor-associated calcium signal transducer 2
30220813_atCYSLTR2Cysteinyl leukotriene receptor 2
31202450_s_atCTSKCathepsin K
32207761_s_atMETTL7AMethyltransferase like 7A
33205738_s_atFABP3Fatty acid-binding protein 3, muscle and heart
34206484_s_atXPNPEP2X-prolyl aminopeptidase 2, membrane-bound
35219855_atNUDT11Nudix (nucleoside diphosphate linked moiety X)-type motif 11
36208017_s_atMCF2MCF.2 cell line derived transforming sequence
37220748_s_atZNF580Zinc finger protein 580
38203402_atKCNAB2Potassium voltage-gated channel, beta member 2
39212912_atRPS6KA2Ribosomal protein S6 kinase, 90 kDa, polypeptide 2
40212338_atMYO1DMyosin ID
41216066_atABCA1ATP-binding cassette, sub-family A (ABC1), member 1
GABBR2Gamma-aminobutyric acid (GABA) B receptor, 2
43209235_atCLCN7Chloride channel 7
44205620_atF10Coagulation factor X
45201525_atAPODApolipoprotein D
46206763_atFKBP6FK506-binding protein 6, 36 kDa
47206039_atRAB33ARAB33A, member RAS oncogene family
48211462_s_atTBL1YTransducin (beta)-like 1Y-linked
49204454_atLDOC1Leucine zipper, downregulated in cancer 1
50214382_atUNC93Aunc-93 homolog A (C. elegans)
51219902_atBHMT2Betaine-homocysteine methyltransferase 2
52207610_s_atEMR2egf-like module containing, mucin-like, hormone receptor-like 2
53214088_s_atFUT3Fucosyltransferase 3
54220027_s_atRASIP1Ras interacting protein 1
56222123_s_atHIF3AHypoxia inducible factor 3, alpha subunit
57209343_atEFHD1EF-hand domain family, member D1
58203308_x_atHPS1Hermansky-Pudlak syndrome 1
59220571_atPRDM11PR domain containing 11
60219697_atHS3ST2Heparan sulfate (glucosamine) 3-O-sulfotransferase 2
61214000_s_atRGS10Regulator of G-protein signaling 10
62219559_atC20orf59Chromosome 20 open reading frame 59
63219764_atFZD10Frizzled homolog 10 (Drosophila)
64222025_s_atOPLAH5-oxoprolinase (ATP-hydrolysing)
65209253_atSORBS3Sorbin and SH3 domain containing 3
66218208_atPQLC1PQ loop repeat containing 1
67202455_atHDAC5Histone deacetylase 5
68207033_atGIFGastric intrinsic factor (vitamin B synthesis)
69211320_s_atPTPRUProtein tyrosine phosphatase, receptor type, U
70221902_atGPR153G protein-coupled receptor 153
71216258_s_atSERPINB13Serpin peptidase inhibitor, clade B (ovalbumin), member 13
72207366_atKCNS1Potassium voltage-gated channel, delayed-rectifier, S1
73221188_s_atCIDEBCell death-inducing DFFA-like effector b
74217207_s_atBTNL3Butyrophilin-like 3
75215788_atLOC339457Hypothetical protein LOC339457
76204858_s_atECGF1Endothelial cell growth factor 1 (platelet-derived)
77219620_x_atC9orf167Chromosome 9 open reading frame 167
7837022_atPRELPProline/arginine-rich end leucine-rich repeat protein
79220193_atC1orf113Chromosome 1 open reading frame 113
80205125_atPLCD1Phospholipase C, delta 1
81221794_atDOCK6Dedicator of cytokinesis 6
82207159_x_atCRTC1CREB regulated transcription coactivator 1
83207184_atSLC6A13Solute carrier family 6, member 13
8437278_atTAZTafazzin (cardiomyopathy, dilated 3A (X-linked)
85204968_atC6orf47Chromosome 6 open reading frame 47
8636829_atPER1Period homolog 1 (Drosophila)
Table 2.   Genes upregulated in melanoma
RankIDGene symbolGene title
  1214612_x_atMAGEA6Melanoma antigen family A, 6
  2209942_x_atMAGEA3Melanoma antigen family A, 3
  3210467_x_atMAGEA12Melanoma antigen family A, 12
  4204086_atPRAMEPreferentially expressed antigen in melanoma
TPD52Tumor protein D52
HLA-DRB1Major histocompatibility complex, class II, DR beta 1
  7209875_s_atSPP1Secreted phosphoprotein 1 (osteopontin)
  8201291_s_atTOP2ATopoisomerase (DNA) II alpha 170 kDa
  9203440_atCDH2Cadherin 2, type 1, N-cadherin (neuronal)
ITGA6Integrin, alpha 6
 11218782_s_atATAD2ATPase family, AAA domain containing 2
 12202720_atTESTestis derived transcript (3 LIM domains)
 13204702_s_atNFE2L3Nuclear factor (erythroid-derived 2)-like 3
 14202149_atNEDD9Neural precursor cell expressed, developmentally downregulated 9
 15220238_s_atKLHL7Kelch-like 7 (Drosophila)
 16202350_s_atMATN2Matrilin 2
PMAIP1Phorbol-12-myristate-13-acetate-induced protein 1
 18208894_atHLA-DRAMajor histocompatibility complex, class II, DR alpha
 19205110_s_atFGF13Fibroblast growth factor 13
 20202291_s_atMGPMatrix Gla protein
MCAMMelanoma cell adhesion molecule
 22206070_s_atEPHA3EPH receptor A3
 23207147_atDLX2Distal-less homeobox 2
 24210095_s_atIGFBP3Insulin-like growth factor-binding protein 3
ITGA4Integrin, alpha 4 (antigen CD49D, alpha 4 subunit of VLA-4 receptor)
 26219872_atC4orf18Chromosome 4 open reading frame 18
 27213786_atTAX1BP1Tax1 (human T-cell leukemia virus type I) binding protein 1
 28214642_x_atMAGEA5Melanoma antigen family A, 5
 29213906_atMYBL1v-myb myeloblastosis viral oncogene homolog (avian)-like 1
 30205609_atANGPT1Angiopoietin 1
 31214581_x_atTNFRSF21Tumor necrosis factor receptor superfamily, member 21
 32201151_s_atMBNL1Muscleblind-like (Drosophila)
 33202979_s_atCREBZFCREB/ATF bZIP transcription factor
 34219787_s_atECT2Epithelial cell transforming sequence 2 oncogene
 35206233_atB4GALT6UDP-Gal:betaGlcNAc beta 1,4- galactosyltransferase, polypeptide 6
 36206364_atKIF14Kinesin family member 14
 37203302_atDCKDeoxycytidine kinase
 38207480_s_atMEIS2Meis homeobox 2
 39205227_atIL1RAPInterleukin 1 receptor accessory protein
 40219250_s_atFLRT3Fibronectin leucine rich transmembrane protein 3
 41217851_s_atSLMO2Slowmo homolog 2 (Drosophila)
 42203989_x_atF2RCoagulation factor II (thrombin) receptor
 43213222_atPLCB1Phospholipase C, beta 1 (phosphoinositide-specific)
 44210293_s_atSEC23BSec23 homolog B (S. cerevisiae)
 45221258_s_atKIF18AKinesin family member 18A
TIA1TIA1 cytotoxic granule-associated RNA-binding protein
DKFZP686A01247Hypothetical protein
 48209291_atID4Inhibitor of DNA-binding 4, dominant negative helix-loop-helix protein
 49210002_atGATA6GATA-binding protein 6
SFRP1Secreted frizzled-related protein 1
PLCB4Phospholipase C, beta 4
 52205018_s_atMBNL2Muscleblind-like 2 (Drosophila)
 53213510_x_atLOC220594TL132 protein
 54207329_atMMP8Matrix metallopeptidase 8 (neutrophil collagenase)
 55203987_atFZD6Frizzled homolog 6 (Drosophila)
 56217979_atTSPAN13Tetraspanin 13
 57201596_x_atKRT18Keratin 18
NUP160Nucleoporin 160 kDa
 59213226_atCCNA2Cyclin A2
 60205194_atPSPHPhosphoserine phosphatase
 61203629_s_atCOG5Component of oligomeric golgi complex 5
 62205016_atTGFATransforming growth factor, alpha
 63218197_s_atOXR1Oxidation resistance 1
 64201860_s_atPLATPlasminogen activator, tissue
 65209135_atASPHAspartate beta-hydroxylase
 66210135_s_atSHOX2Short stature homeobox 2
 67201407_s_atPPP1CBProtein phosphatase 1, catalytic subunit, beta isoform
 69214718_atGATAD1GATA zinc finger domain containing 1
 70203429_s_atC1orf9Chromosome 1 open reading frame 9
 71203203_s_atKRR1KRR1, small subunit (SSU) processome component, homolog (yeast)
KRIT1KRIT1, ankyrin repeat containing
 73210892_s_atGTF2IGeneral transcription factor II, i
 74215629_s_atDLEU2Deleted in lymphocytic leukemia, 2
 75202412_s_atUSP1Ubiquitin specific peptidase 1
 76206055_s_atSNRPA1Small nuclear ribonucleoprotein polypeptide A’
 77202516_s_atDLG1Discs, large homolog 1 (Drosophila)
 78212105_s_atDHX9DEAH (Asp-Glu-Ala-His) box polypeptide 9
LOC54103Hypothetical protein LOC54103
 80212927_atSMC5Structural maintenance of chromosomes 5
 81201663_s_atSMC4Structural maintenance of chromosomes 4
 82210567_s_atSKP2S-phase kinase-associated protein 2 (p45)
 83209649_atSTAM2Signal transducing adaptor molecule (SH3 domain and ITAM motif) 2
 84201242_s_atATP1B1ATPase, Na+/K+ transporting, beta 1 polypeptide
 85219049_atChGnChondroitin beta1,4N-acetylgalactosaminyltransferase
 86222288_atTranscribed locus, moderately similar to XP_517655.1
 87218326_s_atLGR4Leucine-rich repeat-containing G protein-coupled receptor 4
 88201617_x_atCALD1Caldesmon 1
 89211959_atIGFBP5Insulin-like growth factor-binding protein 5
 90204361_s_atSKAP2src kinase associated phosphoprotein 2
 91202498_s_atSLC2A3Solute carrier family 2 (facilitated glucose transporter), member 3
 92213469_atPGAP1GPI deacylase
 93215286_s_atPHTF2Putative homeodomain transcription factor 2
 94202620_s_atPLOD2Procollagen-lysine, 2-oxoglutarate 5-dioxygenase 2
 95221059_s_atCOTL1Coactosin-like 1 (Dictyostelium)
 96209657_s_atHSF2Heat shock transcription factor 2
 97220092_s_atANTXR1Anthrax toxin receptor 1
 98201925_s_atCD55CD55 molecule, decay accelerating factor for complement
 99212446_s_atLASS6LAG1 homolog, ceramide synthase 6 (S. cerevisiae)
100218073_s_atTMEM48Transmembrane protein 48
101202693_s_atSTK17ASerine/threonine kinase 17a
SMCHD1Structural maintenance of chromosomes flexible hinge domain containing 1
104204030_s_atSCHIP1Schwannomin interacting protein 1
105215220_s_atTPRTranslocated promoter region (to activated MET oncogene)
106212747_atANKS1AAnkyrin repeat and sterile alpha motif domain containing 1A
107203642_s_atCOBLL1COBL-like 1
108201506_atTGFBITransforming growth factor, beta-induced, 68 kDa
109205443_atSNAPC1Small nuclear RNA activating complex, polypeptide 1, 43 kDa
110215997_s_atCUL4BCullin 4B
111210990_s_atLAMA4Laminin, alpha 4
112202820_atAHRAryl hydrocarbon receptor
EDEM3ER degradation enhancer, mannosidase alpha-like 3
114218179_s_atFLJ12716FLJ12716 protein
115203491_s_atCEP57Centrosomal protein 57 kDa
116201772_atAZIN1Antizyme inhibitor 1
117204569_atICKIntestinal cell (MAK-like) kinase
118202976_s_atRHOBTB3Rho-related BTB domain containing 3
119205187_atSMAD5SMAD family member 5
120213761_atMDM1Mdm4, transformed 3T3 cell double minute 1, p53-binding protein (mouse)
121218352_atRCBTB1RCC1 and POZ domain containing protein 1
122213743_atCCNT2Cyclin T2
123212867_atCDNA clone IMAGE:5314178
124205394_atCHEK1CHK1 checkpoint homolog (S. pombe)
125216902_s_atRRN3RRN3 RNA polymerase I transcription factor homolog (S. cerevisiae)
126200685_atSFRS11Splicing factor, arginine/serine-rich 11
TPM1tropomyosin 1 (alpha)
129M97935_5_atSTAT1Signal transducer and activator of transcription 1, 91 kDa
130220176_atNUBPLNucleotide-binding protein-like
131216870_x_atDLEU2Deleted in lymphocytic leukemia, 2
133213024_atTMF1TATA element modulatory factor 1
134219571_s_atZNF12Zinc finger protein 12
135212774_atZNF238Zinc finger protein 238
136222266_atC19orf2Chromosome 19 open reading frame 2
137211506_s_atIL8Interleukin 8
138217445_s_atGARTPhosphoribosylglycinamide formyltransferase
MMP16Matrix metallopeptidase 16 (membrane-inserted)
140220253_s_atLRP12Low density lipoprotein-related protein 12
141213094_atGPR126G protein-coupled receptor 126
142212847_atFUBP1Far upstream element (FUSE) binding protein 1
143216915_s_atPTPN12Protein tyrosine phosphatase, non-receptor type 12
144208047_s_atNAB1NGFI-A-binding protein 1 (EGR1-binding protein 1)
145208310_s_atC7orf28Chromosome 7 open reading frame 28
146218637_atIMPACTImpact homolog (mouse)
147214708_atSNTB1Syntrophin, beta 1 (dystrophin-associated protein A1, 59 kDa, basic component 1)
148202351_atITGAVIntegrin, alpha V (vitronectin receptor, alpha polypeptide, antigen CD51)
149204172_atCPOXCoproporphyrinogen oxidase

Comparison of these results against the findings of the only previous large study to use cell lines to examine transformation (Hoek et al., 2004) shows down- and upregulated gene lists with significant (P < 10−15 for both) overlap. The high significance level of overlap indicates that this earlier study may be viewed with confidence despite having few control samples. The two other studies addressing transformation (Haqq et al., 2005; Talantov et al., 2005) extracted RNA directly from tissues rather than from cell lines. The overlap between their gene lists and the present study are of low or no significance (Table 3). It is likely that the strategy of sourcing tissues rather than cell lines underlies the observed disagreement. To assess this possibility a qualitative comparison with the results of Tables 1 and 2 was performed using reanalyzed data originally obtained with a different microarray platform. Kaufmann et al. (2007) used Agilent arrays to generate a large dataset including both melanocytes and melanomas to explore the effect of cell cycle checkpoint function in melanoma. While they did not address the question of differences between melanocytes and melanoma per se they did make their data publically available. A reanalysis of this dataset comparing melanocytes against melanoma (Appendix S1-B) yields down- and upregulated gene sets with significant (P < 10−18 and P < 10−10 respectively) overlap with the present multiple dataset study. This shows that the gene sets outlined in Tables 1 and 2 are results primarily relevant to cell line-specific analyses.

Table 3.   Extent of agreement between the present transformation analyses and others
  1. aGenes or probe sets identified as being significantly changed between diseased and healthy cells or tissues.

  2. bGenes or probe sets shared with the meta-analysis performed for this review (Appendix S1-A).

  3. cCumulative P-value for the intersection as calculated by hypergeometric distribution.

  4. dData were used for reanalysis (Appendix S1-B).

  5. eExtracted RNA directly from tissue samples.

  6. NS, not significant.

Hoek et al. (2004)27420<10−1531688<10−15
Kaufmann et al. (2007)d70426<10−1825550<10−10
Talantov et al. (2005)e43931NS
Haqq et al. (2005)e4048<0.00367278NS

As unequivocal as the results appear, a closer analysis of expression patterns across samples shows that many genes have significantly higher variance than others. For example, two top-ranked genes are the well known melanoma antigens MAGEA6 and PRAME. These both have large variance across melanoma samples. By comparison MDM1 and SPAST, also both in the top upregulated genes in melanoma, show much smaller variance across the samples (Figure 3). We frequently grant pre-eminence to genes by ranking them according to how much they change the amount of their expression between sample groups. This performance metric is often an averaged assessment of a gene’s expression among one set of samples as compared to the average of its expression among the samples of another. However, as shown in Figure 3, we find many genes given prominence in this way show significant variance. If a gene is given preference based only on average performance, how are we to account for the instances where its performance is nil (or nearly so) and yet the condition this performance is linked to – and perhaps inferred to explain – remains? Genes for whom variance is small, whose performance is reproducibly maintained among samples, are more tightly linked to the condition in terms of either cause or consequence. On the other hand, genes for whom variance is sufficiently large cannot be so tightly linked, but may instead be linked to a variable character of the condition (e.g. stage progression or metastatic potential). Therefore, care must be taken when interpreting the results of gene expression comparisons between groups, particularly when one or more of the groups are heterogeneous.

Figure 3.

 Differences in variance between genes with significantly changed expression in melanoma. Normalized gene expression data for PRAME, MAGEA6, MDM1 and SPAST are plotted against the Johanssen melanoma sample set. All of these genes are upregulated in melanoma, and while PRAME and MAGEA6 have higher fold-change averages they have greater variance than MDM1 and SPAST.

Progression from primary lesion to distal metastasis

Early in a primary lesion’s development there is frequently a phase in which proliferation is restricted to the epidermis (melanoma in situ). Following this, there is a distinct change to the growth pattern such that cells invade to the dermis and there proliferate. Early intra-epidermal invasion (radial growth phase) is distinguished from proliferation deeper into the dermis (vertical growth phase). The extent of this penetration is an accurate prognostic marker for patient survival (Breslow, 1978). These and other well-defined clinical stages of melanoma progression have long informed molecular studies of the disease. Lesions and cells isolated from different stages have been examined in various ways in order to understand progression. For example, extensive immunohistochemical analyses of lesions have identified more than a dozen ‘progression antigens’ (De Wit et al., 1992; Ferrier et al., 1998; Manten-Horst et al., 1995; Moretti et al., 1997; Niezabitowski et al., 1999; Silye et al., 1998; Vaisanen et al., 1998; Van Belle et al., 1999; Van Kempen et al., 2000).

Subsequently, DNA microarrays were used to look for better progression markers as well as for exploring the underlying biology of stage progression. Leaving aside programs which used few or no replicates (Baldi et al., 2003; Gallagher et al., 2005; Xi et al., 2006), there were several studies which interrogated stages of progression with sufficient samples for some form of statistical assessment. From these, it is clear that hierarchical clustering of data obtained from tissues were an accurate method for distinguishing melanoma stage groups (Haqq et al., 2005; Smith et al., 2005). While the capacity to accurately distinguish between RGP and VGP primaries in uncertain cases may prove useful in the clinic, it also shows that there are specific molecular differences between these stages. Whether or not these differences are due to melanoma cells per se remains debatable because cell lines derived from different stages are not so clearly distinguishable by transcription profiling (Jaeger et al., 2007).

The first study considered skin, nevi, primaries, and metastases (Haqq et al., 2005). Controlling for false positives the authors identified 2602 genes which could be used to distinguish between stages (Haqq et al., 2005). Another study used class discovery analysis to first cluster samples without reference to their clinical stage. The purpose of this was to see if unbiased data analysis could correctly identify stage groups. Two distinct groups of samples emerged, respectively comprising early stage (skin, nevi, and in situ melanoma) and advanced stage samples (vertical growth phase, lymph node metastases, and distal metastases). This suggested that the most significant change in gene expression during progression takes place between in situ and vertical growth phase melanomas. Multiple testing corrected anova analyses were used to pick out genes with significant differences between early and late stage groups. Prominent among these included CITED1, which was highlighted by the previously mentioned transformation studies, and SPP1 (osteopontin) identified by Haqq et al. (2005). Self-organized mapping (SOM) of this data performed the dual role of a fold-change filter and collator of co-regulated genes to identify many factors involved in mitotic cell cycle regulation and cell proliferation (Smith et al., 2005). A third study made superficial use of DNA microarrays to specifically assess, absent of multiple testing controls, only the known pro-apoptotic genes. This showed that many of these are downregulated early in progression, confirming that the transition between thin and thick primaries is the point of greatest change in gene expression patterns (Jensen et al., 2006). To support, this concept we can contrast these findings to a similar analysis, which compared primary lesions against metastases without distinguishing thick primaries from thin. This study identified 308 genes discriminating between primary and metastatic forms and highlighting changes in cell cycle regulation, mitosis, cell communication and cell adhesion (Jaeger et al., 2007). A more rigorous analysis of their data using the methods employed for this review (Appendix S1-C) yields 127 genes with significant expression differences between primary and metastatic samples (Appendix S2). An identical analysis of the Smith dataset (considering only the probe sets that were interrogated in the Jaeger dataset), which compared early stage samples against late stage samples, yields 2932 genes. Intriguingly there are significant overlaps in the down- (P < 10−29) and upregulated (P < 10−6) gene lists. This difference in yield, and not content, is likely because both Smith and Jensen identified that VGP tumors are transcriptionally more similar to metastases than to RGP tumors, whereas the Jaeger dataset does not make the distinction.

What we can therefore say from these studies is that between the radial and vertical growth phases a significant change in gene expression occurs to encourage proliferation and suppress apoptosis. While there is not enough data to confidently assess differences between later stages it may be that these will be less dramatic than the changes which differentiate RGP and VGP lesions.

Patient survival

Closely related to issues of disease progression is the recognized link this has with patient survival. Removal of primary melanomas before they exceed 1 mm in depth yields a high cure rate, but as depth increases patient outlook worsens in a proportional manner (Breslow, 1978). In melanoma, there are few prognostic tools as powerful as the Breslow index, although its use is restricted to primary lesions and there are no strong prognostic indicators for more advanced stages of the disease. For clinically minded researchers DNA microarray analyses offer an opportunity to extract independent prognostic information for metastatic patients (Kim et al., 2002). Several reports have since been published which attempt to assess its usefulness to clinicians.

An early analysis sought to characterize transcriptional differences between metastases which responded to immunotherapy against those which did not. Although they were unsuccessful they nevertheless identified a small number of genes linked with immune regulation whose expression changed significantly upon treatment (Wang et al., 2002). A later study looking at survival in uveal melanoma patients was able to identify a DNA double-strand break repair gene (NBS1) with prognostic value, validating this by immunohistochemical analysis of an enlarged sample set (Ehlers et al., 2005).

The group of J. William Harbor has over recent years demonstrated that a critical key to successfully delineating the clinical relevance of transcription signatures lies in the comparison of isolated datasets. Working with uveal melanoma and performing experiments on fresh tumor samples they found that primary uveal melanomas cluster into two distinct subtypes. From this a gene signature was extracted which both defined subtype membership and correlated with metastatic death (Onken et al., 2004). While this study has previously been criticized for poor cross-validation (Dupuy and Simon, 2007), these workers would later go on to perform sufficient cross-validation by demonstrating the same result using fine needle biopsy specimens. Critically, they extracted their gene signature from a total of three different datasets treated in isolation and further validated their findings by comparison with a fourth dataset generated by another group (Onken et al., 2004, 2006; Tschentscher et al., 2003). Furthermore, their gene signature was later shown to be better at sample classification than monosomy 3 and other clinicopathologic prognostic factors (Worley et al., 2007). Even without formal multiple testing controls these works together demonstrate how the combination of independent datasets can be used to pare away false positives generated by individual experiments.

The most extensive examination of survival and gene expression in primary cutaneous melanoma patients was conducted on behalf of the Melanoma Group of the European Organization for Research and Treatment of Cancer (EORTC). Rigorous class comparison analyses of data obtained from 82 primary tumors identified a panel of 254 genes capable of separating samples on the basis of metastasis-free survival. The discriminatory power of this gene set was validated using a separate population of 17 samples. Among the genes identified are factors involved in DNA replication and cell division processes (Winnepenninckx et al., 2006). A smaller program examined 43 stages III and IV metastases identified 30 genes which could be used to distinguish between patients with longer or shorter survival times. No separate sample set was used for validation. The overlap between the discriminatory gene lists of Winnepenninckx and Mandruzzato amounts to only two genes (P < 0.11). There are two possible explanations for the lack of significant overlap between studies. The first analysis considered only primary melanomas and the second concentrated on later stage metastases, and it is has already been shown that these may be quite different at the level of transcription (Haqq et al., 2005; Smith et al., 2005). A second possibility is the base methodologies used by these groups for identifying survival-associated genes is subtly different. The Winnepenninckx list was identified by clustering using a correlation coefficient which relies on measuring the angular separation of data vectors. The Mandruzzato set was picked out using Euclidian distance, which considers the distances between vector termini. In simpler terms, Euclidian distance takes into account the magnitude of the differences while Correlation coefficients, being insensitive to magnitude, take into account trends of change. Because shared trends are considered biologically relevant, magnitude is thought to be less important. Nevertheless, as long as downstream users of these lists employ appropriate clustering metrics it is probable that they will perform well (Gibbons and Roth, 2002). By contrast, using cell lines (instead of lesion biopsies) has not generated gene lists that correlate with patient prognosis (Bittner et al., 2000; Hoek et al., 2006). This is likely due to the changes brought by cell culturing. This indicates that once cells are removed from their in vivo context they may also lose the information, which grants them prognostic potential. From the biopsy studies the development of clinically useful prognostic tools look promising.

Treatment studies – a wealth of dross

Melanocytes and melanoma cells have been assaulted with a battery of treatments to observe changes in their transcription patterns. However, for the statistical purist nearly all of these comprise a litany of defeat as almost no workers controlled or estimated false discovery rates. These studies, noted here for interested readers to follow-up independently, include examinations of the effects on melanoma cells of 5-aza-2′deoxycytidine demethylation (Gallagher et al., 2005; Karpf et al., 2004; Mori et al., 2005; Muthusamy et al., 2006; Van Der Velden et al., 2003), agouti signal protein (Voisey et al., 2003), aloe-emodin (Lin et al., 2005), activating enhancer-binding protein 2 (Suyama et al., 2002), arbutin (Cheng et al., 2007), diterpine ester (Cozzi et al., 2006), E2F transcription factor 1 (Jamshidi-Parsian et al., 2005), eugenol (Ghosh et al., 2005a), fascin (Vignjevic et al., 2006), homeobox D3 (Okubo et al., 2002), hypoxia (Kunz et al., 2003), integrins (Dome et al., 2005), interferon-induced ubiquitin-like modifier (Padovan et al., 2002), kojic acid (Cheng et al., 2006), monoclonal antibody therapy (Hafner et al., 2005), passaging (Vogl et al., 2005), promyelocytic leukemia zinc finger protein (Felicetti et al., 2004; Shiraishi et al., 2007), retinoid CD437 (Zhao et al., 2001), STAT3 activation (Schick et al., 2004), three-dimensional culturing (Ghosh et al., 2005b,c), tissue factor (Wang et al., 2004), transforming growth factor β (Berking et al., 2001; Foser et al., 2006), vascular-endothelial growth factor (Minami et al., 2004) and UV light (Jean et al., 2001; Valery et al., 2001; Yang et al., 2006). Some of the above factors were analyzed by multiple groups and in each case; there was no agreement in outcome. For all of these studies, the manifold failures to address specific requirements in high-throughput analysis mean the value of their respective findings is suspect. Therefore, these areas of investigation remain open questions ready to be answered by those equipped to conduct sufficiently controlled experiments.

One study that did control the false discovery rate in its analysis examined the effects of telomerase suppression on melanoma cells (Bagheri et al., 2006). Telomerase is a ribonucleoprotein complex that functions to maintain the protective telomeric repeat sequences capping eukaryotic chromosomes (Greider and Blackburn, 1987). In normal human somatic cells this activity is suppressed, but in cancers it is re-activated (Kim et al., 1994). As with most other cancers, melanoma has also been shown to overexpress this complex (Cheng et al., 1997). Bagheri et al. (2006) used ectopic expression of a ribozyme that targets the core RNA moiety of the telomerase complex to suppress its activity. They showed that this treatment significantly reduced metastatic progression of B16 melanoma cells in vivo. Performing transcription profiling on multiple replicates and controlling false positives allowed them to determine that telomerase suppression resulted in the downregulation of 134 genes. Many of these are involved in transcriptional regulation, cell proliferation and glycolysis. Commenting on the known importance of glycolysis to tumor cell growth and invasion the authors speculated that telomerase may stimulate glycolysis via Akt kinase activation (Bagheri et al., 2006). This was recently given support by a study which demonstrated increased glycolytic activity and in vivo proliferation in Akt kinase-transformed melanoma cells (Govindarajan et al., 2007).

Heterogeneity of form, behavior and response

Melanoma displays considerable variation at all levels of its biology. Clinically, there are four common forms of primary melanoma. Lesion structuring and pigmentation is not uniform. Metastases may develop in different organs. Multiple metastases may respond asymmetrically to treatment. Heterogeneity is also evident in vitro, where melanoma cells show variation in motility, proliferation and response to cytokines. There are genetic aberrations that, although not universal to melanomas, are common to subsets of the disease (Curtin et al., 2005). Some of these differences are likely due to transcriptional changes and many researchers have used DNA microarrays to try and find genes with correlating expression patterns.

Mc1r loss of function

The melanocortin 1 receptor (Mc1r), when bound by α-melanocyte stimulating hormone, drives a signal cascade which culminates in upregulated microphthalmia-associated transcription factor (Mitf). As Mitf regulates tyrosinase expression, involved in eumelanin synthesis, Mc1r activity is critical to this aspect of melanogenesis. Loss of function mutations in the MC1R gene, which is expressed in the epidermis only in melanocytes (Roberts et al., 2006), contribute to a heightened risk for melanoma (Flanagan et al., 2000; Matichard et al., 2004). A recent expression profiling study looked at the effects of Mc1r loss of function in mouse skin to further characterize its role in epidermal biology. By employing multiple samples and controlling for false positives, April and Barsh (2006) identified several hundred new Mc1r signaling target genes, including those expressed by non-melanocytic cells indicating that Mc1r may influence paracrine signaling (). These authors noted the lack of overlap between their study and an earlier expression profiling experiment (which also looked at melanogenesis) and correctly surmised that at least part of this disagreement was likely due to experimental design. April and Barsh later validated their work and extended it by studying the effects of UV irradiation on Mc1r loss of function. This showed that Kit and Mc1r are independent contributors to the epidermal gene expression response, where Kit-dependent responses to UV irradiation involved genes with roles in antioxidant defenses, and Mc1r-dependent responses involved genes important for regulating cell cycle and oncogenesis (April and Barsh, 2007).

CDKN2A deletion

Bloethner et al. (2006) compared melanoma lines for which the CDKN2A locus was deleted against wild-type. Their study was complicated by a desire to account for possible interference from BRAF/NRAS mutations and thus their study sets were relatively small. The authors identified 30 genes with differential expression linked to CDKN2A deletion and independent of BRAF/NRAS mutations. Eight genes were successfully validated using RT-PCR analyses on a second sample set (Bloethner et al., 2006).

The BRAFV600E mutation

In recent years, different groups have deliberately pursued the significance in the relationship between a well-characterized variable and gene expression profiling in melanoma. The serine/threonine kinase BRAF, a member of the MAPK signal pathway, is subject to a specific activating mutation with high frequency in melanoma (Davies et al., 2002). The incidence of this mutation is closely correlated with primary melanomas arising in areas absent of chronic sun-induced damage (Curtin et al., 2005). Because the MAPK pathway is known to influence gene expression it was felt that an activating mutation in BRAFV600E would result in a specific gene expression signature. Multiple studies used DNA microarrays to approach the question of what effect BRAFV600E has on transcriptional processes in melanoma. Pavey et al. (2004) investigated a panel of 61 melanoma lines in which 42 (69%) carried the BRAFV600E mutation and another seven (11%) had NRAS mutations. From >18 000 distinct cDNAs they identified, through support vector machine-based analyses, 83 genes to be used in hierarchical clustering for separating BRAF mutants from wild-type lines. Bloethner et al. (2005) used a much smaller number of cell lines with no more than a few examples each of BRAF mutants, NRAS mutants and wild type samples and looked for correlations between these groups among the expression patterns of >22 000 different transcripts. Three separate softwares were employed to do this and the results were combined to identify gene expression patterns identified in each case. They found that 61 genes were differentially expressed in BRAFV600E samples compared to BRAF wild-types. Comparing these two works reveals no genes that are shared between their discriminator sets. A third study used 21 melanoma cell lines for their study, 16 with the BRAFV600E mutation and five were BRAF wild-type. Despite using arrays, which interrogated in excess of 14 000 genes, the authors restricted their analysis to only 36 genes coding for members of the RAS/RAF/MEK/mitogen-activated protein kinase (MAPK) signaling pathway (Tsavachidou et al., 2004). Reanalysis of their full dataset (Appendix S1-D) using multiple testing correction shows that 16 genes are differentially expressed between BRAF mutant and BRAF wild-type samples. None of these were detected by the Pavey or Bloethner studies. Analysis of three more datasets using multiple testing correction confirmed that no gene’s expression is consistently linked to the mutation status of BRAFV600E (Hoek et al., 2006). Furthermore, the different groups’ gene lists cannot be used to distinguish by clustering BRAFV600E from wild-type in different datasets fails, indicating that each gene list is sample group-specific.

A more recent paper by the authors of the first study acknowledged the absence of a statistically significant connection between the BRAF mutation and the expression of any gene and yet reaffirmed the existence of a BRAFV600E gene expression signature (Johansson et al., 2007). Their strategy was to look for individual BRAFV600E gene expression signatures in a total of four different datasets, using some of the samples from each set to derive a predictor that is tested against the remaining samples. Within each dataset a gene signature was found to correlate with BRAF mutation with 69–84% accuracy. The authors also attempted to validate a predictor generated with one sample set against the others, finding that their success rate for predicting BRAF mutation ranged between 57% and 78% accuracy. This variable result shows that with non-unique molecular signature definitions the success of this type of analysis is dependent upon sample selection (Michiels et al., 2007). This is not to say that there is no relationship between BRAFV600E and gene expression, but rather that the relationship is very probably modulated by factors which have not yet been fully accounted for in experiments attempting to establish the link. Until these are taken into account, the accuracy of predicting BRAF mutation status via expression signatures will likely remain sample group dependent. It is likely that several factors confound a useful connection between BRAF mutation and gene expression. One of these is that BRAF mutation does not guarantee uniform activity downstream (Dhomen and Marais, 2007). Activation of Erk1/2, the canonical target of BRAF signaling, is rarer in melanoma lesions than the BRAF mutation (Jorgensen et al., 2003). In nevi, where BRAF mutation is present in 82% of cases (Pollock et al., 2003), Erk1/2 is not activated at all (Jorgensen et al., 2003). These data suggest that the relationship between BRAF and gene expression is complicated by additional factors which are unaccounted for in current high throughput analyses.

Metastatic potential

No other aspect of melanoma has been subject to as intense investigation as the potential for transformed cells to escape the primary lesion and nucleate life-threatening metastases elsewhere in the body. Metastatic melanoma is the most dangerous stage of the disease and is aggressively pursued by clinical researchers searching for therapies that will increase patient survival rates. The earliest analysis of gene expression regulation of metastatic potential was a small part of a study which studied the relationship between vasculogenic mimicry and invasive behavior. On noting that tissue sections of aggressive metastases showed networks of channel-like structures, Maniotis et al. (1999) found that three-dimensional culturing recapitulated similar networks only in strongly invasive melanoma lines. A relatively small cDNA microarray platform was used to obtain gene expression data for strongly and weakly invasive melanoma lines. Fold-change analysis was the sole method for selecting significantly altered genes. Were this the only such analysis of melanoma metastatic potential available, our current understanding of high throughput study criteria would recommend against serious consideration of the results. However, the analysis of melanoma metastatic potential by expression profiling has been performed many times and comparison with later analyses finds both support and extension for the findings of this initial report.

Bittner et al. (2000) analyzed a library of 31 melanoma lines with the express purpose of identifying cell line classifications. Using a combination of unsupervised hierarchical clustering and non-hierarchical clustering methods they identified a major cluster of 19 samples. In vitro invasion and motility tests suggested that these samples were likely to be less metastatic than others. They extracted from this data a weighted list of genes whose variance correlates with the clusters found. Significantly, some of these genes were previously identified by the original Maniotis et al.’s (1999)study. This alignment with a study which found that highly invasive and not poorly invasive cell lines formed vascular networks strengthened the link between those genes and a weakly metastatic phenotype. Hoek et al. (2006) also pursued melanoma cell line classification. We used three different datasets to show that most melanoma cell lines belong to one of two distinct subgroups. Multiple testing corrected anova identified, in each dataset, 223 genes which consistently showed differential expression between the subgroups. Many of the genes we identified had previously been linked with changes in metastatic potential. The expression patterns of these genes were such that one sample subgroup strongly corresponds with the weakly metastatic signature identified by Bittner et al. The second subgroup corresponds to an invasive phenotype. From these studies one may conclude that the base taxonomy of gene expression in melanoma cells is one tightly linked to metastatic potential. Critically, this taxonomy suggests that melanoma cells may be roughly subdivided into those which are motile and invasive and those which are not motile but rapidly proliferative. There are cell lines, which cannot be so neatly categorized, but they fall between the characterized subtypes as intermediates which may both proliferate and invade. That these subtypes exist and are defined by the genes identified is supported by many other gene expression studies concerned with either invasive or tumorigenic behaviors in melanoma cells (Hoek et al., 2006). The relevance of these in vitro transcription subtypes to in vivo melanoma biology is not yet clear; however, they suggest an intriguing alternative to currently accepted models for melanoma progression. This new hypothesis describes the in vitro subtypes, frozen by culture, as transcriptional states that may be interchangeable in vivo. According to this model progression is conducted by cells oscillating, in response to changing microenvironmental signals, between proliferative and invasive transcription programs (Carreira et al., 2006; Hoek et al., 2006).

Haqq et al. (2005), working with tissue samples, identified that metastases could be divided among either of two distinct groups. Identification of the genes responsible for this division showed they may also distinguish between RGP and VGP samples. This challenging finding, the correlation between RGP signatures and one of the metastatic classes, led the authors to hypothesize that RGP lesion melanoma cells may yet be responsible for some metastases. Interestingly, one of the metastatic types identified by Haqq, expressing several melanocytic genes, strongly resembles the less invasive subtype signature described by both Bittner and Hoek.

Maniotis et al. (1999) compared cell lines, which differed in their capacity to pass through collagen/lamanin/gelatin-coated filters to assess gene expression changes associated with invasiveness. The same group followed this up to re-identify TIE1 as being upregulated in invasive lines (Hendrix et al., 2001). Later they performed further DNA microarray experiments to specifically assess expression of extracellular matrix-modifying factors (Seftor et al., 2001). This was followed by a more general assessment of gene expression changes between invasive and noninvasive cell lines (Seftor et al., 2002). None of these studies are statistically sound as they use few or no biological replicates and could not control for false positives, making their results difficult to trust in isolation. However, comparison of the Seftor gene lists against our subtype distinguishing gene lists shows a significant (P < 0.003) overlap. This data also shows that Seftor’s invasive and noninvasive cell lines are respectively equivalent to the invasive and proliferative signature samples that we described (Hoek et al., 2006). Thus the gene lists being produced, while very likely contaminated with large numbers of false positives, are nevertheless on the right track. Folberg et al. (2006) also performed transcription profiling experiments on cell lines with different invasive potentials in different culturing environments. The comparison of invasive versus noninvasive lines growing in two dimensional cultures yielded 5209 genes with differential expression. Comparison with our subtype distinguishing gene lists shows significant overlap with the Folberg list (P < 10−30). Again, we found that the invasive and noninvasive lines are respectively equivalent to our invasive and proliferative signature samples. The Folberg and Seftor invasiveness studies combined with ours and Bittner’s classification analyses show that among cell lines the primary drivers of melanoma cell gene expression are tightly linked to metastatic potential.


Analyses of cell lines have revealed significant and reproducible changes between melanoma cells and melanocytes, as well as between melanoma cells with differing characteristics of metastatic potential. Similar analyses of melanoma tissues, both primary and metastatic, have yielded gene lists with prognostic potential. It is appreciated that the association between BRAFV600E and gene expression is, while indirect, not absent – but until the contributing factors are identified and accounted for this link remains to be satisfactorily resolved. The studies of melanoma metastatic potential also revealed significant and reproducible changes in gene expression between melanoma cell types. Compared to these, a large number of studies have been found to be deficient in their design and execution, leaving their questions unanswered by the current standards of high-throughput analysis. That gene expression profiling of melanoma continues to be problematic can be illustrated by this final example. Magnoni et al. (2007) published the startling finding that melanocytes derived from the skin of healthy individuals have a transcription profile different from melanocytes derived from the unaffected skin of melanoma patients. This finding has all the appearance of something that should soon find immediate and spectacular application in the clinics. The report contained the elements of a reasonable analysis of the data, employing sufficient replicates, fold-change filtering and multiple testing corrections. However, the authors made the critical mistake of applying a fold-change filter prior to the statistical analysis. Multiple testing corrections are absolutely sensitive to the number of tests conducted. In first filtering out genes, which were not at least two-fold changed the number of likely false positives detected by subsequent inference testing is automatically increased. By cherry picking genes for statistical analysis the effectiveness of multiple testing is severely compromised. Further, if the analysis is performed with the same rigor as that which revealed gene expression changes between invasive and noninvasive cell lines the results are very different. A reanalysis of Magnoni et al.’s data, in which multiple testing corrected anova is performed first, shows that no genes have significantly different gene expression between melanocytes derived from healthy controls and melanoma patients (regardless of fold-change filtering). While the data miner’s canary was present in the original study, close inspection reveals it to be one that had been stuffed and nailed to its perch.


The author would like to acknowledge the computing resources afforded by the Functional Genomics Center Zürich (Zürich, Switzerland). The author is supported by grants from the Swiss National Foundation (grant no. 310040-103671/1), Oncosuisse (OCS-01927-08-2006) and the Gottfried and Julia Bangerter Rhyner Stiftung.