SEARCH

SEARCH BY CITATION

Keywords:

  • microarray;
  • gene expression profiling;
  • cancer classification

Abstract

  1. Top of page
  2. Abstract
  3. Introduction
  4. DNA microarray technologies
  5. Analysis tools
  6. Virtual and physical tumour purification
  7. Tumour expression profiling: breast carcinoma
  8. Tumour expression profiling: lymphoma
  9. Tissue arrays for the rapid characterization of diagnostic markers
  10. Summary
  11. References

As a result of progress on the human genome project, approximately 19 000 genes have been identified and tens of thousands more tentatively identified as partial fragments of genes termed expressed sequence tags (ESTs). Most of these genes are only partially characterized and the functions of the vast majority are as yet unknown. It is likely that many genes that might be useful for diagnosis and/or prognostication of human malignancies have yet to be recognized. The advent of cDNA microarray technology now allows the efficient measurement of expression for almost every gene in the human genome in a single overnight hybridization experiment. This genomic scale approach has begun to reveal novel molecular-based sub-classes of tumours in breast carcinoma, colon carcinoma, lymphoma, leukaemia, and melanoma. In several instances, gene microarray analysis has already identified genes that appear to be useful for predicting clinical behaviour. This review discusses some recent findings using gene microarray technology and describes how this and related technologies are likely to contribute to the emergence of novel molecular classifications of human malignancies. Copyright © 2001 John Wiley & Sons, Ltd.


Introduction

  1. Top of page
  2. Abstract
  3. Introduction
  4. DNA microarray technologies
  5. Analysis tools
  6. Virtual and physical tumour purification
  7. Tumour expression profiling: breast carcinoma
  8. Tumour expression profiling: lymphoma
  9. Tissue arrays for the rapid characterization of diagnostic markers
  10. Summary
  11. References

Today, pathological diagnosis and classification of human neoplasia is largely based on the recognition of histological features, with immunophenotyping and other molecular techniques for distinguishing tumour types and subtypes typically serving as ancillary methods. A combination of pathological classification and clinical criteria are typically used to differentiate prognostically distinct subclasses in clinical settings. However, there is still marked variation in the clinical behaviour of ostensibly homogeneous cancers within currently accepted tumour classifications, making the prediction of responses to treatment and clinical outcomes often difficult. Molecular heterogeneity underlying such variation is already evident in the form of overt cytogenetic changes, as well as mutations in oncogenes and tumour suppressor genes.

Remarkably, molecular pathology is about to enter the post-genome era. Currently armed with a few hundred molecular markers in diagnostic settings, pathologists are likely to have a much larger arsenal soon, with the availability of the complete sequence of the human genome. This represents an exceptional prospect for histopathology, since through such an expansion of assayable markers, it is now becoming possible to gather a more complete molecular portrait of human tumours.

The phenotype of cells and tissues, including the differentiation status and function of various benign and malignant cells, ultimately depends on which proteins, and how much of these proteins, are made at any given time. A primary mechanism by which proteins are regulated, and by which cells respond to environmental conditions, is through variation of the amount of mRNA present in the cell. Thus, the measurement of the patterns of variation in the expression of genes is useful for assessing variation in the characteristics of cells and tissues.

In the past, cellular mRNAs have been quantified by the hybridization of radioactively labelled probes to mRNA sequences immobilized on membranes after they had been size-separated on agarose gels (i.e. northern blotting). While sensitive, this approach allows for the examination of only a few genes and a few samples at a time. Quantitative PCR is both a more sensitive and a faster method of measuring gene expression but still can only measure a few or perhaps several hundred genes at a time. The advent of cDNA microarray technology now allows the efficient measurement of expression for over 40 000 genes in a single overnight hybridization experiment.

The expectation is that the measurement of thousands of genes in large numbers of clinical specimens will reveal a detailed molecular description of malignant tumours. It is hoped that this will lead to the development of novel tumour classification algorithms and novel tumour markers that will allow better prediction of clinical behaviour and ultimately will discover targets for new therapeutic approaches.

DNA microarray technologies

  1. Top of page
  2. Abstract
  3. Introduction
  4. DNA microarray technologies
  5. Analysis tools
  6. Virtual and physical tumour purification
  7. Tumour expression profiling: breast carcinoma
  8. Tumour expression profiling: lymphoma
  9. Tissue arrays for the rapid characterization of diagnostic markers
  10. Summary
  11. References

Several methods have been used to generate DNA microarrays for genome-wide gene expression studies. Oligonucleotide arrays can be made by either the insitu synthesis of short fragments (20–24 nucleotides) of single-stranded DNA on a solid substrate1–3, or thesynthesis of oligonucleotides using conventional methods (typically 20–80 nucleotides), which are subsequently printed within ordered arrays on glass slides by either mechanical spot deposition or ink-jet technology4. In contrast to most oligonucleotide-based technologies, spotted DNA microarrays are typically made by the mechanical deposition of solutions containing individual PCR-amplified double-stranded cDNA fragments (500–5000 base pairs) in small spots on glass slides by a robot using capillary action-based printing tips5. Using current technology, over 50 000 genes can be printed as spots in rows and columns on a single glass microscope slide.

Commercially available oligonucleotide arrays from a number of sources including Affymetrix, Inc. have been used in ‘single-colour’ hybridizations to measure gene expression1. Complex cDNA probes labelled with fluorescent dyes are made by performing reverse transcription on the complex mix of mRNAs isolated from, for example, a tumour specimen. The relative level of expression for each gene on the array in comparison between multiple samples (e.g. tumour versus normal, or amongst multiple tumours) can be determined by comparing the calculated fluorescent intensity measured for that gene on different arrays.

In contrast to single-colour methods, most spotted DNA microarray methods employ a two-colour hybridization scheme in order to measure gene expression reproducibly in comparing multiple samples. Typically, a mixture of ‘red’ (Cy5-labelled) cDNA from a test sample and ‘green’ (Cy3-labelled) cDNA from a common reference sample is simultaneously hybridized to each cDNA microarray. The composition of the common reference sample remains constant throughout all experiments, while the test sample contains a different experimental sample for each array. Using a two-colour approach, it is therefore the ratio of red to green fluorescence that is the unit of comparison between test samples, and not the absolute signal intensity, as is the case for single-labelled oligonucleotide microarray experiments. Large numbers of individual test samples (e.g. tumour specimens) can then be compared with one another in ratio units relative to the common reference.

The laboratories of Patrick O. Brown and David Botstein and their collaborators at Stanford University have used two-colour comparative hybridization on cDNA microarrays to measure gene expression in large numbers of human tumours, normal tissues, and cultured cell lines, and have established an extensive database of gene expression ‘profiles’. In order to prepare the probes, grossly dissected tumour samples are homogenized and mRNA is purified. The mRNA is reverse-transcribed into cDNA and labelled with a red fluorescent dye by incorporating Cy5-labelled nucleotides during reverse transcription. At the same time, mRNA from a common reference sample is reverse-transcribed and labelled with a green dye (Cy3). The two fluorescent cDNA preparations, each of which reflects the complex mixture of all mRNAs from the two samples, are combined and allowed to hybridize to the microarray that contains the cDNA inserts that were spotted at known locations. During the hybridization, fluorescence-labelled cDNA fragments coding for an individual gene will find their cognate gene spot on the microarray, if present, and hybridize to it. The level of red versus green fluorescence (R/G ratio) at this spot now reflects the level of mRNA for this gene in the tumour sample relative to the common reference sample (Figure 1). By repeating this process for several different tumour samples while keeping the reference constant, different tumours can be compared with each other using the reference sample as a consistent comparison standard. The common reference sample is chosen in such a way that the majority of genes spotted on the array show some minimum level of fluorescence intensity. In this way, a meaningful ratio of red divided by green signal intensity can be obtained for most spots. At Stanford, the reference standard currently used is a mixture of 11 human cell lines derived from a diverse variety of human malignancies chosen for their contribution of as diverse a mixture of cell types as possible. The composition of the reference sample has no biological significance and simply contributes a consistently measurable signal in the denominator of the assayed ratios and therefore allows comparisons between individual arrays to be made.

thumbnail image

Figure 1. Schematic representation of a DNA microarray hybridization comparing gene expression of a malignant epithelial cancer with its normal tissue counterpart

Download figure to PowerPoint

Analysis tools

  1. Top of page
  2. Abstract
  3. Introduction
  4. DNA microarray technologies
  5. Analysis tools
  6. Virtual and physical tumour purification
  7. Tumour expression profiling: breast carcinoma
  8. Tumour expression profiling: lymphoma
  9. Tissue arrays for the rapid characterization of diagnostic markers
  10. Summary
  11. References

Technical refinements have allowed the number of genes that can be spotted on glass slides to increase dramatically over the past few years. Early experiments measured approximately 5000 genes per experiment, but most recent data from the Stanford group have been obtained using microarrays containing approximately 45 000 spots. The total number of human genes is still undetermined but is estimated to be between 35 000 and 120 0006. Therefore, ultimately it will be feasible to examine all expressed genes in the human genome on a single microarray. Studies designed to examine tens of thousands of genes across a hundred or more samples can generate a tremendous amount of data (for instance, 100 breast carcinomas and cell lines analysed on arrays with 10 000 genes would generate 106 data points). Clearly, the analysis of such large sets of data generated in these studies necessitated the development of novel analytical tools.

One of these methods, and still the one most relied upon in our group, was to apply hierarchical clustering algorithms to the study of gene expression patterns7. This analysis method ‘clusters’, or organizes, the genes together on one axis on the basis of their similarity in expression across a set of experimental samples (tumours, for example) and similarly ‘clusters’ the experimental samples (e.g. tumours) together on the other axis based on their similarity in expression behaviour across a specified set of genes. The data are displayed in a tabular representation, in which each row represents the data for a single gene and each column represents a single experimental sample7. In order to visualize the data, the fold deviation from the average expression of each gene across the set of samples studied is shown as a coloured square ranging from bright green (below average levels of expression for that gene) through black (average expression of that gene) to bright red (above average level of mRNA present for that gene). A hierarchical tree, or dendrogram, is displayed next to the clustered genes, and above the clustered experimental samples, to depict graphically the degrees of relatedness (correlation coefficient) between adjacent samples and genes; short branches between two samples denote a high degree of similarity, whereas longer branches depict a lesser degree of similarity.

The relationships between tumour specimens and between genes that are highlighted by the clustering analysis are in part dependent on the set of genes and the set of specimens selected for analysis. Typically, a researcher selects a number of arrays (e.g. tumour specimens) and a set of genes with high quality data but excluding those genes that do not vary significantly between tumours, and hence are unlikely to be of interest. For example, a researcher might select only those genes that vary four-fold or more from the mean fluorescence ratio (ratios are converted to log base 2 inorder to maintain linearity across the measured ratios) in at least three experiments. (These selection parameters vary and are determined by the user.) Subsequently, the clustering program groups the selected genes together such that those that are most similar to one another across all genes selected are placed adjacent to each other in the data table; likewise, the program also ‘clusters’ together all the tumour samples that showed the most similar expression patterns to each other across all selected genes. Thus, the clustering output highlights groups of genes that are either highly expressed or relatively underexpressed in different tumour subsets. This is illustrated in Figure 2, where a small fragment of a much larger cluster diagram from a study of breast carcinomas analysed on 8100 gene microarrays is shown8. It is noteworthy in this example that multiple independent copies of the ERBB2 gene on the microarray clustered together in the cluster diagram. This demonstrates that the pattern of gene expression measured for the different independent spots (containing different PCR-amplified clones) each representing the ERBB2 gene on the microarray were more similar to one other than to the pattern of expression of any of the other 8100 genes present on the array. The single exception is the MLN64 gene, which co-clusters with ERBB2 across this sample set. Interestingly, MLN64 is present on the ERBB2 genomic region that is co-amplified together with ERBB2 in some breast cancer cases. The co-variation in copy number for this gene across this set of samples may account for its similar pattern of variation in gene expression9. This example illustrates the reproducibility of the measurements of gene expression using cDNA microarrays and highlights the types of relationships between genes that can be discovered using hierarchical clustering to visualize gene expression patterns.

thumbnail image

Figure 2. Example of data clustering. This small sample of array data was copied from a much larger data set, similar to the one shown in Figure 3. Note how all five different cDNA clones specific for ERBB2 on the array cluster tightly together. The immunostaining for ERBB2 on one of the breast samples (column indicated by an arrow) is shown in the lower panel

Download figure to PowerPoint

Virtual and physical tumour purification

  1. Top of page
  2. Abstract
  3. Introduction
  4. DNA microarray technologies
  5. Analysis tools
  6. Virtual and physical tumour purification
  7. Tumour expression profiling: breast carcinoma
  8. Tumour expression profiling: lymphoma
  9. Tissue arrays for the rapid characterization of diagnostic markers
  10. Summary
  11. References

A primary problem that must be addressed when measuring gene expression on grossly dissected tumour specimens is that most solid human tumours, in addition to the malignant cells, comprise other normal cell types including inflammatory cells and connective tissue cells. These ‘host’ cells can even comprise the majority of cells in a tumour mass. Homogenizing a grossly dissected tumour specimen in order to isolate mRNA will generate a complex mix of these mRNAs derived from malignant cells combined with those derived from host cells. This may make it difficult to discern the cell type of origin for a fluorescent signal for any given gene on the array. Several approaches have been developed that can be used to manage this obstacle.

First, an analysis of the gene expression patterns can be done that takes advantage of the patterns of expression of the subset of genes that are cell type-specific (e.g. lymphocyte markers such as CD3 and histiocyte markers such as CD68). These genes vary their level of expression in complex tissues in a manner that reflects the relative amount of that cell type in the tissue. Therefore, a ‘virtual dissection’ can be performed in a computer by clustering and identifying those gene groups whose expression can be attributed to normal ‘host’ cells10. Figure 3 shows, for example, a group of breast carcinomas in which genes specific for a variety of host cells were identified using a combination of gene expression analysis and immunohistochemistry. Aided by the expression patterns in a number of cell lines (left panel) and the known expression patterns of specific previously characterized genes, clusters of genes were identified that wereindicative of the presence of B-lymphocytes, T-lymphocytes, and macrophages. The variation in the level of expression of these sets of genes, and the nature of the gene sets that comprised the clusters (e.g. the relative presence of delta, gamma versus alpha, beta T-cell receptor expression), reflected the variation in the nature of the inflammatory response in the tumours. Similarly, gene expression patterns could be attributed to the tumour cell populations and interpreted in the context of a comparison to features of normal epithelial cells from that tissue. For example, gene expression detected for keratins expressed predominantly in normal breast basal epithelial cells (keratin 17, keratin 5), as opposed to those found in normal luminal duct epithelial cells (keratin 8, keratin 18), was attributed to the breast tumour cells within the samples and used to define tumour categories with distinct biological features11. Furthermore, the expression levels for sets of genes involved in other phenotypic aspects shared between many different cell types within a tissue varied in a manner that reflected the overall prevalence of that physiology in the tissue. For example, a distinct cluster of genes in the breast cancer data likely varied in a manner that reflected the overall proliferative index of the tissues. This was demonstrated by the fact that the cluster was populated predominantly with genes known to be regulated in relation to the cell cycle (a similar set of genes has been shown to vary in relation to cell doubling time in cultured cell lines10) and the cluster contained the well-known proliferation markers Ki-67 and PCNA8. For several of the genes that comprised these ‘gene expression signatures’ identified by the gene expression clusters, immunohistochemistry using monoclonal antibodies confirmed the presence of the different cell types in tissue from representative breast cancer cases (Figure 3).

thumbnail image

Figure 3. Cluster analysis of 19 cell lines and 65 breast tumour samples showing how different host cell populations can be identified in the tumour samples

Download figure to PowerPoint

A second and more laborious approach to measuring gene expression in individual cell types within a complex tissue involves the isolation of tumour cells by laser capture micro-dissection12–14 (and see review elsewhere in this issue), followed by amplification of the mRNA for subsequent cDNA microarray analysis15–17. Figure 4 shows the results of a hierarchical clustering analysis in which we compared laser capture micro-dissected breast cancer cells (dissected away from surrounding stroma) with 131 other breast carcinoma samples and cell lines that had not undergone tumour cell purification or amplification. The micro-dissected tumour sample clustered directly next to the unpurified tumour from which it was derived. This indicated that the gene expression pattern measured in the purified cells was related more to its tumour of origin than it was to any of the other tumours tested. As expected, expression of genes encoding lymphocyte markers in the purified material was shown to have decreased significantly when compared with the tumour of origin, indicating that dissection had successfully purified tumour cells away from surrounding lymphocytes. This technique remains quite laborious and the amplification of small amounts of mRNA from laser-captured samples in our labs has proven difficult and less reproducible than measurement of unamplified mRNA. Technical advances are under development that will likely increase the reliability of these RNA amplification techniques.

thumbnail image

Figure 4. Purification of tumour cells by laser capture microdissection followed by gene microarray analysis. After amplification of the signal, the sample was hybridized and the resulting data were clustered with 131 different experiments containing breast tumour samples and cell lines. Clustering analysis groups the purified material next to the tumour from which the microdissected material was derived. Note how in the lower panel it is shown that the genes for the B-lymphocyte marker immunoglobulin are expressed at lower levels in the purified tumour cells than in the original tumour sample. The B-cell line RPMI-8226 and the T-cell line Molt4 have appropriate levels of immunoglobulin expression

Download figure to PowerPoint

Tumour expression profiling: breast carcinoma

  1. Top of page
  2. Abstract
  3. Introduction
  4. DNA microarray technologies
  5. Analysis tools
  6. Virtual and physical tumour purification
  7. Tumour expression profiling: breast carcinoma
  8. Tumour expression profiling: lymphoma
  9. Tissue arrays for the rapid characterization of diagnostic markers
  10. Summary
  11. References

A recent study of breast carcinomas used a series of 65 samples obtained from 42 patients11. In this group of patients, 20 underwent an open surgical breast biopsy in which a sample of tissue was obtained, followed by a 16-week regimen of doxorubicin chemotherapy. After chemotherapy, the remaining tumour was resected and a second independent tumour sample was obtained. The ‘before’ and ‘after’ chemotherapy breast tumour specimens were analysed on cDNA microarrays. A statistical approach was used to identify a subset of 476 cDNA clones comprised of 427 different genes selected for the property of being highly variable in expression across tumours derived from different patients, yet being minimally variable between tumour samples taken from the same patient11. For example, ERBB2 was selected by this criterion as a gene that remained constant between independent samples sequentially taken from the same patient, yet differed markedly in its level of expression across tumours from different patients. This so-called ‘intrinsic’ gene list is likely enriched for a set of genes characteristic of the tumour, as opposed to those that vary due to other features such as sampling error. This list was used to cluster the 65 tumour samples (Figure 5) and showed that 17 of the 20 tumour pairs clustered together (black lines, panel a). This indicated that tumour samples taken several weeks apart from the same tumour (even after doxorubicin therapy) were much more similar to each other either than to any other patient's tumour specimen. The remaining tumour pairs that failed to cluster adjacent to their partner specimen had ‘after’ specimens that resided in a group of tumours that shared gene expression characteristic of normal breast tissue (green lines, panel a). This suggested that the second sample consisted mostly of normal tissue and perhaps chemotherapy may have led to inactivation and the destruction of tumour cells in these patients. In addition, the sample set contained two examples of paired specimens that were derived from the primary tumour and their respective metastatic lesions obtained from axillary lymph nodes; these also clustered directly adjacent to each other (blue lines, panel a). Importantly, these results imply that every tumour is an ‘individual’, in that it has a unique expression pattern that can be robustly and repeatedly ‘recognized’ by its gene expression pattern using cDNA microarray and hierarchical clustering analysis.

thumbnail image

Figure 5. Cluster analysis on 65 breast carcinoma samples, using the ‘intrinsic’ gene list; see text

Download figure to PowerPoint

Despite the apparent distinctiveness of each tumour biopsy's expression pattern, the branching pattern of the dendrogram clustered with this ‘intrinsic’ gene list identified four major groups of breast tumours. The subset of genes relatively highly expressed in each of these tumour clusters allowed the identification of the defining characteristics of each of these four groupings. First, a ‘luminal-epithelial/ER+’ group (panel c) was distinguished by the high-level expression of luminal-type cytokeratins and the oestrogen receptor (ER). Second, a group of tumours was distinguished by expression of ERBB2 and other associated genes (panel d). Third, a ‘normal breast’ group was identified that included three specimens derived from normal breast tissue and one fibroadenoma (panel e). Last, a group of tumours was distinguished by the high-level expression of two clusters of genes that are characteristic of normal breast basal epithelial cells (panels e and f). In a subsequent extended study of 78 independent tumours, clustering with the intrinsic gene list identified similar patient sub-classes which now were found to be statistically significantly associated with differences in overall patient survival and relapse-free survival18. Most notably, the ERBB2-positive tumours and the group of tumours with ‘basal-like’ characteristics (mixed ERBB2 status) were both found to be associated with a statistically significant poorer prognosis. In addition, this study also identified two ER-positive tumour subtypes, one of which showed a favourable prognosis and one of which showed a prognosis almost as poor as for ERBB2-positive tumours. Although in this study of 78 patients four major groupings were identified, it is likely that a larger study would identify significant variation within these groupings such that independent and/or overlapping sub-grouping might be identified.

Tumour expression profiling: lymphoma

  1. Top of page
  2. Abstract
  3. Introduction
  4. DNA microarray technologies
  5. Analysis tools
  6. Virtual and physical tumour purification
  7. Tumour expression profiling: breast carcinoma
  8. Tumour expression profiling: lymphoma
  9. Tissue arrays for the rapid characterization of diagnostic markers
  10. Summary
  11. References

A multi-centre collaboration including laboratories at the National Cancer Institute, Stanford and the University of Nebraska used cDNA microarrays to discover novel, distinct subtypes of diffuse large B-cell lymphoma (DLBCL)19. With an annual incidence exceeding 25 000 cases in the United States, DLBCL represents the most prevalent subtype of non-Hodgkin's lymphoma (NHL), though it remains a clinically heterogeneous disease with fewer than half of patients achieving a durable remission. Despite this observed clinical heterogeneity as well as the suspected histophenotypic variation, current pathological classification systems for NHL20 treat DLBCL as a single category and as such, are not useful for predicting the clinical course of patients with DLBCL. Currently, a classification scheme using a combination of clinical parameters is used to assess a patient's prognosis; however, these clinical risk variables are considered to be proxies for the underlying cellular and molecular variation within DLBCL21.

A specialized cDNA microarray – the Lymphochip – was designed to study lymphoid malignancies by selecting genes with likely relevance to the function of lymphoid and other haematopoietic cell types from a variety of sources, including normal germinal centre B-cells22. In a systematic survey of expression profiles from normal and malignant human lymphoid cell populations, a set of 96 specimens were examined, which included primary biopsies from patients with DLBCL (n=42), B-cell chronic lymphocytic leukaemia (CLL) (n=11), and follicular lymphoma (FL) (n=6), as well as a number of samples from normal and transformed lymphocyte populations cultured under various conditions.

A striking degree of molecular variation was evident in the measured gene expression profiles among the normal and tumour samples (Figure 6). Similar to the breast cancer study described above, diversity was captured both by clusters of genes that varied in expression relative to overall phenotypic variation within the studied specimens (e.g. genes associated with cellular proliferation) and by clusters of genes that could be attributed to specific populations of both normal and tumour cells (e.g. genes specifically expressed in T-cells and germinal centre B-cells, respectively). Remarkably, the inherent underlying molecular diversity among the three tumour types surveyed (DLBCL, FL, CLL) was captured as gene expression patterns to an extent that a simple hierarchical clustering of the 96 normal and malignant specimens using all well-measured genes correctly classified all but two of the biopsy samples into theirrespective tumour types (see dendrogram in Figure 6).

thumbnail image

Figure 6. Hierarchical clustering of gene expression data depicting relationships between 96 samples of normal and malignant lymphocytes19. The dendrogram on the left lists the samples studied and provides a measure of the relatedness of gene expression in each sample. The dendrogram is colour-coded according to the category of mRNA sample studied (see upper right key)

Download figure to PowerPoint

While the distinction between the DLBCL, FL, and CLL was largely evident in a genome-level view of the characteristic expression profiles of each of these diseases, the signatures of subtypes of DLBCL discovered in the study were subtler. While relatively homogeneous clusters of tumours were apparent for FL and CLL, the structure of the dendrogram suggested that significant sub-classes of tumour types might exist within the DLBCL tumours. Among the genes that reflected this heterogeneity within DLBCL was a group that formed a cluster that distinguished normal germinal centre B-cells from both resting and in vitro activated blood B-cells, and thus termed the germinal centre B-cell cluster. Clustering of the 42 DLBCL cases separately, using the gene set that distinguished the germinal centre B-cell cluster, revealed two distinct large groups of tumours. In comparing the expression profiles of these two DLBCL subtypes to purified populations of normal B-cells, one group expressed many genes characteristic of germinal centre B-cells (germinal centre B-like DLBCL), while the other shared an expression profile characteristic of in vitro activated peripheral blood B-cells (activated B-like DLBCL). These two subtypes of DLBCL appeared to have significantly different treatment responses, as patients with germinal centre B-like DLBCL had a considerably better overall survival (76% alive after 5 years) than those with activated B-like DLBCL (16% alive after 5 years).

The distinction in expression profiles between these two sub-classes and their similarities to classes of normal B-cells could not simply be attributed to the prevalence of normal B-cells within the bulk tumour biopsies assayed on cDNA microarrays. Several pieces of evidence supported this conclusion, including the identification of several transformed cell lines derived from primary tumours with signatures that had conserved in culture the expression signatures of the DLBCL subtypes identified in the primary biopsies from clinical cases. In addition, genes characteristic of each of the two subtypes were observed to be expressed in tumour cells in independent studies of DLBCL cases by traditional immunohistochemistry. Furthermore, a follow-up study distinguished the two classes by demonstrating that ongoing somatic mutation within immunoglobulin genes was present in all cases classified as germinal centre B-like, but absent or rare in the activated B-like cases surveyed23. Therefore, the dissimilarity in expression profiles between germinal centre B-like and activated B-like DLBCL is most likely the result of their derivation from cells at discrete stages of normal lymphocyte maturation, and the preservation of the genetic programme of these normal cells within their malignant counterparts.

Although the two classes as defined by gene expression patterns are morphologically indistinguishable, the differential expression of hundreds of genes provides a large set of candidate markers that might distinguish them. Indeed, quantitative PCR analyses using individual genes that discriminate between the two classes on cDNA microarrays have been shown to discriminate survival differences within independent cases effectively23B. Similarly, immunohistochemistry with antibodies directed at genes characteristic of thegerminal centre B-cell phenotype was able to identifystatistically significant survival differences inan independentstudy (Natkunam et al., in preparation).

Tissue arrays for the rapid characterization of diagnostic markers

  1. Top of page
  2. Abstract
  3. Introduction
  4. DNA microarray technologies
  5. Analysis tools
  6. Virtual and physical tumour purification
  7. Tumour expression profiling: breast carcinoma
  8. Tumour expression profiling: lymphoma
  9. Tissue arrays for the rapid characterization of diagnostic markers
  10. Summary
  11. References

These early cDNA microarray studies have begun to identify biologically distinguishable sub-categories of tumours that are defined by differential expression of both characterized and uncharacterized genes. To date, cDNA microarrays remain a relatively expensive and time-consuming approach to extend and validate these results. In addition, they rely upon fresh frozen tumour materials that are often not readily available and may be in part biased towards cases yielding larger specimens where excess material is available for research. Traditionally, pathologists have relied upon detecting protein expression in paraffin-embedded tissues using immunohistochemistry to evaluate potential markers for diagnosis and prognosis. The greatest advantage of material stored in paraffin is that very large numbers of tumours can be obtained for analysis from the archives of surgical pathology departments. Perhaps more important is the fact that many of these samples have been collected a number of years ago and hence extensive clinical follow-up data are available.

In order to characterize large numbers of potential markers using paraffin-embedded specimens, we have chosen to generate antisera against peptides from candidate genes that appear to have clinical utility in the gene expression studies performed at Stanford. Traditional approaches using immunohistochemical stains on individual tumour sections are time-consuming, expensive, and impractical for high throughput studies. Recently, Olli Kallioniemi, Guido Sauter, and colleagues have described the technique of tissue array generation24–27. In this technique, small cores are taken from a large number of paraffin-embedded tumours and then combined into an ordered array in a new paraffin block. Several hundred cores can be combined in a single paraffin block, sectioned, and then stained as a group on a single slide. An antibody stain performed on a section of one of these arrays then yields several hundred data points, in contrast to the single result obtained when conventional tumour sections are stained. Figure 7 shows a segment of one such tumour array made with 265 different human haematolymphoid malignancies and stained for bcl-2. We currently have over 1900 different human tumour and normal tissue samples represented in tissue arrays in our laboratory. These arrays will function as a tool to characterize rapidly the expression patterns of large numbers of novel antisera and their respective cognate proteins and to compare these with existing markers across a consistent sample set. We are developing software tools similar to those that we use for the gene microarray analysis to aid in the interpretation of these data. Using our lymphoma tissue arrays, we have been able to confirm the predictive performance of bcl-6 expression in large cell lymphoma as was indicated by the gene microarray study19 (Natkunam et al., in preparation). Similarly, using tissue arrays generated by the group of Guido Sauter and Kallioniemi containing over 600 breast carcinomas, we were able to confirm the prognostic value of the expression of basal-type cytokeratins inbreast carcinoma (van de Rijn et al., in preparation).

thumbnail image

Figure 7. Area from a tissue microarray with 265 different haematolymphoid neoplasms stained with bcl-2

Download figure to PowerPoint

Summary

  1. Top of page
  2. Abstract
  3. Introduction
  4. DNA microarray technologies
  5. Analysis tools
  6. Virtual and physical tumour purification
  7. Tumour expression profiling: breast carcinoma
  8. Tumour expression profiling: lymphoma
  9. Tissue arrays for the rapid characterization of diagnostic markers
  10. Summary
  11. References

Since the development of gene expression microarrays, many studies have employed them in characterizing molecular portraits of neoplasms ranging from leukaemias and lymphomas, to melanoma, to epithelial tumours8, 19, 28–31. These early studies have demonstrated that tumour tissues have gene expression patterns that are a composite of the expression profile of tumour cells as well as their normal counterparts, comprising transformed and untransformed resident cells, and inflammatory cells recruited within the immune response to the tumour. Therefore, a direct comparison of tumour to normal reveals gene expression differences that can primarily be attributed to differences in the cellular composition of the tissues, as well as other differences that can be attributed to gross phenotypic differences such as the proliferative index of the tissue. Gene expression patterns due to different cell types can be distinguished by taking advantage of the co-variation of cell-type specific genes. In this way, known tumour types can be distinguished by their expression patterns both in vitro10 and in vivo11, 19, 28, 31.

To date, the largest bodies of gene expression data collected at Stanford University have employed DNA microarrays for profiling breast carcinomas and lymphomas. In this review, we have focused on these published studies as a paradigm for how gene expression data can be used to define novel, apparently biologically-distinguishable tumour sub-categories and how to begin to explore their clinical relevance.

The early studies that Perou et al. described herein demonstrated that for breast carcinoma, most current useful clinical categories could be accurately identified by specific gene expression patterns. However, ER-positive and ERBB2-positive subtypes could now be readily distinguished by the differential expression of 20–40 genes11, many more than those traditionally employed in immunophenotyping. In addition, the classes defined by gene expression patterns are able to extend and refine existing classifications, and in some cases add new and previously unrecognized sub-classes into the existing classification. Three examples of classes recognized by gene arrays are (1) the distinction of two subtypes of diffuse large cell B-cell lymphoma with disparate clinical responses to treatment, one of which had germinal centre B-cell-like characteristics and the other of which had in vitro ‘activated’ B-cell-like characteristics19; (2) the distinction of a class of breast tumours with a basal epithelial cell phenotype characteristic of poor clinical behaviour11; and (3) the identification of a subtype of breast tumour that would have been classically described as ER-positive, yet in this study had an associated disease outcome that was almost as poor as the prognosis associated with ERBB2-positive tumours18.

It is clear that in the near future gene expression patterns in all major human tumour types will be studied using cDNA microarrays and that in many cases new tumour subtypes will be identified. However, due to both the expense and difficulty in carrying out large studies and the requirement for high-quality fresh frozen tissue, microarray-based gene expression studies may be slow to characterize sufficient cases to have enough statistical power to identify and validate all clinically relevant subtypes. It will be necessary to validate these initial gene expression-based findings using prospective studies and other retrospective approaches such as tissue arrays in order to examine very large numbers of tumours and search for correlates with clinical parameters. In the long term, it may be necessary to develop new clinical tests, or augment existing clinical tests, to score for these newly identified predictive markers. A refined and more detailed categorization of tumours, as well as the insights that come from recognizing their distinguishing biological characteristics, may encourage the development of better-targeted therapeutics.

Some have speculated that these new genomic tools will quickly provide fast and facile assays that might replace histological examination. We believe that this is highly unlikely and that instead a more incremental evolution in clinical practice will occur. New clinical tests will undoubtedly be developed based on genomic studies and then integrated with already established tests. The co-evolution of treatment options and novel classification of tumour types will likely provide the attending oncologist with additional information to help guide the disease treatment process and perhaps allow for improved outcomes and better quality of life for patients.

References

  1. Top of page
  2. Abstract
  3. Introduction
  4. DNA microarray technologies
  5. Analysis tools
  6. Virtual and physical tumour purification
  7. Tumour expression profiling: breast carcinoma
  8. Tumour expression profiling: lymphoma
  9. Tissue arrays for the rapid characterization of diagnostic markers
  10. Summary
  11. References