DNA Chips and Microarrays
Published Online: 27 JAN 2006
Copyright © 2001 John Wiley & Sons, Ltd. All rights reserved.
How to Cite
Orntoft, T. F. and Kruhøffer, M. 2006. DNA Chips and Microarrays. eLS. .
- Published Online: 27 JAN 2006
The immobilization of probes for nucleic acid hybridization on a solid support has been known for many years, but the technique was taken to a new level when thousands of molecules were placed next to each other on a glass surface in a microscopic area, with less than 100 μm between the probes. This enabled the use of small volumes of samples to be hybridized, as well as parallel analysis of thousands of genes, until examination of the whole transcribed genome was achieved in one step on approximately 1 cm2 of glass.
The perspective for this approach has always been staggering. Placing 400000 probes on a small glass surface makes it possible to design probes for sequencing, for polymorphism detection, for expression, for splice variant examination, etc. See also DNA Chip Revolution, Microarrays and Expression Profiling in Cancer, Microarrays and Single Nucleotide Polymorphism (SNP) Genotyping, and Microarrays: Use in Mutation Detection
In addition to a general miniaturization, the shift in array format from the porous nylon or nitrocellulose to solid surfaces such as glass or plastic brought several advantages. It allowed for fluorescent instead of radioactive detection, for faster and more reproducible hybridization kinetics, and for true parallelism.
The most widely used commercial array platform is from Affymetrix Incorporated and involves 25-mer oligonucleotides synthesized in situ on a solid glass wafer by a photolithographic process. For each gene, approximately 20 hybridization probes plus matching control probes are designed close to the 3′-end of the messenger ribonucleic acid (mRNA). These arrays are for single-sample measurements that are normalized and scaled for comparison with other samples. Noncommercial platforms usually utilize robotic gridders or inkjet printing technology that deposit the deoxyribonucleic acid (DNA) probes on the array's surface. Reading is made in a laser scanning confocal microscope with a resolution of approximately 2–5 μm or with CCD camera technology.
The most widespread application of arrays has, by far, been the quantification of mRNA molecules, so-called gene expression monitoring or transcriptome analysis. In this application, information about which genes are expressed and at what level is provided. The performance is linear and reproducible, and an antibody sandwich similar to an enzyme-linked immunosorbent assay (ELISA) is used to amplify the signal from weakly expressed sequences. Another approach has been sequencing in which probes that ideally cover all four possible nucleotide positions along the whole open reading frame (ORF) are used to predict the presence or absence of mutations (Wikman et al., 2000). A fast-growing application is the Single nucleotide polymorphism (SNP) array technology with the purpose of detecting variation in the genome to find polymorphisms associated to increased disease susceptibility or to, for example reduced or increased metabolism of drugs (Sekine et al., 2001). Approximately 4–6 million polymorphisms exist in the genome, the vast amount outside of ORFs. See also Expression Studies
The sample that is loaded on the expression arrays is labeled following one of several protocols. Nanograms to micrograms of total RNA are needed and labeled by an enzymatic reaction or using fluorescent-labeled or modified nucleic acids. In the two most used protocols, 5–20 μg total RNA is used to synthesize complementary DNA (cDNA) by reverse transcription and cRNA by in vitro transcription from a double-stranded DNA modified with a 3′ T7 RNA polymerase recognition site. Noncommercial systems normally incorporate fluorescent Cy5- or Cy3-conjugated nucleotides in test and reference samples, respectively, or aminoallyl nucleotides that are subsequently conjugated with the fluors. Following the Affymetrix protocol, biotin-modified ribonucleotides are incorporated during in vitro transcription and reacted with a streptavidin–phycoerythrin conjugate after hybridization. Subsequently, an antibody sandwich similar to an ELISA is used to amplify the signal in order to detect weakly expressed sequences.
Modified protocols exist in which several rounds of in vitro transcription lead to detectable amounts of labeled complementary RNA (cRNA) from a few nanograms of RNA. One approach is not to use polymerase chain reaction (PCR) but only approximately linear enzymatic amplification; but others have used PCR successfully.
After an array experiment on the Affymetrix system, the initial scan has to be scaled to a predefined standard level. This is necessary to compensate, for example, for varying efficiency during labeling of the samples; therefore it is always wise to label samples for comparison in parallel on the same day. The standardization can be based on a global scaling in which the intensity from all probes on the array is scaled to a standard level or can be based on the level of selected housekeeping genes that are supposedly at the same level in different samples under different conditions (Warrington et al., 2000). If the scaling factor between the samples to be compared differs more than approximately threefold, the validity of the comparison becomes uncertain.
The next step is bioinformatic data mining, to characterize the samples or identify genes whose expression follows a certain sample phenotype. There are various approaches to this. The first step is to eliminate genes whose expression is generally absent or noise-filled. Methods for this depend on the array system used. The second step is to identify those genes whose expression is informative in terms of following a sample phenotype or of having a differential behavior across the samples. This step will eliminate a large number of genes that do not vary much, such as housekeeping genes. Different methods exist for this purpose, for example:
Statistical evaluation based on parametric statistics such as the t-test. In this case the expression of a gene across many samples is normalized and statistically significant variation is registered.
Using a weighting scheme in which genes whose expression covaries with a certain sample phenotype are sorted for further study. Using the following formula,
Covariance measure between gene A and the increasing stage vector (or gene) B, where n is the dimensionality of vectors A and B (number of tissues). This method was used to identify genes that covaried with bladder cancer disease progression (Thykjaer et al., 2001).
Significance analysis of microarrays. This method assigns a score to each gene based on changes in gene expression relative to the standard deviation of repeated measurements (Tusher et al., 2001).
When the differentially expressed genes have been identified, the third step is to extract as much information from the pool of genes as possible. A commonly used method for this purpose is cluster analysis based on hierarchical agglomerative clustering, in which similar gene expression levels across samples lead to a tight relation between genes and a dissimilar behavior leads to clustering of such genes far from each other. The same method can be applied to samples (two-way clustering) such that samples with similar expression of a number of genes are clustered together and those that have different levels of certain groups of gene are clustered far from each other. In this way a new classification of samples based on gene expression occurs, unbiased from other assumptions of the samples. This has been very promising in identifying new subgroups of diseases based on gene expression. Such subgroups can correspond to patients with good or poor survival (van't Veer et al., 2002; Dyrskjøt et al., 2003), patients who will benefit from certain treatments, etc. As many of these projects start out with thousands of genes, it is perhaps not surprising that interesting groups of, for example, 70 genes that work as classifiers will be found. The crucial point is whether such classifying genes will reproducibly identify the same class of patients or samples in a new independent experiment. Mathematical approaches as cross-validation ‘within’ the experiment are made to support the robustness of the classifier, but cannot substitute prospective clinical studies that are quite time-consuming. See also Clustering of Highly Expressed Genes in the Human Genome, Infectomics: Study of Microbial Infections Using Omic Approaches, Microarray Bioinformatics, and Microarrays in Disease Diagnosis and Prognosis
A promising application of expression arrays has been the identification and dissection of regulatory pathways. If a certain receptor is triggered by a ligand or if an inducible signal transduction molecule is activated following transfection (Pedersen et al., 2001) or physical stimulation (Zhao et al., 2000), a very clear impression of the effect on gene transcription can be monitored. Computer software allows the direct linkage of expression data to pathways for easy visualization (Figure 1). This fast and comprehensive overview of large data sets of gene expression profiles may lead to a faster discovery of new pathways and interaction between pathways. In a similar manner, knockout or knockin mice can be used to study the biological role of specific genes. See also Expression Analysis In Vitro, Gene Expression Networks, Microarrays in Drug Discovery and Development, Microarrays in Toxicological Research, and Microarrays: Use in Gene Identification
SNP arrays are mainly used for two purposes: (1) linkage analysis of certain chromosomal loci identified by SNPs to diseases for positional cloning or to find the genetic background of patients' responses to drugs and (2) detection of allelic imbalance in tumor tissue (Figure 2) (Primdahl et al., 2002). In the first case, large, clinically well-defined family cohorts are needed, and regions segregated with disease are subjected to fine mapping with microsatellites, database searches for gene identification, and sequencing of candidate genes. In the latter case, DNA purified from microdissected tumor tissue is compared with the normal leukocyte DNA, and heterogeneous SNPs will convert to homozygous if the allelic balance changes. The method can, at least in theory, detect both losses and gains of alleles, but lost alleles are picked up mostly, probably owing to saturation problems in the PCR reaction, which has to be carried out as heavy multiplexing (more than 50 primer pairs per tube) to reduce the workload. The PCR step in which a small oligonucleotide holding the SNPs has to be amplified is the rate-limiting step in the process toward SNP arrays with more than 10000 SNPs.
The use of arrays for sequencing has been hampered by the fact that the sensitivity is not as high as with conventional, acrylic polymer-based sequencers, and identification of deletions and insertions was still very difficult in 2002. Data indicate that each probe has to be characterized with respect to the signal-to-noise ratio, etc. This makes it necessary to run a very large number of controls before utilizing these arrays (Wikman et al., 2000). See also Comparative Genomic Hybridization in the Study of Human Disease, and Microarrays and Single Nucleotide Polymorphism (SNP) Genotyping
High-density arrays are expensive tools; however, they do provide large amounts of data, and a few sets of experiments can create the platform for months or years of work. Sharing the equipment is often an advantage, as very few laboratories need 80000 data points daily. Furthermore, the setup requires trained people for labeling and experts on bioinformatic data mining that can filter and sort the heavy data load. Immediately after this, the biologists can start interpreting the data. Custom-gridded arrays can be produced at a lesser cost, but the labeling procedure is quite costly for larger clinical studies.
The many databases created worldwide holding microarray data are potentially rich sources of information. With regard to the Affymetrix system, which should be straightforward, however, other array systems are very difficult to compare, as the linearity and dynamics may vary considerably, as well as the references (pools of RNA) that are used for comparisons. Standardized commercial or publicly distributed references will probably be needed to utilize the potential of the many data worldwide that describe different species, tissues and biological conditions.
- 2003) Identifying distinct classes of bladder carcinoma using microarrays. Nature Genetics 33(1): 90–96. , , , et al., (
- 2001) Profile of differentially expressed genes mediated by the type III epidermal growth factor receptor mutation expressed in a small-cell lung cancer cell line. British Journal of Cancer 85: 1211–1218. , , , and (
- 2002) Allelic imbalances in human bladder cancer: genome-wide detection with high-density single-nucleotide polymorphism arrays. Journal of the National Cancer Institute 94: 216–223. , , , et al. (
- 2001) Identification of single-nucleotide polymorphisms (SNPs) of human N-acetyltransferase genes NAT1, NAT2, AANAT, ARD1 and L1CAM in the Japanese population. Journal of Human Genetics 46: 314–319. , , , et al. (
- 2001) Identification of gene expression patterns in superficial and invasive human bladder cancer. Cancer Research 61: 2492–2499. , , , et al. (
- 2001) Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National Academy of Sciences of the United States of America 98: 5116–5121. , and (
- 2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415: 530–536. , , , et al. (
- 2000) Comparison of human adult and fetal expression and identification of 535 housekeeping/maintenance genes. Physiological Genomics 2: 143–147. , , and (
- 2000) Evaluation of the performance of a p53 sequencing microarray chip using 140 previously sequenced bladder tumor. Clinical Chemistry 46: 1555–1561. , , , et al. (
- 2000) Analysis of p53-regulated gene expression patterns using oligonucleotide arrays. Genes and Development 4: 981–993. , , , et al. (
- 2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403: 503–511. , , , et al. (
- 2001) Minimum information about a microarray experiment (MIAME): toward standards for microarray data. Nature Genetics 29: 365–371. , , , et al. (
- 2000) Gene expression profiling: monitoring transcription and translation products using DNA microarrays and proteomics. FEBS Letters 25 480(1): 2–16. , , , et al. (
- Cesareni G (ed.) (2000) Functional genomics. FEBS Letters 480(1) (special issue). Also published online: http://www.elsevier.com/febs/show/special_iss.htt.
- 1999) Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis. Nature Genetics 22: 239–247. , , , et al. (
- 2001) Profiling of gene expression in individual hematopoietic cells by global mRNA amplification and slot blot analysis. Journal of Immunological Methods 252: 175–189. , and (
- 1998) Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. Science 280: 1077–1082. , , , et al. (
- 2002) Genome-wide cDNA microarray screening to correlate gene expression profiles with sensitivity of 85 human cancer xenografts to anticancer drugs. Cancer Research 62: 518–527. , , , et al. (