Recent research from the ENCODE project reports that the large stretches of noncoding DNA found in the human genome are not “junk” DNA, as once believed, but do indeed have biochemical function. These findings may have profound implications in the understanding of human variation and how changes in the genome result in disease.
CITATION Eckner JR, Bickmore WA, Barroso I, Pritchard JK, Gilad Y, Segal E. Genomics: ENDODE explained. Nature 2012; 489: 52–55.
CITATION ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 2012; 489: 57–74.
CITATION Maher B. ENCODE: the human encyclopedia. Nature 2012; 489: 46–48.
Summary and Analysis
Advances in genomics and bioinformatics have provided critical new insights into our understanding of the human genome—especially in identifying protein-coding genes—yet questions remain, including the functionality of the large stretches of noncoding DNA. Recently scientists in 32 labs around the world, as part of the Encyclopedia of DNA Elements (ENCODE) project, have reported in a staggering 30 papers in the journals Nature, Genome Biology and Genome Research that the widely held belief that large segments of the human genome consists of “junk” DNA is incorrect. Findings from the ENCODE project reveal that 80% of the noncoding DNA does indeed have a biochemical function in at least one cell type. Contained within the noncoding roadmap of the genome are promoters, enhancers and regions that encode RNA transcripts that have regulatory functions but are not translated into proteins (Figure 1). The implications of these findings are protean in helping us to understand human variation and how changes in the genome result in disease.
Genome-wide association studies (GWAS) have indicated that a large proportion of the single-nucleotide polymorphisms (SNPs) correlated with disease phenotype are actually located in introns or within noncoding sequences. When the ENCODE consortium examined more than 4500 SNP phenotype associations over a range of human diseases, they determined that 12% of these SNPs overlap regions that contain transcription factors, and 34% overlap DNase I hypersensitive sites, which are transcriptionally active DNA. For example, the SNP rs11742570, which has been shown to be strongly associated with Crohn's disease, overlaps a GATA2 transcription-factor-binding signal. These findings tell us that many disease-associated changes are not actually in the genes themselves, but rather in the regions that regulate the genes. Clearly, interpretation of GWAS results must consider both the coding and noncoding regions of the genome.
The group of 440 consortium scientists performed 24 different types of experiments to examine and identify regions of transcription, transcription factor association, chromatin structure and histone modification in various cell types. Most of the experiments were performed with cell lines such as HeLa, GM12878, K562 and HUVEC; a major (and very necessary) goal to be accomplished is to examine primary cells from both people with specific diseases and healthy individuals. Few experiments have been performed that specifically examined cells of the immune system (subsets of T cells, B cells) or islets, liver, heart, intestine, lung and kidney cells, but we can expect these blanks to be filled in within the next few years.
The data produced by the ENCODE consortium, which includes 5 trillion bytes of raw data representing more than 1640 genome-wide data sets from 147 cell types, is still far from complete. All ENCODE data are freely available for download and analysis at the ENCODE data coordination center at the University of California, Santa Cruz (genomepreview.ucsc.edu/encode).
So—how to navigate the massive amount of data? The published manuscripts are all freely available in the ENCODE explorer (nature.com/encode) or with the very nifty ENCODE iPad App. In addition to the individual papers, the findings are organized into “threads” so that all of the relevant insights from all the publications are contained in one document. Only time will tell whether the massive amount of data obtained in the ENCODE project will translate into improvements in the diagnosis and treatment of human disease.
Dr. Krams is associate professor of surgery at Stanford School of Medicine, Stanford, Calif.
Dr. Bromberg is professor of Surgery and Microbiology and Immunology, and is the chief of the Division of Transplantation, University of Maryland Medical Center, Baltimore.