Notes: LINE, long interspersed nuclear element.
Sequencing the Human Genome: Novel Insights into its Structure and Function
Published Online: 15 JUL 2008
Copyright © 2001 John Wiley & Sons, Ltd. All rights reserved.
How to Cite
Kehrer-Sawatzki, H. and Cooper, D. N. 2008. Sequencing the Human Genome: Novel Insights into its Structure and Function. eLS. .
- Published Online: 15 JUL 2008
This is not the most recent version of the article. View current version (15 NOV 2016)
- Top of page
- Key Findings Made by the Human Genome Project and their Impact on our Understanding of the Structure and Function of the Human Genome
- Further Reading
The Human Genome Project (HGP) was launched in 1990 with the goal of sequencing the entire human genome. This project was undertaken as a collaborative venture by some 20 groups from the United States, the United Kingdom, Japan, France, Germany and China. In 2001, the results of this huge effort, entitled ‘Initial sequencing and analysis of the human genome’, were published in Nature under the banner of the International Human Genome Sequencing Consortium (IHGSC) (Lander et al., 2001). This initial draft sequence covered approximately 90% of the human genome with a redundancy of 4- to 5-fold. In the same year, Celera Genomics also reported a draft sequence of the human genome (Venter et al., 2001). While the version of the human genome sequence produced by the IHGSC was derived from the sequencing of chromosomally mapped and ordered clones, the genome sequence published by Celera Genomics was obtained by random whole-genome shotgun sequencing. The IHGSC assembly represented a composite derived from the genomes of numerous donors, whereas the Celera version of the genome was a consensus sequence derived from only five individuals.
Both draft sequences had major shortcomings, namely >100 000 sequence gaps and incomplete coverage of the euchromatic portions of the genome. The next steps taken towards the ‘finishing’ of the human genome sequence served to increase the coverage of the euchromatic regions (containing the vast majority of the genes) and to close the gaps between contigs. These efforts, which included the integration of several reported clone contigs (McPherson et al., 2001) as well as the Celera scaffolds (Venter et al., 2001), culminated in the ‘finishing’ of the human genome sequence such that it covered more than 95% of its euchromatic portion. This was quite a challenging task because the human genome contains a multitude of dispersed repeats and large segmental duplications which greatly complicate the determination of both its structure and sequence. In 2004, the IHGSC published an improved version of the human genome sequence, which converted it from a draft into a nearly complete genome sequence with a high degree of accuracy. This version of the human genome (Build 35) contained 2.85 billion nucleotides (2850 Mb) interrupted by only 341 gaps, providing coverage of approximately 99% of the euchromatic genome with an error rate of approximately 1 event per 100 000 bases (International Human Genome Sequencing Consortium, 2004; Schmutz et al., 2004a, 2004b). Since then, significant further progress has been made towards the actual completion of the human genome sequence; indeed, the complete euchromatic sequences of all individual human chromosomes, including the annotation of genes and other features, have now been published (summarized in Table 1). Since November 2005, the NCBI (National Center for Biotechnology Information) Build 36 assembly of the human genome sequence has been available in public databases. The data comprise a reference assembly of the complete genome sequence plus the Celera whole genome sequence (WGS) and a number of alternative assemblies of individual haplotypic chromosomes or regions. The full list of assemblies in NCBI 36 as well as the genome sequences are available through the genome browsers:
|Chromosome||Chromosome length (bp)a||Number of known protein-coding genes per chromosomea||Gene density (genes/Mb)||Special features of the chromosome||Reference|
|1||247 249 719||2189||8.85||Largest human chromosome. Rich in disease genes. Huge (∼30 Mb) pericentromeric heterochromatic region at 1q12 spans approximately 5% of the length of the chromosome. Contains clusters of amylase genes (1p21), U1 snRNA genes (1q12–q22) and %S RNA genes (1q) as well as multiple (∼250) tRNA genes||Gregory et al. 2006|
|2||242 951 149||1328||5.47||Chromosome 2 (along with chromosome 4) exhibits the lowest recombination rate of all the autosomes. Contains at 2q13 an ancient telomere–telomere fusion junction at the position where two ape chromosomes once fused to give rise to this human chromosome||Hillier et al. 2005|
|3||199 501 827||1112||5.57||Lowest rate of segmental duplication of all human chromosomes. Contains several olfactory receptor gene clusters||Muzny et al. 2006|
|4||191 273 063||797||4.17||Chromosome 4 (along with chromosome 2) exhibits the lowest recombination rate of all the autosomes. Highest percentage of LINE among all chromosomes||Hillier et al. 2005|
|5||180 857 866||903||4.99||Rich in intrachromosomal duplications. Contains interleukin and protocadherin gene clusters on 5q31||Schmutz et al. 2004b|
|6||170 899 992||1133||6.62||Harbours the major histocompatibility complex and the largest tRNA gene cluster in the human genome. Contains at least 3 imprinted genes||Mungall et al. 2003|
|7||158 821 424||1023||6.44||Contains the highest number of intrachromosomal duplications among all human chromosomes. Contains at least 6 imprinted genes||Hillier et al. 2003 and Scherer et al. 2003|
|8||146 274 826||747||5.11||Contains a fast evolving 15 Mb region on distal 8p with genes related to the innate immunity and nervous systems that appear to have evolved under positive selection||Nusbaum et al. 2006|
|9||140 273 252||929||6.62||Structurally highly polymorphic. Contains the large (∼14 Mb) block of pericentromeric heterochromatin. Contains large numbers of intra- and interchromosomal segmental duplications as well as the largest interferon gene cluster in the human genome (9p22)||Humphray et al. 2004|
|10||135 374 737||834||6.16||Region of extensive segmental duplication located on 10q11||Deloukas et al. 2004|
|11||134 452 384||1385||10.30||Rich in both genes and disease genes. Contains 40% of all olfactory receptor gene clusters. Contains at least 9 imprinted genes||Taylor et al. 2006|
|12||132 349 534||1080||8.16||Chromosome 12 has a unique history of evolutionary rearrangements that occurred in the rodent and primate lineages. Contains clusters of proline-rich protein and type II keratin genes at 12q13||Scherer et al. 2006|
|13||114 142 980||361||3.16||Low gene density in general; contains a central 38 Mb segment where the gene density drops to only 3.1 genes per Mb. This acrocentric chromosome contains ribosomal RNA genes at 13p12 and at least 1 imprinted gene||Dunham et al. 2004|
|14||106 368 585||669||6.29||This acrocentric chromosome contains ribosomal RNA genes at 14p12. Contains two 1 Mb regions of crucial importance to the immune system (T-cell receptor and immunoglobulin heavy-chain genes). Contains serpin gene cluster at 14q32.1 and several regions with imprinted genes||Heilig et al. 2003|
|15||100 338 915||641||6.39||This acrocentric chromosome contains ribosomal RNA genes at 15p12. Two large clusters of clinically important segmental duplications are located in the proximal and distal regions of 15q. Contains a number of imprinted genes||Zody et al. 2006a|
|16||88 827 254||925||10.41||Relatively high gene density. Contains a large number of segmental duplications||Martin et al. 2004|
|17||78 774 742||1236||15.69||High gene density. Has undergone extensive intrachromosomal rearrangement, many of which were probably mediated by segmental duplications. High G+C content of 45% (genome average: 41%)||Zody et al. 2006b|
|18||76 117 153||295||3.88||Low gene density overall. Contains serpin gene cluster at 18q21.3||Nusbaum et al. 2005|
|19||63 811 651||1443||22.61||Highest gene density of all human chromosomes. One quarter of the genes on chromosome 19 belong to tandemly arranged gene families, encompassing 25% of the length of the chromosome. High G+C content of 48–49% (genome average: 41%). Repetitive sequences constitute 53–57% of the chromosome as compared with a genome average of 40–44%. Contains clusters of olfactory receptor genes and cytochrome P450 genes and multiple clusters of zinc-finger genes, and at least 2 imprinted genes||Grimwood et al. 2004|
|20||62 435 964||617||9.88||Smallest metacentric autosome. Rich in both genes and disease genes. Contains type 2 cystatin gene cluster and at least two imprinted genes||Deloukas et al. 2001|
|21||46 944 323||284||6.05||Smallest human chromosome with fewer genes than any other autosome. This acrocentric chromosome contains ribosomal RNA genes at 21p12||Hattori et al. 2000|
|22||49 691 432||519||10.44||This acrocentric chromosome contains ribosomal RNA genes at 22p12. Relatively high gene density. Clusters of segmental duplications at 22q11.2 are associated with several genomic disorders||Dunham et al. 1999|
|X||154 913 754||891||5.75||Contains the pseudoautosomal regions, PAR1 and PAR2, at the tips of the short and long arms, respectively. These regions are essential for normal male meiosis and recombination. PAR1 undergoes an obligate crossover with the Y, thereby giving this region the highest recombination rate in the human genome, at least in males. One X-chromosome is subject to inactivation in females. Highly enriched in interspersed repeats and has a low G+C content of 39% (genome average: 41%)||Ross et al. 2005|
|Y||57 772 954||80||1.38||Lowest gene density of all human chromosomes (contains only 82 known genes). Contains the male-specific region which is a mosaic of heterochromatin and euchromatic X-transposed, X-degenerate and ampliconic sequences, that make-up 30% of the euchromatin. PAR1 undergoes an obligate crossover with the X-chromosome. The virtual absence of homologous recombination between the X- and the Y-chromosome has led to a gradual degeneration of Y chromosomal genes over evolutionary time. However, the absence of recombination, at least within the extensive nonrecombining region of the Y, has also favoured the evolutionary accumulation of transposable elements on the Y chromosome||Skaletsky et al. 2003|
Key Findings Made by the Human Genome Project and their Impact on our Understanding of the Structure and Function of the Human Genome
- Top of page
- Key Findings Made by the Human Genome Project and their Impact on our Understanding of the Structure and Function of the Human Genome
- Further Reading
Gene number and density
Among the most publically discussed results of the HGP has been the number of genes in the human genome. In the latest assembly of the human genome (Build 36), which covers a total of 3 253 037 807 base pairs, 23 686 known and novel protein-coding genes have been annotated (genebuild: Ensembl 2007, database version 47.36i; http://www.ensembl.org/Homo_sapiens/index.html). Gene density varies between the human chromosomes, allowing one to distinguish gene-rich and gene-poor chromosomes (Table 1). The gene distribution within chromosomes is also rather uneven. Thus, strikingly gene-poor regions have been identified (‘gene deserts’; Ovcharenko et al., 2005); these are regions that are devoid of protein-coding genes over distances of several megabases but may nevertheless contain regulatory sequences. Functional clustering of genes, and the coexpression of these genes located in distinct chromosomal domains, has also been observed (Yamashita et al., 2004; Gierman et al., 2007) and these properties have often been conserved over evolutionary time (Sémon and Duret, 2006). See also Clustering of Highly Expressed Genes in the Human Genome, Evolution of Gene Deserts in the Human Genome, Gene Clustering in Eukaryotes, and Gene Distribution in Human Chromosomes
Nonprotein-coding RNAs and transcripts of unknown function
The analysis of the human genome sequence has revealed that, in addition to protein-coding genes, several thousand ribonucleic acid (RNA) genes are present. Nonprotein-coding RNAs of known function include not only structural RNAs such as transfer RNAs, ribosomal RNAs and small nuclear RNAs but also regulatory RNAs (microRNAs and small interfering RNAs (siRNAs)) which are involved in the sequence-specific transcriptional and posttranscriptional modulation of gene expression (Kapranov et al., 2007b). MicroRNA gene loci may be quite numerous: already some 5000 microRNA gene loci have been identified (miRBase, release 10.0; http://microrna.sanger.ac.uk/sequences). See also Evolutionarily Conserved Noncoding DNA , MicroRNA Evolution in the Human Genome, rRNA Genes: Evolution, The Biological Significance of Conserved Nongenic DNA, and Ultraconserved Elements (UCEs) in the Human Genome
In addition to the unambigously noncoding RNAs, large numbers of nonpolyadenylated and polyadenylated transcripts of unknown function (TUF) have been identified which may have some coding potential. Since it may well be that they either represent noncoding transcripts or instead encode short polypeptides, these transcripts are classified collectively as ‘TUFs’. These unannotated transcribed regions or TUFs have been assigned to three different categories:
Both the complexity and abundance of TUFs are quite remarkable. Indeed, unannotated nonpolyadenylated transcripts originating from intergenic regions have been found to represent the major proportion of the transcriptional output of the human genome (Cheng et al., 2005).
The existence of this additional layer of transcriptional complexity has also been evident from data obtained by the Encyclopedia of DNA Elements (ENCODE) project to analyse 30 Mb from 44 genomic regions with the aim of characterizing the functional elements present in these sequences (ENCODE Project Consortium, 2007; Thomas et al., 2007). More than 65% of the approximately 400 annotated genes present in the ENCODE regions possess 5′ distal previously unannotated, tissue-specific transcription start sites and promoter regions, many of which form parts of TUFs (Denoeud et al., 2007). Importantly, a compilation of all previously annotated and empirically detected RNAs by the ENCODE Consortium has indicated that >90% of genomic sequence is transcribed as nuclear primary transcripts (ENCODE Project Consortium, 2007). Thus, it would appear that the majority of bases on both strands in the human genome probably play some part in encoding at least one primary transcript (Kapranov et al., 2007a).
Expressed pseudogenes may be considered as a special category among TUFs since even if some of them have lost their ability to encode a functional protein, they still may be transcribed (Zheng et al., 2007). It has been estimated that the number of pseudogenes could well exceed the number of functional protein-coding genes (International Human Genome Sequencing Consortium, 2004). It is highly likely that at least some of the pseudogenes have acquired new function by encoding either novel proteins or regulatory RNAs. See also Evolutionary Emergence of Genes Through Retrotransposition, Processed Pseudogenes and Their Functional Resurrection in the Human and Mouse Genomes, and Pseudogene Evolution in the Human Genome
Sequence elements controlling gene expression
The comparison of the human genome sequence with the orthologous sequences of mouse and rat revealed the existence of 481 ultraconserved elements (UCEs) of at least 200 bp (Bejerano et al., 2004). Most UCEs are noncoding and have been evolutionarily conserved since the divergence of the mammalian and avian lineages more than 300 million years ago. Analysis of the derived allele frequencies for the segregating polymorphisms in the human UCEs has indicated that these regions have been under negative selectional constraints which have been much more stringent than those normally acting on protein-coding genes (Katzman et al., 2007). This observation strongly supports the functionality of these UCEs, which may represent long-range enhancers of gene expression (Pennacchio et al., 2006).
The availability of the increasingly well-annotated human genome sequence has to some extent obviated the need for the individualized experimental identification of the promoter regions and cis-acting elements that regulate gene expression. However, to understand the complexity of gene expression regulation and to identify the full spectrum of the different deoxyribonucleic acid (DNA) sequences involved, the ENCODE project (Thomas et al., 2007) was conceived; this project has attempted to make the leap from structural to functional analysis by examining more than 200 experimental datasets from studies which have interrogated the 44 ENCODE regions (together representing approximately 1% of the human genome). The conclusions from this collaborative approach should be seen as second only in importance to the HGP itself in terms of our effort to characterize the human genome.
Revisiting our definition of the ‘gene’
The ENCODE project has suceeded in doing something that the HGP could not, namely to change the way in which we think about genes. The complexity exemplified by gene regulatory elements which are often quite distant from the genes they regulate, the existence of trans as well as cis regulatory elements, the quite unanticipated scale of the extent of transcription in the genome, the abundance of noncoding RNA genes and the presence of evolutionarily conserved noncoding regions have together challenged current notions of the gene. Gerstein et al. 2007 have proposed an updated definition of a gene as ‘a union of genomic sequences encoding a coherent set of potentially overlapping functional products’. This definition deftly avoids the complexities of regulation and transcription by removing the former altogether from the definition of a gene. Instead, this definition argues that it is the final functional gene products (rather than any intermediate transcripts) that should be used to group together the various entities that may be associated with a single gene.
High-copy repeat sequences and segmental duplications
The HGP revealed that repeat sequences account for at least 50% of the human genome sequence. These repeats may be classified as
- (i)transposon-derived repeats,
- (ii)partially retroposed copies of genes (referred to as processed pseudogenes),
- (iii)simple sequence repeats,
- (iv)blocks of tandemly repeated sequences at centromeres, telomeres and the short arms of acrocentric chromosomes and
- (v)segmental duplications (SDs) or low-copy repeats.
The number and wide distribution of SDs in the human genome (5%) were most surprising. SDs represent extensive inter- and intrachromosomal duplications of genomic regions that contain genes as well as intergenic sequences (Lander et al., 2001; Venter et al., 2001). She et al. 2004 extended the initial analyses of these low-copy repeats/segmental duplications and initiated the characterization of the duplicational landscape of the human genome. SDs may be viewed as mutational hotspots since they are prone to aberrant recombination events occurring between highly homologous paralogous SDs, and giving rise to large deletions or duplications of the intervening sequences resulting in human genomic disorders (Shaw and Lupski, 2004). However, the rapid expansion and fixation of some intrachromosomal SDs during hominoid evolution may have contributed to the emergence of ‘new’ genes and transcripts embedded within these SDs, thereby conferring some selective advantage in the process (Jiang et al., 2007). SDs have also been shown to represent frequent sites of copy number variation between individuals, thereby contributing considerably to the genomic diversity among humans. See also Segmental Duplications and Genetic Disease, and Structural Diversity of the Human Genome and Disease Susceptibility
The detailed analysis of the repeat distribution in the human genome was key to answering the long standing mystery of Alu sequence enrichment in GC (guanine–cytosine)-rich genomic regions: strong positive selection appears to favour the retention of Alu sequences in GC-rich regions, which may be in some way beneficial to their hosts (Lander et al., 2001).
Genetic diversity of the human genome
Initially, more than 1.4 million single nucleotide polymorphisms (SNPs) were identified in the human genome (Lander et al., 2001). These have been exploited by the Human Haplotype Map (HapMap) project with the aim of developing methods for the design and analysis of genome-wide association studies to map phenotypic variation in humans (International HapMap Consortium, 2005). In the meantime, a second generation haplotype map based upon 3.1 million SNPs has been published (International HapMap Consortium, 2007). The map was obtained by genotyping 270 individuals from four geographically and ethnically diverse populations and includes approximately 25–35% of common SNP variation in the populations investigated. One novel finding has been that 10–30% of pairs of individuals within a population share at least one region of extended genetic identity arising from recent common ancestry. An additional discovery was that up to 1% of all common variants were not tagged by SNPs, primarily because they were located within recombination hotspots (International HapMap Consortium, 2007). Importantly, increased population differentiation at nonsynonymous SNPs was noted as compared to synonymous SNPs. These observations have also indicated systematic differences in the strength or efficacy of natural selection between populations from different geographical areas involving genes linked to Lassa virus in West Africa, skin pigmentation in Europe and hair follicle development in Asia (Sabeti et al., 2007). See also Evolution of Skin Pigmentation Differences in Humans, and HapMap Project
In addition to SNPs, copy number variants and polymorphic inversions have also been shown to contribute to human genomic diversity as evidenced from the results of genome assembly comparisons, array comparative genomic hybridization (arrayCGH) and mapping of large insert clones (Khaja et al., 2006; Redon et al., 2006). This type of genomic variation is likely to have a considerable impact on disease susceptibility in humans as evidenced by several examples. See also Copy Number Variation in the Human Genome, Segmental Duplications and Genetic Disease, Segmental Duplications: A Source of Diversity, Evolution and Disease, and Structural Diversity of the Human Genome and Disease Susceptibility
Reconstruction of ancestral mammalian/eutherian genomes
The sequence of the human genome has not only helped to improve our understanding of its structure and function, and to explore the full range of human genotypic diversity, but also provided the key to understanding the evolutionary history of the human species as well as individual human populations. The importance of human–mammalian genome comparative sequence analysis for the reconstruction of the ancestral eutherian genome has been demonstrated by several studies (e.g. Murphy et al., 2007). Together with other techniques such as comparative chromosome painting, these sequence comparisons have the potential to provide new insights into the evolutionary interrelationship of the different eutherian orders within the mammalian phylogenetic tree. See also Comparing the Human and Canine Genomes, Comparing the Human and Chimpanzee Genomes, The Mouse Genome as a Rodent Model in Evolutionary Studies, The Rat Genome as a Rodent Model in Evolutionary Studies, and The Sequencing of the Rhesus Macaque Genome and its Comparison with the Genome Sequences of Human and Chimpanzee
- Top of page
- Further Reading
The analysis of the sequence of the human genome has had a major impact on biomedical research over the last few years. The HGP has made possible a multitude of genome-wide scaled analyses and has thus provided a wealth of information about the structure of the human genome. In many ways, the HGP has paved the way for what is coming to be called individualized genome medicine. The development of new technologies for improved, less cost-intensive and more precise genome sequencing and assembly has been driven by the overwhelming success of the HGP. The recent sequencing of an individual human's entire diploid genome and its comparison with the human reference sequence (Levy et al., 2007) has yielded new insights into the extent of genetic variation and marks a starting point of a new era of research into the basis of human genetic individuality.
- Antisense transcript
Antisense transcripts control gene expression via posttranscriptional gene silencing by annealing to the complementary sequence of the sense transcript.
An ordered arrangement of overlapping cloned fragments that together contain the sequence of the originally contiguous DNA strand.
A short region of DNA that can bind a transcriptional activator protein, thereby initiating the transcription of a gene which may be distant to the enhancer, possibly even on a different chromosome.
Portion of the genome which contains the euchromatin, a form of chromatin that is rich in actively transcribed genes.
- Segmental duplication
Genomic duplication of a DNA segment longer than 1 kb.
- Ultraconserved element
DNA sequences which have remained unchanged over an extended period of evolutionary time (indicating that they are biologically important) but whose functions remain largely unknown.
- Top of page
- Further Reading
- 2004) Ultraconserved elements in the human genome. Science 304: 1321–1325. , , et al. (
- 2005) Human antisense genes have unusually short introns: evidence for selection for rapid transcription. Trends in Genetics 21: 203–207. , , , and (
- 2005) Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science 308: 1149–1154. , , et al. (
- 2003) The Human Genome Project: lessons from large-scale biology. Science 300: 286–290. , and (
- 2004) The DNA sequence and comparative analysis of human chromosome 10. Nature 429: 375–381. , , et al. (
- 2001) The DNA sequence and comparative analysis of human chromosome 20. Nature 414: 865–871. , , et al. (
- 2007) Prominent use of distal 5′ transcription start sites and discovery of a large number of additional exons in ENCODE regions. Genome Research 17: 746–759. , , et al. (
- 2004) The DNA sequence and analysis of human chromosome 13. Nature 428: 522–528. , , et al. (
- 1999) The DNA sequence of human chromosome 22. Nature 402: 489–495. , , et al. (
- ENCODE Project Consortium, , , et al. (2007) Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447: 799–816.
- 2007) What is a gene, post-ENCODE? History and updated definition. Genome Research 17: 669–681. , , et al. (
- 2007) Domain-wide regulation of gene expression in the human genome. Genome Research 17: 1286–1295. , , et al. (
- 2007) Origin of phenotypes: genes and transcripts. Genome Research 17: 682–690. (
- 2006) The DNA sequence and biological annotation of human chromosome 1. Nature 441: 315–321. , , et al. (
- 2004) The DNA sequence and biology of human chromosome 19. Nature 428: 529–535. , , et al. (
- Chromosome 21 Mapping and Sequencing Consortium (2000) The DNA sequence of human chromosome 21. Nature 405: 311–319. , , et al.,
- 2003) The DNA sequence and analysis of human chromosome 14. Nature 421: 601–607. , , et al. (
- 2003) The DNA sequence of human chromosome 7. Nature 424: 157–164. , , et al. (
- 2005) Generation and annotation of the DNA sequences of human chromosomes 2 and 4. Nature 434: 724–731. , , et al. (
- 2004) DNA sequence and analysis of human chromosome 9. Nature 429: 369–374. , , et al. (
- 2005) A haplotype map of the human genome. Nature 437: 1299–1320. (
- 2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449: 851–861. (
- 2004) Finishing the euchromatic sequence of the human genome. Nature 431: 931–945. (
- 2007) Ancestral reconstruction of segmental duplications reveals punctuated cores of human genome evolution. Nature Genetics 39: 1361–1368. , , et al. (
- 2007b) RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science 316: 1484–1488. , , et al. (
- 2007a) Genome-wide transcription and the implications for genomic organization. Nature Reviews. Genetics 8: 413–423. , and (
- 2007) Human genome ultraconserved elements are ultraselected. Science 317: 915. , , et al. (
- 2006) Genome assembly comparison identifies structural variants in the human genome. Nature Genetics 38: 1413–1418. , , et al. (
- International Human Genome Sequencing Consortium (2001) Initial sequencing and analysis of the human genome. Nature 409: 860–921. , , et al.,
- 2007) The diploid genome sequence of an individual human. PLoS Biology 5: e254. , , et al. (
- 2005) Structure and function of the human genome. Genome Research 15: 1759–1766. (
- 2004) The sequence and analysis of duplication-rich human chromosome 16. Nature 432: 988–994. , , et al. (
- International Human Genome Mapping Consortium (2001) A physical map of the human genome. Nature 409, 934–941. , , et al.,
- 2003) The DNA sequence and analysis of human chromosome 6. Nature 425: 805–811. , , et al. (
- 2007) Using genomic data to unravel the root of the placental mammal phylogeny. Genome Research 17: 413–421. , , , and (
- 2006) The DNA sequence, annotation and analysis of human chromosome 3. Nature 440: 1194–1198. , , et al. (
- 2006) DNA sequence and analysis of human chromosome 8. Nature 439: 331–335. , , et al. (
- 2005) DNA sequence and analysis of human chromosome 18. Nature 437: 551–555. , , et al. (
- 2005) Evolution and functional classification of vertebrate gene deserts. Genome Research 15: 137–145. , , et al. (
- 2006) In vivo enhancer analysis of human conserved non-coding sequences. Nature 444: 499–502. , , et al. (
- 2006) Global variation in copy number in the human genome. Nature 444: 444–454. , , et al. (
- 2005) The DNA sequence of the human X chromosome. Nature 434: 325–337. , , et al. (
- 2007) Genome-wide detection and characterization of positive selection in human populations. Nature 449: 913–918. , , et al. (
- Baylor College of Medicine Human Genome Sequencing Center Sequence Production Team (2006) The finished DNA sequence of human chromosome 12. Nature 440: 346–351. , , et al.,
- 2003) Human chromosome 7: DNA sequence and biology. Science 300: 767–772. , , et al. (
- 2004a) The DNA sequence and comparative analysis of human chromosome 5. Nature 431: 268–274. , , et al. (
- 2004b) Quality assessment of the human genome sequence. Nature 429: 365–368. , , et al. (
- 2006) Evolutionary origin and maintenance of coexpressed gene clusters in mammals. Molecular Biology and Evolution 23: 1715–1723. and (
- 2004) Implications of human genome architecture for rearrangement-based disorders: the genomic basis of disease. Human Molecular Genetics 13: R57–R64. and (
- 2004) Shotgun sequence assembly and recent segmental duplications within the human genome. Nature 431: 927–930. , , et al. (
- 2003) The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes. Nature 423: 825–837. , , et al. (
- 2006) Human chromosome 11 DNA sequence and analysis including novel gene identification. Nature 440: 497–500. , , et al. (
- ENCODE Project Consortium (2007) The ENCODE Project at UC Santa Cruz. Nucleic Acids Research 35: D663–D667. , , et al.,
- 2007) The implications of alternative splicing in the ENCODE protein complement. Proceedings of the National Academy of Sciences of the USA 104: 5495–5500. , , et al. (
- 2001) The sequence of the human genome. Science 291: 1304–1351. , , et al. (
- 2004) Genome-wide transcriptome mapping analysis identifies organ-specific gene expression patterns along human chromosomes. Genomics 84: 867–875. , , et al. (
- 2007) Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription, and evolution. Genome Research 17: 839–851. , , et al. (
- 2006b) DNA sequence of human chromosome 17 and analysis of rearrangement in the human lineage. Nature 440: 1045–1049. , , et al. (
- 2006a) Analysis of the DNA sequence and duplication history of human chromosome 15. Nature 440: 671–675. , , et al. (
- Top of page
- Further Reading
- 2007) Functional constraint and small insertions and deletions in the ENCODE regions of the human genome. Genome Biology 8: R180. , , et al. (
- 2006) EGASP: the human ENCODE Genome Annotation Assessment Project. Genome Biology 7(suppl. 1): S2.1–S31. , , et al. (
- 2005) The Y chromosome as a target for acquired and amplified genetic material in evolution. BioEssays 27: 1256–1262. , and (
- 2004) Integrative annotation of 21 037 human genes validated by full-length cDNA clones. PLoS Biology 2: e162. , , et al. (
- 2004) Whole-genome shotgun assembly and comparison of human genome assemblies. Proceedings of the National Academy of Sciences of the USA 101: 1916–1921. , , et al. (
- 2004) Closing the gaps on human chromosome 19 revealed genes with a high density of repetitive tandemly arrayed elements. Genome Research 14: 239–246. , , et al. (
- 2004) Quality assessment of the human genome sequence. Nature 429: 365–368. , , et al. (
- Ludwig-FAPESP Transcript Finishing Initiative (2004) A transcript finishing initiative for closing gaps in the human transcriptome. Genome Research 14: 1413–1423. , , et al.,
- 2006) Large-scale identification and characterization of alternative splicing variants of human gene transcripts using 56 419 completely sequenced and manually annotated full-length cDNAs. Nucleic Acids Research 34: 3917–3928. , , et al. (
- 2007) Structured RNAs in the ENCODE selected regions of the human genome. Genome Research 17: 852–864. , , et al. (
- 2007) The human promoter methylome. Nature Genetics 39: 442–443. (