SEARCH

SEARCH BY CITATION

Keywords:

  • nomenclature;
  • variation;
  • allele;
  • evolution;
  • bioinformatics;
  • haplotype

Abstract

The classical view of what constitutes an “allele” has been challenged by recent findings of a great deal of human genetic variability, i.e., we can expect, on average, one variant site every 100–250 bases of our haploid genome. The haplotype is defined as “the patterns of co-occurrence of variant sites on the same chromosome” (and therefore within each particular gene). Sufficient evidence exists for the divergence of haplotypes during evolution of Homo sapiens sapiens, and the total number of haplotypes per gene will reflect the amount of time any particular ethnic group has existed on the planet, e.g., greatest in Africans, fewer in East Asians, and still fewer in Caucasians. If the average gene spans 30 kb, we can expect ∼170 polymorphic variant sites per gene in the world population. We do not see 2170 haplotypes, however; we might find only 10 to 200 haplotypes (depending on the gene's size and degree of conservation of the gene product). This finite number allows for a reasonable haplotype nomenclature system for each gene, based on evolutionary divergence. For polymorphic variants (i.e., frequency ≥ 0.01), I propose using Arabic numerals for the major clades (e.g., *1, *2, … *20, *21), capital letters for sublineages (e.g., *2A, *2B, *2C), and Arabic numerals for sub-sublineages (e.g., *22G12, *22G13); additional subcategories may be added, in an alternating number/letter/number/letter sequence, depending on the complexity of present-day haplotypes of a particular gene. Web sites with a web master and external advisory committee should be set up for each gene superfamily, family, or individual gene (depending on complexity), and an international haplotype nomenclature committee, perhaps comprised of several dozen of these web masters, should oversee haplotype nomenclature for the entire human genome. The higher heterozygosity and multiallelic nature makes haplotypes more informative than biallelic SNPs. Ultimately, our knowledge of haplotype patterns, rather than single variant sites, of perhaps several hundred genes will likely be helpful in finding associations between genotype and any multiplex phenotype (e.g., complex diseases including cancer, and/or toxicity of pharmaceutical agents or environmental pollutants). Hum Mutat 20:463–472, 2002. © 2002 Wiley-Liss, Inc.