To the editor:

We report a series of patients in whom diagnosis of the 619 bp-deletion Indian β°-thalassemia mutation was confounded by a systematic annotation error in the public domain. The index patient is a healthy 27-year old Asian Indian woman with known β-thalassemia trait who had molecular genetic testing performed for preconception planning. Complete β-globin gene sequencing and gap-PCR for the 619 bp-deletion were initially negative, which prompted further investigation. Redesigning the 3′ gap-PCR primer to a more distal location in the β-globin gene uncovered the 619 bp-deletion. Sequencing across the deletion revealed the breakpoints to be shifted 132 bp 3′ of the breakpoints reported in dbSNP (rs63751625) and the globin server (, databases that had been used to design the original 3′ gap-PCR primer (online Supporting Information Fig. 1). Sequencing of five additional unrelated patients with the 619 bp-deletion, and careful literature review [1, 2], provide strong evidence that the breakpoint provided in the dbSNP and globin server annotation is incorrect. The correct annotation for the 619 bp-deletion in HGVS nomenclature using reference sequence NM_000518.4 is c.316-149_*342delinsAAGTAGA (deletion of nucleotides 1,197–1,816 relative to the start of transcription). This report highlights the need for caution in the use of public databases during assay development, and the importance of alternative verification methods.

The 619 bp-deletion, which abolishes the 3′ region of the β-globin gene, is one of the most common β°-thalassemia mutations, comprising >50% of β-thalassemia in some Indian subpopulations [1]. This deletion was first described in 1979 by restriction endonuclease mapping and subsequently characterized by DNA sequencing as a 619 bp-deletion with a heptanucleotide insertion (AAGTAGA) between the breakpoints [3–5].

Molecular genetics laboratories that test for β-thalassemia commonly employ a gap-PCR strategy to detect the 619 bp-deletion that relies on knowledge of the deletion breakpoints [1, 2]. We present evidence that an error was propagated in multiple public databases regarding the breakpoint of the 619 bp-deletion, and that this oversight has important implications for manufacturers and laboratories developing or performing molecular β-thalassemia testing (see online Supporting information for detailed results and methods). This error appears to have been propagated from Huisman's 1997 textbook “A Syllabus of Thalassemia Mutations,” which was used to annotate the globin server and dbSNP [6]. The annotation error caused considerable confusion at our two independent clinical laboratories during development of β-globin molecular assays [Mikula et al., In Preparation]. Importantly, the interpretation of patient results was complicated as a result of relying on incorrect information in the public domain.

High annotation error rates have been described for public databases including dbSNP and the globin server (HbVar database) [7–9]. A 2008 study found and corrected mostly minor annotation errors in 13% of the entries in HbVar, with 49% of the entries not evaluated [9]. Error rates in dbSNP have been estimated to be as high as 15–17% [7].

Genetic tests should ideally be validated with patient samples corresponding to the complete spectrum of possible mutations. However, this is impractical when the mutational spectrum is broad, the complexity and expense of the assay is high, the test volume is low, and positive patient samples are difficult to obtain. Efforts to create standardized molecular reference materials are underway for common tests, but are unlikely to address lower-volume genetic tests in the near future [10]. As next-generation sequencing and other high-throughput technologies move into the clinical laboratory, it will become even more difficult to validate the range of mutations that one might encounter. Therefore, genetics laboratories will increasingly need to rely on information provided in the public domain to design and interpret new tests. Our report highlights the need for caution against reliance on in silico validation during assay development, and the value of alternative verification methods such as preparation of synthetic DNA controls, PCR and sequencing with alternate primer sets, and interlaboratory exchange of challenge samples.


We would like to acknowledge Drs. Deborah Barden and Karen Stephens, who helped with sequence data analysis and reviewed the manuscript. We thank Cathi Rubin Franklin and Monica Gallivan for providing patient samples with the 619 bp-deletion.


  1. Top of page
  2. Supporting Information
  • 1
    Varawalla NY,Old JM,Sarkar R, et al. The spectrum of beta-thalassaemia mutations on the Indian subcontinent: the basis for prenatal diagnosis. Br J Haematol 1991; 78: 242247.
  • 2
    Wang W,Kham SK,Yeo GH, et al. Multiplex minisequencing screen for common Southeast Asian and Indian beta-thalassemia mutations. Clin Chem 2003; 49: 209218.
  • 3
    Orkin SH,Kolodner R,Michelson A,Husson R. Cloning and direct examination of a structurally abnormal human beta 0-thalassemia globin gene. Proc Natl Acad Sci USA 1980; 77: 35583562.
  • 4
    Orkin SH,Old JM,Weatherall DJ,Nathan DG. Partial deletion of beta-globin gene DNA in certain patients with beta 0-thalassemia. Proc Natl Acad Sci USA 1979; 76: 24002404.
  • 5
    Spritz RA,Orkin SH. Duplication followed by deletion accounts for the structure of an Indian deletion beta (0)-thalassemia gene. Nucleic Acids Res 1982; 10: 80258029.
  • 6
    Huisman T,Carver MFH,Baysal E. A syllabus of thalassemia mutations. The Sickle Cell Anemia Foundation; Augusta, GA, 1997.
  • 7
    Mitchell AA,Zwick ME,Chakravarti A,Cutler DJ. Discrepancies in dbSNP confirmation rates and allele frequency distributions from varying genotyping error rates and patterns. Bioinformatics 2004; 20: 10221032.
  • 8
    Schnoes AM,Brown SD,Dodevski I,Babbitt PC. Annotation error in public databases: misannotation of molecular function in enzyme superfamilies. PLoS Comput Biol 2009; 5: e1000605.
  • 9
    Wildeman M,van Ophuizen E,den Dunnen JT,Taschner PE. Improving sequence variant descriptions in mutation databases and literature using the Mutalyzer sequence variation nomenclature checker. Hum Mutat 2008; 29: 613.
  • 10
    Mann K. How Can NIST Better Serve the Needs of the Biomedical Research Community in the 21st Century? Available at: 2010. Accessed 7/10/2010.

Colin C. Pritchard*, Jonathan F. Tait*, Arlene M. Buller-Burckle†, Mario Mikula†, * Department of Laboratory Medicine, University of Washington, Seattle Washington, † Division of Molecular Genetics, Quest Diagnostics Nichols Institute, San Juan Capistrano, California.

Supporting Information

  1. Top of page
  2. Supporting Information

Additional Supporting Information may be found in the online version of this article.

AJH_21875_sm_suppinfo.doc345KSupporting Information

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.