Partially funded by grant number P50HG004487-05 from the National Human Genome Research Institute (NHGRI).
Characterization of the Standard and Recommended CODIS Markers†
Article first published online: 24 AUG 2012
© 2012 American Academy of Forensic Sciences
Journal of Forensic Sciences
Special Issue: The American Academy of Forensic Sciences and Wiley-Blackwell have published this supplement without fi nancial support
Volume 58, Issue Supplement s1, pages S169–S172, January 2013
How to Cite
Katsanis, S. H. and Wagner, J. K. (2013), Characterization of the Standard and Recommended CODIS Markers. Journal of Forensic Sciences, 58: S169–S172. doi: 10.1111/j.1556-4029.2012.02253.x
- Issue published online: 10 JAN 2013
- Article first published online: 24 AUG 2012
- Manuscript Accepted: 30 JUN 2012
- Manuscript Revised: 18 JUN 2012
- Manuscript Received: 14 MAR 2012
- forensic science;
- genetic identity;
- DNA typing;
- forensic genetics;
- Combined DNA Index System;
- short tandem repeats
As U.S. courts grapple with constitutional challenges to DNA identification applications, judges are resting legal decisions on the fingerprint analogy, questioning whether the information from a DNA profile could, in light of scientific advances, reveal biomedically relevant information. While CODIS loci were selected largely because they lack phenotypic associations, how this criterion was assessed is unclear. To clarify their phenotypic relevance, we describe the standard and recommended CODIS markers within the context of what is known currently about the genome. We characterize the genomic regions and phenotypic associations of the 24 standard and suggested CODIS markers. None of the markers are within exons, although 12 are intragenic. No CODIS genotypes are associated with known phenotypes. This study provides clarification of the genomic significance of the key identification markers and supports—independent of the forensic scientific community—that the CODIS profiles provide identification but not sensitive or biomedically relevant information.
The culmination of the 1996–1997 STR Project was the selection of the 13 core CODIS markers, all highly polymorphic tetra-nucleotide short tandem repeats (STRs). In 2010, the U.S. Federal Bureau of Investigation revisited the panel composition, creating the CODIS Core Loci Working Group to consider the expansion of the core CODIS marker panel to minimize the likelihood of adventitious matches, improve international compatibility for data sharing, and improve the discriminatory power for missing persons cases and familial searching . This culminated in the proposal of an additional 11 STRs to be used alternatively in various identification contexts . Several criteria were considered in selecting markers for an expanded panel, the first of which being that they have “[n]o known association with medical conditions or defects” (1, p. e52, 2, p. 1). The primary rationale behind the emphasis on the development of panels that contain no association with biomedically relevant phenotypes is clear, as the statutory authority for CODIS itself (the DNA Identification Act of 1994 ; DNA Analysis Backlog Elimination Act of 2000 ; Justice for All Act of 2004 ; and the DNA Fingerprinting Act of 2005 ) is restricted to identification purposes. The Department of Justice has reiterated that CODIS profiles are to be “sanitized ‘genetic fingerprints’ that can be used to identify an individual uniquely, but do not disclose an individual's traits, disorders, or dispositions” . Thus, the rationale behind the criterion requires little explanation by the Working Group. On the other hand, the criterion used by the Working Group for the selection and ranking of the markers is unclear, and the literature offers little information relevant to whether (and the extent to which) any of these markers are causally related to phenotypes [1, 2]. Moreover, a quick review of the literature of linkage analyses and genome wide association studies (GWAS) may yield deceptive and exaggerated reports of linkage with some of these markers, because the number of reports may be simply a relic of the convenient markers' inclusion in commonly used linkage screening panel sets—such as the Marshfield linkage maps [8, 9]—and, thus, the results may not be indicative of any actual causal relationship or biological function .
Motivated by recent court opinions  in which judges (frequently referencing popular science articles ) call the phenotypic irrelevance of the CODIS profile into question, we seek to clarify the role of phenotype in the selection criteria of markers. A myriad of important criticisms of forensic DNA analysis including the legal considerations and implications of molecular photofitting and phenotyping , the history and substance of criticisms aimed at forensic identification using blood grouping, HLA testing, and more recent methods, the suitability of the fingerprint analogy (which has considerable legal importance), and the appropriateness of selected CODIS markers  are each worthy of discussion. The focus of this technical note, however, is to address only the feasibility of creating a DNA database restricted to identity information. As the forensic community grapples with the technical and statistical benefits of the original 13 CODIS loci and the additional 11 loci for prioritization and selection, here, we examine the role of the markers and their regions as elements of the human genome. The selection of markers for identification is an ongoing process varying by population and regulatory control; as such, some markers (e.g., D6S1043, D14S1434) relevant in select sub-populations are not reviewed herein.
We used the UCSC Genome Browser (build GRCh37/hg19)  to analyze each STR region. We conducted BLAT searches  of primers from each STR to locate the precise region of the repeat. We collected data on (i) phenotype and disease associations (GAD view-pack; DECIPHER-full; Online Mendelian Inheritance in Man [OMIM] AV SNPs-full; OMIM genes-full; OMIM pheno loci-full; GWAS catalog-full; RGD human quantitative trait loci [QTL]-full); (ii) genes and gene prediction tracks (UCSC genes-pack; RefSeq-dense); (iii) mRNA and EST tracks (human mRNAs-pack; spliced ESTs-pack); (iv) variation and repeats (common SNPs(132)-pack; simple repeats-pack; microsatellites-full); and (v) regulation (ENCODE Regulation-show; ENC RNA Binding-show; ORegAnno-full; Vista Enhancers-full). We used Ensembl  to determine intronic regions and note any reported phenotypic associations for STR genotypes. Disorders associated or linked within 1 kb of the STR were noted; chromosomal anomalies were not noted. To examine the potential relevance of the markers as noncoding genomic elements, we examined sequences for predicted enhancers and noted RNA-binding protein sites as well as predicted DNase I hypersensitivity and transcription binding sites. SNPs documented in dbSNP (build 132) overlapping and linked within 1 kb of the STR were noted. SNPs from the 1000 Genome Project were searched in SNPedia. If the STR was within a gene locus, we noted the gene name and examined the positioning with regard to the surrounding exons. Extragenic STR regions were examined to document proximity to the nearest transcript. We subsequently searched relevant genes in Database of Genotypes and Phenotypes (dbGaP) , OMIM , and GeneTests  to confirm genetic associations and document the availability of a genetic test for any related gene. We also examined the Marshfield linkage maps to determine which markers are used in the human genetic screening panels . All searches were conducted in October 2011.
Results and Discussion
Individual genotypes of the 24 STRs were not found to be associated with any documented phenotypes (note exception: DYS391 is on the Y chromosome, which if present in a DNA profile may indicate male sex). None of the 24 STRs are located within protein-coding exons (see Table 1) (see also Ref. ). Two of the STRs (VWA and D12S391) are colocated on the same arm of chromosome 12 (12p13) within 6 Mb [21, 22]. Twelve are located within introns of genes, with six of those being genes with known phenotypic associations (see Table 2). Mutations in the six genes are well documented as causative of the corresponding syndromes, but no mutations have been found to be in linkage disequilibrium with any tetra-nucleotide repeat genotypes. Of the intronic STRs, two (FGA and VWA) were within 400 bp of a splice site. All STR loci were associated (within 1 kb) with at least one phenotype according to published GWAS or quantitative trait loci (QTL) studies. TH01 was associated with the most phenotypes (18 traits) ranging from alcoholism  and schizophrenia  to autosomal recessive spinocerebellar ataxia , while DYS391 is believed to be associated only with hairy ears . Such genome wide studies often span large regions of the genome; our findings demonstrate that CODIS STR loci are located within such regions, and hence potentially linked to such traits. However, association with these traits does not imply necessarily that individual CODIS marker genotypes are predictive or causative of any particular trait. As expected, all regions were sprinkled with documented SNPs (see Table 1), with the region of TPOX having the most (33 SNPs) and the region of FGA having the fewest (four SNPs). Four SNPs (rs3829986 and rs41338945 near CSF1PO, rs34120165 near VWA, and rs28359647 near D1S1656) were among those commonly queried for the 1000 Genome Project; none of these are annotated in SNPedia. None of the STRs overlapped predicted enhancers. Ten of the STRs (CSF1PO, FGA, TH01, TPOX, VWA, D7S820, D18S51, D19S433, D1S1656, and Penta D) lay within predicted RNA-binding protein sites. Two STRs (D19S433 and D10S1248) lay within DNase I hypersensitivity sites and one (CSF1PO) lay within a transcription factor. The role of tetra-nucleotide repeats in RNA binding and DNase I hypersensitivity is unknown, although expanded tetra-nucleotide repeats may destabilize transcription factor binding sites . At this time, no correlation has been made between STR repeat sizes in humans and the impact on transcription factor binding. The Marshfield human genetic linkage maps include 14 of the 24 markers, with nine still in use and five identified as “cryptic duplicate markers” and removed from subsequent panels.
|CODIS Marker||Cytogenetic Location||Intragenic or Distance from Nearest Gene||Included in Marshfield Human Genetic Linkage Maps||Number of (#) SNPs (dbSNP Build 132) Within 1 kb|
|3||D21S11||21q21.1||>100 kb from nearest gene||Removed||7|
|4||D8S1179||8q24.13||>50 kb from nearest gene||Included||14|
|6||D13S317||13q31.1||>100 kb from nearest gene||Included||10|
|7||D16S539||16q24.1||~10 kb from nearest gene||Removed||29|
|11||D5S818||5q23.2||>100 kb from nearest gene||Removed||8|
|13||D2S1338||2q35||~20 kb from nearest gene||Included||11|
|16||D12S391*||12p13.2||~40 kb from nearest gene||Included||18|
|17||D2S441||2p14||~30 kb from nearest gene||Removed||13|
|18||D10S1248||10q26.3||~3 kb from nearest gene||Included||16|
|19||Penta E||15q26.2||Within uncharacterized EST; ~50 kb from nearest gene||19|
|20||DYS391||Yq11.21||~5 kb from nearest gene||0|
|23||SE33||6q14||psedogene, ~30 kb from nearest gene||8|
|24||Penta D||21q22.3||Intron 4||6|
|CODIS Marker||Gene Name||Disorder(s) Caused by Gene Mutations||Number of (#) Phenotypes Associated Within 1 kb||Predicted DNA Elements|
|1||D18S51||BCL2 (B-cell CLL/lymphoma 2)||Leukemia/lymphoma, B-cell||11||ELAV1 binding site|
|2||FGA||FGA (fibrinogen alpha chain)||Congenital afibrinogenemia; hereditary renal amyloidosis; dysfibrinogenemia (alpha type)||17||PABPC1 binding site|
|5||VWA*||VWF (von Willebrand factor)||Von Willebrand disease||12||ELAV1 binding site|
|8||D7S820||SEMA3A (sema domain, immunoglobulin domain, short basic domain, secreted (semaphorin) 3A)||8||CELF1, ELAV1 and PABPC1 binding site|
|9||TH01||TH (tyrosine hydroxylase)||Segawa syndrome, recessive||18||ELAVL1, PABPC1 and SLBP binding site|
|10||D3S1358||LARS2 (leucyl-tRNA synthetase 2, mitochondria)||15||None|
|12||CSF1PO||CSF1R (colony stimulating factor 1 receptor)||Predisposition to myeloid malignancy||15||eGFP-GATA2 transcription factor; PABPC1 binding site|
|14||D19S433||C19orf2 (uncharacterized gene)||7||DNase I hypersensitivity site; SLBP binding site|
|15||D1S1656||CAPN9 (calpain 9)||10||PABPC1 binding site|
|18||D10S1248||None||6||DNase I hypersensitivity site|
|19||Penta E||EST: BG210743 (uncharacterized EST)||8||None|
|21||TPOX||TPO (thyroid peroxidase)||Thyroid dyshormonogenesis 2A||5||PABPC1 and SLBP binding site|
|22||D22S1045||IL2RB (interleukin 2 receptor, beta)||11||None|
|24||Penta D||HSF2BP (heat shock factor 2-binding protein)||6||PABPC1 and SLBP binding site|
The current understanding of the standard and recommended CODIS panels of STR loci summarized here highlights that these markers continue to be of limited significance for assessing phenotypes. Indeed, we found no documentation of individual genotypes for the 24 STRs to be causative of any documented phenotypes either in the literature or in the interrogated databases. Several of the STRs overlay predicted sites for genomic regulation, but there is no evidence that any particular repeat genotypes are indicative of phenotype. The utility of the CODIS profile itself, even in light of the significance of various epigenetic effects and roles of noncoding RNAs, is limited to identification purposes at this time. The existence of the predicted DNA elements suggests that some STR loci may be involved in genomic regulation. However, even for CODIS marker genotypes statistically associated with biomedically relevant phenotypes, statistical association is not synonymous with positive or negative predictive value . While we cannot say that the standard and recommended CODIS markers are wholly absent and forever immune from any implications for potentially sensitive or medically relevant information, we can affirm that individual genotypes are not at present revealing information beyond identification [1, 2, 5].
- 2Developing criteria and data to determine best options for expanding the core CODIS loci. Investig Genet 2012;3:1., , .
- 3DNA Identification Act of 1994, Pub Law 103–322, 108 Stat. 1796, 2065–71.
- 4DNA Analysis Backlog Elimination Act of 2000, Pub Law 106–546, 114 Stat. 2726–37.
- 5Justice for All Act of 2004, Pub Law 108–405, 118 Stat. 2260.
- 6DNA Fingerprinting Act of 2005, Pub Law 109–162, 119 Stat. 2960, 3085
- 773 Fed. Reg. at 74937.
- 11People v. Buza, San Francisco Co. Super. Ct. SCN 207818 (First App. Dist, Ct. of App. Cal., Aug. 4, 2011) at 5 (quoting Haskell v. Brown, 677 F.Supp.2d 1187, 1190 (N.D. Cal. 2009)).
- 13Forensic DNA phenotyping: regulatory issues. Columbia Sci Technol Law Rev 2008;9:159–202., .
- 15Overview of STR fact sheets. Short Tandem Repeat DNA Internet Database. www.cstl.nist.gov/strbase/str_fact.htm (accessed May 11, 2012)., .
- 18Online Mendelian Inheritance in Man (OMIM®). McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, MD, http://omim.org/ (accessed May 11, 2012).
- 19GeneTests: medical genetics information resource (database online). Copyright, University of Washington, Seattle, 1993, http://www.genetests.org (accessed May 11, 2012).
- 20Comprehensive human genetic maps: individual and sex-specific variation in recombination. Am J Hum Genet 1998;63:861–9, http://research.marshfieldclinic.org/genetics/GeneticResearch/compMaps.asp (accessed May 11, 2012)., , , , .