SEARCH

SEARCH BY CITATION

Keywords:

  • forensic science;
  • genetic identity;
  • DNA typing;
  • forensic genetics;
  • Combined DNA Index System;
  • short tandem repeats

Abstract

  1. Top of page
  2. Abstract
  3. Methods
  4. Results and Discussion
  5. References

As U.S. courts grapple with constitutional challenges to DNA identification applications, judges are resting legal decisions on the fingerprint analogy, questioning whether the information from a DNA profile could, in light of scientific advances, reveal biomedically relevant information. While CODIS loci were selected largely because they lack phenotypic associations, how this criterion was assessed is unclear. To clarify their phenotypic relevance, we describe the standard and recommended CODIS markers within the context of what is known currently about the genome. We characterize the genomic regions and phenotypic associations of the 24 standard and suggested CODIS markers. None of the markers are within exons, although 12 are intragenic. No CODIS genotypes are associated with known phenotypes. This study provides clarification of the genomic significance of the key identification markers and supports—independent of the forensic scientific community—that the CODIS profiles provide identification but not sensitive or biomedically relevant information.

The culmination of the 1996–1997 STR Project was the selection of the 13 core CODIS markers, all highly polymorphic tetra-nucleotide short tandem repeats (STRs). In 2010, the U.S. Federal Bureau of Investigation revisited the panel composition, creating the CODIS Core Loci Working Group to consider the expansion of the core CODIS marker panel to minimize the likelihood of adventitious matches, improve international compatibility for data sharing, and improve the discriminatory power for missing persons cases and familial searching [1]. This culminated in the proposal of an additional 11 STRs to be used alternatively in various identification contexts [1]. Several criteria were considered in selecting markers for an expanded panel, the first of which being that they have “[n]o known association with medical conditions or defects” (1, p. e52, 2, p. 1). The primary rationale behind the emphasis on the development of panels that contain no association with biomedically relevant phenotypes is clear, as the statutory authority for CODIS itself (the DNA Identification Act of 1994 [3]; DNA Analysis Backlog Elimination Act of 2000 [4]; Justice for All Act of 2004 [5]; and the DNA Fingerprinting Act of 2005 [6]) is restricted to identification purposes. The Department of Justice has reiterated that CODIS profiles are to be “sanitized ‘genetic fingerprints’ that can be used to identify an individual uniquely, but do not disclose an individual's traits, disorders, or dispositions” [7]. Thus, the rationale behind the criterion requires little explanation by the Working Group. On the other hand, the criterion used by the Working Group for the selection and ranking of the markers is unclear, and the literature offers little information relevant to whether (and the extent to which) any of these markers are causally related to phenotypes [1, 2]. Moreover, a quick review of the literature of linkage analyses and genome wide association studies (GWAS) may yield deceptive and exaggerated reports of linkage with some of these markers, because the number of reports may be simply a relic of the convenient markers' inclusion in commonly used linkage screening panel sets—such as the Marshfield linkage maps [8, 9]—and, thus, the results may not be indicative of any actual causal relationship or biological function [10].

Motivated by recent court opinions [11] in which judges (frequently referencing popular science articles [12]) call the phenotypic irrelevance of the CODIS profile into question, we seek to clarify the role of phenotype in the selection criteria of markers. A myriad of important criticisms of forensic DNA analysis including the legal considerations and implications of molecular photofitting and phenotyping [13], the history and substance of criticisms aimed at forensic identification using blood grouping, HLA testing, and more recent methods, the suitability of the fingerprint analogy (which has considerable legal importance), and the appropriateness of selected CODIS markers [2] are each worthy of discussion. The focus of this technical note, however, is to address only the feasibility of creating a DNA database restricted to identity information. As the forensic community grapples with the technical and statistical benefits of the original 13 CODIS loci and the additional 11 loci for prioritization and selection, here, we examine the role of the markers and their regions as elements of the human genome. The selection of markers for identification is an ongoing process varying by population and regulatory control; as such, some markers (e.g., D6S1043, D14S1434) relevant in select sub-populations are not reviewed herein.

Methods

  1. Top of page
  2. Abstract
  3. Methods
  4. Results and Discussion
  5. References

We used the UCSC Genome Browser (build GRCh37/hg19) [14] to analyze each STR region. We conducted BLAT searches [15] of primers from each STR to locate the precise region of the repeat. We collected data on (i) phenotype and disease associations (GAD view-pack; DECIPHER-full; Online Mendelian Inheritance in Man [OMIM] AV SNPs-full; OMIM genes-full; OMIM pheno loci-full; GWAS catalog-full; RGD human quantitative trait loci [QTL]-full); (ii) genes and gene prediction tracks (UCSC genes-pack; RefSeq-dense); (iii) mRNA and EST tracks (human mRNAs-pack; spliced ESTs-pack); (iv) variation and repeats (common SNPs(132)-pack; simple repeats-pack; microsatellites-full); and (v) regulation (ENCODE Regulation-show; ENC RNA Binding-show; ORegAnno-full; Vista Enhancers-full). We used Ensembl [16] to determine intronic regions and note any reported phenotypic associations for STR genotypes. Disorders associated or linked within 1 kb of the STR were noted; chromosomal anomalies were not noted. To examine the potential relevance of the markers as noncoding genomic elements, we examined sequences for predicted enhancers and noted RNA-binding protein sites as well as predicted DNase I hypersensitivity and transcription binding sites. SNPs documented in dbSNP (build 132) overlapping and linked within 1 kb of the STR were noted. SNPs from the 1000 Genome Project were searched in SNPedia. If the STR was within a gene locus, we noted the gene name and examined the positioning with regard to the surrounding exons. Extragenic STR regions were examined to document proximity to the nearest transcript. We subsequently searched relevant genes in Database of Genotypes and Phenotypes (dbGaP) [17], OMIM [18], and GeneTests [19] to confirm genetic associations and document the availability of a genetic test for any related gene. We also examined the Marshfield linkage maps to determine which markers are used in the human genetic screening panels [20]. All searches were conducted in October 2011.

Results and Discussion

  1. Top of page
  2. Abstract
  3. Methods
  4. Results and Discussion
  5. References

Individual genotypes of the 24 STRs were not found to be associated with any documented phenotypes (note exception: DYS391 is on the Y chromosome, which if present in a DNA profile may indicate male sex). None of the 24 STRs are located within protein-coding exons (see Table 1) (see also Ref. [10]). Two of the STRs (VWA and D12S391) are colocated on the same arm of chromosome 12 (12p13) within 6 Mb [21, 22]. Twelve are located within introns of genes, with six of those being genes with known phenotypic associations (see Table 2). Mutations in the six genes are well documented as causative of the corresponding syndromes, but no mutations have been found to be in linkage disequilibrium with any tetra-nucleotide repeat genotypes. Of the intronic STRs, two (FGA and VWA) were within 400 bp of a splice site. All STR loci were associated (within 1 kb) with at least one phenotype according to published GWAS or quantitative trait loci (QTL) studies. TH01 was associated with the most phenotypes (18 traits) ranging from alcoholism [23] and schizophrenia [24] to autosomal recessive spinocerebellar ataxia [25], while DYS391 is believed to be associated only with hairy ears [26]. Such genome wide studies often span large regions of the genome; our findings demonstrate that CODIS STR loci are located within such regions, and hence potentially linked to such traits. However, association with these traits does not imply necessarily that individual CODIS marker genotypes are predictive or causative of any particular trait. As expected, all regions were sprinkled with documented SNPs (see Table 1), with the region of TPOX having the most (33 SNPs) and the region of FGA having the fewest (four SNPs). Four SNPs (rs3829986 and rs41338945 near CSF1PO, rs34120165 near VWA, and rs28359647 near D1S1656) were among those commonly queried for the 1000 Genome Project; none of these are annotated in SNPedia. None of the STRs overlapped predicted enhancers. Ten of the STRs (CSF1PO, FGA, TH01, TPOX, VWA, D7S820, D18S51, D19S433, D1S1656, and Penta D) lay within predicted RNA-binding protein sites. Two STRs (D19S433 and D10S1248) lay within DNase I hypersensitivity sites and one (CSF1PO) lay within a transcription factor. The role of tetra-nucleotide repeats in RNA binding and DNase I hypersensitivity is unknown, although expanded tetra-nucleotide repeats may destabilize transcription factor binding sites [27]. At this time, no correlation has been made between STR repeat sizes in humans and the impact on transcription factor binding. The Marshfield human genetic linkage maps include 14 of the 24 markers, with nine still in use and five identified as “cryptic duplicate markers” and removed from subsequent panels.

Table 1. Genomic characterization of CODIS markers
 CODIS MarkerCytogenetic LocationIntragenic or Distance from Nearest GeneIncluded in Marshfield Human Genetic Linkage MapsNumber of (#) SNPs (dbSNP Build 132) Within 1 kb
  1. Markers are shown in their relative rank according to Hares [1].

  2. a

    VWA and D12S391 are colocated on 12p13 within 6 Mb.

1D18S5118q21.33Intron 1Included13
2FGA4q28Intron 3 4
3D21S1121q21.1>100 kb from nearest geneRemoved7
4D8S11798q24.13>50 kb from nearest geneIncluded14
5VWA*12p13.31Intron 40 27
6D13S31713q31.1>100 kb from nearest geneIncluded10
7D16S53916q24.1~10 kb from nearest geneRemoved29
8D7S8207q21.11Intron 1Included7
9TH0111p15.5Intron 1 8
10D3S13583p21.31Intron 20 11
11D5S8185q23.2>100 kb from nearest geneRemoved8
12CSF1PO5q33.1Intron 6 15
13D2S13382q35~20 kb from nearest geneIncluded11
14D19S43319q12Intron 1Included10
15D1S16561q42Intron 6Removed22
16D12S391*12p13.2~40 kb from nearest geneIncluded18
17D2S4412p14~30 kb from nearest geneRemoved13
18D10S124810q26.3~3 kb from nearest geneIncluded16
19Penta E15q26.2Within uncharacterized EST; ~50 kb from nearest gene 19
20DYS391Yq11.21~5 kb from nearest gene 0
21TPOX2p25.3Intron 10 33
22D22S104522q12.3Intron 4Included19
23SE336q14psedogene, ~30 kb from nearest gene 8
24Penta D21q22.3Intron 4 6
Table 2. Reported phenotypic relevance of genomic regions of CODIS markers
 CODIS MarkerGene NameDisorder(s) Caused by Gene MutationsNumber of (#) Phenotypes Associated Within 1 kbPredicted DNA Elements
  1. Markers are shown in their relative rank according to Hares [1].

  2. a

    VWA and D12S391 are colocated on 12p13 within 6 Mb.

1D18S51BCL2 (B-cell CLL/lymphoma 2)Leukemia/lymphoma, B-cell11ELAV1 binding site
2FGAFGA (fibrinogen alpha chain)Congenital afibrinogenemia; hereditary renal amyloidosis; dysfibrinogenemia (alpha type)17PABPC1 binding site
3D21S11None 1None
4D8S1179None 17None
5VWA*VWF (von Willebrand factor)Von Willebrand disease12ELAV1 binding site
6D13S317None 5None
7D16S539None 8None
8D7S820SEMA3A (sema domain, immunoglobulin domain, short basic domain, secreted (semaphorin) 3A) 8CELF1, ELAV1 and PABPC1 binding site
9TH01TH (tyrosine hydroxylase)Segawa syndrome, recessive18ELAVL1, PABPC1 and SLBP binding site
10D3S1358LARS2 (leucyl-tRNA synthetase 2, mitochondria) 15None
11D5S818None 5None
12CSF1POCSF1R (colony stimulating factor 1 receptor)Predisposition to myeloid malignancy15eGFP-GATA2 transcription factor; PABPC1 binding site
13D2S1338None 9None
14D19S433C19orf2 (uncharacterized gene) 7DNase I hypersensitivity site; SLBP binding site
15D1S1656CAPN9 (calpain 9) 10PABPC1 binding site
16D12S391*None 6None
17D2S441None 6None
18D10S1248None 6DNase I hypersensitivity site
19Penta EEST: BG210743 (uncharacterized EST) 8None
20DYS391None 1None
21TPOXTPO (thyroid peroxidase)Thyroid dyshormonogenesis 2A5PABPC1 and SLBP binding site
22D22S1045IL2RB (interleukin 2 receptor, beta) 11None
23SE33None 9None
24Penta DHSF2BP (heat shock factor 2-binding protein) 6PABPC1 and SLBP binding site

The current understanding of the standard and recommended CODIS panels of STR loci summarized here highlights that these markers continue to be of limited significance for assessing phenotypes. Indeed, we found no documentation of individual genotypes for the 24 STRs to be causative of any documented phenotypes either in the literature or in the interrogated databases. Several of the STRs overlay predicted sites for genomic regulation, but there is no evidence that any particular repeat genotypes are indicative of phenotype. The utility of the CODIS profile itself, even in light of the significance of various epigenetic effects and roles of noncoding RNAs, is limited to identification purposes at this time. The existence of the predicted DNA elements suggests that some STR loci may be involved in genomic regulation. However, even for CODIS marker genotypes statistically associated with biomedically relevant phenotypes, statistical association is not synonymous with positive or negative predictive value [24]. While we cannot say that the standard and recommended CODIS markers are wholly absent and forever immune from any implications for potentially sensitive or medically relevant information, we can affirm that individual genotypes are not at present revealing information beyond identification [1, 2, 5].

References

  1. Top of page
  2. Abstract
  3. Methods
  4. Results and Discussion
  5. References