Over half of the human genome consists of repetitive DNA sequences, including hundreds of thousands of tandem repeats. However, until very recently most genome-wide association studies of quantitative trait loci (QTLs) and diseases have assessed only single nucleotide polymorphisms (SNPs). This failure to assess key classes of repetitive DNA variants, including tandem repeat polymorphisms (TRPs), could be a major factor in the “missing heritability” facing many genetic studies of polygenic disorders (Trends Genet 26:59–65, 2010). Borel et al. (Hum Mutat 33:1302–1309, 2012) now report that a polymorphism in a specific tandem repeat (motif: CGGGGCGGGGCG) found in the promoter of cystatin B (CSTB) is highly correlated with expression levels (and is thus is a stronger cis-eQTL than any SNPs analysed) in cultured lymphoblasts from the umbilical cords of healthy subjects. This discovery follows a previous finding that a rare expansion of this same tandem repeat in the CSTB promoter causes progressive myoclonic epilepsy 1 (EPM1 or Unverricht-Lundborg disease). This tandem repeat is not only of interest for epilepsy genetics, but also provides evidence for TRP-regulated gene expression in human cells.
Tandem repeats can be highly mutable and have the potential to play an important role in genome plasticity, function, and evolution, as well as a range of major diseases. TRPs have uniquely extended digital (multiallelic) distributions, making them functionally distinct from SNPs and associated binary polymorphisms. The findings of Borel et al. and others provide a compelling rationale to move beyond SNP-based approaches and embrace the full range of polymorphic diversity present in the genomes of humans and other species. This will require not only whole genome sequencing using next-generation technologies but also sophisticated bioinformatic and computational approaches to accurately measure and catalogue all classes of repetitive DNA.