Massively parallel sequencing technology provides an unprecedented capacity for variant identification. Thousands of single nucleotide variants and short indels can easily be found by exome sequencing, which only covers about 1% of the human genome. However, our ability to predict variant functionality lags behind, particularly beyond the protein code. For example, splice-affecting mutations are common in human disease (http://www.dbass.org.uk/), but they could be overlooked or miss-classified as missense, nonsense or even silent changes. Failure to recognise the variant functionality results in a sharp increase in “Variants of Unknown Significance” (VUSs). Therefore, an urgent challenge is to develop more robust and high throughput tools for variant interpretation.
Rogan's group (Mucaki et al., Hum Mut 34:557–565, 2013) has developed an in silico approach for the assessment of splice-affecting variants around the exon-intron boundaries and the prediction of the cryptic or exon skipping isoforms based on information theory. Changes in the total information content (Ri, total), calculated based on the sequences of and distance (gap surprisal) between the splice acceptor and donor sites, have been used to discriminate the expressed splicing isoforms. A splicing mutation in BRCA1 intron 20 was used to demonstrate the consistency between the predicted and experimentally detected mRNA isoforms. Further validation was performed for the end-point, quantitative estimations and with a second gap surprisal of the splicing regulatory elements in 61 reported splice-affecting variants. The concordance approached 85% between the in silico predictions and the experimental data. Although the proposed approach still has limitations such as its inability to cope with multiple variants in the same region, it is a potential cost-effective in silico tool for streamlining assessment of VUSs and prioritising the splice-affecting variants for further expressional studies.