Prediction of Mutant mRNA Splice Isoforms by Information Theory-Based Exon Definition


  • Additional Supporting Information may be found in the online version of this article.

  • Contract grant sponsors: Natural Sciences and Engineering Research Council of Canada (371758-2009); Canadian Breast Cancer Foundation; Canada Foundation For Innovation; Canada Research Chairs; Compute Canada; Western University; and Cytognomix Inc.

    Communicated by Michael Dean

Correspondence to: Peter K. Rogan, Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, ON N6A 2C1, Canada. E-mail:


Mutations that affect mRNA splicing often produce multiple mRNA isoforms, resulting in complex molecular phenotypes. Definition of an exon and its inclusion in mature mRNA relies on joint recognition of both acceptor and donor splice sites. This study predicts cryptic and exon-skipping isoforms in mRNA produced by splicing mutations from the combined information contents (Ri, which measures binding-site strength, in bits) and distribution of the splice sites defining these exons. The total information content of an exon (Ri,total) is the sum of the Ri values of its acceptor and donor splice sites, adjusted for the self-information of the distance separating these sites, that is, the gap surprisal. Differences between total information contents of an exon (ΔRi,total) are predictive of the relative abundance of these exons in distinct processed mRNAs. Constraints on splice site and exon selection are used to eliminate nonconforming and poorly expressed isoforms. Molecular phenotypes are computed by the Automated Splice Site and Exon Definition Analysis ( server. Predictions of splicing mutations were highly concordant (85.2%; n = 61) with published expression data. In silico exon definition analysis will contribute to streamlining assessment of abnormal and normal splice isoforms resulting from mutations.