Identification and validation of loss of function variants in clinical contexts

The choice of an appropriate variant calling pipeline for exome sequencing data is becoming increasingly more important in translational medicine projects and clinical contexts. Within GOSgene, which facilitates genetic analysis as part of a joint effort of the University College London and the Great Ormond Street Hospital, we aimed to optimize a variant calling pipeline suitable for our clinical context. We implemented the GATK/Queue framework and evaluated the performance of its two callers: the classical UnifiedGenotyper and the new variant discovery tool HaplotypeCaller. We performed an experimental validation of the loss-of-function (LoF) variants called by the two methods using Sequenom technology. UnifiedGenotyper showed a total validation rate of 97.6% for LoF single-nucleotide polymorphisms (SNPs) and 92.0% for insertions or deletions (INDELs), whereas HaplotypeCaller was 91.7% for SNPs and 55.9% for INDELs. We confirm that GATK/Queue is a reliable pipeline in translational medicine and clinical context. We conclude that in our working environment, UnifiedGenotyper is the caller of choice, being an accurate method, with a high validation rate of error-prone calls like LoF variants. We finally highlight the importance of experimental validation, especially for INDELs, as part of a standard pipeline in clinical environments.


Introduction
In order to better characterise the variants that didn't validate in the genotyping, a number of annotations are compared in the following pages. The readability of the plots has been improved by presenting the data according to the validation result and the caller of origin: for most of the annotations, the values will depend on the calling process and the variant quality score recalibration, and therefore also the overlapping variants (i.e. those called by both methods) have been presented separately. In GC content instead, which is an independent genome characteristic, more emphasis has been put on the distinction between the overlap and the calls unique to one of the two methods. The comparison here presented is meant to be qualitative, and should help in identify potential reasons for calling errors.

Variant Quality recalibration LOD score
In terms of LOD score, no striking differences can be observed if we stratify the variants according to caller and overlap. In some density plots, the very small number of variants does not allow to see the both distributions. A more simplified view allows to notice that in HaplotypeCaller, and for SNPs only, variants that did not validate tend to have a smaller LOD score.

Culprit values
Culprit values are the parameters which the variants most differ for: identifying the culprit values for not validated variants might help clarifying the major issues behind the calling errors.
In this case, the three most represented values among false calls are: FS (phredscaled strand bias measure), MQ (mapping quality) and QD (quality over depth). We will see each of these parameters in the next pages.

Strand bias values
In general, but this is more evident for SNPs called by UnifiedGenotyper and INDELs called by HaplotypeCaller, not validated variants have a higher strand bias: this might highlight a commonly known issue in sequencing capture regions where sequencing data are mostly present only in forward or reverse.

Quality over Depth
While in general one could expect lower depths to influence false calling, in this case it is interesting to notice that HaplotypeCaller is more affected by the Quality over Depth combination. This is true both for SNPs and for INDELs, and it might be a characteristic of the algorithm.

Haplotype Score values
There is a wide range of values for the HaplotypeScore and it is not easy to make the plots readable. In the following figure the counts of the variants called by bins of 0.5 of Score value have been plotted, with a limit on the X axis to 50. The only major difference here is that HaplotypeCaller results in a much flatter distribution of HaplotypeScores than UnifiedGenotyper.

Depth distribution
While it is generally clear that not validated variants are called in lower depth region (and this very evident for SNPs called by UnifiedGenotyper), this measure does not highlight a clear difference of performance between the two callers as it happens for the Quality over Depth.

GC content distribution
In this plot we cannot identify major differences for the GC content value in not validated variants: they are called in a wide range of values and the not validated variants do not seem to differ for particularly high or low values.