Evaluation of the detection of GBA missense mutations and other variants using the Oxford Nanopore MinION

Abstract Background Mutations in GBA cause Gaucher disease when biallelic and are strong risk factors for Parkinson's disease when heterozygous. GBA analysis is complicated by the nearby pseudogene. We aimed to design and validate a method for sequencing GBA using long reads. Methods We sequenced GBA on the Oxford Nanopore MinION as an 8.9 kb amplicon from 102 individuals, including patients with Parkinson's and Gaucher diseases. We used NanoOK for quality metrics, NGMLR to align data (after comparing with GraphMap), Nanopolish and Sniffles to call variants, and WhatsHap for phasing. Results We detected all known missense mutations in these samples, including the common p.N409S (N370S) and p.L483P (L444P) in multiple samples, and nine rarer ones, as well as a splicing and a truncating mutation, and intronic SNPs. We demonstrated the ability to phase mutations, confirm compound heterozygosity, and assign haplotypes. We also detected two known risk variants in some Parkinson's patients. Rare false positives were easily identified and filtered, with the Nanopolish quality score adjusted for the number of reads a very robust discriminator. In two individuals carrying a recombinant allele, we were able to detect and fully define it in one carrier, where it included a 55‐base pair deletion, but not in another one, suggesting a limitation of the PCR enrichment method. Missense mutations were detected at the correct zygosity, except for the case where the RecNciI one was missed. Conclusion The Oxford Nanopore MinION can detect missense mutations and an exonic deletion in this difficult gene, with the added advantages of phasing and intronic analysis. It can be used as an efficient research tool, but additional work is required to exclude all recombinants.

A: NGMLR alignment of all reads over gene and pseudogene, with two split reads of the correct length individually shown below. These comprise gene and pseudogene components, with two apparent transitions between gene and pseudogene in second one. B: Graphmap alignment for same sample. Note fewer reads on pseudogene. All reads aligning to pseudogene were <5 kb, and none had split alignments. One example shown IGV traces (without any realignment or filtering) for all 10 samples in this flow cells are shown. The numbers correspond to Supplementary   Table S8. Sample in which a given SNV was called after Graphmap alignment are denoted by ^, and after NGMLR by *. The number of uncorrected calls of each base at that position, illustrating the high number of errors at these positions in all or most samples. Note that (3) was always called in Graphmap aligned samples, and never in NGMLR-aligned ones.
A: After downsampling to ~50, 100, and 200 reads, the quality score was plotted against the total number of reads over that base for each. The highest number of reads is the original file (before downsampling). The mutation(s) carried in each sample are shown, with false positive calls (falsepos). Note that Nanopolish detected true variants regardless of reads.
One false positive was found in all downsampled files, while others only at higher or lower coverage.
B: Two mutations in S15 (p.R502C and p.R535C) visualised on IGV showing uncorrected reads carrying each base, with and without downsampling. C: Nanopolish adjusted quality score (absolute score divided by reads over that position) for true positive calls with and with downsampling as above. A: Quality score plotted against number of total over that position, including downsampling by factors of 2, 4, and 10. Note that the curve for a mutation present in different samples is almost identical across samples (three p.N409S, two p.L483P).
B: Nanopolish adjusted quality score (absolute score divided by reads over that position) for true positive calls with and with downsampling as above.

Supplementary Figure S10. Visualisation of GBA in the NA12878 MinION WGS publicly available data.
Top: all reads. Bottom: file filtered by samtools for mapping quality 1, to retain only reads with unique alignments. Note that these were aligned with BWA-MEM using the "-x ont2d" option. reported (Spataro et al., 2017;Tayebi et al., 2003).

Supplementary
A: Normal configuration. GBA shown in blue, pseudogene in red. Intergenic regions coloured for clarity. The primer locations we used are shown as arrows. See also figure S1.
B: Likely structure of the recombination we detected, with pseudogene sequence inserted into gene, but primer binding sequences retained.
C: Possible structure of the recombinant we did not detect, as the primer sequence was deleted during the gene fusion event.