Molecular characterization of PRKN structural variations identified through whole‐genome sequencing

Abstract Background Early‐onset Parkinson's disease (PD) is the most common inherited form of parkinsonism, with the PRKN gene being the most frequently identified mutated. Exon rearrangements, identified in about 43.2% of the reported PD patients and with higher frequency in specific ethnicities, are the most prevalent PRKN mutations reported to date in PD patients. Methods In this study, three consanguineous families with early‐onset PD were subjected to whole‐genome sequencing (WGS) analyses that were followed by Sanger sequencing and droplet digital PCR to validate and confirm the disease segregation of the identified genomic variations and to determine their parental origin. Results Five different PRKN structural variations (SVs) were identified. Because the genomic sequences surrounding the break points of the identified SVs might hold important information about their genesis, these were also characterized for the presence of homology and repeated sequences. Conclusion We concluded that all identified PRKN SVs might originate through retrotransposition events.


| INTRODUCTION
Mutations in the PRKN gene (OMIM #600,116) are the most common cause of autosomal recessive Parkinson's disease (PD). The PRKN gene is located on chromosome 6q26 and its larger transcript (transcript variant 1; NM_004562.2) contains 12 coding exons and encodes a protein of 465 amino acids (NP_004553.2). All types of mutations, including missense, nonsense, splice site, frameshift, and structural variations (SVs), have been reported in PD patients carrying PRKN mutations. SVs (exon rearrangements) are the most common type of mutations, being identified in about 43.2% of the reported patients (Kasten, et al., 2018). In a large multicenter study, where we identified PRKN mutations in 71.42% of the examined patients, SVs were the most prevalent mutations identified in the Iranian PD population (Taghavi et al., 2018). The most common PRKN SV identified to date in the PD population is the c.(171+1_172-1)_ (412+1_413-1)del mutation, which consists of a deletion encompassing the entire exon 3 of the PRKN gene (https:// www.mdsgene.org) (Kasten, et al., 2018). The Multiplex Ligation-Dependent Probe Amplification (MPLA; MRC Holland), which allows the detection of DNA copy number changes (CNVs) of up to 40 sequences in a single reaction, is the most frequently used technique to identify SVs in the PRKN gene, despite the fact that it does not determine the genomic localization of the deletion/insertion break points. On the other hand, whole-genome sequencing (WGS), considered the most comprehensive genetic screening as it captures both coding and noncoding genetic variation, enables us to identify gene fusions, CNVs, and other complex SVs (Royer-Bertrand & Rivolta, 2014). The continuous progress of read coverage uniformity and reduced allele bias in WGS (Meynert, Ansari, FitzPatrick, & Taylor, 2014) has led to improved detection of copy number changes and de novo variations (Gilissen, et al., 2014;Ritter, et al., 2015).

| CLINICAL REPORT
We here described the clinical characteristics of three different families carrying PRKN SVs. All patients' clinical details are summarized in Table 1. Briefly, all patients began the disease in childhood, with the youngest patient developing the disease at the age of 10 years. Most of the patients showed slow disease progression with the exception of patient FC-P2, who onset the disease at the age of 17 and showed severe disease progression (Table 1). Only one family, consisted of three affected siblings, showed additional symptoms (Table 1).

| METHODS
Three different families with early-onset PD were clinically examined and subjected to WGS analyses. The local ethics committee at each participating medical center approved this study, and informed consent, according to the Declaration of Helsinki, was obtained from all participants. DNA samples from all participants were isolated from whole blood, using standard procedures. WGS was performed as previously described (Sanchez, et al., 2016). Specifically, deletions were called by using GenomeSTRiP (v2.0) (Handsaker, Korn, Nemesh, & McCarroll, 2011) and were jointly called by using 17 HapMap individuals (CEPH Platinum Genomes pedigree). All deletions annotated as PASS in the GenomeSTRiP results were further filtered by using custom scripts to remove redundant calls and break points overlapping repeat regions, or with extensive mapping ambiguity. Identified deletions affecting coding areas were further analyzed through SplazerS, which identifies and split-aligns reads that cross-structural variant break points (Emde, et al., 2012). First, all reads mapping to the candidate region were extracted, and then by using Spla-zerS, they were mapped back to the region to identify and confirm the break point locations. Subsequently, Sanger sequencing, as described elsewhere (Krebs, et al., 2013) and by using primers flanking the cut-off points previously determined by the WGS analyses, was used to validate the identified deletions and to determine the genomic localization of the deletions' break points. Primer sequences were designed by using a public primer design website (https://ihg.gsf.de/ihg/ ExonPrimer.html; primer sequences available upon request) using the NM_004562.2 gene sequence as a reference. The validated CNVs were later quantified through the ddPCR QX100 system (Bio-rad, USA) by using TaqMan probes targeting PRKN exons 2-6 as well as a reference gene (TERT) (Hindson, et al., 2011). Taqman probes were acquired from Applied Biosystems (Life Technologies, USA), and a DNA sample from a healthy individual as well as a non-template control were, respectively, used as reference control DNA and negative control. All CNV scores were calculated using the Quantasoft software according to the manufacturer's instructions (Bio-Rad, USA). These analyses were done in all available family members in order to examine the disease segregation of the deletions and to determine their parental origin ( Figure 1).
The formation of SV is a complex phenomenon that is not well understood. Long homologies around break points suggest SV formation by nonallelic homologous recombination (NAHR); short homologies, with high mobile element content within SV regions, indicate that they originated through transposable element insertions (TEI); while little or no homology suggests SV formation by a nonhomologous end-joining (NHEJ) or by a templateswitching mechanism during replication. We therefore examined the proximal and distal sequences (~2 kb) to the break points to determine their homology and the presence of repeated sequences, as they are known to affect the genomic integrity through recombination involving insertion, deletion, and rearrangements (Abyzov, et al., 2015;Kaer & Speek, 2013;Lupski & Stankiewicz, 2005). First, pairwise sequence comparisons were carried out to determine the homology between the sequences flanking the deletions' break points. Both proximal and distal sequences to the deletions' break points were aligned through Clustal Omega software (https://www.ebi.ac.uk/ Tools/msa/clustalo/). Data from the NIH Roadmap Epigenomics project (https://www.roadmapepigenomics.org/) were used to examine the deletions' break points for the presence of chromatin marks and repeated elements, such as retrotransposons, including the long terminal repeat (LTR) and the non-LTR retrotransposons (i.e.; long interspersed elements (LINEs or L1) and short interspersed element (SINEs)) (Kaer & Speek, 2013). Data for the chromatin states using a multivariate Hidden Markov Model (HMM; ChromHMM analysis) were also investigated (Ernst, et al., 2011).
Additionally, we identified two isolated PD cases carrying compound heterozygous PRKN deletions and characterized all five PRKN deletions. In the new cases, the performed WGS analyses led to the identification of 683 and 592 coding (including missense, nonsense, and frameshift) and splice site nucleotide variations for patient B_II-1 and patient C_II-1, respectively. Because both patients were born to consanguineous marriages, a recessive pattern of inheritance was suspected ( Figure 1). However, no rare (with a frequency <0.5% for a recessive model) or novel homozygous or compound heterozygous coding variations were identified in the patients' genomes, meaning that all variations identified were present in heterozygosis and therefore were not compatible with a recessive pattern of inheritance. All known coding variations (known and unknown) identified in the known PD genes were as well examined, but no pathogenic mutation was identified. We then also examined all SVs identified through WGS in the patients' genomes. Patient B_II-1 was shown to carry 50 SVs while 56 different SVs were identified in the patient C_II-1. Interestingly, we found that both patients carried two different heterozygous SNVs at the PRKN locus. The PD patient from   Table 2). All deletions segregated with disease status as only the patients were carriers of two mutant alleles, as expected for an autosomal recessive inheritance (Figure 1a-c; Table 2). The nomenclature of each identified deletion was checked in the Mutalyzer program, with which we examined their effect on the protein. All deletions but one were predicted to cause premature stop codons (Table 2), thus resulting in truncated, nonfunctional proteins.
We found long terminal repeat (LTR) and non-LTR retrotransposons at both proximal and distal break point regions along with short DNA homologies around the deletions' break points (the majority of them close to the deletions' break points or <1 kb away [Tables 2 and 3]), indicating that all these deletions might originate through TEI (Lupski & Stankiewicz, 2005). All proximal and distal regions showed repressive marks, such as tri-methylation at H3K9 and H3K27 (H3K9me3, H3K27me3). Although the majority of genomic regions surrounding the deletions' break points also showed methylation and/or tri-methylation at H3K4 (H3K4me3, H3K4me1), which influences transcriptional activation, depletion of active marks was observed in some of the break points, as it has been observed in other TEI break points (Abyzov, et al., 2015) (Table 3).
We concluded that WGS is the preferred technique to well characterize the copy number changes observed in the PRKN gene as well as other parkinsonism genes, as it  showing the deletions' break points (highlighted with a blue line). MIs observed in Pedigree A and C are highlighted with a red rectangle. (c) CNVs plots of PRKN exons 2, 3, 4, 5, 6 obtained through ddPCR QX100 system, Bio-rad). Two exon copies (homozygous wt allele) are represented with a CNV score close to 2, one exon copy (heterozygous mutant allele) with a CNV score close to 1, and no copies with a CNV score of 0 | 1247 enables us to characterize the bases around their break points that are thought to hold important information about their genesis (Kidd, et al., 2010). Because exon rearrangements are the most common mutations identified in the PRKN gene and in recessive PD, since the PRKN gene is the most frequently mutated gene, the characterization of PRKN SVs is essential for understanding their formation mechanisms as well as for examining and interpreting their functional effects in model organisms. Taken together, precise mapping of deletion break points and localization of the repeated elements is important because they might reveal common disease signatures that will, in turn, lead to novel genomic editing strategies for gene therapy (Esposito, et al., 2017;Kaer & Speek, 2013;Lupski & Stankiewicz, 2005).

ACKNOWLEDGMENT
Authors thank all patients and relatives for their generous contribution to this study.