SNPs in SNCA, MCCC1, DLG2, GBF1 and MBNL2 are associated with Parkinson's disease in southern Chinese population

Abstract Numerous single nucleotide polymorphisms (SNPs), which have been identified as susceptibility factors for Parkinson's disease (PD) as per genome‐wide association studies, have not been fully characterized for PD patients in China. This study aimed to replicate the relationship between 12 novel SNPs of 12 genes and PD risk in southern Chinese population. Twelve SNPs of 12 genes were detected in 231 PD patients and 249 controls, using the SNaPshot technique. Meta‐analysis was used to assess heterogeneity of effect sizes between this study and published data. The impact of SNPs on gene expression was investigated by analysing the SNP‐gene association in the expression quantitative trait loci (eQTL) data sets. rs8180209 of SNCA (allele model: P = .047, OR = 0.77; additive model: P = .047, OR = 0.77), rs2270968 of MCCC1 (dominant model: P = .024, OR = 1.52), rs7479949 of DLG2 (recessive model; P = .019, OR = 1.52), rs10748818 of GBF1 (additive model: P < .001, OR = 0.37), and rs4771268 of MBNL2 (recessive model: P = .003, OR = 0.48) were replicated to be significantly associated with the increased risk of PD. Noteworthy, a meta‐analysis of previous studies suggested rs8180209, rs2270968, rs7479949 and rs4771268 were in line with those of our cohort. Our study replicated five novel functional SNPs in SNCA, MCCC1, DLG2, GBF1 and MBNL2 could be associated with increased risk of PD in southern Chinese population.


| INTRODUC TI ON
Parkinson's disease (PD) is the second most common neurodegenerative disease, which is diagnosed based on the motor signs of bradykinesia, rigidity and tremor. 1 Although the effective symptomatic therapies for both motor and nonmotor manifestations of PD exist, there are no preventive or neuroprotective treatments available, which mean that the progressive decline of PD is inevitable. The pathogenesis of PD has been linked to a loss of dopaminergic neurons in substantia nigra as well as pathologic α-synuclein aggregation. 2 However, the aetiology of sporadic PD is still unclear. Theories suggest that it might be caused by the confounding influence of genetic and environmental risk factors, such as toxins and pesticide exposure. 3 An interesting study of the heritability of PD risk, involving over 500 families, revealed that in up to 60% of idiopathic PD patients the phenotype could be explained by genetic factors. 4 The identification of patients with PD risk alleles may be helpful for early diagnosis, further paving the way for the personalized medicine. 5 The first genome-wide association study (GWAS) confirmed the causal genes SNCA, PARK16, LRRK2 and BST1 as risk genes for PD. 6 Subsequent GWASs, in increasingly larger patient-control cohorts, and meta-analyses not only confirmed candidate gene-based and former GWAS associations but also revealed additional risk genes like RIT2, GCH1 and STK39. 7,8 Since many of the associated GWAS SNPs reside in noncoding regions and large numbers of individuals are required to be analysed, integrative analysis that combines both DNA sequencing and gene expression would accelerate the identification and functional characterization of the biological variants and PD-related genes. 9 Recently, the two GWAS results in PD identified several new PD risk loci. 10,11 However, because of differences in allele frequencies among different ethnicities, the analysis is not directly applicable to the Chinese population. As a result, the association of these novel candidate loci with PD in the southern Chinese population remained unclear. In this study, we aimed to explore the relationship between these newly characterized risk alleles and PD in southern Chinese population compared with previous GWAS studies and further discuss the potential effect of the susceptible loci on the respective gene expression by the expression quantitative trait loci (eQTL) analysis.

| Study population
A total of 231 PD patients were recruited from September 2017 to July 2019 from the outpatient clinic at the department of neurology, Ruijin Hospital affiliated to Shanghai Jiaotong University School of Medicine. PD was diagnosed by the movement disorder specialists using the MDS diagnostic criteria. 12 A total of 249 healthy control (HC) subjects were recruited and examined by the movement disorder specialists to exclude any possibility of PD. Patients with the medical history of other neurodegenerative diseases and/or inflammatory-, drug induced-, vascular-or toxin-induced parkinsonism were all excluded. Based on the diagnostic age, PD patients were divided into two groups: late-onset PD (LOPD) group and an early-onset PD (EOPD) group. All patients with the first diagnostic age of PD more than 45 years were placed in the LOPD group, 13 and remaining were placed in the EOPD group. Mild PD was defined when Hoehn-Yahr staging was below 2.5 after assessment. 14 Patients with relatives who have PD (within the last three generations) were regarded as familial PD patients. This study was approved by the Ethics Committee of the Ruijin Hospital affiliated to the Shanghai Jiaotong University School of Medicine, and all participants provided written informed consent.

| DNA extraction and genotype analysis
Two millilitre of venous blood sample was collected in EDTA anti-coagulation tubes from PD patients and healthy controls. The phenolchloroform-isopropyl alcohol method was used to extract genomic DNA. Polymer chain reaction and extension primers were designed using Primer5 software (version 5.0; PREMIER Biosoft International).

| Expression quantitative trait loci analysis
The potential functional impact of validated SNPs on gene expression was evaluated by analysing gene-SNP association in eQTL studies with two different databases: the Braineac eQTL data set and the GTEx (Genotype Tissue Expression Project) database. The Braineac eQTL data set, a public database developed by the UKBEC (the UK Brain Expression Consortium, UKBEC), integrates genotypes and gene expression data from 134 human brain samples of 10 brain regions, 15 while the GTEx database integrates genotypes and gene expression data of various tissues from 544 donors with different pathological diseases. 16

| Construction of luciferase reporter gene vectors and dual-luciferase reporter assays
The GBF1 promoter plasmid containing the A or G allele at rs10748818 was amplified from the genomic DNA of HC, using primers containing BglII in the forward primer and HindIII in the reverse primer for cloning (forward: 5′-GAAGATCTACTGCTCTAGTCCTGTGGGT-3′ and reverse: 5′-CCCAAGCTTCATTGCAACCCTGAGATACCCC-3′).
Jurkat cells (human T lymphocyte cells) and SH-SY5Y (human neuroblastoma cells) were plated into 24-well culture plates 24 hours prior to transfection and 490 ng polymorphism plasmid or pGL3-basic empty plasmid (as a negative control) was transfected using Lipofectamine 3000 (Invitrogen), with 10 ng Renilla pRL-TK plasmid (Promega) co-transfected as a normalizing control. After 24 hours, cells were rinsed with PBS and harvested with Passive Lysis buffer (Promega). Transcriptional activity was determined using the Dual-Luciferase Reporter Assay System (Promega) on a Synergy H4 Hybrid Microplate Reader (BioTek). For each plasmid construct, four independent transfection experiments were carried out and readings were taken in duplicate. The transcriptional activities were reported as relative luciferase activities, which was the ratio of firefly luciferase activities over renilla luciferase activities.

| Statistical analyses
Data were analysed by the SPSS software version 25.0 (SPSS Inc). A t test was used to compare the differences in age between PD patients and controls. A chi-square test was used to study the differences in the sex proportions, the discrepancy in allele and genotype frequencies and to test the Hardy-Weinberg equilibrium (HWE). Logistic regression analysis was used to calculate the risk analysis of each SNP in dominant, recessive and additive models after adjusting for age and gender. The genetic power of each SNP was calculated using Power and Sample Size software (version 3.1.6). 17 Multiple tests were performed using the Bonferroni correction method. Meta-analysis was performed using Review Manager 5.2 for Windows. Linkage disequilibrium (LD) linkage analysis was performed on the platform of SHEsis. 18 Q-statistics and I 2 were used for assessing the heterogeneity. Statistical significance was taken as two-sided P < .05.

| Demographic and clinical characteristics of the study population
The demographic and clinical characteristics of 231 PD patients and 249 HC subjects are shown in Table 1

| Analysis of genotypic and allele frequency in PD
For all the SNPs, genotype distributions were in the HWE (Table S2).
The minor allele frequencies and genotype frequencies of all these SNPs are listed in Table S2, and the SNPs found to be significantly associated with PD are listed in Table 2 (Table S3).
Since two pairs SNPs of the twelve detected polymorphisms were located on the same chromosome, to explore whether they were in linkage disequilibrium, LD linkage analysis was performed.

| Genotype-phenotype analysis in LOPD and EOPD
All PD patients were divided into two subgroups: LOPD and EOPD.
There were no discrepancies in gender between EOPD and LOPD patients (P = .580). For LOPD group, the dominant models of the rs2270968 of MCCC1 (P = .042, OR = 1.48) and the rs9261484 of b Referred to EOPD.
c Referred to LOPD.
of LOPD (Table 3; Table S4). The rs4771268 of MBNL2 was found to be associated with PD in both recessive model and the addi- In the analysis between EOPD and HC, the rs10748818 of GBF1 under the additive model was found to be associated with EOPD after adjustment for age and gender (P = .011, OR = 0.20) (Table 4;

| Meta-analysis
To further verify the results, we performed a meta-analysis for these loci based on our data and two other available GWAS studies. 10,11 The meta-analyses then identified rs8180209 of SNCA (P < .001, OR = 0.75; Figure 1A) and rs4771268 of MBNL2 (P < .001, OR = 0.90; Figure 2B), which are in accordance with the results from our cohort.
Substantial heterogeneity was found in rs10748818 of GBF1 in these studies (I 2 = 64%, P = .092; Figure 2A) mainly attributed to different ORs in Asian population in our study. But rs2270968 of MCCC1 (P < .001, OR = 1.10; Figure 1B) and rs7479949 of DLG2 (P < .001, OR = 0.89; Figure 1C) which were not significantly associated with PD in allele model in our cohort still turned to be significant alleles for PD following the meta-analysis. However, the association was in the same direction as in the reference studies with the magnitude of risk similar or greater than previously reported for rs2270968 and rs7479949.

| Functional prediction and validation
To fully understand the influence of relevant loci on the onset of PD, we selected 7 candidate loci based on the above results to explore the association between the genotype and the gene expression by for the rs9261484 of TRIM40, all the other 6 SNPs were shown to alter the relevant gene expression in the specific brain regions ( Figure   S1; Table S6). The A allele of rs8180209 decreased SNCA expression in white matter (P = .017) ( Figure 1A). The G allele of rs2270968 increased the MCCC1 expression in various brain regions, including frontal cortex (P = .038), hippocampus (P = .010), putamen (P = .007), substantia nigra (P = .003) and thalamus (P = .042) ( Figure 1B). The C allele of rs7479949 elevated the DLG2 expression in the occipital cortex (P = .038) and the temporal cortex (P = .048) (Figure 1C).
The G allele of rs10748818 increased the GBF1 in the frontal cortex (P = .026) ( Figure 1D). The T allele of rs4771268 attenuated the MBNL2 expression in white matter (P = .009) ( Figure 1D). And finally, the T allele of rs12528068 increased the RIMS1 expression in the hippocampus (P = .044) ( Figure 1E).
Moreover, in the GTEx database, sQTLs (splicing QTLs) showed that the rs8180209 of SNCA altered the splicing or alternative splicing of the intron to modulate the expression of SNCA in the cortex and nerve ( Figure S2; Table S7). And the eQTLs of GTEx results were also in agreement with the Braineac analysis that the G allele of rs2270968 enhanced the MCCC1 expression in various brain regions ( Figure S3; Table S7).
The rs10748818 of GBF1 was the only locus of five SNPs located on the genetic promoter region. Hence, we performed dual-luciferase reporter gene assay to test whether this variation altered GBF1 promoter transcriptional activity. However, allele alteration of rs10748818 had no effect on GBF1 promoter transcriptional activity in Jurkat cells and SH-SY5Y ( Figure S4).

| D ISCUSS I ON
In our study, we replicated that the following SNPs were associated  The rs3758549 locus is localized in the promoter region of both GBF1 and PITX3. For PITX3, rs3758549 is reported to be significantly associated with the risk of PD in the Asian population. 36 But for GBF1, its association with PD has not been studied in detail. GBF1 is shown to modulate the rate of anterograde trafficking to control protein secretion and its carrier organelle, 37 the axonal anterograde transport was impaired in the MPTP model, and anterograde axonal transport of glial cell line-derived neurotrophic factor (GDNF) was shown to be adversely affected in the 6-OHDA model. 38 Accumulation of misfolded/unfolded α-synuclein in the endoplasmic reticulum (ER) and disruptions in protein clearance mechanisms causes activation of ER stress mechanisms which could be observed in post-mortem tissue from sporadic human PD brains and in many animal models of PD. 39 GBF1 is also shown to modulate the ER-Golgi response to the external environment. 40 All these studies suggest association of GBF1 with PD, but more detailed investigation is required. Additionally, the rs10748818 failed to affect GBF1 promoter transcriptional activ-

CO N FLI C T O F I NTE R E S T
None.

AUTH O R CO NTR I B UTI O N S
JL designed the study, provided financial support and revised the manuscript. WK revised the manuscript. AZ, YL, MN, GL, NL and LZ collected the data. AZ and YL carried out the genetic analyses and performed data analysis. AZ wrote the manuscript. All the co-authors contributed to revising the manuscript for intellectual content and approved the final version for publication.

DATA AVA I L A B I L I T Y S TAT E M E N T
All data relevant to the study are included in the article or uploaded as supplementary information.