Regulatory Variant rs2535629 in ITIH3 Intron Confers Schizophrenia Risk By Regulating CTCF Binding and SFMBT1 Expression

Abstract Genome‐wide association studies have identified 3p21.1 as a robust risk locus for schizophrenia. However, the underlying molecular mechanisms remain elusive. Here a functional regulatory variant (rs2535629) is identified that disrupts CTCF binding at 3p21.1. It is confirmed that rs2535629 is also significantly associated with schizophrenia in Chinese population and the regulatory effect of rs2535629 is validated. Expression quantitative trait loci analysis indicates that rs2535629 is associated with the expression of three distal genes (GLT8D1, SFMBT1, and NEK4) in the human brain, and CRISPR‐Cas9‐mediated genome editing confirmed the regulatory effect of rs2535629 on GLT8D1, SFMBT1, and NEK4. Interestingly, differential expression analysis of GLT8D1, SFMBT1, and NEK4 suggested that rs2535629 may confer schizophrenia risk by regulating SFMBT1 expression. It is further demonstrated that Sfmbt1 regulates neurodevelopment and dendritic spine density, two key pathological characteristics of schizophrenia. Transcriptome analysis also support the potential role of Sfmbt1 in schizophrenia pathogenesis. The study identifies rs2535629 as a plausibly causal regulatory variant at the 3p21.1 risk locus and demonstrates the regulatory mechanism and biological effect of this functional variant, indicating that this functional variant confers schizophrenia risk by altering CTCF binding and regulating expression of SFMBT1, a distal gene which plays important roles in neurodevelopment and synaptic morphogenesis.

Detailed information about the reagents are provided in the Supplementary Table 1.

Genotyping
In brief, the genomic sequence surrounding the rs2535629 was amplified by PCR with specific primers (Supplementary Table 2). The PCR products were then treated with shrimp alkaline phosphatase (SAP) and exoenzyme I (ExoI) to remove unincorporated dNTPs and primers. The purified products were used as template to genotype rs2535629 with snapshot multiple mixing solution, genotyping primers (Supplementary Table 3) and ddNTPs. Genotyping data were analyzed and manually checked using the GeneMapper software 4.0.

Cell culture
HEK-293T and U251 cells were cultured in high-glucose Dulbecco's Modified Eagle's medium containing 10% FBS, 1× penicillin and streptomycin (contains 50 units/mL of penicillin and 50 µg/mL of streptomycin). SH-SY5Y and SK-N-SH was cultured in high-glucose DMEM supplemented with 10% FBS, 10 mM sodium pyruvate solution, 1× Minimum Essential Medium non-essential amino acid solution, penicillin and streptomycin. Cells were cultured at 37℃ (with 5% CO 2 ) and there were no mycoplasma contamination detected in our study (Supplementary Table 4).

Schizophrenia cases and controls
All of cases were diagnosed with DSM-IV criteria, using the Structured Clinical Interview for DSM-IV (SCID) [1] . The detailed information about the onset of schizophrenia, the duration course, symptoms, family history of psychiatric illnesses and medical history were evaluated by at least two independent experienced psychiatrists to reach a consensus DSM-IV diagnosis. Subjects with a history of alcoholism, epilepsy, neurological disorders, or drug abuse were excluded from the study.

Dual luciferase reporter gene assays
The human DNA sequence containing rs2535629 was amplified with specific primers (Supplementary Table 5). The amplified products were digested with restriction enzymes KpnI and XhoI, then ligated into the pGL3-promoter vector. The ligated vectors were transfected into DH-5α cells and selected with LB culture plates containing ampicillin. After screening for 18 hours, single colonies were selected for further amplification culture. The sequence of the inserted DNA was verified by Sanger sequencing. PCR-mediated point mutation technology was used to generate DNA fragments containing alternative alleles of SNP (Supplementary Table 5).
Luciferase reporter gene assays were performed in HEK-293T, SH-SY5Y, SKN-SH and U251 cells. For HEK-293T cells, 4×10 4 cells were seeded into 96-well plates. After culturing for 24 hours, the recombinant plasmid (100 ng) and the internal control plasmid pRL-TK (Renilla Luciferase) (20 ng) were co-transfected into the cells using Lipofectamine 3000. For SK-N-SH and SH-SY5Y, and U251 cells, 6×10 4 cells were plated into a 96-well plate. After 24 hours culture, the recombinant plasmid (150 ng) and the internal control plasmid pRL-TK (50 ng) were co-transfected into the cells using Lipofectamine 3000. 48 hours post transfection, luciferase activity was determined by using Luminoskan Ascent instrument (Thermo Scientific) and the dual-luciferase reporter gene detection system. The testing process is carried out in accordance with the manufacturer's recommended instructions.

CRISPR-Cas9-mediated genome editing
To knockout the genomic sequence containing rs2535629, we designed two sgRNAs (located in upstream and downstream of rs2535629, respectively) and cloned the sgRNAs into PX-459 vector.
SH-SY5Y cells were seeded into a 6-well plate with a density of 1×10 6 /well. After culturing for 24 hours, these cells were transfected with the constructed plasmids using Lipo3000. Puromycin (1 μg/ml) was used to select the transfected cells. Genomic DNA was extracted by phenol/chloroform method and the extracted DNA was used as the template for knockout efficiency validation. RNA was extracted by TRIZOL reagent. Takara reverse transcription kit was used for reverse transcription, according to the manufacturer's instructions. Gene expression was quantified using SYBR kit.

Knockdown assays
The synthesized shRNAs were annealed and ligated into the pLKO.1-EGFP-Puro vector. The lentiviruses were packed with HEK-293T cells cultured in 10 cm culture dish. The constructed shRNA plasmids and the package plasmids were co-transfected into HEK-293T cells when confluence reached about 70%. After culturing for 48 hours, lentiviruses were collected and used to infect the cells. 48 hours post infection, 2 μg puromycin was added to each well for 7 days to kill the uninfected cells. TRIZOL was used to extract total RNA and reverse transcription was conducted using the Takara reverse transcription kit. Gene expression was quantified using the SYBR kit.

Assays of dendritic spine density
Two softwares (ImageJ software [2] (https://imagej.nih.gov/ij/), NeuronStudio [3] (https://m.vk.com/neuron studio)) were jointly used for neuron morphological analysis. Briefly, the neurons that co-labeled with green (Venus protein which used to manifest spine morphology of neurons) and red (mCherry indicated that the knock-down vectors were transfected into the target neuron) fluorescence were selected for morphological analysis with double-blind. Image J software was used to transform the format of target picture (.czi) produced by Laser confocal instrument (ZEISS) into other format (.tiff) with 8-bit (which can be opened with NeuronStudio software). The NeuronStudio (a toolkit for neurite spine analysis) was then used to trace the second and tertiary dendrite ((total length of 100 μm per neuron)) of target neurons and identify spine types automatically. The derived spine types were confirmed by manual correction according to the 3D structure of neurons and the specific criterion of spines (mushroom: head to neck diameter ratio>1.5, stubby: head to neck diameter ratio＜1.5 and length to neck diameter ratio＜2, thin: head to neck diameter ratio＜1.5 and length to neck diameter ratio>2) [4,5] .

Supplementary Table 8. Sequences of shRNAs used to knockdown CTCF and Sfmbt1
Knockdown-shRNA-primer ( Figure 1. Prioritization and identification of regulatory SNPs at schizophrenia risk loci. Risk SNPs from three large-scale GWASs [6][7][8] were subjected to functional genomics analyses. Chromatin immunoprecipitation and sequencing (ChIP-Seq) experiments performed in brain tissues or neuronal cell lines (such as neuroblastoma) were used to derive the binding motifs (i.e., specific DNA sequence) of corresponding transcription factors. The identified PWMs were then compared to the PWM database, and the matched PWMs (i.e., PWMs derived from ChIP-Seq were compared with the corresponding PWMS in PWM database, and the one with best motif match was used for further analysis) were used to investigate if the risk SNPs were located in the identified PWMs and if different alleles of the SNPs disrupted (or affected) the binding of transcription factor. Expression quantitative trait loci (eQTL) annotation was performed to explore the potential target genes of the identified regulatory SNPs. Reporter gene assays were used to validate the regulatory effects of the identified TF binding-disrupting SNPs, and independent genetic association study was conducted to validate if the identified TF binding-disrupting SNPs were associated with schizophrenia in Chinese population.  (yrs)) were plotted in five regions (NCX: neocortex, HIP: hippocampus, AMY: amygdaloid complex, STR: striatum and MD: mediodorsal nucleus of thalamus) of the human brain. Data were from the BrainSpan (http://www.brainspan.org/) [9] .  Figure 8. SFMBT1 is widely expressed in the brain. Consensus normalized expression (NX) levels were created for the 10 brain regions by combining the data from two transcriptomics datasets (GTEx and FANTOM5). Color coding is based on brain region and the bar shows the highest expression among the subregions included.