Genetic and epigenetic factors fine-tune TGFB1 expression within the osteoarthritic articular joint

Objective : Osteoarthritis (OA) is an age-related disease characterised by articular cartilage degeneration. It has a large heritability and genetic screens have identified single nucleotide polymorphisms (SNPs) marking genomic risk loci. One such locus is marked by G>A SNP rs75621460, downstream of TGFB1 . This gene encodes TGF- β 1, the correct expression of which is essential for cartilage maintenance. We have used a combination of human patient samples (n=319) and a chondrocyte model to characterise the impact of rs75621460 in multiple articular joint tissues. Methods : Patient samples were genotyped and DNA methylation (DNAm) levels quantified by pyrosequencing. Gene reporter and electrophoretic mobility shift assays were used to determine differential nuclear protein binding to the region. The functional impact of DNAm upon TGFB1 expression was tested using targeted epigenome editing. Results : We identified that rs75621460 is located within a TGFB1 enhancer, and that the OA risk A-allele alters transcription factor binding, decreasing enhancer activity. Protein complexes binding to A (but not G) induced DNAm at flanking CG-dinucleotides. Strong correlations were observed between patient DNAm levels and TGFB1 expression, the direction of which was opposing between cartilage and synovium. This demonstrated biological pleiotropy in the impact of the SNP within different tissues of the articulating joint. Conclusion : The OA risk SNP rs75621460 impacts TGFB1 expression by modulating the function of a gene enhancer. We propose a mechanism by which the SNP impacts enhancer function, providing novel biological insight into one mechanism of osteoarthritis genetic risk, which may facilitate the development of future pharmacological therapies. risk loci 37 . The very small range of DNAm values over which correlations with TGFB1 expression occur potentially suggest the effects operate in a sub-population of chondrocytes within the tissue. In vitro methylation of the enhancer reduced the activity of both alleles in a reporter assay, and EMSA analysis indicated that DNAm at CpG2 impacted upon protein binding. The SP1 antibody bound to both methylated and unmethylated probes, consistent with previous reports into binding of this transcription factor 38,39 . We further identified that a targeted increase of DNAm at CpGs3-6 could reduce TGFB1 expression in the absence of the rs75621460 A-allele.

OA is an age-related, degenerative disease of the articulating joints, affecting over 40 million Europeans 15 . The disease hallmark is the thinning and loss of articular cartilage, often accompanied by a low-grade synovial inflammation within the affected joint 16 . This leads to chronic impairment of joint function, with a resultant increased risk of premature death due to secondary co-morbidities 17,18 . A typical clinical end-point is surgical replacement of the affected joint. Currently, there are no disease-modifying OA drugs and novel treatments are urgently required.
The causes of primary OA are complex. Yet, with an estimated heritability of ~50%, genetic influences contribute highly to disease susceptibility 19 . Genome-wide association studies (GWAS) have revealed the highly polygenic nature of OA and, over 90 significant association signals have been reported. Risk variants are often intergenic and thought to operate by mediating differential expression of their target genes. This places OA in the "enhanceropathy" category of common diseases, in which subtle but detrimental changes in gene expression through aberrant activity of DNA regulatory elements, or "enhancers", contribute to disease progression 20 .
In 2019, an OA risk signal was reported at chr19q13.2, marked by intergenic SNP rs75621460 (G>A; minor allele frequency (MAF), 0.03) 14 . The SNP lies 2.4kb downstream of TGFB1 and has >99% probability of being the single causal variant at this locus 14 . In this study we investigate rs75621460 and the encompassing region of DNA for regulatory activity.

Accepted Article
This article is protected by copyright. All rights reserved Furthermore, we quantify genetic variation and epigenetic modifications within the region and measure the impact upon expression of TGFB1 in multiple human joint tissues.

In silico analysis of the locus
An in-silico analysis of the locus was performed using ROADMAP chromatin state data 21 , RNAsequencing (RNA-seq) data from hip OA and neck-of-femur (NOF) fracture cartilage 22 , and ATAC-sequencing data from knee OA cartilage 23 . P-values for RNA-seq data were calculated using a Wald test within the DESeq2 package. The ROADMAP 18-state model utilises 6 histone post-translational modifications to assign one of eighteen chromatin states to cell-specific epigenomes and was used here to identify potential regulatory function in two cell types: E006, embryonic stem cell-derived mesenchymal stem cells (MSCs); and E049, bone marrow-derived cultured chondrocytes. Analysed knee articular cartilage ATAC-seq data was downloaded directly from GEO (accession GSE108301) 23 . Population allele frequencies of rs75621460 were taken from LDlink.

Luciferase reporter analysis
A 553bp region encompassing rs75621460 was amplified from pooled blood DNA, cloned into the pGL3-Basic firefly reporter vector (Promega), and sequenced to identify clones with the ancestral G-or derived A-allele at rs75621460. Tc28a2 immortalised chondrocytes were seeded into a 96-well plate 24h prior to transfection with the relevant pGL3-promoter luciferase vector construct (100ng) and pRL-TK Renilla vector (1.5ng) using Fugene HD transfection reagent (Promega). After 24h, cells were lysed and luciferase activity was measured by GloMax Navigator (Promega). For each well, luciferase activity was normalised to that of Renilla as previously described 24 .

Electrophoretic Mobility Shift Assay (EMSA)
Nuclear protein was extracted from Tc28a2 cells as previously described 25 . For each allele of rs75621460, forward and reverse single stranded DY682-labeled oligonucleotides (Eurofins), spanning 15bp each side of the SNP, and encompassing CpG2 were annealed to generate doublestranded probes (Table S1). Four probe combinations were generated containing either the G allele or A allele at rs75621460 that were unmethylated or methylated at CpG2. Reactions were carried

Accepted Article
This article is protected by copyright. All rights reserved out as previously described 25,26 . For supershift assays, 2µg of the indicated antibody was added to the binding reaction (Table S2).

CRISPR-Cas9
The CHOPCHOP CRISPR Design Tool 27 was used to design guide RNA (gRNA) sequences which were predicted to have low off-target effects, a GC content between 40 and 70%, and with a high targeting efficiency immediately upstream (gRNA1) and downstream (gRNA2) of rs75621460. The selected gRNAs created an 84bp deletion encompassing rs75621460 (Table S3).

Gene expression analysis
cDNA was reverse transcribed from total RNA using the Superscript IV standard protocol (Invitrogen) after an initial 15-minute treatment with 1 unit of amplification grade DNaseI (Invitrogen). Gene expression was measured by quantitative reverse transcription-polymerase chain reaction (qRT-PCR) using pre-designed TaqMan assays (Integrated DNA Technologies).
Gene expression was quantified using TaqMan chemistry, normalised to housekeeping genes 18S, HPRT1 and GAPDH and expressed as 2 -Δct as described previously 29 .

Patient samples and extraction of nucleic acids
Human tissue samples were obtained from patients undergoing hip or knee joint replacement surgery due to end-stage OA or NOF fracture. Arthroplasty was conducted at the Newcastle upon Tyne NHS Foundation Trust hospitals. The Newcastle and North Tyneside Research Ethics

Accepted Article
This article is protected by copyright. All rights reserved Committee granted ethical approval for the collection, with each donor providing verbal and written informed consent (REC reference number 14/NE/1212). Further details of the patient samples used in this project are provided in Table S4. RNA was extracted from cartilage by TRIzol-chloroform (Life Technologies) separation, following which the RNA was purified from the aqueous phase using the RNeasy Mini Kit (Qiagen). Both DNA and RNA were extracted from whole blood and synovium using the EZNA DNA/RNA Isolation kit (Omega Bio-Tek). For genotyping DNA was used directly. For methylation analysis, 500ng DNA was bisulphite converted using the EZ DNA methylation kit (Zymo Research).

Pyrosequencing
PyroMark Q24 Advanced (Qiagen) was used to genotype all patient DNA samples as previously described 24 . Pyrosequencing was also used to quantify DNAm at six CpGs flanking rs75612460 following bisulphite conversion of DNA (EZ DNA Methylation Kit, Zymo). Each sample was amplified in duplicate. Samples were excluded from the analysis if the replicates differed by >5%.
Assays were designed using PyroMark assay design software 2.0 and primer sequences are listed in Table S5.

Lucia Reporter Assay
A 546bp region containing either the G or the A allele of rs75621460 was amplified and cloned into the pCpG-free-promoter-Lucia vector (Invivogen). Primer sequences are listed in Table S6.
Clones were transformed into competent GT115 cells (Invivogen) according to the manufacturer's protocol. Plasmids were methylated or mock-methylated in vitro using M.SssI (NEB). Successful methylation was determined by digest with methylation-sensitive SmaI (New England Biolabs).
Tc28a2 cells were transfected with 100ng of pCpG-free-promoter construct, along with 10ng of the pGL3-promoter vector (Promega) and luminescence measurements were made as described above.

Accepted Article
This article is protected by copyright. All rights reserved and expanded each time to 90% confluency. At each passage, cells were isolated for extraction of nucleic acids (Purelink, Thermo Fisher).

Statistical analyses
Genotype and methylation correlations were calculated using Kruskal-Wallis testing. For Lucia reporter assays we corrected for multiple comparisons using the methods of Holm-Sidak or Dunn, as specified in the figure legends. Changes in gene expression following Cas9 modulation were calculated using paired t-tests. AEI and DNAm relationships were determined using linear regression analysis. The exact details of all statistical tests is provided in the relevant figure legend. All tests were performed in GraphPad Prism 8.3.1.

Results
The region encompassing rs75621460 is a gene enhancer OA risk SNP rs756215460 is an intergenic variant at chromosome 19q12, positioned between CCDC97 and TGFB1 (Fig.1a). ChIP-seq data from mesenchymal stem cells (MSCs) and differentiated chondrocytes, along with ATAC-seq from OA knee chondrocytes indicate that the SNP resides within a chromatin-accessible region with post-translational histone modifications H3K27ac (yellow) and H3K27me3 (red), indicating that this region possesses regulatory function ( Fig.1a). TGFB1 expression was significantly (P<0.01) upregulated in OA hip cartilage. No significant change (P>0.05) in CCDC97 expression was observed (Fig.1b).
We cloned the 550bp accessible chromatin region into a luciferase reporter vector. The ancestral G-allele construct conferred a 2.7-fold increase in luciferase activity (Fig.1c). The derived A-allele (OA-risk) also demonstrated regulatory activity (1.6-fold), which was significantly lower (P<0.05) than that of G.
A multiple sequence alignment revealed that the G-allele is highly conserved in mammals ( Fig.1d). Within human populations, the A-allele emerged at a frequency >1% only in Europeans.

Differential allelic protein binding occurs at rs75621460
We used electrophoretic mobility shift assays (EMSAs) to characterise proteins binding to rs75621460. This revealed several complexes with a greater binding affinity to the G-allele than A

Accepted Article
This article is protected by copyright. All rights reserved ( Fig.2a, arrows 1 and 4). Furthermore, proteins were identified which exclusively bound to one of the two alleles ( Fig.2a arrows 2-3 and 5-6). Unlabelled probes were added to the reaction at increasing concentrations ( Fig.2b and c). The unlabelled A-probe was unable to strongly compete for binding for the higher molecular weight complexes bound to the G-probe (Fig.2b). However, some lower molecular weight complexes were outcompeted by increasing concentrations of the unlabelled A-probe, indicated by two arrows on Figure 2b. The unlabelled G-probe was able to compete for binding of all protein complexes to the labelled A-probe, with only one exception, indicated with an arrow on Figure 2c. The TRANSFAC database predicted four transcription factors which differentially bind to the alleles of rs75621460: SP1, MAZ, KLF17, and ETF ( Fig.2d). All four were predicted to bind exclusively to the G-allele. EMSA was performed using antibodies raised against the four proteins (Fig.2e). A supershifted band was observed in the presence of the SP1 antibody. This complex (indicated by an arrow) was bound to both alleles, however with a greater abundance at the G probe (Fig.2e). These combined EMSA results indicate that the G-allele binds proteins with greater affinity than the A-allele, and that distinct protein complexes bind to the region in chondrocytes, determined by the allele carried at rs75621460.

TGFB1 is the gene target of the rs75621460 enhancer
We deleted an 84bp region of the enhancer encompassing rs75621460 from the genome of Tc28a2 immortalised chondrocytes using CRISPR-Cas9 and a pair of gRNAs (gRNA 1 and 2) (Fig.3a).
No change in CCDC97 expression was measured (P=0.12) following deletion of the region ( Fig.3b). However, a significant decrease in TGFB1 expression was observed in Tc28a2-Δ84, in which mean expression was 0.48 of that measured in wild-type cells (P=0.003).

Methylation quantitative trait locus (mQTL) analysis of rs75621460
The deletion introduced in Tc28a2-Δ84 cells encompassed six CG-dinucleotides (CpGs), positions at which eukaryotic DNA can be methylated. This included a single upstream CpG (CpG1), and five downstream CpGs (CpG2-6) (Fig.4a). We investigated whether DNAm at these CpGs was modulated by SNP genotype. Due to the low MAF at rs75621460, we screened 206 human hip and knee cartilage samples to identify sufficient individuals carrying the A allele for analysis and identified 190 major allele homozygotes (GG), and 16 heterozygotes (GA).
We quantified cartilage DNAm at the 6 CpGs and stratified values by SNP genotype. All homozygous (GG) individuals investigated (n=93-101 across the six CpGs) were hypomethylated
In samples for which both DNA and RNA were available, we tested for correlations between DNAm and TGFB1 expression (methylation and expression QTLs, meQTLs). In cartilage, data were analysed together (n=31), and also by joint site (hip, n=14; knee, n=17).
Across both tissues, homozygous patient samples (GG) showed no significant meQTLs (P>0.09, Fig.4f). Conversely, very strong correlations were observed amongst the heterozygote samples. In cartilage, this was dependent upon the joint site from which cartilage was taken, with stronger meQTLs measured in knee (r 2 =0.47 -0.99) than in hip (r 2 =0.01 -0.65) (Fig. S2a). In knee cartilage and synovium, the strongest effect was observed for CpG1 (upstream of the SNP), where r 2 values were 0.99 and 0.90, respectively (Fig.4g). In both knee tissues, increasing DNAm at Correlations between DNAm and TGFB1 expression were observed at the five downstream CpGs (Fig.4g). In knee cartilage, a very strong meQTL (r 2 =0.92) operated at CpG2

Accepted Article
This article is protected by copyright. All rights reserved (Fig. S2a). Strikingly, the impact of rs75621460 upon DNAm at the downstream CpGs was paradoxical in the two distinct knee joint tissues. In cartilage, increasing DNAm correlated with increased TGFB1, whereas in synovium the opposite effect was seen (Fig.4f).
Heterozygous DNAm at CpG2 was stratified by DNAm at CpG1 (to identify correlations between CpGs upstream and downstream of the SNP) and at CpG3 (to identify correlations between CpGs downstream of the SNP). In synovium, positive correlations were observed between CpG1 and CpG2 (r 2 =0.84) and between CpG2 and CpG3 (r 2 =0.97) (Fig.S2b). However, in cartilage, correlations were observed at the downstream CpGs (r 2 =0.92-0.94), but not between CpG1 and CpG2 (r 2 =0.04-0.31), which are physically separated by rs75621460 (Fig.S2c). This validates the observations made in the meQTL analysis and suggests that in cartilage, distinct mechanisms regulate DNAm upstream and downstream of the SNP.
The detected meQTLs were the strongest in cartilage, the central tissue in OA pathogenesis. We therefore continued to use a chondrocyte model for subsequent downstream analyses.

DNA methylation in the enhancer attenuates activity
We next investigated whether DNAm at the CpGs flanking rs75621460 have a functional impact upon enhancer activity. The enhancer was cloned into a CpG-free reporter vector and expressed in Tc28a2 cells in either an unmethylated or methylated state. Methylation of the cloned region resulted in a significant reduction in enhancer activity in constructs containing both the G-(P=0.004) and A-allele (P=0.019) (Fig.5a), demonstrating that DNAm influences chondrocyte enhancer activity independently of rs75621460 genotype.
We repeated the EMSA, this time including probes that were methylated at CpG2, the sole CpG contained within the probe sequence. We compared nuclear protein binding to both alleles in the unmethylated or methylated state. The six bands of interest that were previously identified ( Fig.2a) are highlighted (Fig.5b). All of these protein complexes were able to bind to methylated probes (Fig.5b). Interestingly, for bands 1 and 4, methylation of the A-probe appeared to recover protein binding (Fig.5b).
We conducted a supershift assay using the methylated probes and also included an antibody for DNMT3a with the aim of detecting recruitment of a de novo DNA methylating enzyme by proteins bound to the A-allele. However, the only visible shift identified using this panel of antibodies was in the lanes containing anti-SP1, which was able to bind to both the

Accepted Article
This article is protected by copyright. All rights reserved unmethylated and methylated probes (Fig.5c and d). These EMSA data, along with our reporter assay data indicate that methylation of the region attenuates activity of the enhancer. However, DNAm at the single most proximal CpG to rs75621460 (CpG2) does not prevent the binding of proteins adjacent to the SNP, including SP1.

Modulation of the epigenome using DNMT3a-dCas9
Finally, we investigated whether DNAm flanking the SNP could functionally impact TGFB1 expression in the absence of the derived A-allele. We used a DNMT3a-dCas9 fusion protein for targeted editing of DNAm at the six CpGs in Tc28a2 cells, which are homozygous (GG) at rs75621460. Five gRNAs (gRNA3-7) were designed to target the region (Fig.S3a). DNMT3a-dCas9 was expressed alone (non-targeting control), or along with one of the five gRNAs, and DNAm was quantified over three cell passages (Fig.S3b). Four of the five guides (gRNAs4-7) successfully increased DNAm at one or more CpGs, an effect which was lost passively (Fig.S3b).
As gRNA3 did not modulate DNAm at any of the targeted six CpGs, it was not included in subsequent experiments.

Discussion
TGF-1 has a well-established role in OA pathophysiology, however this is the first study to identify an interplay between genetic and epigenetic regulation of TGFB1 expression in the context of disease risk. We have characterised an intergenic TGFB1 enhancer within the

Accepted Article
This article is protected by copyright. All rights reserved articulating joint, at which the alleles of an OA risk SNP impact upon DNAm and regulate TGFB1 in vivo.
We confirmed the SNP region as an in vitro enhancer at which the rs75621460 OA-risk Aallele reduces enhancer activity compared to the highly-conserved ancestral G-allele. The conservation of the G-allele amongst distinct human populations and throughout mammalian evolution illustrates the importance of the G for protein binding and enhancer function. EMSA analysis supported this data, showing that different alleles at rs75621460 could bind distinct proteins. The emergence of the A-allele in European populations implicates a selection advantage resulting from population-specific pressure, yet that this selection also simultaneously confers a detriment to cartilage health in older age, a phenomenon known as antagonistic pleiotropy.
Additionally, we identified that the transcription factor SP1, which has previously been shown to play a role in TGFB1 regulation 31 , binds to complexes at both alleles. Deletion of the region in chondrocytes confirmed TGFB1 as the enhancer gene target.
The absence of eQTLs in patient samples was perhaps unsurprising due to our modest sample size. Interindividual variability in gene expression often necessitates sample sizes involving hundreds of patients for the detection of significant genotype-expression correlations 32,33 . A complementary approach for eQTL analysis, which greatly increases sensitivity, involves measuring allelic imbalance between the expression of gene transcripts, and has been widely applied to investigations of OA risk loci 24,34-46 . We were unable to utilise this approach here due to the low MAF and the absence of a suitable TGFB1 transcript SNP.
However, we have demonstrated how the use of a secondary endophenotype, DNAm, can provide a more sensitive approach to investigate the impact of genotype upon gene expression within an individual.
We identified mQTLs at six CpGs in two tissues of the articulating joint, indicating that genetic and epigenetic interplay at the locus contributes to disease aetiology as observed at other OA risk loci 37 . The very small range of DNAm values over which correlations with TGFB1 expression occur potentially suggest the effects operate in a sub-population of chondrocytes within the tissue. In vitro methylation of the enhancer reduced the activity of both alleles in a reporter assay, and EMSA analysis indicated that DNAm at CpG2 impacted upon protein binding. The SP1 antibody bound to both methylated and unmethylated probes, consistent with previous reports into binding of this transcription factor 38,39 . We further identified that a targeted increase of DNAm at CpGs3-6 could reduce TGFB1 expression in the absence of the rs75621460 A-allele.

Accepted Article
This article is protected by copyright. All rights reserved It has previously been documented that regulatory SNPs can confer tissue-specific effects upon genes, resulting in biological pleiotropy 32,40 . At this locus, the directly opposing effects in cartilage and synovium are the result of a shared effect of a single variant rather than the colocalisation of two distinct effects 40 . This emphasises that whilst integration of epigenetic data is a useful post-GWAS tool 14,41 , functional analyses in appropriate disease models are imperative to elucidate tissue-specific pathological mechanisms.
We propose a molecular mechanism of TGFB1 regulation in cartilage, as follows (Fig.S4).
Substitution of the highly-conserved G-allele at rs75621460 alters the consensus sequence for protein binding. In the presence of the G-allele, a protein complex with strong transcriptional activity binds to the sequence (Fig.S4a). This complex does not modulate DNAm, hence there is We speculate that in synovium, where a paradoxical correlation was observed, tissuespecific proteins which have a repressive effect upon TGFB1 could bind to the A-allele (Fig. S4c).
In both tissues, the OA risk A-allele results in attenuated enhancer activity, and decreased TGFB1 expression.
Elucidating the mechanism of TGFB1 expression in synovium was not within the scope of this study and requires further investigation. Furthermore, the impact of the SNP upon downstream TGF-β signalling remains unknown. The SNP resides within a region of open chromatin in fibroblast-like synoviocytes 42 . This knowledge, along with our data, indicates that the region is also utilised to regulate TGFB1 in synovium. The TGFB1 enhancer is an interesting focus for future studies, especially in the context of inflammatory joint diseases such as rheumatoid arthritis. Additionally, the use of single-cell technologies could identify subpopulations of cells within joint tissues in which these mechanisms operate. Furthermore,

Accepted Article
This article is protected by copyright. All rights reserved novel techniques for targeted subnuclear proteomics profiling provide a promising tool to identify the exact proteins modulating TGFB1 expression in distinct tissue types 43 .
TGF-β has been well-studied in the context of OA pathophysiology 44,45 . In healthy cartilage, TGF-β acts as an anabolic factor to stimulate the synthesis of ECM proteins, conveying a chondroprotective effect against mechanical loading in a healthy joint 46

Accepted Article
This article is protected by copyright. All rights reserved

Accepted Article
This article is protected by copyright. All rights reserved (D) Analysis of differential transcription factor binding to the G and A-allele at rs75621460 using the transcription factor database, TRANSFAC.
(E) Supershift experiment with antibodies targeting SP1, MAZ, KLF17, and ETF compared to no antibody (control) to the EMSA reaction containing the G or A-allele probe. The arrow indicates the position of supershifted complexes.  (C) Labelled probes containing rs75621460 G-allele with CpG2 unmethylated or methylated were incubated with antibodies raised against SP1, MAZ, KLF17, ETF, or DNMT3a (all 2μg) and Tc28a2 nuclear protein extract. The arrow indicates the SP1-supershifted complex.
(D) Labelled probes containing rs75621460 A-allele with CpG2 unmethylated or methylated were incubated with antibodies raised against SP1, MAZ, KLF17, ETF, or DNMT3a (all 2μg) and Tc28a2 nuclear protein extract. The arrow indicates the SP1-supershifted complex. (B) Tc28a2 percentage methylation levels at the 6 CpGs surrounding the SNP in no guide controls (black, dashed line) or using a gRNA targeting the region (colours as described above, solid line) (n=6 biological replicates, each with two technical repeats). Combinations of gRNAs were also used: gRNA 4+6 and gRNA 5+7 (light blue, solid line). Subsequent changes in TGFB1 expression were also measured following targeted editing of methylation, and gene expression was normalised to that in no-guide controls (n=6 biological repeats, each with 3 technical repeats). P values were calculated using paired t-tests following testing of control values for normality (D'Agostino and Pearson, P=0.37).