Identification of rheumatoid arthritis causal genes using functional genomics

Over the past decade, genome‐wide association studies have contributed a wealth of knowledge to our understanding of polygenic disorders such as rheumatoid arthritis. As the size of sample cohorts has improved so too have the computational and experimental methods used to robustly define variants associated with disease susceptibility. The challenge now remains to translate these findings into improved understanding of disease aetiology and patient care. Whilst much of the focus of translating the findings of genome‐wide association studies has been on global analysis of all variants identified, careful functional study of individual disease susceptibility loci will be required in order to refine our understanding of how individual variants contribute to disease risk. Here, we present the argument behind such an approach and describe some of the novel tools being used to investigate risk loci. This includes the use of chromosomal conformation capture techniques and modifications of the CRISPR‐Cas9 system, with several examples of their implementation being described.


| INTRODUCTION
Rheumatoid arthritis (RA) is a common autoimmune disease with an estimated global prevalence of approximately 25 million. 1 The disease is characterized by the inflammation of multiple synovial joints and has a chronic course. Pain and reduced mobility associated with this inflammation severely impact on patients' quality of life and, if uncontrolled, can cause irreversible damage to joints, often resulting in disability. 2 The prevalence of RA is significantly higher amongst women and increases with age, 3 such that the prevalence of RA is projected to increase. This is set to amplify the therapeutic costs and high rates of work disability, 2 combining to create a major health care and socioeconomic burden.
Current therapeutic options include combinations of disease modifying anti-rheumatic drugs (DMARDs), such as methotrexate, and biologic drugs, such as tumour necrosis factor (TNF) inhibitors. However, the efficacy of these treatments varies widely amongst RA patients and there is still an unmet need for novel, affordable therapies. 4 A more detailed understanding of RA pathogenesis and how it is affected by the mechanism of action for these drugs is likely to improve our ability to predict therapeutic response, ensuring that patients receive optimal treatment within the window of opportunity that exists between the emergence of preclinical features and the onset of joint damage.

| RHEUMATOID ARTHRITIS AETIOLOGY
The role of autoimmunity in RA is highlighted by the prevalence of autoimmune antibodies in patients' sera. These include rheumatoid factor, which recognizes the fragment crystallizable region of immunoglobulin G and is present in 70%-80% of established RA, 5 and anti-citrullinated protein antibodies (ACPAs), present in 50%-70% of RA patients. 6 Clinical evidence, along with the association of different environmental and genetic factors with ACPApositive or -negative RA, suggests that ACPA status may discriminate between different forms of RA.
It is thought that ACPA-positive RA emerges as a result of untolerated citrullination of proteins. This results in systemic autoimmune activation; it is thought that RA is manifested in synovial joints as they contain many proteins that normally undergo citrullination. 7 Additional detail, along with how this model applies to ACPA-negative RA is still unclear.

| RHEUMATOID ARTHRITIS GENETICS
In keeping with other polygenic disorders, RA susceptibility is modulated by a multitude of loci with low penetrance. These loci combine to account for approximately 60% of the variation in disease risk. 8 To date, genome-wide association (GWA) studies based on data from tens of thousands of patients have helped to identify approximately 100 loci that are robustly associated with disease susceptibility. 9 The reproducibility of many of these associations has been confirmed in multiple populations, with stringent statistical methods being used to ensure that associations are not erroneously reported, as a result of confounding or multiple testing, for example. In addition, imputation of genotypes for untested SNPs and high-density genotyping of 51 RA susceptibility loci have improved the resolution with which these associations can be described. 10 Highdensity mapping was performed using the Immunochip custom SNP array, designed to interrogate 186 loci known to be associated with any of 12 autoimmune diseases. Collaborative efforts, such as the Immunochip consortium, have been invaluable in the collection of genetic data from large numbers of individuals. As a further example, imputation is typically performed using known haplotypes identified by the 1000 Genome project. 11 A minority of the variants that constitute RA susceptibility loci are known to affect the amino acid sequence of protein-coding genes, such as the Human Leukocyte Antigen (HLA), or Protein tyrosine phosphatase, non-receptor type 22 (PTPN22). Variants located within the coding sequence of these genes are common to many autoimmune diseases and generally have relatively high odds ratios (OR), signifying a relatively high impact on the likelihood of developing RA; however, they are outnumbered by the vast majority of identified loci which have smaller ORs and are generally found in non-coding regions ( Figure 1). Of the 46 RA susceptibility variants that have been fine mapped using the Immunochip custom array, only seven potentially causal exonic variants were identified (approximately 85% non-coding). 10 This is in keeping with other polygenic disorders and autoimmune diseases in general, where it has been estimated that 90% of causal single nucleotide polymorphism (SNP) associations lie outside of protein-coding regions. 12 When considering the functional implications of variants identified through GWA studies, it is important to recognize that they typically tag a large collection of variants that are highly correlated due to linkage disequilibrium (LD). It is possible that any of these variants, or indeed a combination of them, may be responsible for mediating the observed increase in disease susceptibility. Bayesian approaches, such as the Probabilistic Identification of Causal SNPs (PICS) have been applied in order to infer causality based on underlying haplotype structure and patterns of association at a given locus. 12 PICS enables generalization based on the collections of risk loci; however, at a given locus the true causality of individual SNPs remains to be experimentally resolved.
RA PICS are enriched within super-enhancers, 12 large clusters of transcriptional enhancers that drive cell identity. This same enrichment is true for RA SNPs more generally and is also true of SNPs associated with other autoimmune diseases. 13,14 It is likely that SNPs contained within enhancers, or super-enhancers, affect transcriptional regulation. Many mechanisms explaining how this effect may be mediated exist, including the dysregulation of chromosomal accessibility. Autoimmunity SNPs are enriched amongst DNAse I hypersensitivity sites (sites of accessible chromatin), 15 and thousands of SNPs have been demonstrated to impact chromatin accessibility (DNase I sensitivity quantitative trait loci). 16 Many bioinformatics tools exist to explore the potential effects of individual SNPs. For example, RegulomeDB collates an array of data including information relating to transcription factor (TF) binding, chromatin accessibility and chromatin state. 17 It is, however, important to recognize that in reality these SNPs do not exist in isolation and it is possible that multiple SNPs within an LD block have a cumulative effect. A SNP may be considered more likely to be causal if it overlaps with a DNAse I hypersensitivity site or TF binding site, or if it disrupts a TF binding motif. F I G U R E 1 European RA genetic susceptibility loci and their odds ratios as reported as part of a trans-ethnic meta-analysis (CI, confidence interval; OR, odds ratio) 9 The underlying data sets are populated with annotations based on sequencing data from chromatin immunoprecipitation (ChIP) and chromatin accessibility experiments. These experiments map the prevalence of various features across the genome, in a variety of different cell types. Such features include TF occupancy, histone modification occupancy and DNAse/transposase accessibility. In the majority of instances, these data have been generated through international collaborative efforts, including the NIH Roadmap Epigenomics Consortium, 18 the Functional Annotation of the Mammalian Genome 19 and constituents of the International Human Epigenome Consortium, 20 such as the Encyclopaedia of DNA Elements (ENCODE). 21 Several methods have been developed to integrate functional annotation data with GWA study summary statistics, including methods that first identify features that are enriched with risk loci and then prioritize variants that overlap these features. 22,23 GWA study findings have already contributed to our understanding of RA. For example, an enrichment of susceptibility variants in enhancers active in T cells supports a role for these cells in the aetiology of RA. 12 In addition, the most significant association between peaks of histone 3 lysine 4 trimethylation (H3K4me3), a marker of active promoters, and 31 RA SNPs was found in T helper cells. 24 Synovial fibroblasts (SFs) represent an additional cell type, demonstrated to play a crucial role in mediating the inflammation and irreversible damage characteristic of RA synovial joints. 25 These innate immune cells are activated within the synovial joints of patients with RA and show progressive changes in their epigenome, based on disease progression and severity. 26 To date, there is little genetic evidence supporting their causality in RA; however, it is clear that SFs play an important role. The paucity of genetic evidence may be due to a lack of epigenetic annotations, compared to frequently studied peripheral blood cell types, such as T helper cells.
The findings of GWA studies have already initiated the repositioning of biologic DMARDs targeting components of the Interleukin 23 pathway to treat psoriasis, psoriatic arthritis, ankylosing spondylitis and IBD. [27][28][29] Novel therapeutic strategies for RA have also been identified, including peptidyl arginine deiminase (PAD) inhibitors 30,31 and Janus kinase (JAK) inhibitors. 32 PADs orchestrate citrullination and include PAD type IV, which overlaps an RA susceptibility locus (rs2301888). Similarly, tyrosine kinase 2 (TYK2), a member of the JAK family also overlaps an RA susceptibility locus (rs34536443).
Efforts to further translate the findings of GWA studies into improvements in our understanding of RA and improved patient care are hampered by an inability to accurately assign variants and overlapping enhancers to proteincoding genes, relevant biological processes and cell types.

MEDIATE THE EFFECT OF NON-CODING VARIANTS
Whilst non-coding RNA and trans-regulatory elements should not be overlooked, most non-coding variants are thought to disrupt cis-regulatory elements, such as enhancers. Non-coding variants are typically assumed to affect the nearest protein-coding gene known to have immunological relevance; however, this has been demonstrated to be misguided in some instances. For example, C-type lectin domain family 16, member A (CLEC16A) has been historically associated with several autoimmune diseases, including type 1 diabetes (rs12708716), with disease susceptibility variants found along its length and especially within intron 19. It has recently been demonstrated that this intron forms part of a chromatin interaction with the promoter of a neighbouring gene, homolog of dexamethasone-induced protein (DEXI), and that variation in genotype at this locus correlates with DEXI expression. 33 The CLEC16A/DEXI example outlined above highlights the utility of functional data, such as genotypespecific expression or chromosomal conformation data, in identifying the protein-coding genes through which the effect of disease-associated variants is mediated.

| Expression quantitative trait loci
Variants that correlate with the expression of a given transcript are termed expression quantitative trait loci (eQTLs) and a number of large databases of eQTLs exist, such as BloodeQTL, 34 the Genotype-Tissue Expression (GTEx) project 35 and the Immune Variation (ImmVar) project. 36 GTEx demonstrated that eQTLs in relevant pathogenic tissues are significantly enriched for trait associations and may explain a substantial proportion of heritability (40%-80%). 35 However, many susceptibility loci, such as those associated with RA, may have an effect that is too subtle to be captured in such databases, which can be derived from relatively low numbers of samples. The minimal impact of individual variants upon disease susceptibility would suggest that any discernible effect may be too small to be captured and would be highly specific to cell type or stimulatory conditions. Whilst eQTL data are often comprised of heterogeneous cell types, such as whole blood, several studies have looked more closely at the constituent cell types. For example, Ye et al identified 39 loci that explain 25% of the variation in the response of T helper cells to in vitro stimulation. 37 These loci overlap with disease susceptibility loci such as the RA susceptibility locus found in the interleukin 2 receptor subunit alpha gene (IL2RA). In addition, eQTLs are often specific to cell types and stimulatory conditions, with stimulation specific eQTLs identified in CD14+ monocytes being demonstrated to be enriched amongst GWA study loci. 38 Taken to its extremity, this approach of focussing on ever more tightly defined cell types and conditions has led to the identification of eQTLs using single-cell RNA sequencing (scRNA-seq) of peripheral blood mononuclear cells. 39 Ultimately, the most relevant collection may be carefully defined cell types of known clinical relevance isolated from patient samples, as performed on T helper cells and B cells isolated from patients with RA. 40

| Chromosomal interactions
Certain aspects of the network of chromosomal interactions that occurs within the three-dimensional genome are thought to be less variable across cell types than eQTLs. For example, topologically associated domains (TADs), or regions of the genome within which interactions occur at a higher frequency than with neighbouring regions, are thought to be fairly robust across cell types. 41 It may, therefore, be easier to identify potential causal genes based on TADs. However, the relative strength of individual interactions, occurring within TADs, varies more across cell types, with interaction strength between the enhancers and promoters of a given gene thought to correlate with expression of that gene. This has been demonstrated for individual loci, such as the β-globin locus 42 ; however, it is difficult to demonstrate on a global scale due to the limited resolution of chromosomal conformation data and the confounding correlation of various other features, such as chromosomal accessibility. 43 The prevailing model of enhancer function comprises of chromosomal looping bringing promoters and enhancers into physical proximity and generating a transcriptionally permissive environment (as reviewed elsewhere 41,44 ). Whilst this model was originally developed based on experiments using fluorescence in situ hybridization, 45,46 ligation-based methods are currently widely used to characterize interactions. 43 These methods rely on fixation of nuclei, fragmentation and ligation of chromosomal fragments that are found in close physical proximity. When followed by qPCR using primers specific to the interaction being interrogated, this is commonly termed chromosomal conformation capture (3C) 47 ; however, it is also possible to use high-throughput sequencing to quantify all interactions, termed Hi-C. 48 The number of possible interactions is vast, and as a result, Hi-C data are typically sparse and render a low-resolution picture of the chromosomal interactome, suitable for identifying TADs. High-throughput relative quantification of specific interactions can be achieved through positive selection of ligated fragments prior to sequencing. This can be achieved using a library of bait oligonucleotides (Capture Hi-C, CHi-C) 49 or an antibody against a specific protein of interest (HiChIP), 50 such as a TF. A number of variations of these techniques exist, with an array of names and acronyms, such as Circularized 3C (4C 51 ), 4C followed by high-throughput sequencing (4C-sEquation 52 ), 3C Seq, 53 Carbon copy 3C (5C 54 ), ChIPloop 55 and chromatin interaction analysis by paired-end tag sequencing (ChIA-PET 56 ); typically, they are categorized according to the variety of interactions characterized, for example one vs one, 3C; all vs all, Hi-C; many vs all, CHi-C 57 (Figure 2).
Several studies have performed CHi-C using a bait library that captures the promoters of 89% of Ensembl protein-coding, non-coding, antisense, snRNA, miRNA and snoRNA transcripts, with only technical considerations being used to exclude 11% of promoters for these transcript types. Using this bait library, in a lymphoblastoid cell line and CD34+ haematopoietic progenitor cells, it was demonstrated that disease-associated SNPs, identified through GWA studies, are enriched amongst fragments that interact with promoters. 58 The same bait library has been used to produce high-resolution maps of promoter contacts in 17 human primary blood cell types, demonstrating that promoter interactions are cell type-specific, frequently occur between active promoters and enhancers, and that eQTLs for a given gene are enriched amongst the fragments identified as interacting with the promoter of that gene. 59 Within the breast cancer field, a bait library designed to capture interactions occurring with 68 disease-associated loci, characterized through GWA studies, was used to identify putative causal genes. The authors identify 110 genes as interacting with 33 breast cancer susceptibility loci and confirm that 22 of these genes have been previously identified through eQTL studies, 32 are associated with breast cancer survival and 14 are somatically mutated in breast cancer. 60 HiChIP represents an alternative approach to mapping regulatory interactions occurring within a given cell type. This has been successfully achieved for various T helper cell subtypes, by focussing on interactions that can be immunoprecipitated with H3K27ac, 61 a marker for active chromatin. Of particular relevance to the identification of putative causal genes, the authors identified 2597 genes that interact with enhancers overlapping 684 autoimmune PICS, with only 14% of these genes representing the closest gene to the SNP.
The potential of CHi-C and HiChIP data for identifying putative causal genes is, therefore, well established; however, the statistical methods used to interpret Hi-C data are still being developed and further refinement of the underlying methods remains important in order to improve reproducibility. 62 In addition, the costs associated with highthroughput sequencing and high cell numbers required by 3C-based methods mean that much of the data generated using techniques such as Hi-C, CHi-C and HiChIP has been generated from low numbers of samples.

| PUTTING RA SUSCEPTIBILITY LOCI IN 3D CONTEXT
CHi-C has been used to characterize interactions occurring with RA susceptibility loci, and those of several related diseases, in T and B cell lines. 63 These data provide a key insight into the 3D chromosomal landscape at these loci and can be compared with that generated from primary cells using the promoter capture bait library described earlier. 59 The resulting picture is understandably complex and can best be summarized using examples of some of the different interactions observed.

| Disease susceptibility loci do not necessarily interact with the nearest gene and may interact with alternative genes located at a long distance
Susceptibility variants associated with RA and juvenile idiopathic arthritis and found at chromosomal position 13q14.1 (rs9603616 and rs7993214, respectively) are located within an intron of COG6 (component of oligomeric Golgi complex 6), which is required for normal Golgi function ( Figure 3A). CHi-C has revealed that, in  62 (whilst no consensus exists, a threshold of 1 Mb is often used to distinguish cis-and trans-regulatory effects). FOXO1 has previously been implicated in RA, as it is crucial to the survival of SFs. 64 FOXO1 has also been demonstrated to be hypermethylated in RA SFs 65 and therefore represents a strong candidate for mediating the increased disease susceptibility associated with the 13q14.1 risk locus.

| Distinct susceptibility loci associated with different diseases may interact with each other, with their effects being mediated by alterations to the same genes
The interaction occurring between a type 1 diabetes susceptibility locus overlapping CLEC16A (rs12708716) and the promoter of DEXI has already been described, as observed by Davison et al. 33 This susceptibility locus overlaps a psoriatic arthritis susceptibility locus (rs12928822) and, as well as confirming the presence this interaction in T (Jurkat E6.1) and B (GM12878) cell lines, Martin et al also observed interactions between the DEXI promoter and an RA susceptibility locus found at chromosomal position 16p13.13 ( Figure 3B, rs4780401). 63 Interactions were also observed between these three disease susceptibility loci, suggesting a common autoimmune mechanism.

| Regions can show a complex pattern involving both of the interaction types mentioned above
6q23.3 contains two independent risk loci for RA, systemic lupus erythematosus and coeliac disease, one of which overlaps TNF alpha-induced protein 3 (TNFAIP3, rs7752903), with the second being intergenic (Figure 3C, rs6920220). In addition, an alternative selection of variants, also overlapping TNFAIP3, comprises a risk locus for psoriasis and psoriatic arthritis (rs610604). These regions show a complex network of interactions with each other and with the promoters of various protein-coding genes, including TNFAIP3, interferon gamma receptor 1 (IFNGR1) and interleukin 20 receptor subunit alpha (IL20RA). 66,67 TNFAIP3 encodes the protein A20. It has been demonstrated that a TT>A polymorphism (rs35926684), associated with RA disease susceptibility and overlapping F I G U R E 3 Illustrations exemplifying chromosomal interactions observed at susceptibility loci for RA and genetically similar diseases.
Included are three examples that show long-range interactions between susceptibility loci and non-neighbouring putative causal genes (A), interactions between different disease susceptibility loci (B) and combinations of the aforementioned categories (C) (Adapted from ref. 63,66,67 ). Index SNPs for each locus are given in parenthesis and indicated in red. Susceptibility loci included as baits in the region-capture experiment are coloured blue, along with interactions observed (arrows) and overlapping genes. In contrast, chromosomal loci including promoters identified as interacting with susceptibility loci are coloured green, along with promoter capture interactions and associated genes. Interaction strength is conveyed by the colour intensity of associated arrows, with arrowheads identifying non-bait ends. JIA, juvenile idiopathic arthritis; PsA, psoriatic arthritis; T1D, type 1 diabetes TNFAIP3, results in impaired binding of the TF NFκB (nuclear factor kappa-light-chain-enhancer of activated B cells) and reduced expression of A20. 68 It is, however, possible that this and other variants associated with RA also affect expression of IFNGR1 and IL20RA. Interestingly increased NFκB binding has been associated with the intergenic RA risk allele in a heterozygous T cell line. The risk genotype also shows a higher frequency of interactions with IFNGR1 and IL20RA in B-lymphoblastoid cell lines and has been correlated with increased expression of IL20RA in primary T helper cells. 66 As a receptor for the pro-inflammatory cytokine interleukin-20, IL20RA is a strong candidate for mediating increased susceptibility for RA. Similarly, Interferon-γ, which binds to IFNGR1, plays an important role in macrophage activation, MHC expression and T helper cell differentiation. 69 In addition, in an African American cohort, IFNGR1 was found to be expressed at elevated levels in blood isolated from patients with RA, 70 whilst the generalizability of this finding is unclear it may support the relevance of this gene in RA.

| INVESTIGATING THE IMPACT OF VARIANTS ON PUTATIVE CAUSAL GENES
Chromosomal interaction data can contribute to a body of evidence that aids the identification of putative causal genes. In order to demonstrate conclusively that a variant impacts the expression of a putative causal gene, it is likely that additional techniques will be necessary. The potential contribution of eQTL or other expression data has already been discussed. A further tool that is already proving itself invaluable in the field of functional genetics is the Clustered Regularly Interspersed Palindromic Repeats (CRISPR)-CRISPR-associated protein 9 (Cas9) system.
In its conventional wild-type form, the CRISPR-Cas9 system is capable of targeted generation of double-strand breaks, which are either repaired by non-homologous endjoining (NHEJ) or homology directed repair (HDR). 71 Cycles of Cas9-mediated cleavage and accurate repair are typically escaped as a result of imperfect NHEJ, resulting in small insertions and deletions (INDELs). Alternatively, HDR can be hijacked, through the provision of an exogenous repair template in order to knock-in specific mutations.
For example, Sokhi et al recently used the CRISPR-Cas9 system to study the TT>A polymorphism found on chromosome 6q23.3 and mentioned earlier 72 ( Figure 4A). Using bacterial artificial chromosomes (BACs), the authors introduced topologically associated 6q23.3 subdomains into wild-type mice and bred these onto a Tnfaip3 null background. The resultant progeny expressed human TNFAIP3 in B cells, T cells, bone marrow-derived macrophages and SFs in a stimulation responsive manner that recapitulates the human expression pattern. A 789 bp segment that includes the enhancer containing the TT>A mutation was deleted from the BAC using the CRISPR-Cas9 system. This resulted in decreased A20 expression in the human HEK293T cell line and all of the murine cell types mentioned above. In keeping with A20-deficient mice, humanized, TNFAIP3 expressing mice harbouring this targeted deletion have an increased incidence of arthritis. 72 In addition to its conventional wild-type form, genetically engineered catalytically dead Cas9 (dCas9) proteins are capable of recruiting chromatin modifiers to specific genomic locations. This enables modulation of chromatin such that the impact of activating or inactivating a noncoding region can be determined. This is particularly useful for studying enhancers, where activation or inactivation can be linked to increased or decreased expression of genes which are influenced by that enhancer.
Using a fusion of dCas9 and the transcriptional activator VP64, Simeonov et al 73 were able to demonstrate that activation of an intronic enhancer located within IL2RA increases IL2RA expression ( Figure 4B). This enhancer contains a SNP associated with Crohn's disease (rs61839660), where fine mapping has resolved the association, such that it is unlikely any other variant is mediating the associated increase in disease susceptibility. 74 Interestingly, this same variant is also associated with thyroiditis susceptibility and is protective for type 1 diabetes. Simeonov et al 73 went on to introduce this SNP in mice, using the wild-type CRISPR-Cas9 system, and showed that this led to impaired expression of IL2RA upon T cell stimulation ( Figure 4C). In addition, a 12 bp deletion, including the Crohn's disease SNP, skewed the response of naïve murine T cells to activation. Using this combination of genetic data and functional experimentation, the authors were able to convincingly demonstrate how disease susceptibility is mediated at this locus. In addition to VP64, alternative activator domains are available with varying potencies and mechanisms. 75 It is also possible to fuse dCas9 and the transcriptionally repressive Krüppel-associated box (KRAB) domain. 76 These techniques are often termed CRISPR activation and CRISPR interference.
The CRISPR-Cas9 system is targeted by a guideRNA (gRNA) that harbours complementarity to the target genomic site. A single gRNA is sufficient to enable targeting to a single location and typically low numbers of gRNAs are used in order to aid the interpretation of their effects. In contrast, CRISPR screens use viral libraries containing thousands of gRNAs, to deliver different single guides to different cells in a population. Typically, following transduction of viral libraries, cells are selected based on a phenotype of interest, such as expression of a particular protein (flow cytometry), or drug resistance. The  (B-D). gRNA, guide RNA enrichment of specific gRNAs in the selected population is assessed by high-throughput sequencing and used as a readout of the gRNAs ability to confer the phenotype of interest. This screening technique was used by Simeonov et al 73 in combination with dCas9-VP64 to identify the IL2RA enhancer mentioned earlier, with flow cytometry used to isolate cells expressing high levels of IL2RA (Figure 4D). Further examples include the combination of flow cytometry and wild-type Cas9 to perform saturating mutagenesis of a BCL11A, known to regulate foetal haemoglobin levels, 77 and the combination of growth selection and dCas9-KRAB to map enhancers at the MYC and GATA1 locus that regulate cellular proliferation. 78 Where the phenotype, or gene, of interest is not amenable to flow cytometry or growth selection a reporter systems can be used, as exemplified by Klann et al at the HBE1 locus. 79 CRISPR screens have also been successfully combined with scRNA-seq to study the impacts of individual guides on the transcriptome more generally. [80][81][82] This approach may be more conducive to the study of multiple disease susceptibility loci in a single experiment; however, it comes with additional considerations relating to experimental design and challenges when it comes to interpreting the resulting data.

| LIMITATIONS AND CHALLENGES
Identification of causal genes using functional genomics represents a huge challenge in the translation of GWA study findings towards a clinically applicable understanding of RA. The potential benefits of this include the identification of new drug targets and better stratification of patients amongst available therapies. As discussed, there are a F I G U R E 5 Illustrating a suggested workflow for the identification of causal genes for complex genetic disorders such as RA. This workflow assumes that identified variants are non-coding and directly affect the regulation of protein-coding genes, in cis DING AND OROZCO number of novel techniques that can contribute to answering this question; however, there are also several obstacles that warrant discussion.
As illustrated in Figure 1, the median OR for an RA susceptibility locus is 1.11, meaning that, on average, an individual with a given RA susceptibility variant is only 11% more likely to develop RA than an individual who does not carry that variant. As discussed, with reference to eQTL data sets, it is likely that the effect of a given disease susceptibility variant is very small and may only be detectable under the correct conditions. This necessitates the use of sensitive techniques in cell types and stimulatory conditions that are relevant to RA. This obstacle is exacerbated, since the effect of a disease susceptibility locus may be mediated by a combination of SNPs found in LD and through a combination of protein-coding genes.
One method of overcoming the limited effect expected from individual variants is to study the effect of completely disrupting underlying regulatory elements: The effect of chromatin modifiers such as KRAB is not specific to a single nucleotide and can have an effect that spans several hundred nucleotides 76 ; Similarly, whilst a SNP is likely to modulate the activity of an enhancer large deletions are likely to completely abrogate it.
The functional interrogation of individual susceptibility loci is a laborious and technically challenging task and is likely to lack the same statistical rigour and reproducibility that has become a hallmark of GWA studies. 83 It is therefore important that experiments are carefully designed, reported and interpreted in order to best contribute to our incrementally improving understanding of RA aetiology. There is, however, an unmet need for such studies, which will undoubtedly contribute to our increasing understanding of how non-coding elements, such as enhancers, function in health and disease.

| CONCLUSION
Over the past decade, our ability to confidently describe variants associated with susceptibility to RA and other polygenic disorders has improved dramatically. This includes the imputation of associations for SNPs not experimentally investigated and the performance of high-density fine mapping, to refine observed associations. Recently the focus has switched to understanding how these variants confer disease susceptibility, a question that has largely emerged from the observation that most polygenic disease susceptibility loci do not have obvious, direct effects on protein-coding genes. Novel techniques and tools, such as chromosomal conformation capture, and the CRISPR-Cas9 system are well suited to investigating the effect of disease susceptibility variants and underlying non-coding elements. A workflow illustrating the position of the various experimental approaches and uses of data described above is shown in Figure 5. The implementation of such methods to study RA risk loci will contribute further evidence to the role of certain genes, pathways and cell types in RA as well as identifying new genes, pathways and cell types. It is expected that this increased understanding of disease aetiology will aid the stratification of patients for existing therapies as well as aiding in the discovery of new therapeutic targets, ultimately improving the prospects of patients with RA and other polygenic disorders.