CD4+ and B Lymphocyte Expression Quantitative Traits at Rheumatoid Arthritis Risk Loci in Patients With Untreated Early Arthritis

Objective Rheumatoid arthritis (RA) is a genetically complex disease of immune dysregulation. This study sought to gain further insight into the genetic risk mechanisms of RA by conducting an expression quantitative trait locus (eQTL) analysis of confirmed genetic risk loci in CD4+ T cells and B cells from carefully phenotyped patients with early arthritis who were naive to therapeutic immunomodulation. Methods RNA and DNA were isolated from purified B and/or CD4+ T cells obtained from the peripheral blood of 344 patients with early arthritis. Genotyping and global gene expression measurements were carried out using Illumina BeadChip microarrays. Variants in linkage disequilibrium (LD) with non‐HLA RA single‐nucleotide polymorphisms (defined as r2 ≥ 0.8) were analyzed, seeking evidence of cis‐ or trans‐eQTLs according to whether the associated probes were or were not within 4 Mb of these LD blocks. Results Genes subject to cis‐eQTL effects that were common to both CD4+ and B lymphocytes at RA risk loci were FADS1,FADS2,BLK,FCRL3,ORMDL3,PPIL3, and GSDMB. In contrast, those acting on METTL21B,JAZF1,IKZF3, and PADI4 were unique to CD4+ lymphocytes, with the latter candidate risk gene being identified for the first time in this cell subset. B lymphocyte–specific eQTLs for SYNGR1 and CD83 were also found. At the 8p23 BLK–FAM167A locus, adjacent genes were subject to eQTLs whose activity differed markedly between cell types; in particular, the FAM167A effect displayed striking B lymphocyte specificity. No trans‐eQTLs approached experiment‐wide significance, and linear modeling did not identify a significant influence of biologic covariates on cis‐eQTL effect sizes. Conclusion These findings further refine the understanding of candidate causal genes in RA pathogenesis, thus providing an important platform from which downstream functional studies, directed toward particular cell types, may be prioritized.

Rheumatoid arthritis (RA) is a complex genetic disease in which immune tolerance becomes impaired, and an unchecked inflammatory response leads to chronic pain and damage to the synovial joints (1). Genetic variation at the HLA-DRB1, HLA-DPB, and HLA-B loci accounts for a large proportion of the known RA risk (2), with implications for antigen presentation to T lymphocytes (3,4). Outside of the HLA region, accumulating data now highlight an overlap between the 101 confirmed RA risk loci and cell-specific enhancer elements, which is maximal in CD4+ lymphocytes followed by B lymphocytes (5)(6)(7)(8). Such molecular insights support a pivotal role for both CD4+ T cell and B cell lineages in the pathogenesis of RA (9)(10)(11). Mapping cellular mechanisms of genetic risk in the disease is far from straightforward, however, because lead single-nucleotide polymorphisms (SNPs) at associated loci are typically noncoding and intergenic, tagging linkage disequilibrium (LD) blocks that contain multiple genes (7,12).
To prioritize causal genes, one solution is to explore associations between genetic variants and downstream molecular quantitative traits, the most proximal of which is gene expression. Thus, with respect to a putative susceptibility gene, colocalization of an expression quantitative trait locus (eQTL) with a disease risk variant implicates the gene as a candidate for disease causation (13). Data from eQTL studies in healthy human subjects have indeed informed algorithms for prioritization of candidate genes in RA (7). Importantly, however, it is now clear that the transcriptional consequences of genetic variation can manifest as cell type specificity, with potentially profound implications for disease pathogenesis (14,15). For example, it has been observed that only 22% of cis-eQTLs are consistently identified in different circulating cell subsets from healthy donors; eQTLs present in a specific cell type may not be detectable in another cell type or in whole blood-and vice versa. Moreover, a number of eQTLs can be detected only under specific conditions of cell stimulation (14,16,17). This suggests that the contribution of eQTL data to inferred causality among candidate genes for a given disease must increasingly be understood at a cellular level and within a relevant biologic context (18).
The suggestion that the effect size of a risk variant's influence on gene expression may depend on the environmental parameters to which cells are exposed has potentially important implications for understanding the complexities of disease induction. In RA, for example, the unmasking of eQTL effects in relevant cell populations during a transient systemic trigger might plausibly be sufficient to break immune tolerance, permitting a transition to persistent joint inflammation. Against this backdrop, we set out to reassess the biologic landscape of candidate susceptibility genes in RA by mapping cis-and trans-eQTLs at 101 established RA risk loci in circulating CD4+ and B lymphocyte subsets sampled from a cohort of untreated patients with early arthritis. In so doing, we sought insight into potential common and/or cell-specific mechanisms of genetic risk in a highly relevant biologic context, free from the confounding influences of in vivo immune modulation or ex vivo manipulation.

PATIENTS AND METHODS
Patients. Patients with early arthritis (all of selfreported white ethnicity) who were attending the Newcastle Early Arthritis Cohort (NEAC) clinic in the UK were recruited into the study, and peripheral blood samples were obtained prior to the commencement of immunomodulatory therapy; individuals who were exposed to steroid treatment during the 3 months prior to recruitment or those whose ethnic origin, determined by genotype, was not of white Northern European descent were excluded from the analyses. This resulted in 71 patients being recruited between January 2008 and December 2009, and a further 273 during 2012 and 2013; the NEAC cohort has been described in detail elsewhere (19)(20)(21)(22). Initial diagnoses were validated at follow-up visits over a median period of 20 months (range 13-25 months; duration of follow-up >1 year in all cases), as described previously (19,21). All patients gave their written, informed consent for inclusion into the study, which was approved by the local Regional Ethics Committee.
Measurements of gene expression, data curation, and quality control. Whole peripheral blood was stored at room temperature for ≤4 hours before processing. CD4+ lymphocytes were isolated from the peripheral blood by positive selection, as previously described (21), yielding a median cell purity of 98.9%. To obtain B lymphocytes, peripheral blood mononuclear cells were first isolated by density centrifugation using the Lymphoprep protocol (Axis-Shield Diagnostics), and then subjected to positive selection using anti-CD19 magnetic microbeads (Miltenyi Biotec). The median cell purity was 96.4%, as determined by flow cytometry (see Supplementary  Figure 1, available on the Arthritis & Rheumatology web site at http://onlinelibrary.wiley.com/doi/10.1002/art.40393/abstract).
RNA was immediately extracted from total CD4+ T cells or B lymphocytes using an RNeasy Mini kit (prior to 2012) or AllPrep DNA/RNA Mini kit (both from Qiagen), and then subject to quality control using an Agilent 2100 Bioanalyzer (Agilent). The median RNA integrity number in the samples analyzed was 9.4. Complementary RNA generated from 250 ng total RNA (Illumina TotalPrep RNA Amplification kit) was hybridized to either an Illumina Whole Genome 6 version 3 (using CD4+ lymphocyte samples obtained prior to 2012) or a 12HT BeadChip (using CD4+ T cell samples obtained in or after 2012, and all B cell samples) (both from Illumina). The analysis was limited to probes determined to be common to both array platforms, based on unique capture sequence identifiers. Those liable to cross-hybridization according to probe-sequence BLATanalysis were then excluded. 362 THALAYASINGAM ET AL Following normalization (robust spline normalization) and variance stability transformation (23,24), batch correction of the data from CD4+ cells by linear modeling (25), and merging of the component data sets (26), principal components analysis was carried out to confirm correction for technical bias (see Supplementary Figure 2 Genotyping. Genomic DNA was isolated from the peripheral blood of all patients, either from the whole blood using the Wizard genomic DNA purification kit (Promega) (for samples obtained prior to 2012) or from isolated lymphocytes in parallel with RNA extraction (AllPrep DNA/RNA Mini kit; Qiagen). Genotyping was carried out using an Illumina Human CoreExome-24 version 1-0 array, following the manufacturer's protocol. Samples and SNPs with a call rate of <98% were excluded. In addition, SNPs with a minor allele frequency of <0.01 or an Illumina GenomeStudio cluster separation of <0.4 were excluded from further analysis. Data were pre-phased using SHAPEIT2 and imputed to the 1000 Genomes Phase 1, version 3, reference panel using IMPUTE2. Imputed SNPs with INFO scores of <0.8 were excluded.
Analysis of eQTLs and covariates. Analysis of eQTLs was limited to loci defined by the 101 lead disease-associated variants confirmed to be present in Caucasians, as previously described by Okada et al (7). For this analysis, linear models were fitted and residual analysis was performed to verify model assumptions using the R package; Pearson's R 2 statistics and associated P values were derived. Due to abundant cross-hybridization of the expression probes and the confounding effect of copy numbers within the HLA region, we limited our analysis to non-HLA variants. Permutation testing (10,000 permutation replicates) was carried out to derive experiment-wide P values equivalent to a predetermined a value of 5%; a more relaxed (though nonetheless robust) significance threshold was also defined at an a value of 10%. This method, utilized to correct for multiple testing, proved more stringent than the standard Benjamini-Hochberg method, which was also applied for comparison. A general linear model incorporating other potential biologic and clinical parameters, including age, sex, C-reactive protein (CRP) level, and swollen joint count, permitted evaluation of the robustness of the eQTLs in relation to inflammation markers and other potential covariates.
Comparisons with published data sets. Published eQTL data sets were identified using PubMed literature searches. Results were cross-checked and validated with reference to the GTEx Portal database (available at http://gtexportal. org) (27).

RESULTS
Mapping of eQTLs at RA risk loci in lymphocytes of treatment-naive patients with early arthritis. Expression data from primary peripheral blood lymphocytes were available for a total of 344 genotyped patients with early arthritis; available data on CD4+ lymphocytes were  limited to 249 of the patients, data on B lymphocytes were available for 242 of the patients, and paired data were available for 147 of the patients. The baseline clinical characteristics and diagnoses of all patients are summarized in Table 1. After quality control procedures were applied, a total of 1,227 genotyped variants in LD (defined as r 2 ≥ 0.8) with lead RA-associated SNPs were considered. Filtered expression probes whose start sites mapped to within 4 Mb of LD blocks (as defined in Patients and Methods) were initially measured to identify cis-acting eQTLs. In a secondary analysis of trans-eQTLs, those with start sites >4 Mb from the same LD blocks were evaluated in a similar manner.
Permutation testing was carried out using 10,000 permutation replicates for each analysis in each lymphocyte subset. This allowed us to account for multiple testing, in which the total number of tests for each cell type corresponded to the number of unique SNP-gene pairs in the analyses of cis-or trans-acting eQTLs across the prespecified loci, after data processing and quality control had been performed. The maximum value of the test statistic (minimum nominal P value) across the total number of tests in each permutation replicate was recorded, and significance thresholds exceeding 5% or 10% in each permutation replicate were determined. This procedure resulted in experiment-wide P value thresholds (a = 5% or a = 10%) that were used to define evidence of eQTLs in each cell type, as summarized in Figure 1 (for cis-eQTL analyses) and in Supplementary  Figure 3 (for trans-eQTL analyses; available on the Arthritis & Rheumatology web site at http://onlinelibrary. wiley.com/doi/10.1002/art.40393/abstract).

364
THALAYASINGAM ET AL The eQTLs acting on 3 genes (METTL21B, IKZF3, and JAZF1) were unique to CD4+ T lymphocytes in this population, with PADI4 also subject to a convincing effect exclusively in this cell type despite falling marginally short of the a = 10% threshold by permutation analysis; the latter gene encodes a peptidylarginine deiminase enzyme, and therefore is of interest in the pathogenesis of RA (28). At the 8p23 locus, FAM167A was, in contrast to the neighboring BLK gene, shown to be subject to cis regulation only in B lymphocytes, and SYNGR1 and CD83 eQTLs were also specific to this cell type. These data are summarized in Tables 2 and 3 and depicted as Manhattan plots in Figure 2. No trans-eQTLs achieved experiment-wide significance thresholds, either in CD4+ T lymphocytes or in B lymphocytes.
Representative examples of eQTL plots are depicted in Figure 3, and a comprehensive list of all SNP-probe associations that remained significant after Benjamini-Hochberg correction for multiple testing is provided in Supplementary Tables 2 and 3 (available on the Arthritis & Rheumatology web site at http://onlinelib rary.wiley.com/doi/10.1002/art.40393/abstract), in which significance thresholds of a = 5% and a = 10% by . † The permuted significance thresholds of a = 5% and a = 10% equate to P = 6.48 9 10 À7 and P = 1.96 9 10 -6 , respectively (see Figure 1). ‡ Based on a threshold of a = 10%. § Data for the PADI4 eQTL fell marginally short of the a = 10% threshold. . † The permuted significance thresholds of a = 5% and a = 10% equate to P = 5.71 9 10 À7 and 2.19 9 10 À6 , respectively (see Figure 1). ‡ Based on a threshold of a = 10%.  (7). Limiting any or all of the above analyses to samples  for which paired CD4+ and B lymphocytes were available (n = 147) had no substantial effect on the eQTL genes identified, although some associations ceased to reach experiment-wide significance due to diminished power (see Supplementary Tables 5 and 6 (31), and a similar analysis, by Kasela et al, was conducted in whole CD4+ (and CD8+) T cells (32). Another study, by Fairfax et al (14), demonstrated the presence of eQTLs in primary B cells from 288 healthy Europeans. Studies by Dixon and colleagues (33,34) presented cumulative data from Epstein-Barr virustransformed human B cells (lymphoblastoid cell lines), and a large meta-analysis was conducted to compare studies performed in the whole blood of predominantly healthy volunteers (35). Finally, our data were considered in the context of the GTEx resource database (27).
Overlap between the genes subject to cis-eQTLs in these studies compared with those identified in our own study is illustrated in Supplementary Figure 4 (available on the Arthritis & Rheumatology web site at http:// onlinelibrary.wiley.com/doi/10.1002/art.40393/abstract). Reassuringly, all of the cis-eQTL genes identified in patients with untreated early arthritis replicated the findings reported in at least one of the comparator studies. Strong independent validation of CD4+ lymphocyte-specific associations was provided with regard to 9 of the genes. Among these, RA risk loci at 11q12 and at 17q12-21 were each observed to harbor pairs of apparently coregulated genes, FADS1/FADS2 and ORMDL3/GSDMB, respectively. Moreover, at the 17q12-21 locus, IKZF3 was confirmed to be subject to a highly significant eQTL effect in CD4+ lymphocytes (32). When a more lenient (but nonetheless robust) method for multiple test correction was employed, we highlighted, for the first time, an association between PADI4 expression and genotype at the 1p36 locus specifically in CD4+ Tcells, its having previously been identified only in whole blood. Our findings with respect to FAM167A, SYNGR1, and CD83 corroborate those in the only other study of primary human B lymphocytes, by Fairfax et al (14), and although the same eQTLs have been noted in mixed cell populations of whole blood (35), no study (including our own) has yet replicated them in CD4+ T lymphocytes.
Lack of significant impact of clinical covariates on eQTLs. Because differential eQTL effect sizes have been observed in paired CD4+ T cells from healthy donors according to whether T cell receptor-mediated stimulation of the cells was undertaken ex vivo prior to RNA extraction (30), we hypothesized that certain clinical covariates, and/or the activation status of circulating CD4+ T cells, might have a similar influence in vivo. The clinical parameters considered included age, sex, CRP level, and erythrocyte sedimentation rate (as indicators of the systemic acute-phase response), as well as disease phenotype (RA versus non-RA). In the patient subgroup for whom CD4+ lymphocyte expression data were available, normalized transcript levels of CD25, CD69, and interferon-c, as measured by microarray, were also considered as surrogates of the CD4+ T cell activation status. The incorporation of each of these covariates, in turn, into linear models made no difference to the final eQTL list (as shown in Tables 2 and  3), and individual regression slopes were robust to their inclusion (representative examples are depicted in Supplementary Figure 5, available on the Arthritis & Rheumatology web site at http://onlinelibrary.wiley. com/doi/10.1002/art.40393/abstract). Consistent with these findings, lists of genes subject to cis-eQTL effects did not vary substantially when patients with RA and those with alternative diagnoses were considered independently (results not shown). Thus, eQTLs were robust to clinical and biologic covariates in our study, and no evidence of early disease-specific eQTLs at RA risk loci was found.

DISCUSSION
We present the first eQTL analysis of primary lymphocytes from donors presenting with untreated, suspected inflammatory arthritis-a context highly relevant for the purpose of unravelling genetic risk mechanisms in RA. Several important observations can be made on the basis of our findings. CD4+ and B lymphocytes in this setting exhibit distinct but overlapping eQTLs at confirmed RA risk loci (Tables 2 and 3). The specificity of an eQTL effect for one cell type may simply be a reflection of the lack of expression of a gene by a comparator cell, but probelevel microarray data suggest that reported genes were expressed in both CD4+ and B lymphocytes in our study. Therefore, the cell-specific effects that we observed for METTL21B, IKZF3, JAZF1, and PADI4 (in CD4+ lymphocytes) and FAM167A, SYNGR1, and CD83 (in B lymphocytes) may indicate differential regulatory functions of disease risk variants between lineages.
Strikingly, at the common BLK-FAM167A autoimmune locus at 8p23, we found that 2 adjacent genes were subject to eQTLs whose activity differed between cell types: the FAM167A effect displayed robust B lymphocyte specificity and was absent in CD4+ lymphocytes, whereas the BLK effect that was prominent in CD4+ T lymphocytes was less prominent among B lymphocytes (compare Table 2 and Table 3, and see Figure 2). The most strongly associated SNPs differed between cell types at this locus-a finding that was maintained among patients for whom paired cell-specific data were available (as shown in Supplementary Tables 5 and 6, http://onlinelibrary.wiley.com/doi/10.1002/art.40393/abstract), potentially signifying the presence of mechanistically distinct regulatory variants in strong LD. Nonetheless, the results of our study also contribute to an emerging picture in which eQTLs can regulate the expression of more than one gene at disease-associated loci, examples being found at 11q12 and 17q12-21. This is consistent with the concept that key genetic variants may act as "master regulators" of gene expression.
Our findings provide an important platform from which downstream functional studies may be directed toward particular cell types. For example, elucidating the relevance of the METTL21B gene product in CD4+ T cell function would now seem a priority, given our findings confirming a pronounced eQTL effect on this gene in this cell type. Alternative causal candidate genes known, to date, to be favored at the 12q13-q14 locus are CDK4 and CYP27B1, based on their respective functions in cell-cycle progression and vitamin D metabolism (7); however, since neither of these genes were shown to be subject to prominent eQTL effects, despite the growing body of literature discussing their functions, it seems justified to consider METTL21B as an alternative candidate gene in CD4+ T cells.
A similar case for both CD83 and SYNGR1 in B lymphocytes might also be made. CD83 encodes a transmembrane member of the immunoglobulin superfamily expressed widely on dendritic cells, but also on activated lymphocytes; its important role in regulating B lymphocyte development and effector function is only now beginning to be understood (36). SYNGR1 is an integral membrane protein associated with presynaptic vesicles in neuronal cells, and its function in lymphoid cells remains obscure. However, caution should be exercised when interpreting transcript eQTLs in isolation (37), and validation of our findings at the protein level should be prioritized. This was amply illustrated by Simpfendorfer et al, who, similar to our findings at the 8p23 locus, highlighted BLK transcript expression as subject to an eQTL in lymphocytes; however, the CD4+ T cell-specific effect was not sustained at the protein level in these cells. By measuring allelic expression imbalance, those authors went on to demonstrate a robust eQTL for both RNA and protein expression in naive/transitional B cell subsets isolated from umbilical cord blood, which was less evident in whole B cells, suggesting that disease risk is conferred during early B cell development rather than by CD4+ T cells (38), potentially via dysregulated B cell receptor signaling (39).
Our study is the first to provide evidence of an eQTL SNP in CD4+ lymphocytes that was in perfect LD with the RA-associated variant at the 1p36 locus, a variant that regulates PADI4 gene expression. The PADI4 gene has already been recognized as a strong causal candidate for the disease, encoding peptidylarginine deiminase 4, a key enzyme involved in posttranslational citrullination of arginine residues that yields neoepitopes against which RA-specific anticitrullinated peptide autoantibodies may be raised (28). However, distinct mechanisms of CD4+ lymphocyte dysregulation now warrant further investigation (40).
Similarly, the finding that IKZF3 is subject to an eQTL in CD4+ lymphocytes is, to our knowledge, a novel observation and is intriguing, given the proven role of the transcription factor product of this gene in regulating interleukin-10 production by these cells (41).
Conceiveably, our observations with regard to PADI4 and IKZF3 could be interpreted as evidence that putatively common causal SNPs augment gene expression in CD4+ T cells uniquely under the particular biologic and/or environmental circumstances of early arthritis. However, our analysis of interactions between specific biologic covariates and eQTL effects did not support such an interpretation: in particular, the IKZF rs9916765 eQTL slope gradient was unaffected by markers of systemic inflammation, T cell activation, or clinical diagnosis (see Supplementary Figures 5D-F, http://onlinelibrary.wiley.com/doi/10.1002/art.40393/ abstract). While this could be seen as surprising, given the previously reported differences between CD4+ T cell eQTLs according to activation status in vitro (30), the contrastingly cross-sectional nature of our study, which focused on unstimulated ex vivo cells from systemically 368 THALAYASINGAM ET AL inflamed and uninflamed peripheral blood samples, render the findings complimentary rather than contradictory, in our view. Indeed, the fact that eQTL effects did not differ according to disease classification (e.g., RA versus non-RA) in our early arthritis population recalls the findings in a study by Peters et al, whereby inflammatory bowel disease-specific eQTLs resided outside of known risk loci for that condition (42). Further work is therefore needed to elucidate the mechanisms by which eQTL effects may wax or wane at a cellular level within the in vivo environment. Our data extend the understanding of the causal candidate gene landscape in early RA, highlighting several such candidates that now deserve further investigation in defined primary lymphocyte populations. In the future, the possibility that eQTL effects may exhibit heterogeneity between subsets of CD4+ T and/or B lymphocytes should be considered, since these populations are well-known to comprise functionally diverse compartments. Moreover, it is likely that larger integrative studies, including meta-analyses of accumulating lymphocyte eQTL data sets in relevant populations, will be required to expand on this. Such work will have additional value in the identification of trans-eQTL effects, which, because of power considerations, we were not able to address in the present study.