Identification of Novel Associations and Localization of Signals in Idiopathic Inflammatory Myopathies Using Genome‐Wide Imputation

Objective The idiopathic inflammatory myopathies (IIMs) are heterogeneous diseases thought to be initiated by immune activation in genetically predisposed individuals. We imputed variants from the ImmunoChip array using a large reference panel to fine‐map associations and identify novel associations in IIM. Methods We analyzed 2,565 Caucasian IIM patient samples collected through the Myositis Genetics Consortium (MYOGEN) and 10,260 ethnically matched control samples. We imputed 1,648,116 variants from the ImmunoChip array using the Haplotype Reference Consortium panel and conducted association analysis on IIM and clinical and serologic subgroups. Results The HLA locus was consistently the most significantly associated region. Four non‐HLA regions reached genome‐wide significance, SDK2 and LINC00924 (both novel) and STAT4 in the whole IIM cohort, with evidence of independent variants in STAT4, and NAB1 in the polymyositis (PM) subgroup. We also found suggestive evidence of association with loci previously associated with other autoimmune rheumatic diseases (TEC and LTBR). We identified more significant associations than those previously reported in IIM for STAT4 and DGKQ in the total cohort, for NAB1 and FAM167A‐BLK loci in PM, and for CCR5 in inclusion body myositis. We found enrichment of variants among DNase I hypersensitivity sites and histone marks associated with active transcription within blood cells. Conclusion We found novel and strong associations in IIM and PM and localized signals to single genes and immune cell types.


INTRODUCTION
The idiopathic inflammatory myopathies (IIMs) are a heterogeneous group of rare autoimmune diseases primarily characterized by muscle weakness with extramuscular manifestations. The strongest genetic risk for IIM resides in the HLA region, although non-HLA associations have also been reported. To date, the largest genetic association studies have been conducted in Caucasian populations through the Myositis Genetics Consortium (MYOGEN) (1) and subsequent meta-analyses (2).
However, a genome-wide association study (GWAS) in clinically amyopathic dermatomyositis in the Japanese population (3) and candidate gene studies in Japanese and Chinese populations have also identified significant genetic risk factors for IIM (4)(5)(6).
We previously published a genetic association study on 90,536 genetic variants from ImmunoChip, a targeted array containing coverage of 186 established autoimmune susceptibility loci (1). In this follow-up study we re-analyzed the IIM Immuno-Chip data set after imputation of 1,648,116 variants to identify novel associations and to facilitate fine-mapping of risk regions reported in IIM and clinical and serologic subgroups.

PATIENTS AND METHODS
Samples. Caucasian IIM patient samples were collected through MYOGEN (1). IIM samples were included if patients fulfilled probable or definite Bohan and Peter classification criteria (7) for polymyositis (PM), juvenile PM, dermatomyositis (DM), or juvenile DM, and Griggs, European Neuromuscular Centre, or Medical Research Council criteria for inclusion body myositis (IBM) (8). ImmunoChip control data from 12 countries were provided by 4 disease consortia. Anti-Jo-1 (anti-histidyl-transfer RNA) autoantibodies were detected using either immunoprecipitation or line blot using methods described previously (9).
Genotyping and imputation. Analysis was conducted on the existing Illumina ImmunoChip array data consisting of 2,565 IIM patients (1). Clinical subgroup analysis was conducted on PM (n = 903), DM (n = 817), juvenile DM (n = 508), and IBM (n = 252) patients, and patients with anti-Jo-1 autoantibodies (n = 311). Healthy control samples were collected from the same pool of controls but matched with patients 4:1 based on principal components analysis coordinates (9). Variants were imputed with the Michigan Imputation Server using the Haplotype Reference Consortium (HRC) panel version r1.1 2016. The HRC consists of 64,940 haplotypes from individuals of predominantly European ancestry. Poorly imputed genotypes (r 2 < 0.5), single-nucleotide polymorphisms (SNPs) deviating from Hardy-Weinberg equilibrium in controls (P < 0.001), and variants with a low minor allele frequency of <0.01 were removed. After stringent SNP quality control, we analyzed 1,648,116 variants, of which 120,734 were directly genotyped SNPs. Population stratification was assessed by the genomic inflation factor (λ) scaled to 1,000 patients and 1,000 controls. Including the top 3 principal components as covariates was sufficient to control for population differences (λ for 1,000 samples = 1.04).
Statistical analysis. Association analysis was conducted on gene dosages using SNPTest version 2.5.2. Linkage disequilibrium between SNPs was calculated using Plink version 1.90. The first 3 principal components were included as covariates and used in a logistic regression analysis using an additive model. Forward stepwise logistic regression was used to test for independent effects conditional on the variant of interest. Genomewide significance was defined as a P value less than 5 × 10 −8 . Suggestive significance was defined as a P value less than 2.25 × 10 −5 , based on Bonferroni correction for the number of independent haplotype blocks on the original genotyping array. Regions were defined as novel if there was no genome-wide or suggestive evidence of association in previous genetic analyses. Regional association plots were generated using LocusZoom. js (10).
Functional analysis. We reported the functional effect of the most strongly associated SNP in the locus from the dbSNP database's predicted functional effect. GARFIELD (GWAS Analysis of Regulatory and Functional Information Enrichment with Linkage Disequilibrium correction) analysis was used to characterize the cellular and regulatory contribution of the associated variants (11). The "gwas-credible-sets" package implemented in LocusZoom.js was used to calculate 95% credible SNP sets (10), which were annotated with functional information from public databases, including Genotype-Tissue Expression (GTEx) and splicing quantitative trait locus (QTL), and regulatory information from the UCSC Genome Browser.

RESULTS
Regions of interest associated with IIM and clinical subgroups are presented in Table 1 and Figure 1. SNPs are reported if they reached genome-wide significance, or reached suggestive significance and the locus has previous statistical evidence of association in an autoimmune rheumatic disease in the National Human Genome Research Institute-European Bioinformatics Institute GWAS Catalog. All associations reaching a suggestive significance threshold of P < 2.25 × 10 −5 are included in the Supplementary Data, available on the Arthritis & Rheumatology website at https://onlinelibrary.wiley.com/doi/10.1002/art. 42434. A Manhattan plot of the combined IIM analysis is shown in Figure 1. Manhattan plots for subgroup analyses and regional association plots for non-HLA associations are included in Supplementary Figures 1-19 , reached genome-wide significance in the PM subgroup. This is the first time these loci have been found to reach genome-wide significance in IIM. After conditioning on the SNP with the lowest P value in these regions, there were no additional independent effects reaching genome-wide significance. However, an independent intronic variant in STAT4 remained suggestively significant in IIM (rs6752770, P = 6.1 × 10 −6 , OR , along with predicted deleteriousness and functionality using Combined Annotation Dependent Depletion (CADD) and Regulo-meDB, respectively. For the STAT4 and NAB1 regions, a substantial proportion of the 95% posterior probability for credible SNPs could be attributed to a single SNP; rs4853540 (STAT4) maps to an enhancer for STAT1, and rs6733720 (NAB1) is a significant splicing QTL for NAB1 in 26 tissues (P < 1.6 × 10 −6 ) and a significant expression QTL for several genes, including NAB1 and GLS (glutaminase), in multiple tissues (see Supplementary Data).
The LINC00924 locus is a novel association in autoimmune disease, reaching genome-wide significance. Most variants in these regions were imputed, explaining the lack of association in our prior IIM ImmunoChip study, where the original SNP may have been removed during quality control (1). Notably, we also found Figure 1. Manhattan plot of the total idiopathic inflammatory myopathy (n = 2,565) association analysis. The red line represents genome-wide level of significance (P < 5 × 10 −8 ); the blue line represents suggestive significance calculated from prior coverage from the ImmunoChip array (P < 2.25 × 10 −5 ). Single-nucleotide polymorphisms reaching P < 2.25 × 10 −5 that were directly genotyped are colored green to differentiate from imputed variants. For visualization purposes, the y-axis has a cutoff in the HLA region (chromosome 6, 25-35 Mb) of P < 1 × 10 −1 .

15)
HLA-B suggestive evidence of association with TEC and LTBR, which have previously been associated with autoimmune rheumatic diseases (12)(13)(14). Associations with the STAT4 and DGKQ loci were more significant in the current study than in the prior IIM Immuno-Chip study, as were associations previously reported in IIM clinical subgroups, such as NAB1 and FAM167A-BLK loci in PM and CCR5 in IBM (Table 1). We used GARFIELD to assess whether IIM-associated variants are enriched in regulatory elements of specific cell types (11). We found enrichment of variants among DNase I hypersensitivity sites and histone marks associated with active

DISCUSSION
By using imputation to identify and fine-map genetic associations in IIM, we found 3 new genome-wide associations in the combined IIM cohort, STAT4, SDK2, and LINC00924. Conditional analysis revealed evidence of independent associations in the STAT4 region. For the whole IIM group and clinical subgroups, the HLA region was the most significant genetic risk factor. The strongest genetic risk in this region was for the anti-Jo-1 subgroup despite a sample size of only 331 patients. In the PM subgroup, we found 1 novel genome-wide association with NAB1. This is the first time this data set has been used for genome-wide imputation, and the first time this data set has been stratified by adult-and juvenile-onset myositis subgroups and by individuals with anti-Jo-1 autoantibodies.
We defined an associated region by proximity to the nearest gene. For example, in the LINC00924 region, the strongest association was intergenic, lying approximately 150 kb from this long intergenic noncoding RNA, which has been associated with a number of traits such as coronary heart disease and ischemic stroke. However, it may be that these associations are influencing a different gene or regulatory element lying further away from the most associated SNP. It is worth noting that the lead variant in the LINC00924 locus was removed during quality control in our prior ImmunoChip analysis. In other instances, such as in the SDK2 region, the most associated variants lie intronic of the gene. SDK2 is a member of the immunoglobulin superfamily, although the specific function is not yet known. In both instances, there is no previous evidence of genetic association with rheumatic diseases. A limitation of this study is the lack of a replication cohort due to the rarity of IIM, although patient recruitment is ongoing within the MYOGEN consortium.
For some regions, we found suggestive novel associations that have prior evidence of association with autoimmune disease, such as TEC, a tyrosine kinase involved in T cell signalling and activation, and LTBR, a signalling receptor expressed on myeloid cells. In other instances, we strengthened previously observed associations with IIM (STAT4 and DGKQ). In addition, it is interesting to note that we observed a suggestive association with PLCL1 (P = 1.45 × 10 −5 , OR 1.14 [95% CI 1.07-1.21]) in the combined IIM analysis. Variants in PLCL1, a gene encoding the intracellular signalling molecule phospholipase C-like 1, were previously reported in the MYOGEN DM GWAS, although it was not identified in the subsequent ImmunoChip analysis. We have identified more significant associations than were previously reported in IIM clinical subgroups, such as NAB1 and FAM167A-BLK loci in PM and CCR5 in IBM, which also have prior evidence of association with rheumatic disease. Although this study comprised the same cohort as previous studies, this analysis can be viewed as a fine-mapping experiment; imputation from a large reference panel allows better coverage to localize further signals. Indeed, we found that many associations could be localized to single genes or credible SNPs with high posterior probability, likely due to the coverage of coding genes on the ImmunoChip array resulting in high-quality imputation. We note that the ImmunoChip is a targeted array and therefore does not have genome-wide imputation coverage.
Functional analysis of variants reaching suggestive significance in IIM was conducted using GARFIELD. This method uses functional annotation from primary tissues and cell lines from the ENCODE and Roadmap Epigenomics projects. As expected from data generated using the ImmunoChip array, associated variants in IIM were enriched in regulatory elements of blood cells. In particular, the strongest relative enrichment was seen in regions of open chromatin in CD19+ B cells and CD3+ T cells. The role of T and B cells is well known in IIM through muscle immunohistopathology and the presence of autoantibodies, and our findings suggest their contribution to disease pathology may be genetically encoded.
Increasing the number of patient samples in an analysis should statistically strengthen suggestive associations from previous studies. A recent genome-wide meta-analysis of 4 seropositive rheumatic diseases revealed several novel loci, for example, NAB1, DGKQ, and YDJC, where IIM contributed to the association. Using additional IIM patients to those included in the metaanalysis, we were able to identify these associations in IIM only, and in some cases, such as NAB1 in PM, attribute these associations to specific clinical subgroups of IIM. The lead variant rs6733720 in NAB1 is a significant splicing QTL for NAB1 in 26 tissues and an expression QTL for several genes, including NAB1 and GLS, in different tissues. These NAB1 variants are also in moderate linkage disequilibrium (r 2 = 0.66), with a variant previously identified in systemic sclerosis and rheumatoid arthritis, which acts as an expression QTL for NAB1 expression in lymphoblastoid cell lines (2).
Although we found a number of interesting associations in the PM subgroup, one limitation of these findings is the potential heterogeneity within this subgroup. The Bohan and Peter criteria do not differentiate between PM and immune-mediated necrotizing myopathy. In addition, we included patients classified as having PM that have autoantibodies against transfer RNA synthetases, although antisynthetase syndrome is increasingly recognised as a separate entity. Some previous genetic studies in IIM combined adult-and juvenile-onset DM for analysis. This study stratified DM by age of onset to investigate non-HLA associations. However, there do not seem to be strong signals that differentiate these clinical subgroups. Our previous work has shown that stratifying IIM cohorts by autoantibody status may increase power to detect genetic associations within the HLA region (9). Therefore, we also analyzed a subgroup of patients with anti-Jo-1 autoantibodies; however, we did not find any significant associations outside the HLA region. Although the anti-Jo-1 subgroup is thought to be more clinically homogeneous, there may have been a lack of power to detect associations with only 331 patients in the analysis. For this reason, we did not investigate genetic associations with other less common autoantibodies.
To our knowledge, this is the largest genetic association study investigating non-HLA genes in patients with anti-Jo-1 autoantibodies. A recent study targeted a number of SNPs in the IL1B locus in a Mexican cohort of 154 antisynthetase-positive IIM patients and found an association with a synonymous SNP in IL1B (19). We could not replicate this association in our Caucasian cohort (rs1143634, P = 0.1). In the prior ImmunoChip analysis, PTPN22 was the only non-HLA region to reach genome-wide significance; however, in this analysis, PTPN22 did not reach genome-wide significance. This disparity may be due to the more accurate matching of patients to controls, as it is known that there is a wide variation of allele frequency among different European populations of the PTPN22 R620W risk polymorphism (20).
In summary, we used imputation to identify and fine-map genetic associations in IIM. We found 4 new genome-wide associations in IIM and PM, and we found associations reaching suggestive significance that have previously been associated with autoimmune rheumatic disease. These risk variants are functionally enriched in relevant immune cells, expanding our knowledge of the genetic architecture of IIM.