Using Penalised Logistic Regression to Fine Map HLA Variants for Rheumatoid Arthritis


Aruna T. Bansal, Acclarogen Ltd, St John's Innovation Centre, Cowley Road, Cambridge, CB4 0WS, UK. Tel: (+44) 1223 421 662; Fax: (+44) 1223 420 844; E-mail:


Rheumatoid arthritis (RA) is strongly associated with the human leukocyte antigen (HLA) genomic region, most notably with a group of HLA-DRB1 alleles termed the shared epitope (SE). There is also substantial evidence of other risk loci in the HLA region, but refinement has been hampered by extensive linkage disequilibrium (LD). Using genotype imputation, we analysed 6575 RA cases and controls with genotypes at 6180 HLA SNPs; about half the subjects had four-digit DRB1 genotypes. Single-SNP tests revealed hundreds of strong associations across the HLA region, even after adjusting for DRB1. We implemented penalised logistic regression in a multi-SNP association analysis using the double-exponential (DE) penalty term on the regression coefficients and the normal-exponential-gamma (NEG). The penalised approaches identified sparse sets of SNPs that could collectively explain most of the association with RA over the whole HLA region. The HLA-DPB1 SNP rs3117225, was consistently identified in our analyses and was confirmed by results from the North American Rheumatoid Arthritis Consortium study (NARAC). We conclude that SNP selection using penalised regression shows a substantial benefit over single-SNP analyses in identifying risk loci in regions of high LD, and the flexibility of the NEG conveys additional advantages.