Complete HLA genotyping of type 1 diabetes patients and controls from Mali reveals both expected and novel disease associations

HLA genotyping was performed on 99 type 1 diabetes (T1D) patients and 200 controls from Mali. Next‐generation sequencing of the classical HLA‐A, ‐B, ‐C, ‐DRB1, ‐DRB3, ‐DRB4, ‐DRB5, ‐DQA1, ‐DQB1, ‐DPA1, and ‐DPB1 loci revealed strong T1D association for all loci except HLA‐C and ‐DPA1. Class II association is stronger than class I association, with most observed associations predisposing or protective as expected based on previous studies. For example, HLA‐DRB1*03:01, HLA‐DRB1*09:01, and HLA‐DRB1*04:05 predispose for T1D, whereas HLA‐DRB1*15:03 is protective. HLA‐DPB1*04:02 (OR = 12.73, p = 2.92 × 10−05) and HLA‐B*27:05 (OR = 21.36, p = 3.72 × 10−05) appear highly predisposing, although previous studies involving multiple populations have reported HLA‐DPB1*04:02 as T1D‐protective and HLA‐B*27:05 as neutral. This result may reflect the linkage disequilibrium between alleles on the extended HLA‐A*24:02~HLA‐B*27:05~HLA‐C*02:02~HLA‐DRB1*04:05~HLA‐DRB4*01:03~HLA‐DQB1*02:02~HLA‐DQA1*02:01~HLA‐DPB1*04:02~HLA‐DPA1*01:03 haplotype in this population rather than an effect of either allele itself. Individual amino acid (AA) analyses are consistent with most T1D association attributable to HLA class II rather than class I in this data set. AA‐level analyses reveal previously undescribed differences of the HLA‐C locus from the HLA‐A and HLA‐B loci, with more polymorphic positions, spanning a larger portion of the gene. This may reflect additional mechanisms for HLA‐C to influence T1D risk, for example, through expression differences or through its role as the dominant ligand for killer cell immunoglobulin‐like receptors (KIR). Comparison of these data to those from larger studies and on other populations may facilitate T1D prediction and help elucidate elusive mechanisms of how HLA contributes to T1D risk and autoimmunity.

K E Y W O R D S autoimmunity, disease association, KIR, linkage disequilibrium, Mali, type 1 diabetes

| INTRODUCTION
The HLA region of the human genome is well-established as providing the primary contribution to type 1 diabetes (T1D) susceptibility, with strongest contributions generally attributable to alleles in the HLA-DQB1 and HLA-DRB1 loci. 1 However, despite nearly 50 years of study, the exact basis of genetic susceptibility to T1D remains unclear.Both the extensive polymorphism of the HLA loci and the extent of linkage disequilibrium (LD) in the HLA region contribute to the difficulty of attributing T1D risk to individual HLA alleles.Studying HLA-associated T1D risk across multiple populations, where HLA allele distributions and haplotypic combinations vary, is an effective approach to revealing the contribution of individual alleles.Most studies of HLA-associated T1D risk have been performed on subjects of European descent, although the number of reports based on non-European cohorts is rising.This report presents comprehensive HLA association data in a cohort of T1D patients and non-diabetic controls from Mali, where the incidence of T1D is low but rising quickly. 2 It builds on our previous report of DRB1 association in this population. 3Only a few reports exist regarding HLA in the Malian population.5][6][7][8] For the study reported here, NGS-based whole gene sequencing data were generated for all classical HLA loci in a cohort of 99 T1D patients and 200 non-diabetic controls.In-depth analyses were performed at the allele, haplotype, and amino acid levels.

| Subjects
Subjects from Mali were enrolled in the study with approval from relevant ethics committees in Australia, Mali, and California, as previously described. 3The design for the genetics portion of the study of diabetes in Malian youth included 100 T1D patients and 200 non-diabetic controls.Of 132 T1D patients enrolled in the overall study, the first 100 were selected for the HLA genotyping analysis.Subsequently, one patient was excluded due to a revised diagnosis of type 2, rather than type 1, diabetes.A total of 200 non-diabetic control subjects, unrelated to each other or to the T1D patients, were collected for the genetic study.Samples were stabilized with DNAgard ® blood and saliva stabilizing reagents (BioMatrica, San Diego, CA) prior to shipping as previously reported. 32 | HLA genotyping DNA was prepared using QIAamp ® blood kits (QIAGEN, Germantown, MD) according to manufacturer's instructions, with the modification that the DNA elution buffer was pre-heated to 70 C and the elution columns were incubated at 70 C for 10 min prior to centrifugation.DNA samples were sent to GenDx (Utrecht, The Netherlands) for HLA genotyping using the NGSgo ® -MX11-3 system.Each sample was sequenced once.Genotypes were generated for classical HLA loci (HLA-A, -B, -C, -DRB1, -DRB3, -DRB4, -DRB5, -DQA1, -DQB1, -DPA1, and -DPB1).Genotype data were generated at the four-field level, which includes silent codon changes, introns, and untranslated sequences, using GenDx NGSengine ® 2.23.1.23474and IPD-IMGT/HLA Database release version 3.45.1.A newer release of the database (v.3.53.0)was subsequently used to test potential novel sequences.For disease association analyses, HLA genotype data were truncated to two fields, which is sufficient to define the protein product.Note that although the secondary, expressed HLA-DRB loci, HLA-DRB3, -DRB4, and -DRB5 are separate loci and are genotyped separately, a maximum of one of these loci is included on any individual chromosome.Because of this, analyses of these loci were performed as if they were alleles of a single locus, termed "HLA-DRB3/4/5" (HLA-DRB3, -DRB4, -DRB5, and "null" when no secondary, expressed HLA-DR is present, as expected for chromosomes containing HLA-DRB1*01, -DRB1*08, and -DRB1*10 alleles).Notation for these data is the HLA-DRB locus number plus the two-field allele notation, for example, HLA-DRB4*01:03 and can be notated in the HLA-DRB3/4/5 category as 4*01:03.When only zero or one HLA-DRB3/4/5 allele was present in a genotype, determined on the basis of DRB haplotype patterns identified by Andersson, 9 the absent "allele" was denoted as HLA-DRB3/4/5*00:00.

| Statistical analyses
BIGDAWG 10 version 3.0.6 was used to perform locus-level (kx2) chi-squared (χ 2 ) tests of locus-level heterogeneity and allele-level (2 Â 2) χ 2 tests of association within each locus for 99 T1D individuals and 197 control individuals at the HLA-A, HLA-B, HLA-C, HLA-DRB1, HLA-DRB3/4/5, HLA-DQA1, HLA-DQB1, HLA-DPA1, and HLA-DPB1 loci at two-field (peptide level) resolution.Tests of locuslevel heterogeneity were corrected for multiple comparisons by dividing the p-values by the number of loci (9), and applying an adjusted threshold of significance of 5.56 Â 10 À03 .For allele-level tests of association in which the locus-level test of heterogeneity was not significant, significance thresholds were corrected by dividing the p-values by the number of allele categories tested.BIG-DAWG combines the set of alleles with expected counts <5 in either T1D individuals or control individuals that represent less than 20% of χ 2 contingency table cells into a common "binned" category for analysis, as the χ 2 test is invalid when >20% of contingency table cells have expected values <5. 11yPop (v0.8.0) 12 was used to test Hardy-Weinberg equilibrium (HWE) proportions of HLA genotypes at all loci in T1D and control individuals.Locus-level HWE deviations were tested using Guo and Thompson's exact method. 13Chen's method 14,15 was used to identify individual genotypes deviating significantly from HWE, with a threshold for significance of 0.05.

| RESULTS
Two hundred control and 99 patient samples were assayed for 11 classical HLA loci (see Section 2).Overall genotyping success rate was 96.1% (5172 successful calls from a possible total of 5382).Genotyping success rates for individual HLA loci are listed in Supplemental Table S1 and ranged from a low of 93.3% for the DQB1 locus to a high of 98.3% for the HLA-A and HLA-B loci.
The HLA region was strongly associated with T1D in this data set, with seven of nine tested loci reaching statistical significance.As shown in Table 1, neither HLA-C nor HLA-DPA1 reached significance for the corrected threshold of p = 0.0055 (0.05/9 loci tested).All other loci did exhibit locus-level significance, with significance ranging from p = 0.0033 (HLA-DPB1) to p = 3.17 Â 10 À12 for the DQA1 locus.Association analyses for the DRB1 and DQB1 loci are presented in Tables 2 and 3 Alleles in the HLA region are subject to strong linkage disequilibrium (LD), and disease associations may be observed for a neutral allele simply due to its LD with a risk allele.In addition, observed disease associations may be attributable to a combination of alleles at different loci (haplotype), rather than to an allele at a single locus.To examine those possibilities, haplotypes were estimated and tested for T1D association.  5.
The data show that the strongest T1D association in this data set is for haplotypes that include the HLA-DRB1*04:05 allele.Association for HLA-DRB1*04:05 by itself is very strong (OR = 14.96, p = 1.73 Â 10 À10 , Table 2); however, considering HLA-DRB1*04:05 in a haplotype with HLA-DRB4 and -DQ-encoding alleles increases the OR even higher, to 19.14, for the haplotype HLA-DRB1*04:05$HLA-DRB4*01:03$HLA-DQB1*02:02$HLA-DQA1*03:03.Strong protection was seen for the allele HLA-DRB1*15:03 (OR = 0.36, p = 0.009, Table 2).For this allele, adding the context of other HLA class II alleles makes very little difference to the observed association, presumably because the LD for the haplotype HLA-DRB1*15:03$HLA-DRB5*01:01$HLA-DQB1*06:02$HLA-DQA1*01:02 is so strong that nearly all chromosomes that have one of these alleles have the others as well.In other words, the source of the protection is difficult to determine, because all or any subset of these alleles show a similar level of association.
In some cases, individual amino acid (AA) residues can have a strong effect on HLA-associated disease susceptibility that spans groups of HLA alleles, for example, the strong effect on T1D risk of codon position 57 in HLA-DQB1 reported in 1988. 16,17T1D associations were examined at the level of individual amino acids for all loci in these data.The analysis queried 396 polymorphic AA positions, with sufficient variation for analysis, across 11 classical HLA loci and produced 899 association results, which can be seen in Table S5.Most polymorphic positions had 2 variants, and those significantly associated with T1D generally had one AA variant with a protective OR and the other with a predisposing OR.Some positions, however, had more than two possible amino acid residues at a given position, with several positions having as many as 6 different possible amino acids.Overall, like the allele-level data, the AA-level data show more association in class II loci, particularly HLA-DRB1 and -DQB1, than in class I.In fact, HLA class I did not exhibit significant T1D association at the AA level.Figure 1 shows a diagrammatic representation of the polymorphic amino acid positions in the HLA class II loci HLA-DRB1, -DQA1, and -DQB1 and their significance to T1D risk.Significantly-associated HLA class II AA residues and their positions are shown in Table 6.Many amino acids for each locus demonstrate significant T1D association, including position 57 of the HLA-DQB1 locus, which is represented by the red dot at position number 72 near the top of Figure 1.Of the 133 AA positions that were queried across the loci encoding the classical HLA class II antigens HLA-DRB1, HLA-DQB1, and HLA-DQA1, 16 (12%) were located in the signal peptide regions.These may be particularly attractive candidates for further study as they may affect expression of the protein rather than the peptide repertoire.

| DISCUSSION
9][20][21] The data revealed some expected similarities with HLA-associated T1D risk studies from other populations such as positive association with HLA-DRB1*03:01, -DRB1*04:05, and -DRB1*09:01 alleles and negative association with HLA-DRB1*15:03, a member of the group of alleles commonly referred to as "DR2."][24][25][26] Inconsistency of the strongly positive disease association for HLA-DPB1*04:02 reported here with previous reports lends support to the notion that the association simply reflects LD with a different risk locus on the chromosome rather than an effect of the allele itself.Others have argued that all observed DPB1 associations are likely to simply reflect LD with other risk factors. 27This result underscores the need for interpreting HLA and T1D risk results in the context of other loci and across populations.(Table S3).9][30][31][32] We note here that, although the distribution of estimated nine-locus haplotypes in the data set is too sparse to allow formal disease association analysis, the extended haplotype Regardless of whether the observed T1D associations with HLA-DPB1*04:02 and HLA-B*27:05 are direct or indirect effects (attributable to LD), diagnostic, polymorphic single nucleotide polymorphisms (SNPs) from these alleles could be incorporated in the design of a simple, population-specific diagnostic tool to aid T1D risk prediction or differential diagnosis of diabetes type in Malian individuals.

HLA locus
Full HLA genotyping is much too cumbersome and expensive to be practical as a widespread predictive and diagnostic tool.To create a more practical test, SNP panels have been developed that include selected SNPs in the HLA region and additional SNPs in multiple non-HLA polymorphic loci that have much smaller effects on T1D susceptibility. 33Data from these panels create what is known as a genetic risk score (GRS).5][36][37][38] Genetic diversity among populations, especially in the HLA region, makes the creation of a "one size fits all" T1D predictive test difficult.Incorporation of population-specific results like those observed for HLA-DPB1*04:02 and -B*27:05 in this study could refine and increase the predictive value of a predictive test by informing design of an assay specifically targeted to the population being tested without the need for full HLA genotyping.Commonly, protein variants are defined by one or a small number of variations in the gene sequence that may lead to one (or several) amino acid differences in the resulting peptides encoded by the gene.The HLA region is the most highly polymorphic region of the human genome, and most loci have a large number of polymorphisms, leading to the many thousands of alleles identified to date.While most association studies are based on full, named alleles, one alternate method is to individually examine each polymorphic amino acid position within the genes.Table S5 shows all variants from the 11 classical HLA loci genotyped in this study.After correction for multiple comparisons, all significantly associated positions are limited to class II loci.Figure 1 shows associated positions and Table 6 lists significantly-associated variants.
None of the potential T1D associations for class I polymorphic positions survive correction for multiple comparisons; they are not shown in Figure 1 or listed in Table 6.However, examination of all the polymorphic amino acid positions in the class I loci (Table S5) reveals qualitative differences in the HLA-C locus, which generally exhibits little or no association with T1D, compared to the HLA-A and HLA-B loci.Before correction, HLA-C has approximately twice as many nominally-significant amino acid associations ( p ≤ 0.05) than HLA-A or HLA-B (A = 12, B = 13, C = 24).Not surprisingly, the majority of the associated positions in HLA-A (10 of 12) and HLA-B (13 of 13) are located in exons 2 and 3, which are the exons that encode the peptide-binding groove of the HLA antigen.In contrast, only 7 of 24 polymorphic amino acid positions for HLA-C are in exons 2 and 3, with 17 located in other exons, especially exon 5 (n = 9), which encodes the transmembrane portion of the protein.Notably, one of the polymorphic positions revealed as significant before correction (p(nc) = 0.008), HLA-C position 309, was reported over a decade ago to play an important role in affecting Natural Killer (NK) cell function. 39These data may suggest that the role of the HLA-C locus in T1D risk could differ from that of the other classical HLA antigens, perhaps by virtue of interaction with the innate immune system.HLA-C is the dominant ligand for Killer cell immunoglobulin-like receptors (KIR) on natural killer (NK) cells. 40HLA-C can be categorized into two groups that bind different sets of KIR and direct NK function. 41,424][45][46][47][48][49] No clear picture has yet emerged to elucidate the role of HLA-C interaction with KIR in the context of T1D risk.The extreme polymorphism of both the HLA and the KIR regions of the genome, as well as the expense of genotyping these loci, combine to make these studies difficult.Effects of HLA and KIR interaction depend on both the group(s) of the HLA-C alleles present and the particular KIR expressed in a given individual, greatly increasing the complexity of analyses and, potentially, requiring large data sets due to reduced statistical power.However, the HLA-C data reported here provide a good rationale for further study of HLA-C in larger data sets.
This study provided a good overview of HLA and T1D risk in an understudied population.It demonstrated a strong association between HLA class II alleles and T1D and provided potential diagnostic targets to aid T1D prediction and diagnosis.It suggested that HLA class I may contribute to T1D risk not only by classical antigen presentation to the adaptive immune system but also by interaction with the innate immune system.As higherresolution data become available from more populations, comparisons among the study results may uncover previously elusive mechanisms of how HLA confers strong risk for T1D.In addition, expanded and more comprehensive data should lead to efficient, cost-effective screening tools for T1D prediction and differential diagnosis.

AUTHOR CONTRIBUTIONS
Stéphane Besançon and Assa Traore Sidibé provided samples and clinical data for the study.Graham D. Ogle organized the collaboration and the procurement of samples for genotyping.Jannetje Kooij and Maaike Rijkers coordinated the genotyping.Fereshte Dadkhodaie, Helma de Bruin, and Harper R. N. Martin generated experimental data.Erik H. Rozemuller performed and coordinated analysis of the primary data.Steven J. Mack designed and performed all association analyses of the genotyping data and co-wrote the manuscript.Janelle A. Noble conceived and coordinated the project, analyzed primary data, and wrote the manuscript.No artificial intelligence systems were applied in the writing of the paper or for the work described.
Alleles in HLA loci other than HLA-DRB1 and HLA-DQB1 that are significantly associated* with T1D.
T A B L E 4Note: Total n = total numbers of alleles from control individuals and T1D individuals for a given HLA locus.Abbreviations: 95% CI, 95% confidence interval; DRB3/4/5, DRB3, DRB4, and DRB5, which are separate loci but segregate and are analyzed as if they were alleles of a single locus; fcase, frequency of alleles in T1D individuals; fcontrol, frequency of alleles in control individuals; n, number; OR, odds ratio.*HLA-A,-B,-DRB3/4/5, -DQA1, and -DPB1 were all significantly T1D associated at the locus level (see Table1); thus, significance threshold is p ≤ 0.05.HLA-C did not reach statistical significance at the overall locus level for T1D association.Significance for HLA-C is 0.05/17 ≤ 0.0029.HLA-DPA1 did not reach statistical significance at the overall locus level for T1D association.Significance for HLA-DPA1 is 0.05/6 ≤ 0.008.