Deciphering the fine nucleotide diversity of full HLA class I and class II genes in a well‐documented population from sub‐Saharan Africa

With the aim to understand how next‐generation sequencing (NGS) improves both our assessment of genetic variation within populations and our knowledge on HLA molecular evolution, we sequenced and analysed 8 HLA loci in a well‐documented population from sub‐Saharan Africa (Mandenka). The results of full‐gene NGS‐MiSeq sequencing compared with those obtained by traditional typing techniques or limited sequencing strategies showed that segregating sites located outside exon 2 are crucial to describe not only class I but also class II population diversity. A comprehensive analysis of exons 2, 3, 4 and 5 nucleotide diversity at the 8 HLA loci revealed remarkable differences among these gene regions, notably a greater variation concentrated in the antigen recognition sites of class I exons 3 and some class II exons 2, likely associated with their peptide‐presentation function, a lower diversity of HLA‐C exon 3, possibly related to its role as a KIR ligand, and a peculiar molecular diversity of HLA‐A exon 2, revealing demographic signals. Based on full‐length HLA sequences, we also propose that the most frequent DRB1 allele in the studied population, DRB1*13:04, emerged from an allelic conversion involving 3 potential alleles as donors and DRB1*11:02:01 as recipient. Finally, our analysis revealed a high occurrence of the DRB1*13:04‐DQA1*05:05:01‐DQB1*03:19 haplotype, possibly resulting from a selective sweep due to protection to Onchorcerca volvulus, a prevalent pathogen in West Africa. This study unveils highly relevant information on the molecular evolution of HLA genes in relation to their immune function, calling for similar analyses in other populations living in contrasting environments.


PCR-SSO typings of class I and II genes
HLA class II (DRB1, DQB1 and DPB1) typings were performed by locus-and group-specific PCR amplification, followed by direct hybridization with sequence-specific oligonucleotide (SSO) probes on nylon filters. HLA-DRB1 and -DQB1 SSO probes were designed 3,4 in order to discriminate the alleles assigned by the HLA Nomenclature Committee in 1991.
For HLA class I (A, B, C) typings, samples were first tested in 1996 by direct PCR-SSO hybridization, using locally designed probes 5,6,7 . In 2001 new HLA-A and -B typings were performed using the reverse PCR-SSO hybridization protocol of the 13th International Histocompatibility and Immunogenetics Workshop (IHIW, the reverse line strip system having been specifically developed for the IHIW Anthropology component) that comprised 139 probes able to discriminate 341 alleles.
Tagging and multiplexing methods designed by Galan et al 8 for Roche NGS-454 sequencing were used. For this high-throughput sequencing, adaptors are required for the emPCR and 454 GS-FLX pyrosquencing using Lib-L Titanium Series reagents. Thus, the primers used (Table S1) were modified by adding a 30 bp Titanium adaptor to the 5'-end (CCATCTCATCCCTGCGTGTCTCCGACTCAG) and a short sequence of 7-mer oligonucleotidic tag. A total of 32 forward and 21 reverse tags were designed in order to allow the generation of 182 unique combinations of forward and reverse primers and to sequence as many individually tagged amplicons at the same time. Three sequencing runs were performed.  Due to primer design, exon 2 was sequenced from position 5 to position 245 for DRB1, and from 17 to 258 for DQB1; for DQA1 the sequences encompass the last 50bp of intron 1 (7 different haplotypes) to position 217 (lacking the last 32pb of the exon); for DPB1, the sequences encompass the last 19 bp of intron 1, the complete exon 2 and the first 2 bp of intron 2 (but no variability was observed in the intronic regions).
PCR preparations were performed in 96-well plates, under a sterile hood to avoid contamination by other DNA. For every plate, a well was filled with water and PCR mix to check the absence of contamination after amplification. The fusion-primers were diluted in de-ionized contamination-free water in order to have a concentration of 0,5 µM of each primer. Amplifications were made in a 10µl reaction volume containing 5 µl of QIAGEN Multiplex Kit Buffer (QIAGEN), 2 µl of de-ionized DNAand RNA-free water, 1 µl of each of the diluted primers and 1 µl of DNA. The thermocycling conditions used for the amplification began with 1 cycle of 15 min at 95°C followed by 40 cycles of 20 sec at 95°C (denaturation step), 45 sec at 55°C (annealing step) and 60 sec at 72°C (extension). A last step at 72°C was done at the end during 10 minutes.
After amplification, PCR products were checked on a 2% agarose gel using a loading dye (Promega) or directly on a 2% E-gel (E-gel 96 2% with SYBR safe, Invitrogen). A total of 4µl (more for samples with low amplification signals) of each PCR product of the entire plate were then collected in a single tube. Plates were mixed two by two in singles tubes. To ensure the same amplification between the tubes, amplifications were tested on a 2% agarose gel. Finally, the tubes were mixed together, the quantity of each depending on their amplification signals. To control that the amplification between the 4 genes was the same, products of the 4 tubes were run on a 2% agarose gel. The final 4 tubes were sent to Beckman Coulter Genomics to be sequenced.
Sequences were delivered as Fasta and FastQ files. Reads were filtered using Mothur 9 with a minimal PhredScore of 30. These data were explored using SESAME Barcode 10 . SESAME software processes the DNA sequences obtained through next-generation sequencing and allows the identification of multiplexed samples by the tags and a reference sequence.

NGS-MiSeq sequencing of full class I and class II genes
Two complementary techniques were used to sequence the full HLA genes at both the Geneva Hospital and Stanford University.

Geneva Hospital
We used the Holotype HLA X2 kit (Omixon Biocomputing Ltd, Budapest, Hungary) in combination with the Illumina MiSeq platform to type 58 individuals for 7 HLA loci (A, B, C, DRB1, DQA1, DQB1 and DPB1). Long-range PCR amplification was done for each locus. HLA-A, B, C, DQA1 and DQB1 were amplified over their entire length, from 5'UTR to 3'UTR (DQB1 needed 2 different amplifications). HLA-DRB1 was amplified from intron 1 to intron 4 and HLA-DPB1 from intron 1 to 3'UTR. The primers' mixes were provided by Omixon (Holotype HLA X2 Kit). The dNTP (10mM), the LongRange PCR Buffer (10x) and the LongRange PCR Enzyme Mix (LongRange PCR kit, QIAGEN) were used for the amplification. The amplicon sizes were verified through 2% agarose gel electrophoresis. The amplicons were quantified by qPCR, using the Quantifluor dsDNA System (Promega). The samples were then diluted to reach approximately the same final concentration and all the loci of each individual were pooled. The library preparation was done using the Holotype HLA X2 kit (Omixon), which provides buffers and enzymes for the whole process. The amplicons generated by the long range PCR were first enzymatically fragmented. The shared ends produced by this enzyme were repaired and the adaptors were ligated to the ends of the fragments. A final pool was created, containing fragments of every locus for all the individuals. A clean-up and concentration step was performed using AMPure XP Beads (Beckman Coulter). A size selection of the pool was done using a Pippin Prep (Sage Science) in order to select a range of DNA fragment sizes. The library was finally quantified by qPCR, using the reagents from the Library Quantification Kit -Illumina/universal (KAPA biosystems). The denatured library was loaded at 9pM in the MiSeq. The FASTQ files generated by the MiSeq were processed by the software HLA Twin v1.1.1 (Omixon).

Stanford University
A total of 65 DNA samples were typed for 8 HLA loci (A, B, C, DPA1, DPB1, DQA1, DQB1, DRB1) using a NGS typing method developed by Sirona Genomics (Immucor, Inc, Norcross, GA, USA) and performed following the manufacturer's instructions. Long range PCR amplified the entire genes of all class I loci (5'UTR to 3'UTR) and key regions of class II loci: DPA1: 5'UTR to intron 4; DPB1: intron 1 to intron 4; DQA1: 5'UTR to intron 4; DQB1: 5'UTR to intron 5. The PCR amplification reaction contained 100 ng of genomic DNA, Sirona PCR master mix containing enzymes and primers specific for each HLA locus. The thermal cycling parameters for all genes were: initial denaturation 94°C/2 min, followed by 35 cycles at 94°C/30 sec, 60°C/30 sec, 66°C/7 min 30 sec, followed by a final extension step at 66°C for 10 min. PCR were performed using Veriti Thermal Cyclers (Applied Biosystems/Thermo Fisher Scientific, Waltham, MA, USA). PCR products were quantified using a PicoGreen assay (Invitrogen/Thermo Fisher Scientific, Waltham, MA, USA) with a Victor X plate reader (Perkin Elmer, Waltham, MA, USA). PCR products for all genes were pooled in optimal molar amounts, and purified using Agencourt AMPure XP beads (Beckman Coulter, Fullerton, CA). Barcoded sample libraries were prepared by enzymatic cleavage into 300-500 bp fragments, purification using Agencourt AMPure beads, followed by enzymatic end repair to remove dNTPs overhangs, incorporation of deoxynucleotide dAMP to blunt ended 3' ends, followed by ligation of a unique index 'barcode' adaptor to each pooled sample. All adaptor ligated samples were pooled into a single tube, purified using Agencourt AMPure XP beads and DNA fragments were size selected for 400-500 bp fragments using the HLA alleles were assigned using the Sirona Genomics NGS alignment software, which uses two complementary informatics strategies to analyse to make genotyping calls. The first strategy utilizes Expectation Maximization to rank computed allele candidates based on mapping metrics. Coverage is calculated from competitive alignment of paired-end NGS reads with all HLA reference sequences in the IMGT/HLA database v3.22 11 (http://www.ebi.ac.uk/imgt/hla) and reference sequences generated 'internally' by Sirona/Immucor. The second strategy utilizes a dynamic phasing algorithm to assemble reads and construct one or two phased consensus sequences by de novo assembly of mapped paired-end sequences. These consensus sequences are then aligned to the HLA allele database to find the best fit.