HLA‐G whole gene amplification reveals linkage disequilibrium between the HLA‐G 3′UTR and coding sequence

Polymorphic sites in the HLA‐G gene may influence expression and function of the protein. Knowledge of the association between high‐resolution HLA‐G alleles and 3‐prime untranslated (3′UTR) haplotypes is useful for studies on the role of HLA‐G in transplantation, pregnancy, and cancer. We developed a next generation sequencing (NGS)‐based typing assay enabling full phasing over the whole HLA‐G gene sequence with inclusion of the 3′UTR region. DNA from 171 mother‐child pairs (342 samples) was studied for: (a) HLA‐G allele information by the NGSgo‐AmpX HLA‐G assay, (b) 3′UTR haplotype information by an in‐house developed sequence‐based typing method of a 699/713 base pair region in the 3′UTR, and (c) the full phase HLA‐G gene sequence, by combining primers from both assays. The mother to child inheritance allowed internal verification of newly identified alleles and of association between coding and UTR regions. The NGSgo workflow compatible with Illumina platforms was employed. Data was interpreted using NGSengine software. In 99.4% of all alleles analyzed, the extended typing was consistent with the separate allele and 3′UTR typing methods. After repeated analysis of four samples that showed discrepancy, consistency reached 100%. A high‐linkage disequilibrium between IPD‐IMGT/HLA Database‐defined HLA‐G alleles and the extended 3′UTR region was identified (D′ = 0.994, P < .0001). Strong associations were found particularly between HLA‐G*01:04 and UTR‐3, between HLA‐G*01:01:03 and UTR‐7, and between HLA‐G*01:03:01 and UTR‐5 (for all: r = 1). Six novel HLA‐G alleles and three novel 3′UTR haplotype variants were identified, of which three and one, respectively, were verified in the offspring.


| BACKGROUND
The human leukocyte antigen-G (HLA-G) is a non-classical major histocompatibility complex class Ib glycoprotein, which can have immune regulatory properties. HLA-G was first described in the placenta. 1 For healthy pregnancy, the immune system of the mother needs to accept the semiallogeneic fetal tissue. HLA-G is expressed by fetal extravillous trophoblasts in the placenta where it is thought to be involved in reinforcing immune tolerance of maternal immune cells toward the fetus. 2,3 Low levels of soluble HLA-G have been associated with pregnancy complications. [4][5][6][7] HLA-G may be involved also in mediating mechanisms for immune tolerance and allograft acceptance after transplantation, 8,9 and in immune escape mechanisms in tumors. 9 In contrast to classical HLA genes (HLA-A, -B, -C), HLA-G is far less polymorphic. An extensive study has been published on the HLA-G gene structure and haplotype diversity in populations over the world. 10 Both genetic variations in the coding sequence and in the 3 prime untranslated region (3 0 UTR) of the HLA-G gene may affect the level and function of the gene product. Certain HLA-G genotype variants are associated with soluble (s) HLA-G levels in the plasma 11 and functionality of the protein. 12 The 3 0 UTR is targeted by microRNAs, 13 which can mediate reduced expression of the mRNA and/or protein. Polymorphisms in this 3 0 UTR region, such as the one at +3142, 14 may influence the efficiency of microRNA binding, and consequently the level of HLA-G expression. The 14-bp insertion/deletion and +3187A polymorphisms influence the stability of the HLA-G mRNA molecule and the expression of HLA-G. [15][16][17] Castelli and colleagues composed 3 0 UTR haplotypes on the basis of eight different polymorphisms in the HLA-G 3 0 UTR region. 18 Several studies have shown that the 3 0 UTR haplotype is related to (soluble) HLA-G levels. 19,20 Next generation sequencing methods have made it easier to obtain nucleotide sequence information of the full HLA-G gene. 21,22 Because of our interest in the possible functional consequences of HLA-G DNA variants, we have performed extended sequencing to obtain information on combinations of the allele sequence and the 3 0 UTR region of HLA-G. For this, we studied mother-child pairs in a population that was mostly of European ancestry, since this approach enhances the reliability with which novel alleles and association between coding sequence and UTR are identified.

| Cohort
For an initial screen, 218 mother-child pairs were selected as part of a Norwegian Pregnancy Biobank, after informed consent, as described in a previous publication. 23 The study was approved by the Regional committee for Medical and Health Research Ethics in South-Eastern Norway, and performed in accordance with the principles of the Helsinki Declaration. DNA was isolated from maternal and fetal EDTA plasma samples. 23 Of the 218 mother-child pairs included, 171 complete pairs were eventually available for which typing was obtained for the HLA-G coding sequence.

| Polymerase chain reaction (PCR)
PCR were carried out in a total volume of 15 μL consisting of 12.5 ng of genomic DNA, 0.3 pmol/μL forward primers, 0.3 pmol/μL reverse primers, 0.04 U of GoTaq (Promega), 0.2 mM dNTP, 4 mM MgCl 2 , 15 mM (NH 4 ) 2 SO 4 , 50 mM Tris HCl, 0.05 mM EDTA, 0.01% gelatin, and 1 mM β-mercaptoethanol. The PCR protocol consisted of an initial 2 minutes at 94 C, followed by 10 cycles consisting of 15 seconds at 94 C and 45 seconds at 68 C, and by 25 cycles consisting of 15 seconds at 94 C, 45 seconds at 64 C, and 30 seconds at 72 C. The program was finalized with 3 minutes at 72 C. For enzymatic cleanup, 15 μL of PCR product was subjected to 2 μL of ExoSAP-IT (Applied Biosystems, Ottawa, Canada). Reactions were incubated for 30 minutes at 37 C and for 20 minutes at 80 C.
For a combined strategy of simultaneous assessment of HLA-G alleles and 3 0 UTR haplotypes we performed extended typing, whereby the forward primer from the NGSgo-AmpX assay was combined with the reverse primer from the in-house developed SBT assay (see Figure 1). The technical protocol was similar to the NGS methodology that was followed to determine the HLA-G alleles.

| Linkage disequilibrium
To calculate the overall linkage disequilibrium (LD) between the HLA-G coding sequence and the 3 0 UTR, PyPop 0.7.0 software (California) 26 was used. The delta (D), relative delta (D 0 ), and P value were used as measures of the strength of association. The D 0 is a number between 0 and 1, with 1 meaning maximal association. A positive D, in combination with D 0 approaching 1 and a significant P value, denotes high LD. For calculation of LD between different pairs of HLA-G alleles and 3 0 UTR haplotypes, D was calculated along with the Pearson correlation coefficient (r), to determine the strength of the association.

| RESULTS
From 171 mother-child pairs the HLA-G coding sequence, the HLA-G 3 0 UTR, and the extended HLA-G region were evaluated. Of the remaining 47 pairs, at least one of the assessments could not be adequately performed, mostly due to insufficient amount of DNA. Due to inheritance in the study cohort of family samples, the 171 complete mother-child pairs gave opportunity to check reliability of the extended typing strategy and robustness of newly found alleles; provided that they were inherited by the child or not derived from the father.
For identification of haplotype frequencies we only report information from the mothers. The frequencies of the combined HLA-G coding sequence and 3 0 UTR haplotype for the 342 available maternal chromosomes have been summarized in Table 1. We found nine new genotype variants of HLA-G: six new HLA-G alleles and three new 3 0 UTR haplotype variants (using IPD-IMGT/HLA Database version 3.34). Of these, four HLA-G alleles and one 3 0 UTR haplotype variant could be confirmed in the offspring, since they were inherited from mother to child. The sequences of the six new HLA-G genotype variants have been registered in GenBank and the IPD-IMGT/ HLA Database, and their official sequence names, determined by the WHO Nomenclature Committee, have been incorporated in Table 1.
For 684 chromosomes (342 maternal, 342 child) we could assess the consistency between the extended HLA-G sequencing approach and the separate sequencing reactions that targeted either the coding sequence or the 3 0 UTR region. For 680 alleles (99.4%) consistent results were found. Four discrepancies were encountered. In two cases, one in a maternal DNA sample and one in a child DNA sample, extended sequencing gave a homozygous typing, whereas the separate allele typing and 3 0 UTR typing both gave a heterozygous result. In a third sample (mother), extended typing gave G*01:01:05, whereas allele typing gave G*01:04:01:01. In a fourth sample (child), extended typing of a non-maternally inherited allele gave UTR-4~G*01:01:05, whereas the separate typing methods gave UTR-5/ G*01:01:01:05. Extended typing was performed again on the four DNA samples showing discrepancy between the technical approaches, and this time the sequencing results F I G U R E 1 Full phasing over the whole HLA-G gene sequence by extended next generation sequencing. HLA-G allele typing was acquired by an NGSgo-AmpX assay on an amplicon that starts at −57 and ends at +2727. A separate, in-house developed sequencing assay targeted the 3 0 UTR region and generated an amplicon between +2940 and +3559. For a combined strategy of simultaneous assessment of the HLA-G allele and 3 0 UTR haplotype, the forward primer (starting at −57) from the NGSgo-AmpX assay was combined with the reverse primer (at 3559) from the in-house developed sequencing assay Note: HLA-G haplotypes from 342 maternal chromosomes have been depicted in order from highest to lowest frequency. The frequencies in the last column represent percentages of the total number of chromosomes analyzed. a Haplotypes were composed based on singular polymorphisms in the 3 0 UTR region, according to the approach described previously. 18 UTR-18 is similar to UTR-6, except for one position at +3227. 25  matched between the extended sequencing approach and the two separate sequencing assays. LD analysis was performed, and a strong overall association was found between the HLA-G coding sequence and the 3 0 UTR (D = 0.066, D 0 = 0.994, P < .0001). As shown in Table 2, particularly strong associations were found between HLA-G*01:04 and UTR-3, between HLA-G*01:01:03 and UTR-7, and between HLA-G*01:03:01 and UTR-5 (for each pair: positive D value and r = 1).

| DISCUSSION
We performed sequencing on mother-child pairs to obtain the fully phased whole HLA-G gene sequence, including information concerning both the allele and the 3 0 UTR region. In 99.4% of all alleles analyzed, the extended typing method gave consistent results as compared to the separate allele-and 3 0 UTR typing methods. Strong associations were found between HLA-G*01:04 and UTR-3, between HLA-G*01:01:03 and UTR-7, and between HLA-G*01:03:01 and UTR-5. We identified nine novel HLA-G variants (either coding sequence or 3 0 UTR), five of which could be verified in the offspring.
A particularly strong association was found between the HLA-G*01:04 allele and UTR-3. Interestingly, this combination has been found to be related to low levels of soluble HLA-G in the serum of lung transplant recipients and to adverse transplant outcome. 27 In contrast to G*01:01 and G*01:03, G*01:04 exhibits an altered peptide anchor motif, and, when expressed on cells, it confers increased protection against cytotoxicity by natural killer Associations between pairs of HLA-G alleles and 3 0 UTR haplotypes  cells. 12 Recently, an NGS-based approach was published to obtain information on the full HLA-G gene, including the 5 0 UTR, the coding sequence, and the 3 0 UTR. 22 In the current study a similar approach was followed, but here we had the advantage of investigating mother-child combinations and the opportunity to internally validate several novel HLA-G alleles and 3 0 UTR haplotype variants. Furthermore, we could validate consistency of the extended typing method in comparison to the separately typed coding and 3 0 UTR sequences. A limitation of our study is that the 5 0 UTR region was not included in the analysis. The current findings in the Norwegian population for UTR-1~HLA-G*01:01 (34.2%) and UTR-3~HLA-G*01:04 (11.4%) match those described in a French population (33.6% and 15.5%, respectively). 19 In other populations of European ancestry, frequency for both UTR-3 and HLA-G*01:04 was 8.5%. 10 It should be noted, however, that the prevalence of combined allele/UTR haplotypes, as described in our study, may be different for other ethnic groups.
The four discrepancies that we encountered between the extended typing results and typing of the separate regions probably should be ascribed to an incidental imperfection in the extended typing technique. The first two cases may be due to a missed allele in the extended typing because of a flaw in PCR amplification. Case three and four may be due to contamination or carry over of the reaction. Repeating of the extended typing on the four DNA samples did give compatible results between extended typing and the two separate sequencing assays.
We have used the mother-child pairs to identify novel alleles in the mother, and verify these in the child. Obviously, this could be done only in those cases were there was a considerable confidence that the particular allele was inherited from mother to child. In 18.7% of motherchild pairs the child was either homozygous or the mother and child had similar genotypes, which complicated full segregation analysis. Furthermore, we did not have genetic information from the fathers.
The sequencing strategy provided for HLA-G may be translated to classical HLA genes. The functional relevance of polymorphisms in the coding alleles of classical HLA genes is widely acknowledged, and availability of complementary UTR sequence information could provide an extra layer of information on the level of expression and on the functionality of the HLA molecules. Indeed, genetic variations in the UTR of HLA-A are associated with expression. 28 Variations in the 3 0 UTR of HLA-C affect cell surface expression of the gene product and consequently viral control. 29,30 The expression level of HLA-C also influences immunogenicity in incompatible transplantation conditions. 31 Increasing knowledge of genetic variation in the UTR regions of both non-classical and classical HLA genes would necessitate agreement on appropiate nomenclature and inclusion in the IPD-IMGT/HLA Database.
In conclusion, we have performed extended sequencing of the HLA-G gene and have described both allele coding sequence/UTR combinations and novel genotype variants of the HLA-G gene.