Chronic hepatitis B (CHB) is a major global health issue particularly important in some developing countries. It can lead to cirrhosis, hepatic failure, and hepatocellular carcinoma. According to a national epidemiological survey for hepatitis B by the Chinese Ministry of Health in 2006, the hepatitis B surface antigen (HBsAg) seropositive rate was 7.18% for the general population aged between 1 and 59 years (http://www.chinacdc.cn/dcbg/200804/t20080423_34870.htm), whereas in Guangdong province, where our study was conducted, the seropositive rate was 15.46% for the same-age population as indicated by a survey in 2009.1 In the search for genetic variants predisposing to CHB numerous candidate gene approaches have been performed. Based on the “common-variant common-disease” hypothesis2 two genome-wide association studies have been conducted and reported common variants in the HLA-DP and HLA-DQ predisposing to CHB.3, 4 However, the possible roles of rare genetic variants (minor allele frequency <0.05) remain undescribed. The fast-progressing next-generation sequencing technology has proven an effective way to interrogate the whole exome (exome sequencing) or the whole genome (whole genome sequencing) and to unravel rare variants.5 In this study, exome sequencing was performed in a group of discovery patients and controls. Candidate genetic variants were selected and their association with CHB infection tested in a case-control study. The results were further analyzed by structural analyses of the mutant proteins. Gene expression studies were also performed for the associated gene, transmembrane protein 2.
Chronic hepatitis B (CHB) is a major global health issue. The role of rare genetic variants in CHB has not been elucidated. We aimed to identify rare allelic variants predisposing to CHB. We performed exome sequencing in 50 CHB patients who had no identifiable risk factors for CHB and 40 controls who were healthy and hepatitis B surface antibody-positive, but had never received hepatitis B vaccination. We selected six rare variant alleles and followed up their association with disease status by Sanger sequencing in a case-control study comprising 1,728 CHB patients and 1,636 healthy controls. The latter had either not been immunized with hepatitis B vaccine or had uncertain vaccination status. Our results showed that transmembrane protein 2 p.Ser1254Asn, interferon alpha 2 p.Ala120Thr, its regulator NLR family member X1 p.Arg707Cys, and complement component 2 p.Glu318Asp were associated with CHB, with P values of <1.0 × 10−7, 2.76 × 10−5, 5.08 × 10−5, 2.78 × 10−4 and odds ratios (ORs) of 2.45, 4.08, 2.34, and 1.97, respectively. The combined P value was <2.0 × 10−16. As there has been no indication of immunological functions for the associated gene, transmembrane protein 2, we further studied its expression by immunohistochemistry, real-time polymerase chain reaction, and western blotting. Our results showed that it was strongly expressed by healthy hepatocytes, but its expression was reduced in liver tissues with CHB, hepatitis B viral (HBV) genome-containing HepG2.2.15 cells, as compared with healthy liver tissues and non-HBV genome-containing HepG2 cells (P = 0.022 and 0.0036, respectively). Conclusion: We identified four missense mutations associated with CHB, our results providing evidence for rare inborn genetic defects that contribute to increased host susceptibility to CHB. (HEPATOLOGY 2012;56:1661–1670)
Materials and Methods
Subjects in the Case-Control Study.
Diagnosis of CHB is based on seropositivity for HBsAg and persistent or recurrent elevated levels of serum alanine aminotransferase (ALT) for longer than 6 months, as defined by the criteria of the Chinese Society of Hepatology and Chinese Society of Infectious Diseases, the Chinese Medical Association.6 In all, 1,728 CHB patients were recruited from the Department of Infectious Diseases, 3rd Affiliated Hospital, Sun Yat-sen University between May 2008 and March 2011. Those who were coinfected with hepatitis A, hepatitis C, or hepatitis E viruses were excluded. Subjects who were pregnant, were infected with human immunodeficiency virus (HIV), were alcohol or drug abusers, or had autoimmune diseases were also excluded. All patients were unrelated and of Chinese Han ethnicity. Controls were recruited from volunteers in the same city with the same ethnicity and in the same time period. Inclusion criteria were that they were healthy subjects, as confirmed by medical examination, including normal liver enzymes, negative for HBsAg and hepatitis B eAg (HBeAg), normal liver ultrasound and no previous hepatitis B virus (HBV) immunization, or uncertainty about vaccination history. All controls were unrelated. This gave us 1,636 control subjects (Supporting Table 1).
In our association study candidate genetic variants were first confirmed by Sanger sequencing and then examined in 500 cases versus 500 controls taken randomly from the 1,728 cases and 1,636 controls. We proceeded with analysis of the whole cohort only when P values ≤ 0.05, or odds ratios (ORs) ≥1.5 were obtained in the initial 500 cases versus 500 controls tests. Those variants not reaching the criteria were discarded.
Subjects in the Discovery Group for Exome Sequencing and Candidate Selection.
This group was comprised of 50 CHB patients and 40 controls (not included in the 1,728 cases versus 1,636 control study). Exome sequencing was performed in this group in order to identify rare sequence variants and to select candidate variants for the case-control study. In order to maximize our chance of discovering variants contributing to CHB susceptibility we attempted the “extreme phenotype comparison” approach5 in addition to the inclusion criteria mentioned above. We hypothesized that patients without identifiable common risk factors to CHB might be regarded as “susceptible” individuals. We therefore selected 50 CHB patients who had no mother-to-child transmission, blood transfusion, administration of blood products, history of unsafe injection, or HBsAg-positive sexual partner. We also hypothesized that, given the very high HBsAg seropositive rate in this province,1 individuals who had evidence of previous exposure to HBV, but had remained healthy, might be classifiable as “resistant” individuals who could be used as controls. From among them we selected 40 controls who were seropositive for HBsAb, but had no hepatitis B vaccination history and were healthy, as confirmed by annual medical examination for the last 5 consecutive years (Supporting Table 2).
Written informed consent was obtained from all participants. The project was approved by the Ethics Committee of the University for Human Study and was conducted according to the principles of the 1975 Declaration of Helsinki.
Exome Sequencing and Candidate Selection in the Discovery Group
Exome sequences were captured by NimbleGen2.1 M array targeting 34 Mb of the human genome, containing 180,000 coding exons and 551 miRNA genes (http://www.nimblegen.com). The enriched library was sequenced on Illumina HighSeq2000. Sequencing reads were aligned to NCBI build 36.3. After removing reads duplicates, the average sequencing depth per sample was 43×. Of the targeted bases 93.54% had coverage of ≥8× and genotype quality score ≥30. Single nucleotide variations (SNVs) were identified (“called”) by SOAPsnp (http://soap.genomics.org.cn/index.html) and SAMTools (http://samtools.sourceforge.net). Small insertions and deletions (indels) were called by programs Dindel, Mpileup group, and Mpileup individual (http://www.sanger. ac.uk/resources/software/dindel/). Variants were annotated with information from the Consensus Coding Sequences Database at the NCBI.
In selecting candidate genetic variants, we modified various existing protocols for Mendelian gene discovery.7 Rare variants were identified by comparing their frequencies in the Chinese Han data in HapMap (August 2010 release) (http://hapmap.ncbi.nlm.nih.gov/) and our in-house data. Variations novel to HapMap Chinese Han data and our in-house data were treated as rare variants, as the HapMap already included data from 248 Chinese Han subjects and our in-house data from another 100 subjects. We reasoned that the following criteria might provide the most effective approach: (1) generally rare variants that were not shared by sequenced cases and controls and appeared more frequently (i.e., gave higher “call counts”); (2) those that were predictively deleterious on the genes' functions (e.g., truncating or missense mutations to highly conserved amino acids and/or were expected to be damaging according to PolyPhen-2 (http://genetics.bwh.harvard.edu/pph2/) or SIFT (http://sift.jcvi.org/www/SIFT_chr_coords_submit.html)); (3) variants of genes with known antiviral/immune functions. Using these criteria, we performed two rounds of selections. In the first round we focused on rare variants that appeared more frequently and were predictively more deleterious, regardless of their known functions. In the second round we focused on known functions of the genes in combination with call counts. We first selected genes involved in immunity by comparing the genes of the variants with two databases, Gene Ontology (http://wiki.geneontology.org/index.php/Immunologically_Important_Genes) and Ensembl (http://www. ensembl.org/). We then further selected those with functions more likely to be involved in CHB, on the basis of existing knowledge and also with higher call counts. No indels passed these selection criteria. The processes for SNVs are detailed in Supporting Fig. 1A,B and related footnotes.
To ensure accuracy, all genotypes were determined by Sanger sequencing on the ABI 3730XL DNA analyzer, using a BigDye Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems, Foster City, CA) (primer sequences available on request). To identify whether there was subpopulation structure, five markers with different allele frequencies between northern and southern Chinese among ethnic Han Chinese subpopulations8 were tested for in 600 cases and 600 controls randomly taken from the cohort. These were typed by TaqMan assay (Applied Biosystems) (primer and probe sequences available on request).
Tests for Association, Effect Size Estimation.
Logistic regression was used to examine the association between the SNVs and CHB status with adjustment for sex and age. In the model, a SNV is entered as an explanatory variable, coded as 0, 1, and 2 for the number of copies of the minor allele in the SNV genotype, and case-control status is coded as the dichotomous (1, 0) response variable. In addition to P values based on asymptotic theory, the adaptive permutation option of PLINK9 (with maximum number of permutations per single nucleotide polymorphism [SNP] 10,000,000) was also used to calculate empirical P values in the logistic regression model. In order to examine the cumulative effects of the four loci, we collapsed the four SNVs into one explanatory variable, by counting the total number of risk alleles found in the individual locus analysis (actual range 0-3) in subjects who had complete genotype data for the four loci. The resulting data were analyzed as a 4 × 2 table with Fisher's exact test, and also by logistic regression analysis of CHB status against the number of risk alleles adjusted for age and sex, using commands in the R package.
Tests for Population Stratification.
The population structure was examined by the Hardy-Weinberg equilibrium test and an allelic association (Pearson chi-square) test between cases and controls, as described by Sokal and Rohlf.10 The Z-score test proposed by Lee11 and chi-square test of Pritchard and Rosenberg12 were applied to test for population stratification using all five SNPs.
Structural Modeling of the Mutant Proteins
To elucidate the potential molecular effects of the discovered mutations, modeling of their encoded proteins was performed using Discovery Studio 3.0 (Accelrys, San Diego, CA). A homology model of interferon alpha 2 (IFNA2) was constructed by MODELLER module using the NMR structure of IFNA2a (PDB ID: 1ITF) and the crystal structure of IFNA2b (PDB ID: 1RH2) as templates. The refined model of IFNA2 was validated by the VERIFY-3D program and the model of the IFNA2 p.Ala120Thr mutant version based on this. For NLR family member X1(NLRX1), a recently reported crystal structure (PDB ID: 3UN9)13 served as the template to perform the p.Arg707Cys mutation modeling. The complement component 2(C2) p.Glu318Asp mutation model was built on the crystal structure reported by Milder et al. (PDB ID: 2I6Q).14 It is not feasible to model the mutation on transmembrane protein 2 (TMEM2) at present, as very little is known of the structure of TMEM2 or its homologous proteins.
Expression of TMEM2
TMEM2 p.Ser1254Asn was found to be associated with CHB, but with no indications of immunological function of the wildtype protein. We therefore performed expression studies. Immunohistochemistry was performed on formalin-fixed and paraffin-embedded healthy liver tissues from 12 individuals, with polyclonal rabbit antihuman TMEM2 antibody (Aviva Systems Biology, San Diego, CA). The sections were incubated with the first antibody at 1:40-1:160 dilution at 4°C overnight. The second, peroxidase-labeled goat antirabbit/mouse antibody (Dako K5007, Carpinteria, CA) was applied to the sections for 30 minutes at 37°C and the sections were developed with Diaminobenzidine (DAB) solution. The staining was replicated in healthy liver tissues from another six subjects using rabbit polyclonal antibody to human TMEM2 from a different company (Jin Tiancheng, Beijing, China). Negative controls were performed with phosphate-buffered saline (PBS) replacing the first antibody preparation.
Real-Time Polymerase Chain Reaction (PCR) and Western Blotting.
Real-time PCR was performed in liver tissues from three CHB patients and normal liver tissues from three subjects who underwent surgical ablation of hemangioma in the liver. The latter had normal liver function (normal ALT, aspartate aminotransferase [AST], and total bilirubin) and were negative for HBsAg and HBeAg. Total RNA was extracted. Real-time PCR for TMEM2 was performed using the primers 5′-GGAGATATGCTCCGTCTGACC-3′ and 5′-CATCTGACTTGCCATACAAGGT-3′ and 5′-CCA TCTTCCAGGAGCGAGA-3′ and 5′-TGGTTCACA CCCATGACGAA-3′ for glyceraldehyde-3-phosphate dehydrogenase (GAPDH). Real-time PCR was also performed with the same primers in two cell lines (1) HepG2.2.15 containing the complete HBV genome and capable of stable HBV expression and replication in the culture system15 and (2) a non-HBV-containing HepG2 cell line (ATCC, Manassas, VA). The two cell lines were maintained in the exponential growth phase in Dulbecco's Modified Eagle's Medium (DMEM) (Life Technologies, Carlsbad, CA), supplemented with 10% fetal bovine serum, 100 units/mL penicillin, and 0.1% (w/v) streptomycin. The mean and standard error (SE) were calculated from three independent experiments.
Western blotting was also performed on the two cell lines. After lysis of the harvested HepG2 and HepG2.2.15 cells and gel electrophoresis, the rabbit antihuman TMEM2 antibody (Aviva Systems Biology) and mouse antihuman GAPDH monoclonal antibody (Kang Chen Biotech, Shanghai, China) were applied for detection of the proteins.
Apart from meeting all the criteria described in the “candidate selection” section above, TMEM2 p.Ser1254Asn was selected in the first round, as it had 14 “calls,” the most prevalent in the discovery group (Supporting Fig. 1A; Fig. 1). In the second selection round, four variants in the “case” group were selected: (1) IFNA2 p.Ala120Thr was selected by virtue of the known anti-HBV function of its wildtype; (2) NLRX1 p.Arg707Cys was selected because of its known function as a regulator in several antiviral pathways including those for production of type I interferon16; (3) Interleukin 1 receptor, type II (IL1R2) p.Arg372Trp was chosen because of its function in viral infection; (4) C2 p.Glu318Asp had the highest call count (six calls) among the genes concerned with immunity in exome sequenced cases and this mutation occurred at a normally highly conserved codon. In the control group endoplasmic reticulum aminopeptidase 1 (ERAP1) p.Pro184Arg was selected as it had the highest call counts (six calls) among the genes involved in immunity in exome sequenced controls and because of its central role in peptide trimming, a step required for the generation of most HLA class I-binding peptides17 (Supporting Fig. 1B; Fig. 1).
Associations of these variants were first tested in the 500 cases versus 500 controls taken randomly from the whole cohort. Four variants, TMEM2 p.Ser1254Asn, IFNA2 p.Ala120Thr, NLRX1 p.Arg707Cys, and C2 p.Glu318Asp passed the test and were further studied in the whole cohort (Supporting Table 3). These allelic variants achieved statistically significant association in the whole cohort after Bonferroni adjustment for six independent tests, whether assessed by asymptotic or empirical P values (Table 1). In all, 1,487 cases and 1,611 controls had the complete genotyping data for the four loci. When the four SNVs were combined in these cases and control subjects, the number of risk alleles was strongly associated with CHB status (P < 2.0 × 10−16) (Table 2), whereas IL1R2 p.Arg372Trp and ERAP1 p.Pro184Arg were discarded after the 500 cases versus 500 controls test (Supporting Table 3). Each of the five SNPs selected to examine hidden population structure in our samples was not significant in the tests of Hardy-Weinberg equilibrium and allelic association (Supporting Table 4). The P values of tests proposed to detect population stratification using all five SNPs by Lee11 and Pritchard and Rosenberg12 were 0.21 (Z score = 0.80) and 0.79 (χ = 2.35), respectively. These results provided no evidence for differences in genetic background between cases and controls, suggesting that spurious association due to population structure was unlikely to occur (Supporting Table 4). Our Sanger sequencing in the control subjects also showed that the four SNVs had the minor allele frequencies 0.003-0.036, confirming their rare variant status.
|Allele Counts‡||P Values|
|Chr.||Position*||Gene||RA†||Cases||Controls||Odds Ratios[95% CI]§||Asymptotic||||Empirical¶|
|Subject Counts||P Values‡|
|Risk Allele Number||Cases||Controls||Odds Ratio [95% CI]†||Fisher||Logistic|
Accuracy of Sanger sequencing also enabled us to extract data from individuals who carried more than one of the above associated mutations and from individuals who were homozygous for any of the four mutations from the whole cohort. The results were displayed in Table 3. Such detailed information is usually unavailable from genome-wide association studies.
The IFNA2 mutation p.Ala120Thr substitutes hydrophobic alanine with hydrophilic threonine, altering the hydrophobic nature of that region. Our modeling further suggests that a hydrogen bond would be formed between the newly introduced OH of Thr120 and the main chain carbonyl oxygen at the adjacent Asn116. Residue 120 is located at the C-terminus of helix C. The substitution at this position may affect the conformation of helix C and subsequently trigger movement of the connected loop region. A recently released crystal structure of IFN α-5 complexed with IFN-α/β binding protein C12R (PDB code: 3OQ3)18 has demonstrated that helix C is needed for the interaction between IFN α-5 and the binding protein. The key role of residue 120 in helix C for partner protein binding is also supported by the crystal structure of human growth hormone with its receptor,19 which has a similar structure. Therefore, a mutation at 120 from alanine to threonine may affect the interaction of IFNA2 with its receptor. In addition, in the wildtype protein the neighboring residue Cys121 forms a disulfide bridge with Cys24 (Cys24-Cys121) between helices A and C. The mutation at residue 120, therefore, may perturb the disulfide bridge and subsequently the structure of helix A (Fig. 2A).
The NLRX1 p.Arg707Cys variant also occurs at a highly conserved residue. The crystal structure of the C-terminal fragment of human NLRX1 shows that Arg707 is located on the leucine-rich-repeat 1(LRR1) region between a β strand and an α helix. The substitution of a large, basic arginine by a medium-sized and polar cysteine introduces a significant change in electrostatic potential around that exposed region (Fig. 2B). Consequently, this may have impact on the activity of the protein.
The C2 variant p.Glu318Asp is located on the hydrophilic side of the α helix (Fig. 2C). The model suggests that this substitution is not likely to influence the intramolecular interaction significantly. However, whether or not this site could affect interactions with other proteins is unknown.
Expression of TMEM2.
Immunohistochemistry with antihuman wildtype TMEM2 antibodies in healthy liver sections from 12 individuals detected strong, discrete, and granular cytoplasmic staining. The staining was further replicated with a different anti-TMEM2 antibody in liver sections from six additional individuals (Fig. 3A). Real-time PCR showed that the CHB liver tissues and HBV genome-containing HepG2.2.15 cell line expressed TMEM2 mRNA at reduced levels compared with healthy liver tissues (Fig. 3B), as did HepG2 cells devoid of the HBV genome (Fig. 3C) (P = 0.022 and 0.0036, respectively). Western blotting revealed reduced protein expression in HBV genome-containing HepG2.2.15 cell line when compared with HepG2 cells devoid of the HBV genome (Fig. 3D).
We have identified four rare missense mutations associated with CHB. To the best of our knowledge this is the first article reporting rare genetic variants associated with CHB and, furthermore, all the genes and their mutations are novel to CHB. During our investigation the mutations of IFNA2 p.Ala120Thr and NLRX1 p.Arg707Cys had not been in the HapMap and dbSNP 133 build (http://www.ncbi.nlm.nih.gov/projects/SNP/), although they appeared later in the dbSNP 134/135 builds as SNPs with no indication for their biological significance. The TMEM2 variant p.Ser1254Asn was entered in the dbSNP133 during our investigation with no indication of its immunological function. C2 p.Glu318Asp is reported in the literature,20 but not with regard to HBV infection.
The association of IFNA2 p.Ala120Thr with CHB produced the highest OR (4.08) of the genes tested. Interferons have potent activity against many viruses, including HBV,21 as evidenced by their efficacy in CHB therapy. We have found no reports of coding variations of interferons being associated with CHB. Codon 120 where the alanine to threonine substitution occurs is believed to be the key residue for ligand and receptor binding (see Results).19 Our analysis also suggests that this variation may change the conformation of helix C, which could thereby initiate relocation of the connected loop region and interfere with formation of the disulfide bridge (Cys24-Cys121) between helices A and C (Fig. 2A). Such a structural change would be likely to diminish the efficacy of wildtype interferon in CHB, pointing to a possible antiviral contribution of type I IFN to the resolution of chronic HBV infection.
NLRX1 is believed to function as a negative regulator of the ancient mitochondrial antiviral response.22, 23 The mechanism is believed to operate through the retinoic acid-inducible gene (RIG-I) and Toll-like receptor (TLR) signaling pathways depressing production of type I interferons and nuclear factor-kappa B (NF-κB).22, 23 However, it has also been reported that NLRX1 plays a proinflammatory role by amplifying the reactive oxygen species induced by the NF-κB and JNK pathways.24 Notwithstanding these differences of opinion, our findings support a role for NLRX1 in combating CHB infection. The mutant gene product may evoke a more potent inflammatory response, thereby contributing to CHB pathogenesis.
C2 is part of the membrane attack unit of complement C4b2a3b that causes cell lysis. Its antiinfective role is supported by a previous observation that carriers of the same mutation have higher mortality rates and more complications of infection.20 Our study is the first to show an association of this variant with CHB, suggesting that an unimpaired complement system may play an important, although as yet unexplained, role in anti-CHB infection.
The TMEM2 p.Ser1254Asn variant yielded the most significant P value (<1.0 × 10−7) of all the SNVs tested. This protein is considered to belong to the transmembrane protein superfamily.25 It has been shown that four members of this family mediate innate immune responses and restrict infections by H1N1, West Nile, Dengue viruses, SARS-CoV, and filoviruses upon stimulation of interferons. Hence they are named the interferon-inducible transmembrane proteins 1, 2, 3, and 5, respectively.26, 27 Although there are currently no published indications of immunological function for TMEM2, our study demonstrates an association with CHB. Immunohistochemistry revealed strong, discrete, and granular cytoplasmic staining in healthy hepatocytes, suggesting TMEM2 may be a normal constitutive membrane component of hepatocellular organelles. Reduced expression of TMEM2 in the CHB liver tissues and the cell line infected with HBV suggests TMEM2 could possibly be a contributor to innate or adaptive antiviral immunity. The missense mutation may reduce or demolish its antiviral function and thereby predispose carriers to CHB. However, in the absence of plausible explanatory mechanisms we cannot exclude the possibility that the mutant allele we have identified may show association with CHB by virtue of linkage disequilibrium with another mutant gene of greater intrinsic significance.
Considering the antiviral function of IFNA2, the regulatory role of NLRX1 in inducing type I interferon and NF-κB, and the roles of C2 in immunity, we suggest that these mutations could be causal for enhanced susceptibility to CHB. We also speculate that the mutant IFNA2 may have reduced antiviral function and that the mutant NLRX1 may evoke a more potent inflammatory response, contributing to CHB pathogenesis. The exact roles of TMEM2 p.Ser1254Asn and C2 p.Glu318Asp in CHB warrants further investigation.
The accuracy of Sanger sequencing enables us to extract data on individuals who carry more than one of the four associated mutations, providing direct evidence for the oligogenic or polygenic nature of susceptibility to CHB infection in these individuals. Taken together, our study provides compelling evidence for several rare inborn genetic defects contributing to host susceptibility to CHB. Our results also show that it is feasible to use exome sequencing to investigate the genes underlying complex diseases and underline its advantage in identifying variants with clear functional implications. The strategy we developed in this investigation suggests an approach to investigation of other complex diseases: perform exome sequencing in a discovery group comprising “susceptible” cases along with “resistant” controls; select rare variants by knowledge of the genes' known functions and their frequencies in the discovery group; and finally test their association with disease in the whole experimental cohort. Our results also show the potential for novel therapies and personalized medicine, such as administration of interferon to those who bear disabling mutations in their interferon genes.