Host genetic factors associated with hepatocellular carcinoma in patients with hepatitis C virus infection: A systematic review

Summary Hepatitis C virus (HCV)‐infected patients are at risk of developing hepatocellular carcinoma (HCC). Individuals at heightened risk could be targeted by intensive follow‐up surveillance. We have conducted a systematic review of the literature to identify host genetic predisposition to HCC in HCV‐infected patients. A comprehensive search of Medline and Embase databases was performed, and the strength of evidence of associations for each gene on development of HCC was evaluated. We identified 166 relevant studies, relating to 137 different genes, or combinations thereof. Seventeen genes were classified as having “good” evidence of an association, a significant association was observed for 37 genes but this finding had not yet been replicated, 56 genes had mixed or limited evidence of an association, and 27 genes showed no association. IFNL3 /4, TNF‐α and PNPLA3 genes had the most evidence of an association. There was, however, considerable heterogeneity in study design and data quality. In conclusion, we identified a number of genes with evidence of association with HCC, but also a need for more standardized approaches to address this clinically critical question. It is important to consider the underlying mechanism of these relationships and which are confounded by the presence of other HCC risk factors and response to therapy. We also identified many genes where the evidence of association is contradictory or requires replication, as well as a number where associations have been studied but no evidence found. These findings should help to direct future studies on host genetic predisposition to HCC in HCV‐infected patients.


Chronic hepatitis C virus (HCV) infection is one of the aetiolog-
ical factors underlying the development of hepatocellular carcinoma (HCC), a commonly observed tumour whose incidence is increasing. 1 Typically, HCC develops after sustained liver damage, where disease usually progresses from mild to severe fibrosis, then cirrhosis and eventually to HCC. Only a proportion of patients with cirrhosis will develop HCC (1%-7% per year). 2 The question of which HCV-infected patients are at risk of developing HCC is an important one, especially in the era of effective direct-acting antiviral (DAA) oral treatments. 3 Even if most patients are likely to become free of HCV through DAA therapy, it remains important to assess which patients need continuing surveillance for early detection and treatment of HCC and other liver disease. While there are many risk factors for HCC, including alcohol use, viral hepatitis and metabolic diseases, host genetics are likely to play a crucial role. Knowledge of host genetics therefore could add discriminatory value to risk prediction tools, allowing better stratification and personalized assessment of optimal longterm management, thereby increasing the efficacy of surveillance programmes.
This systematic literature review identified publications which examine the association between specific human genes/single nucleotide polymorphisms (SNPs) and the occurrence of HCC. We have aggregated and graded the evidence for each studied gene according to the strength of evidence for an association with HCC. As cirrhosis itself is a risk factor for HCC development, we also included reports of the association between host genetics and cirrhosis and fibrosis progression. Our aim in undertaking this task was purely utilitarian, that is with a view to developing appropriate genetic testing to aid in patient stratification and clinical management-as opposed to a mechanistic approach aiming to shed light on the molecular processes underlying the pathogenesis of HCC. We identify a large number of human genes which have been studied in relation to HCV and HCC, and classify these genes according to the strength of evidence of an association.

| Search strategy
We conducted a comprehensive literature search in Ovid MEDLINE and Embase databases for relevant papers, using the search strategy listed in Appendix S4, divided into the following categories: hepatocellular carcinoma, hepatitis C, genetics and risk/associations. The search strategy was developed by two researchers and checked by another independently.
To refine the search, review papers were excluded and studies were limited to the English language. Studies were limited to humans.

| Study selection
Studies were included if they: • Included patients infected with HCV, • Evaluated associations between germline polymorphisms and HCC, cirrhosis or fibrosis, • Had a case-control design or • Were a relevant meta-analysis. Study eligibility was assessed by two independent reviewers, firstly at the title level, then through abstract assessment and then by assessing the full texts. Reviewer discrepancies were resolved with the assistance of a third reviewer.

| Data extraction
We extracted odds or hazard ratios from the selected papers, along with pertinent details of the study, including SNP reference number where available (rs#), sample size, data on adjustment or matching, the outcome studied and the comparison/ control group used. Where multiple genes and/or SNPs were analysed within the same study, we extracted data for each gene/ SNP. Similarly, if a study used multiple outcomes (ie HCC, cirrhosis or fibrosis), we extracted results for each of these. In cases where there were multiple comparison groups (eg HCC patients being compared with both HCV-infected patients and healthy controls), we extracted data for the HCV-infected comparison. Full data tables are available in Appendix S3 (here: https://figshare. com/s/2ffc9030826df2fe150e).

| Classification of studies-strength of evidence
In order to provide an indication of the strength of evidence for each gene, all studies relating to each gene were collated together and each gene was classified into one of four categories. These categories were determined a priori and broadly were based upon genes having had a significant association which was replicated in a separate study and an absence of substantive disagreement.
Criteria for strength of evidence: • Genes/SNPs with strong evidence ○ ≥2 studies with a significant association (as defined by each study) with HCC, and ○ Absence of substantial disagreement/negative studies or ○ Positive meta-analysis; • Studies in need of replication ○ Only one HCC study performed, but with a significant association found; • Genes/SNPs with some/mixed evidence ○ Multiple studies carried out but with disagreement or ○ Positive studies, but only in cirrhosis or fibrosis, not HCC; • Genes/SNPs with no evidence ○ Only negative studies or ○ Negative meta-analysis (regardless of other studies).

| Identified study characteristics
The search criteria identified 1668 unique studies for review ( Figure 1). After review, 166 studies investigating the association of host genetic polymorphisms with fibrosis progression, cirrhosis or HCC development in chronic HCV-infected patients were selected as being relevant. One hundred and fifty-six of these were original research articles, of which 150 were candidate gene studies, and 6 were genomewide association studies (GWAS). The remaining studies (10) were meta-analyses. HCC was used as an outcome in 90 studies, while 45 used cirrhosis and 41 used fibrosis (some studies had multiple outcomes). The 166 selected papers are listed in Appendix S2.
There was considerable heterogeneity between studies in a number of parameters: various comparison groups were used, the most common being HCV-infected patients without HCC, cirrhosis or fibrosis (74.7% of all studies). These comparison groups differed substantially between studies using cirrhosis/fibrosis as an outcome-which almost all compared with HCV-infected patients (92.1%)-and HCC as an outcome, where healthy subjects (21.1%) and HCV cirrhosis patients (10.0%) were more frequently used.

| Genes/SNPs with strong evidence
We identified 17 genes with strong evidence of an association with HCC in HCV-infected patients (Table 1 and Appendix S3 available at https://figshare.com/s/2ffc9030826df2fe150e), although even within this group the level of evidence varied substantially. The number of HCC studies for each gene ranged from 2 to 7 (median 3), while for 8 of these genes, there were additional studies which found an association with cirrhosis or fibrosis. A total of 42 original studies involving HCC as an outcome made up the evidence base for these genes. An additional 25 studies involved cirrhosis or fibrosis as an outcome. A large majority of these studies used HCV-infected patients (with or without cirrhosis) as a control group, although some (which we identify below) used healthy subjects as controls. Table 5 indicates which studies used HCV-infected patients with cirrhosis but without HCC as their control group. Matching and adjustment were highly variable between studies, and this is best explored using the data in Appendix S3. The total number of patients stud-

| ALDH2
The ALDH2 gene encodes the mitochondrial aldehyde dehydrogenase enzyme. Two reports studied ALDH2, both of which used HCC as the outcome and were candidate gene studies, comprising a total of 638 patients. Kato et al 4 found a significant association in 170 patients, with controls being age-and gender-matched HCV-infected cirrhosis patients. Alcohol was not adjusted for in this analysis but was in the second paper 5 where a significant association was determined in 468 patients, where the comparison group was HCVinfected patients without cirrhosis.

| CAT
The antioxidant enzyme catalase is encoded by the CAT gene. A total of 482 patients were studied across two different studies, both of which found an association with the SNP rs1001179. 6,7 There was a large disparity in odds ratio between these two studies (13.6 and 1.74), which may be explained in part by the different comparison groups used (noncirrhotic and cirrhotic HCV-infected patients, respectively).

| EGF
Three original studies were identified investigating the rs4444903 SNP of the EGF gene, two studying HCC 8,9 (578 patients in total) and one studying fibrosis, with a variety of comparison groups (including HCV-infected and HCV-related cirrhosis patients). The EGF gene encodes epidermal growth factor. All studies found a significant association with the outcome, although for one, 9 the association was only significant for the A/A to A/G comparison (and not for A/A to G/G). Additionally, a meta-analysis 10 describing comparisons between HCC patients and both patients with liver disease and healthy controls found a significant overall effect.

| GSTM1
A deletion of the GSTM1 gene-which encodes the glutathione S-transferase mu 1 protein-was associated with HCC in two studies. 11,12 A relatively small total number of patients (n = 189) were included across these two studies, and the applicability to HCV is potentially questionable, given that only subsets of the patients in both studies were HCV-infected, with the remaining patients having cirrhosis caused by other aetiologies.

| GSTT1
GSTT1 (encoding glutathione S-transferase theta 1) is related to GSTM1, and indeed, the same studies 11,12 (and same 189 patients) which identified an association with the GSTM1 gene deletion also found a similar association between GSTT1 gene deletion and HCC. The same caveats regarding applicability to HCV also apply.

| HLA
While not a single gene, the HLA gene complex (which encodes the major histocompatibility complex) has been widely studied with respect to its association with HCC, cirrhosis and fibrosis. Three of the HLA genes HLA-Bw4, 13 HLA-B18 14 and HLA-DR11 14 were reported to be associated with HCC in two studies by the same group, which investigated a total of 293 patients. An additional four studies found associations between various HLA genes/alleles and cirrhosis, including two candidate gene studies and a GWAS.

| HLA-Bw4+KIR3DS1
In addition to the evidence of HLA genes associated with HCC alone, three studies 13,15,16 (with a total of 776 patients) identified the pairing of HLA-Bw4 and KIR3DS1 (which encodes a transmembrane glycoprotein expressed by natural killer cells) in combination as being associated with HCC.

| HSPA1B
Two studies (with a total of 666 patients) identified an association between HSPA1B-which encodes a heat-shock protein-and HCC in HCV-infected patients. 17,18 One study of 366 patients 18

| IFNL3/IFNL4
The IFNL3 (previously IL-28B) and IFNL4 genes encode the interferon lambda 3 and lambda 4 cytokines, respectively. More studies were identified for IFNL3/IFNL4 than any other gene, with seven original studies (with a total of 3,154 patients) and three meta-analyses using HCC as an outcome and six using cirrhosis or fibrosis. Additionally, one study found a significant association with cirrhosis, although most studies using cirrhosis or fibrosis as an outcome did not find an association. Four of the studies were meta-analyses, with three of these focusing on rs12979860 and HCC 26-28 and one focusing on fibrosis. Of these, two meta-analyses found a positive association between rs12979860 and HCC 26,27 and one found an association between fibrosis and rs12979860.

| IL-1b
The IL-1b gene encodes the cytokine interleukin-1 beta. Four studies determined a significant association between IL-1b and HCC in HCVinfected patients, [29][30][31][32] whereas two studies investigated the association with cirrhosis and fibrosis and found no such association. A total of 1226 patients were included in the HCC studies, one study used healthy controls, 29 two used HCV-infected controls, 31,33 and the remaining one used controls with HCV-related cirrhosis. Most of these studies also assessed or adjusted for differences in factors such as HCV genotype and alcohol use.

| MDM2
Three studies were identified for MDM2 (mouse double minute 2 homolog, a regulator of the p53 tumour suppressor), all using HCC as an outcome, 5,34,35 with a total of 745 patients. Of the two original studies, one found a positive association, 5 the one that did not had a small sample size and used healthy subjects as the control group. The remaining study, 35 a meta-analysis, found a significant association in the HCV-infected subgroup. All studies found at least one significant association within the MICA gene, although some tested multiple SNPs and found associations in some but not all of the tested SNPs. 36,37 All studies used HCC as an outcome and one also found an association with cirrhosis.

| MnSOD
The MnSOD gene encodes the manganese-dependent superoxide dismutase mitochondrial protein. Four relatively small studies (total 755 patients) investigated the association between HCC and MnSOD. 6,7,39,40 All studies used an HCV-infected comparison population, and three of the studies found a significant association. 6,39,40 The remaining study did not, perhaps due to difference in ethnicities between this and the other studies (Caucasian rather than North African).

| PNPLA3
Nine original studies and one meta-analysis studied the PNPLA3 gene Results for individual studies, including hazard/odds ratios, significance, sample size and inclusion/exclusion criteria can be found in Appendix S3.

TA B L E 2 (Continued)
TA B L E 3 Genes identified as having some or mixed evidence of an association with HCC, in patients with HCV

Cirrhosis or fibrosis
Candidate gene

Meta-analysis
Positive associations/number of comparisons

Cirrhosis or fibrosis
Candidate gene

Meta-analysis
Positive associations/number of comparisons while three did not. This may be due to these studies being performed in a different population (Japanese) than the other studies.

| TGF-ß1
TGF-ß1 is a gene encoding transforming growth factor beta 1. Six studies were identified containing 1356 patients. Of these, just two (with 705 patients) 45,46 used HCC as an outcome and both of these found a significant association using HCV-infected controls. All but one of the studies using cirrhosis or fibrosis as an outcome found significant associations.

| TNFα
The TNF-α gene encodes the tumour necrosis factor alpha cytokine.
Ten studies studied TNF-α SNPs, with four of them focusing on HCC

| UGT1A7
UDP glucuronosyltransferase 1 family, polypeptide A7, is encoded by the UGT1A7 gene. Five of the six studies which focused on UGT1A7 used HCC as an outcome, 33,[49][50][51][52] and all but one of these 49 found a significant association. A total of 1450 patients made up these studies, and they used a variety of control groups including HCV-infected patients and three studies using healthy controls. The one study which used cirrhosis as an outcome (and healthy subjects as the control) also found a positive association.

| Studies in need of replication
We identified 37 genes which had a positive association with HCC in a single study, but with no confirmatory data from additional studies ( Table 2

| Genes/SNPs with some/mixed evidence
A total of 53 genes or gene combinations were classified as having some or mixed evidence of an association with HCC, the classification being derived from a number of reasons (Table 3 and Appendix S3 available at https://figshare.com/s/2ffc9030826df2fe150e).

Cirrhosis or fibrosis
Candidate gene

Meta-analysis
Positive associations/number of comparisons Results for individual studies, including hazard/odds ratios, significance, sample size and inclusion/exclusion criteria can be found in Appendix S3.

TA B L E 3 (Continued)
For example some genes, such as FAS, had only studies relating to cirrhosis and/or fibrosis. Others, such as CYP2E1, have one study finding a positive association with HCC, but then others which failed to replicate this association. The total number of patients studied for each gene ranged from 30 to 6472 (median 1223), while the median sample size for each individual study (not including meta-analyses) was 296 (range 30-6472).

| Genes/SNPs with only negative studies
There were 27 genes identified where none of the studies found a significant association between the gene and liver outcomes (Table 4 and Appendix S3 available at https://figshare. com/s/2ffc9030826df2fe150e). All but two of the genes (CYP1A2 and CYP2R1) had just one study associated with them, although some looked at multiple SNPs within the same gene. Seventeen of the studies investigated the association with HCC and 13 with cirrhosis or fibrosis. The median sample size for each individual study (not including meta-analyses) was 262 (range 61-5604).

| D ISCUSS I ON
The most striking features of this review are firstly the wide breadth,   Table 5. There was also substantial variation in the degree of adjustment for confounding factors. Age and sex adjustment was common, and alcohol intake was also used (although it is difficult to collect reliable alcohol data). Relatively few studies were There is a clinical need to identify patients who are at increased risk of HCC in a population of HCV-infected patients. Given this, it is conceivable that the genes identified as having strong evidence of an association could, with appropriate further investigation and validation, be combined into a genetic risk score for HCC in HCVinfected patients. The cirrhosis risk score 7 (CRS7) aims to do this for TA B L E 5 As Table 1, but restricted to studies that use patients with HCV cirrhosis as a control group Caucasian, Egypt cirrhosis and fibrosis, 53,54 although we did not find any significant associations between CRS7 and HCC in our search. Combining a putative HCC risk score with other clinical data could create a composite score with a powerful ability to stratify and predict risk, and thereby impact on patient care pathways.
We also identified 37 genes, which had some evidence of an association, but without replication in an independent study. These genes therefore have potential-but not yet appropriately determined-utility in a composite score as described above. The  Equally, we did not aim to determine any definitive causality between the genes identified in this study and the liver disease outcomes, taking a utilitarian, not mechanistic approach to the data. Some of the described genetic associations may arise from direct effects of the SNPs on some aspect of carcinogenesis.
However, others may arise in the context of potential confounding effects, with potential mechanisms other than direct induction of HCC. For example, genes such as IFNL3 and IFNL4 are known to be associated with treatment response in those treated with interferon regimens. 55 For these genes, therefore, the causal mechanism for the association with HCC incidence may be related to the duration of infection, rather than any direct effect on liver disease progression or hepatocarcinogenesis. The IFNL3/IFNL4 studies featured here were broadly not able to adjust for this duration of infection. In the new era of DAA treatments, it is possible that these associations will cease to exist as response to interferon therapy becomes less relevant. Another example of potential confounding is PNPLA3, which has previously been linked to alcoholic liver disease. 56 Although some studies adjusted for alcohol consumption, it is almost certain that at least some of the mechanism of association with HCV is due to confounding by alcoholic liver disease. These alternative mechanisms do not necessarily detract from the gene's utility as a predictor of HCC, but must be carefully considered in any modelling.
We included the evidence surrounding cirrhosis and fibrosis within this review as the typical disease course tends to progress from fibrosis to cirrhosis and eventually HCC. However, only a small percentage of patients who have cirrhosis are likely to get HCC (1%-7% per year). 2 This means that genes associated with cirrhosis and fibrosis do not necessarily translate to risk of HCC as well. It may be that different types of genes, such as those involved in cell cycle regulation, are more likely to be associated with HCC, while other genes are likely to be involved in the pathogenesis of fibrosis and cirrhosis.
It is crucial that evidence for each gene should be assessed in detail before drawing conclusions as to their utility for risk stratification. This systematic review of the evidence surrounding the link between host genetic factors and HCC has produced a significant aggregation of the evidence available within the literature addressing this topic. We have identified a number of methodological difficulties in studies relating to this issue and the need for a more rigorous and systematic approach to identification and validation of candidate genes. We have also identified a number of host SNPs with at least some validated evidence of an association with HCC in HCV-infected patients, which we hope will act as a springboard to further studies and the ultimate identification of a clinically useful host genetic signature enabling stratification of individual patient risk of HCC development.

ACK N OWLED G EM ENTS
This study was funded by the MRC Stratified Medicine Programme STOP-HCV Award MR/K01532X/1.

AUTH O R S' CO NTR I B UTI O N S
WLI and AJW conceived the review. CJP and AJW performed the literature review. All 4 named authors performed data analysis, constructed and critically reviewed the manuscript. STOP-HCV provided the funding for the study.