Effect of common genetic variants on the risk of cirrhosis in non‐alcoholic fatty liver disease during 20 years of follow‐up

Abstract Background and Aims Several genotypes associate with a worse histopathological profile in patients with non‐alcoholic fatty liver disease (NAFLD). Whether genotypes impact long‐term outcomes is unclear. We investigated the importance of PNPLA3, TM6SF2, MBOAT7 and GCKR genotype for the development of severe outcomes in NAFLD. Method DNA samples were collected from 546 patients with NAFLD. Advanced fibrosis was diagnosed by liver biopsy or elastography. Non‐alcoholic steatohepatitis (NASH) was histologically defined. Additionally, 5396 controls matched for age, sex and municipality were identified from population‐based registers. Events of severe liver disease and all‐cause mortality were collected from national registries. Hazard ratios (HRs) adjusted for age, sex, body mass index and type 2 diabetes were estimated with Cox regression. Results In NAFLD, the G/G genotype of PNPLA3 was associated with a higher prevalence of NASH at baseline (odds ratio [OR] 3.67, 95% CI = 1.66–8.08), but not with advanced fibrosis (OR 1.81, 95% CI = 0.79–4.14). After up to 40 years of follow‐up, the PNPLA3 G/G genotype was associated with a higher rate of severe liver disease (adjusted hazard ratio [aHR] 2.27, 95% CI = 1.15–4.47) compared with the C/C variant. NAFLD patients developed cirrhosis at a higher rate than controls (aHR 9.00, 95% CI = 6.85–11.83). The PNPLA3 G/G genotype accentuated this rate (aHR 23.32, 95% = CI 9.14–59.47). Overall mortality was not affected by any genetic variant. Conclusion The PNPLA3 G/G genotype is associated with an increased rate of cirrhosis in NAFLD. Our results suggest that assessment of the PNPLA3 genotype is of clinical relevance in patients with NAFLD to individualize monitoring and therapeutic strategies.


| INTRODUC TI ON
Non-alcoholic fatty liver disease (NAFLD) is the most common chronic liver disease globally with an estimated prevalence of 25%. 1 The increased prevalence of NAFLD is linked to a similar increase in obesity and type 2 diabetes mellitus (T2D). 2,3 The histopathological spectrum of NAFLD ranges from simple steatosis to non-alcoholic steatohepatitis (NASH), fibrosis and cirrhosis. 4 There is evidence that genetic factors significantly affect the risk of such disease progression in NAFLD. [5][6][7] A large body of evidence supports associations for several genetic variants with different traits of NAFLD, including histologically defined NASH, fibrosis and cirrhosis. [8][9][10][11][12][13][14][15][16][17][18][19][20] Some of the most studied genetic vari- Furthermore, since the initial discovery of an association between these four genes and NAFLD severity, several studies have investigated their combined effect on the risk of progressive disease in NAFLD. In a study on 1515 patients with histologically determined NAFLD, a polygenic risk score (PRS) was constructed. This PRS correlated to the presence of hepatic fat content, NASH and liver fibrosis. 21 However, most of the evidence stem from studies with a crosssectional design and previous longitudinal cohort studies did not include patients with histologically defined NAFLD or had short time of follow-up. [22][23][24] In the present study, we performed a longitudinal multicentre study to investigate how genetic variants of PNPLA3, TM6SF2, MBOAT7 and GCKR affect the risk of developing severe liver disease, cardiovascular disease (CVD) and overall mortality in a cohort of well-characterized patients with NAFLD with extensive follow-up.

| The NAFLD population
A total of 901 individuals diagnosed with NAFLD at the Karolinska University Hospital or Linköping University Hospital, both Sweden, between 1974 and 2019 were screened for inclusion. Subjects were collected from an ongoing study, Fatty Liver In Sweden 2 (FLIS-2, n = 62) and from two previously described study cohorts, Fatty liver in Sweden 1 (FLIS-1, n = 143) and long-term follow-up of NAFLD (LTU, n = 696). 25,26 The NAFLD diagnosis was established by standard clinical investigation, that is, confirmed steatosis on imaging or liver biopsy in combination with alcohol consumption of less than 140 g/week for women and 210 g/week for men and the exclusion of any other chronic liver disease or steatogenic medications. Patients with concurrent liver diseases such as alcohol-related liver disease, chronic viral hepatitis B and C were excluded. Samples for DNA analysis could be collected from 592 NAFLD subjects, 302 from living and 290 from deceased subjects. In total, 546 samples were successfully characterized for PNPLA3 allele type, 523 for TM6SF2, 532 for MBOAT7 and 535 for GCKR. Figure 1A describes the inclusion of study participants. Blood tests, anthropometric measures and data on comorbidities were collected at baseline.

| Reference population and firstdegree relatives
For each subject with NAFLD, up to 10 reference individuals from the Swedish general population, matched for age, sex and municipality at the year of diagnosis were identified by linkage to the Total Population Register. 27 This matched cohort was subsequently linked to several other national registers (described below), allowing for identification of outcomes and registry-based covariates. After exclusion of reference individuals with a diagnosis of alcohol use disorders (n = 221), other chronic liver diseases (n = 105) or NAFLD before baseline (n = 2), the reference population consisted of 5234 individuals. A total of 421 first-degree siblings of the NAFLD patients were also identified in the Total Population Register. Of these, one was excluded due to a diagnosis of NAFLD before baseline and seven due to other liver diseases. Thus, 413 siblings were included in the analysis ( Figure 1B).

Lay summary
Studies show that certain common genetic variants are linked to an increased risk of inflammation and scaring of the liver (fibrosis) in people who have non-alcoholic fatty liver disease (NAFLD). In this study, we gathered information about four genetic variants in a group of 546 persons with NAFLD, matched to up to 10 individuals from the general Swedish population. Information about the event that someone developed cirrhosis, cardiovascular disease, or died, was collected from national patient registers and we were able to follow the group over a period of 20 years. We found that NAFLD patients with a specific variant of a gene called PNPLA3 had a higher risk of developing cirrhosis.
The risk was increased both compared with that of NAFLD patients with the normal gene variant and compared with the general population.

| NASH and liver fibrosis
Liver biopsy had been performed in 496 subjects (90.8%). The NAFLD activity score (NAS) was calculated by summing the degree of steatosis (0-3), lobular inflammation (0-3) and hepatocellular ballooning (0-2) according to Kleiner et al. 28 The fatty liver inhibition of progression algorithm was used to define the presence of NASH. 29 Fibrosis stage was assessed according to the classification by Kleiner. 28 In subjects where liver biopsy was not available transient elastography (Fibroscan® Echosens) was used to categorize fibrosis. A liver stiffness of ≥15 kPa was defined as advanced fibrosis. Rev. D.0) and as previously described. 30 Briefly, the FFPE slides were deparaffinized and digested with protease. The samples were then loaded on pure link columns. The DNA bounded to the column was eluted in a fresh tube while the flow-through RNA containing was applied to a new column, treated with DNase, and then eluted from the filter to recover RNA.

| Calculation of genetic risk score
We used the model previously described by Dongiovanni et al. to calculate a PRS. 21 The score summarizes the total number of alleles in the four investigated genotypes weighted by their individual effect size using linear regression and was shown to correlate strongly to histopathological traits of NASH and fibrosis stage. 21

| Outcomes
2.4.1 | Severe liver disease, cardiovascular events and overall mortality Every individual residing in Sweden has a unique personal identity number that is linked to several registers, including the National F I G U R E 1 Flowchart describing patient selection and exclusion criteria in the NAFLD cohorts (A) and matched reference population and first-degree relatives (B). Abbreviations: LTU = the Long-term follow-up of NAFLD study. FLIS-1 and 2 = the Fatty Liver in Sweden Study 1 and 2, NAFLD, non-alcoholic fatty liver disease, ALD , alcohol-related liver disease.
Patient Register, 32,33 the Swedish Cancer Register 34 and the Causes of Death Register. 35 For both the NAFLD cohort and the reference population, we received data from these registers on diagnoses of other liver diseases than NAFLD, cirrhosis, decompensation events, liver transplantation, cardiovascular events, alcohol-and drug-associated disorders, hepatocellular carcinoma (HCC) and date and cause of death. All subjects were followed from the date of inclusion until the date of an outcome, a censoring event, or 31 December 2019. Censoring events in the analysis of overall mortality were emigration, liver transplantation or diagnosis of liver diseases other than NAFLD after the index date.
In the analyses of CVD and severe liver disease, death not due to the outcome (CVD-or liver-related, respectively) were also considered a censoring event. The ICD-8, ICD-9 and ICD-10 systems were used to define outcomes in the registries. Cardiovascular events were defined as acute ischaemic heart disease or acute cerebrovascular disease. Severe liver disease was defined as a diagnosis of cirrhosis, decompensation with ascites, oesophageal varices, hepatic encephalopathy, portal hypertension, hepatorenal syndrome or HCC. The ICD codes used to define outcomes are shown in Table S1.

| Statisticalanalysis
Baseline characteristics of the NAFLD cohort were calculated using summary statistics and are shown as median values with interquartile ranges (IQR) for continuous parameters or as total numbers and percentages for categorical parameters. The association between allele type and the presence of NASH and advanced fibrosis (stage 0-2 vs stage 3-4, or less or more than 15 kPa in those with only Fibroscan assessment of fibrosis) at baseline was calculated only in patients with NAFLD (as there were no genetic data in reference individuals or siblings), using logistic regression presented as odds ratios (OR) with 95% confidence intervals (CI). In a second model, we adjusted for age, sex, body mass index (BMI) and T2D at baseline.
Cox regression was used to estimate rates of severe liver disease, cardiovascular events and overall mortality respectively. In the NAFLD population, we compared subgroups of patients with NAFLD per allele type, using the wild type as the reference group. The primary regression model was unadjusted, whereas the second model was adjusted for age, sex, T2D and BMI. As fibrosis might not be a confounder, but rather a mediator, we did not adjust for fibrosis in the model. Instead, we stratified the cohort on the presence of advanced fibrosis at baseline and examined the rates of each outcome separately in these strata. However, as a sensitivity analysis, we additionally adjusted the regression model for advanced fibrosis.
Separately, we investigated rates of outcomes in NAFLD compared with matched controls per allele type (e.g. comparing patients with PNPLA3 G/G to matched controls and PNPLA3 C/C to matched controls, separately). This Cox model was conditioned on the matching factors (age, sex, calendar year of diagnosis and municipality). Estimates from the Cox models are presented as hazard ratios (HRs) with 95%CIs. The aim of the study was to examine the etiological association between genotypes and outcomes. Therefore, we did not use a competing risk framework as the main analysis. Cox regression is preferred when the research question is considering etiological associations, whereas the competing risk framework is preferred when calculating cumulative incidence. 36 However, as a second sensitivity analysis, we performed a competing risk analysis using the Fine-Gray regression model where death from other causes than severe chronic liver disease was defined as competing risk.

| Ethicalconsiderations
The study was approved by the regional ethics committee in  Table 1.

| Associationbetweenalleletypeand severityofNAFLDatbaseline
In the NAFLD cohort, the G/G-genotype of PNPLA3 was associated with a higher risk of NASH in both crude (OR 3.42, 95% CI = 1.68-6.95) and adjusted analyses (aOR 3.67, 95% CI = 1.66-8.08), while no association with any allele type of TM6SF2, MBOAT7 or GCKR and the presence of NASH was seen. The PRS was associated with a higher risk of NASH in both crude and adjusted analysis (OR 3.59 per unit increase, 95% CI = 1.53-8.43; aOR 3.81 95% CI = 1.48-9.81); hence, this was largely driven by the PNPLA3 component. No association was seen between PNPLA3, TM6SF2, MBOAT7, GCKR or the PRS and advanced fibrosis ( Table 2).

| Associationbetweenalleletypeandrateof developmentofsevereliverdiseaseinNAFLD
In the analysis restricted to the NAFLD cohort, over a median followup of 19.6 (0.1-40.0) years, 78 events of severe liver disease were observed, of which there were 35 cases of liver decompensation and 20 cases of HCC. The G/G-genotype of PNPLA3 was associated with a higher rate of developing severe liver disease compared with the C/C genotype in both the crude analysis (HR 2.14, 95% CI = 1.17-3.91) and when adjusting for age, sex, T2D and BMI (aHR 2.27, 95% CI = 1.15-4.47) ( Table 3).

| Associationbetweenalleletype,rateof CVDandriskforoverallmortalityinNAFLD
A total of 195 subjects (35.7%) in the NAFLD cohort had a CVD outcome. The incidence rate of cardiovascular events did not differ significantly between any allele type of PNPLA3, TM6SF2, MBOAT7 or GCKR ( Table 4). During follow-up, 255 (46.7%) of the subjects in the NAFLD cohort died. No significant association was seen between any genotype or the PRS and higher risk of mortality. The same finding was seen in patients without advanced fibrosis at baseline (Table 5 and Figure 3).

| Associationbetweenalleletypeandrateof severeliverdiseasecomparedwiththereference populationandsiblings
In total, 5234 controls were matched to the 546 NAFLD subjects with PNPLA3 data. A total of 94 subjects in the reference population (1.8%) developed severe liver disease during follow-up and 74 (13.5%) in the NAFLD cohort. Subjects with NAFLD exhibited in general a higher rate of severe liver disease compared with the reference population (HR 9.00, 95% CI = 6.85-11.83). The rate was further increased among carriers of the G/G-genotype of PNPLA3 (HR C/C vs controls: 6.79, 95% CI = 4.02-11.48; HR G/G vs controls: 23.32, 95% CI = 9.14-59.47; p interaction = 0.017). For TM6SF2, GCKR, or MBOAT7, no association between genotype and rate of severe liver disease was observed ( Table 6).
Severe liver disease among siblings was rare (n = 18) and, therefore, no meaningful analyses could be performed.  (Table S4). Data on events of CVD in siblings were not available because of restrictions in data access from the national registers, why no analysis was possible.

| DISCUSS ION
In this large cohort study, we found that the PNPLA3 G/G genotype was associated with a more than twofold rate for the development of cirrhosis in NAFLD. The finding was consistent, both compared with other patients with NAFLD that carried the C/C genotype but also compared with reference individuals from the general population. Our results support a link between the G/G genotype in PNPLA3 and risk for the development of both cirrhosis and HCC in NAFLD. However, the association was restricted to patients without advanced fibrosis at baseline. This demonstrates that to establish the PNPLA3 genotype to estimate the risk of future liver-related events, is of interest mainly for patients who are being diagnosed with NAFLD early after onset. For patients who already have developed advanced fibrosis at diagnosis, the disease promoting effect of the G/G genotype seems to be of less importance.
The results are consistent with previous findings. In a recent study on more than 80 000 obese individuals from the UK biobank with similar outcome measures, the PNPLA3 C > G allele was shown to increase the risk of severe liver disease 1.6-fold. 37 In addition, in a recent study on 471 NAFLD patients prospectively enrolled and followed for a median of 5.4 years, a twofold increased risk of severe liver disease among carriers of the C > G allele of PNPLA3 was reported. 38 Our data extend these findings by a much longer follow-up and by active comparison with the general population.
The effect of the PNPLA3 G/G genotype on the risk of developing severe liver disease was more pronounced for patients without advanced fibrosis at baseline. Fibrosis is the most important predictor of long-term prognosis in NAFLD. 26 A plausible reason could be that in patients who have already developed advanced fibrosis, the additive effect on progression caused by genetic traits is of lower importance. Another explanation could be that the subgroup with advanced fibrosis was small and that the analysis was underpowered.
The PNPLA3 G/G genotype was also associated with the presence of NASH at baseline. Our results are consistent with previous TA B L E 2 Associations between genotype or PRS and NAFLD severity (NASH and advanced fibrosis) at baseline using logistic regression Abbreviations: aOR, adjusted odds ratio; BMI, body mass index; CI, confidence interval; NAFLD, non-alcoholic fatty liver disease; NASH, non-alcoholic steatohepatitis; OR, odds ratio; PRS, polygenic risk score.
a Adjusted for age, sex, BMI and type 2 diabetes. evidence supporting this relationship. 39 However, we found no association between PNPLA3 genotype and advanced fibrosis at baseline. This contrasts several previous studies that links PNPLA3 genotype to advanced fibrosis. 40,41 This lack of association could possibly be the same as in our survival analysis, a low statistical power. Only 14.5% of histologically characterized subjects with NAFLD in our cohort had fibrosis stage 3 or 4 at baseline, compared with 35% in a previous study with a similar setting. 42 This however also highlights that our cohort is less likely to suffer from selection bias.
We found that the PRS correlated to the presence of NASH but not to advanced fibrosis. This is partly consistent with the original study that first demonstrated that the PRS was predictive of both NASH and advanced fibrosis. 21 In our study, the PRS did not correlate to increased risk of development of severe liver disease. We believe this discrepancy between previous reports and the present results is mainly explained by a lack of statistical power. The weight of each genotype in the original model that defines the PRS was based on the association to steatosis for each gene. 21 The model was developed in a cohort of >9000 individuals. Only a minority of NAFLD patients TA B L E 3 Hazard ratios from Cox regression analyses of associations between genotypes and development of severe liver disease in the NAFLD cohort  Abbreviations: aHR, adjusted hazard ratio; BMI, body mass index; CI, confidence interval; HR, hazard ratio; NAFLD, non-alcoholic fatty liver disease; PRS, polygenic risk score. a Adjusted for age, sex, type 2 diabetes mellitus and BMI.

F I G U R E 2
Kaplan-Meier estimate of cirrhosis-free survival time in a subgroup of NAFLD patients without advanced fibrosis (F0-2 or < 15 kPa). p = log rank test for the G/G genotype versus the C/C genotype.
with simple steatosis develop advanced liver disease during their lifetime. Therefore, our cohort of 546 genotyped NAFLD subjects was probably too small to gain enough statistical power to detect genetic variants with lesser effects on the long-term prognosis in steatosis.
None of the PNPLA3, TM6SF2, MBOAT7 or GCKR genotypes nor the PRS were associated with increased mortality. In a populationbased study of more than 19 000 individuals from the US National Health and Nutrition Examination Survey, the C > G allele of PNPLA3 was associated with a 1.3 times risk of overall mortality and a 20fold risk of liver-specific death. 22 However, ours and other studies based on well-defined NAFLD cohorts have not been able to verify these results. 38 Unlike some previous results, our study did not find any significant associations between TM6SF2, GCKR, or MBOAT7 and NAFLD severity.
Since the first reports describing TM6SF2 and its suspected role in disease progression of NAFLD, 12  Abbreviations: aHR, adjusted hazard ratio; BMI, body mass index; CI, confidence interval; HR, hazard ratio; NAFLD, non-alcoholic fatty liver disease; PRS, polygenic risk score. a Adjusted for age, sex, type 2 diabetes mellitus and BMI.

TA B L E 5 Hazard ratios from Cox regression analyses for associations between genotypes and overall mortality in the NAFLD cohort
this association. In a Japanese study, TM6SF2 genotype was not associated with histological severity. 42 In a study on the subjects with histologically characterized NAFLD and healthy controls, the TM6SF2 was associated with a higher prevalence of NAFLD but not with liver fibrosis or NASH. 44 Unlike previous reports, we found no association between mutations in TM6SF2 and a reduced risk for CVD events. 45 Although the evidence for an association between the TM6SF2 gene and hepatic steatosis is robust, the disease promoting effect regarding long-term development of liver cirrhosis or HCC remains unclear. The present study was not able to further establish such a correlation.
The GCKR rs1260326 variant causes a loss-of-function of a regulatory enzyme that leads to increased de novo hepatic lipogenesis, thus increasing the risk of steatosis. However, simultaneous increased activity of intracellular glucokinase also leads to lower insulin resistance. Insulin resistance is itself a main driver of both F I G U R E 3 Kaplan-Meier estimate of transplant-free survival time comparing carriers of the C/C, C/G and the G/G genotype of PNPLA3 gene. p is for significance of log rank test.

TA B L E 6
Associations between genotype and development of severe liver disease in the NAFLD cohort compared with reference individuals matched for age, sex and municipality NAFLD development and disease progression, and exactly how this dual mechanism affects NAFLD pathophysiology is not completely known. 46,47 The evidence for the importance of the MBOAT7 gene is also diverging. Although an association with an increased risk of liver steatosis compared with non-carriers can be established, recent studies have not been able to replicate the evidence for the impact of MBOAT7 polymorphisms on disease severity in NAFLD. 48,49 In summary, the evidence for a link between the GCKR and MBOAT7 genes and advanced histological disease or long-term liver-related events in NAFLD is lacking. Our results could not further strengthen such a correlation.
Our study has several strengths. The cohort consists of well characterized NAFLD subjects of which liver biopsy was available for the majority (91%). The identification of a matched reference population ensures a reliable comparison of outcomes with the general population. The follow-up time of up to 40 years ensures that liver-related outcomes could be detected even in a slowly progressing liver disease such as NAFLD. 50 The use of nation-wide registers leads to minimal loss to follow-up. The ICD codes used to define severe liver disease in our study has recently been validated and found to be highly accurate. 51 Some limitations should be acknowledged. The NAFLD cohort consisted of patients recruited from a clinical setting where they had been diagnosed with NAFLD owing to clinical symptoms or findings.
Hence, there is a risk of selection bias, which could mean that the NAFLD cases in our cohort had more advanced liver disease compared with people with undiagnosed NAFLD found in the general population. This entails that a disease promoting gene, such as the PNPLA3 G/G genotype, could be overrepresented in our material and would affect the external validity of our results. This is also inferred by the fact that 16% of the NAFLD cohort carried the PNPLA3 G/G genotype compared with approximately 5% previously reported in studies of the general population. 52 The cohort with available DNA data was limited to 546 subjects, which could mean that the study was underpowered to study rare outcomes such as severe liver disease for these genetic variants.
However, this implies that the effect size of such associations might be limited and of little clinical relevance. There were no detailed data on the reference individuals or siblings. Still, the comparison is not possible to make outside of register-based cohorts and can be a valuable addition to the field. Registry data can be less accurate in detecting alcohol use disorder since they are not always formally diagnosed. Therefore, there is a risk of misclassification bias in that alcohol use could be less frequently identified in the reference population that were not under active clinical care. This could lead to an imbalance between groups. However, such a bias would dilute the effect size towards the null and would not lead to false-positive results. Living subjects were able to decline to participate in the study which was not the case for deceased subjects. This method for inclusion could lead to an over-representation of deceased individuals and a selection bias. In 34 subjects, transient elastography was used to categorize fibrosis. This non-invasive method is well established for clinical use but is not as accurate as liver biopsy, which is considered gold standard for detecting and grading of fibrosis. 53 However, the prospective value of biopsy and elastography is similar. 54 Our method for analysing DNA in tissue from liver biopsies of deceased subjects did not follow the same protocol as for living subjects, whose DNA was collected by blood tests. Any differences in accuracy, due to contamination or other complications between the two methods cannot be ruled out.
Moreover, the study was planned and initiated before discovery of other potentially interesting SNP:s such as in the HSD17B13 gene. 55 Future studies are needed to examine the long-term effect of such genes on outcomes.

| CON CLUS ION
The G/G genotype (I148M) of the PNPLA3 gene was associated with a higher rate of progression to severe liver disease in patients with NAFLD and particularly in patients without advanced fibrosis at diagnosis. Genotyping of the PNPLA3 gene might be of clinical importance when tailoring future surveillance programs for patients with NAFLD.

E TH I C A L A PPROVA L
The study was approved by the regional ethics committee in Stockholm,