Genomics and complex liver disease: Challenges and opportunities


  • Brian D. Juran,

    1. Division of Gastroenterology and Hepatology, Center for Basic Research in Digestive Diseases, Mayo Clinic College of Medicine, Rochester, MN
    Search for more papers by this author
  • Konstantinos N. Lazaridis

    Corresponding author
    1. Division of Gastroenterology and Hepatology, Center for Basic Research in Digestive Diseases, Mayo Clinic College of Medicine, Rochester, MN
    • Center for Basic Research in Digestive Diseases, Mayo Clinic College of Medicine, 200 First Street SW, Rochester, MN 55905
    Search for more papers by this author
    • fax: 507-284-0762

  • Potential conflict of interest: Nothing to report.


The concept of genetic susceptibility in the contribution to human disease is not new. What is new is the emerging ability of the field of genomics to detect, assess, and interpret genetic variation in the study of susceptibility to development of disease. Deciphering the human genome sequence and the publication of the human haplotype map are key elements of this effort. However, we are only beginning to understand the contribution of genetic predisposition to complex liver disease through its interaction with environmental risk factors. In the coming decade, we anticipate the development of human studies to better dissect the genotype/phenotype relationship of complex liver diseases. This endeavor will require large, well-phenotyped patient populations of each disease of interest and proper study designs aimed at answering important questions of hepatic disease prognosis, pathogenesis, and treatment. Teamwork between patients, physicians, and genomics scientists can ensure that this opportunity leads to important biological discoveries and improved treatment of complex disease. (HEPATOLOGY 2006;44:1380–1390.)

The complex diseases of the liver represent the majority of cases encountered in clinical hepatology. From common ones such as nonalcoholic fatty liver disease to rare ones such as primary sclerosing cholangitis, these diseases develop as a result of interaction between the human genome and the environment. Key to understanding this concept is the realization that individual genetic variants of the human genome are neither sufficient nor necessary for complex disease development and instead act as disease risk (i.e., susceptibility) factors. This is in contrast to Mendelian disorders, for which the genetic variant is usually causative of disease. The impetus of the human genome project and subsequent endeavors has led to a greater understanding of the genome's involvement in complex disease. Despite the recent progress in human genomics, the current effort in complex liver disease genetics is lacking.

We herein provide an overview of the concepts involved with complex disease and the field of genomics, review the current state of genomics research in select complex liver diseases, and discuss the opportunities and challenges we face in applying genomics to achieve a better understanding of the pathogenesis and therapy of complex disease of the liver.

The Intricacy of Complex Disease

Complex Disease Genetics.

The vast majority of human diseases are genetically complex. Complex diseases are multifactorial, the result of interplay between genes and the environment.1, 2 Thus, the strong correspondence between genotype and disease phenotype characteristics of Mendelian disorders is not present in complex disease.3 This lack of accord makes complex diseases widely diverse in penetrance, phenotype, progression, and response to treatment. Because of this inherent heterogeneity, it is often useful to conceptualize and study complex disorders as a series of complex disease traits (Fig. 1). Defining and exploiting these traits is essential if we wish to dissect the genetic contributions to etiology and pathogenesis of complex disease.

Figure 1.

Disease traits can help elucidate the genetics of complex disease. Complex diseases display wide diversity in phenotype, penetrance, progression, and response to treatment, and thus it is advantageous to conceptualize and study complex disorders as a series of disease traits. These could be the presence/absence of an associated diagnostic marker (e.g., antimitochondrial antibody in PBC) or comorbid disease (e.g., inflammatory bowel disease in PSC), a previously determined risk factor (e.g., central obesity in NAFLD), or an associated quantitative value (e.g., level of ALT in chronic hepatitis C). Indeed, each complex disorder will display more than one trait, either specific to the disease or shared among many others. Furthermore, multiple genes may be involved with each discreet trait, and could potentially overlap among traits. Defining and exploiting these traits is the key to dissecting the genetic contributions to etiology and pathogenesis of complex disease.

Currently, the best means for quantifying the genetic and environmental influences on complex disease is comparison of disease concordance between monozygotic (i.e., identical) and dizygotic (i.e., fraternal) twins.4 Because monozygotic twins share 100% of their DNA, disease concordance is suggestive of genetic influence and conversely, discordance illustrates the extent of environmental effect (Fig. 2). In addition to twin studies, familial aggregation provides a means to estimate the level of genetic influence in complex diseases.2 Because family members are more likely to share genetic material among themselves than with the general population, they are also more likely to share genetic characteristics associated with increased disease risk. However, family members (especially close siblings) share environmental exposure that may also contribute to familial disease aggregation. The risk of disease development in the siblings of affected individuals is defined by the relative risk ratio of a sibling (λs). The λs is calculated as the prevalence of a complex disease among siblings divided by the prevalence of the disease in the population at large.2 Generally, the higher this number is, the greater the evidence of a genetic component to disease. However, caution should be taken when considering λs values, because they may be misleading due to a substantial shared environment affect or inaccurate data on disease prevalence in the population.

Figure 2.

Disease concordance in monozygotic twins suggests strength of genetic effect. Because monozygotic twins share the same DNA sequence, high concordance of complex disease development between them indicates a large genetic effect, whereas low concordance suggests that environmental effects may be more important. Concordance in dizygotic twins is expected to be less than in monozygotic twins regarding the genetic affect and is often assessed to control for environmental exposure. The concordance between monozygotic and dizygotic twins for five complex diseases encountered in gastrointestinal and hepatology clinical practice are shown; Crohn's disease and ulcerative colitis,80, 81 gastroesophageal reflux disease,82 primary biliary cirrhosis,66 and celiac disease.83 UC, ulcerative colitis; GERD, gastroesophageal reflux disease; PBC, primary biliary cirrhosis; MZ, monozygotic; DZ, dizygotic.

Gene–Environment Interaction.

The contribution of environment–genome interactions to complex disease is a valid concept; however, our ability to assess them is nebulous at best. For instance, our exposures to this external environment are both involuntary (e.g., chemicals and microorganisms) and voluntary (e.g., lifestyle choices such as alcohol consumption) and often act through the genome to modulate susceptibility to the development of complex diseases (Fig. 3). The consequence of environmental exposure on the eventuation of complex disorders is widely varied and is illustrated by the observation that some individuals with extensive exposure to environmental risk do not develop disease, whereas other individuals with minimal risk exposure do (e.g., some obese sedentary individuals do not develop nonalcoholic fatty liver disease, whereas some lean active people do). Such variable environmental effects are thought to be driven by genetic variation. Evaluation of these gene–environment interactions and elucidation of their role in complex disease pathogenesis is one of the major challenges we face.

Figure 3.

The genome modulates environmentally induced disease susceptibility. As free living beings, we are constantly exposed to the environment in which we exist. This entails exposure to toxic chemicals such as heavy metals, pesticides, and industrial chemicals; lifestyle choices including alcohol, smoking, diet, and exercise; and infection by micro-organisms, to name a few. Many different environmental components have been associated with increased risk of complex disease development (e.g., heavy alcohol consumption increases risk of cirrhosis), a phenomenon that is thought to occur largely through gene interaction. Thus, the variable effect of most environmental exposures on disease development is often attributed to individual genomic variation. Indeed, several specific genetic variants have been shown to either potentiate or diminish complex disease susceptibility when assessed in a population. PAH, polycyclic aromatic hydrocarbons; PCB, polychlorinated biphenyls.


SNP, single nucleotide polymorphism; ALD, alcoholic liver disease; NAFLD, nonalcoholic fatty liver disease; AIH, autoimmune hepatitis; PBC, primary biliary cirrhosis; PSC, primary sclerosing cholangitis.


Human genomics is the field of study that seeks to understand the structure and function of the entire human genome. The impetus for this discipline occurred just over 50 years ago when James Watson and Francis Crick reported the double helical structure of DNA,5 and many years of subsequent work have recently come to fruition with the conclusion of the human genome project6, 7 and publication of the human haplotype map.

Sequence Variation of the Human Genome.

The sequence of the human genome was determined from but a handful of people,6, 7 and consequently does not identify the millions of genetic variations (i.e., polymorphisms) that distinguish each of us as a unique individual.7 These genomic DNA variations take many forms, such as single nucleotide polymorphisms (SNPs), microsatellite repeats, insertions, and deletions. The National Center for Biotechnology maintains the most comprehensive public database of genomic variation (, which to date has cataloged over 12 million human variants. Most widespread among these are SNPs, single base positions in the genome, where alternate nucleotides exist, accounting for over 90% of human polymorphic loci.7

To some extent, the location of the genome in which a SNP occurs suggests its potential for functional significance and relative risk on affecting a disease or its trait.8 For instance, SNPs in a gene coding sequence could result in premature termination (nonsense SNPs) or amino acid substitution (nonsynonomous SNPs), potentially resulting in altered protein function, and thus are highly likely to display a phenotypic effect. Furthermore, SNPs in splice site recognition, promoter, or enhancer sequences have the potential to alter gene expression, splicing, or stability, and therefore may exhibit an effect on phenotype. SNPs outside of these regions, such as those residing in large introns (intronic SNPs) and between genes (intergenic SNPs) are thought to be far less likely to affect genome function. However, it has been suggested that a large portion of this intergenic sequence is under positive selection and thus potentially functional.9

Genomic Diversity and Haplotype Blocks.

Even though the human genome harbors millions of common genetic variants, humans display relatively limited genetic diversity, owing to the young age of our species. Diversity of common variation is primarily driven by random mating and meiotic recombination.10 Because our species has transmitted its genetic material over a relatively small number of generations, contemporary chromosomes share common regions of variation. These shared regions of variation are said to be in linkage disequilibrium. Recombination occurs more readily in certain regions of the genome (i.e., recombination hot spots),11, 12 and as a result the pattern of linkage disequilibrium across the genome is inconsistent. These regions of high recombination frequency (i.e., low linkage disequilibrium) flank regions of low recombination frequency (i.e., high linkage disequilibrium), forming block-like structures known as haplotype blocks.12, 13 The limited allelic diversity of these haplotype blocks has the potential to streamline genomic studies aimed at elucidating the mechanisms of complex disease.14, 15 To facilitate such studies, the International Human Haplotype Map (HapMap) project was undertaken to assess the structure of human haplotype blocks and identify small groups of tagging SNPs that are informative for the variation in a larger set. The current release of the HapMap project data uses over 3 million SNPs to assess human haplotype structure in 269 individuals from 4 racial/ethnic groups and is available on the Internet at The most important observation of the HapMap effort is the confirmation of significant redundancy among local SNPs, which suggests that extensive genomic variation information can be elucidated using a subset of tagging SNPs (Fig. 4).11, 16

Figure 4.

This figure illustrates a simplistic example of using tag-SNPs to reduce the amount of genotyping necessary to assess the common genetic variants in a population. From this small region of the genome containing seven SNPs (upper panel), the genotyping of three of them (lower panel) is all that is necessary to capture the extent of genetic variation. SNPs, single nucleotide polymorphisms.

Genomic Study of Complex Traits

Although we are certainly far from a unified theory on the inner workings of the human genome, the knowledge gleaned from our initial efforts at sequencing and cataloging its variation provide the opportunity to begin exploring its role in modulating traits of complex liver disease. Using the familial- and population-based approaches (Fig. 5) described below, we are able to identify simple relationships between complex disease traits and genetic variants. Although these common single-variant and haplotypic associations are likely to be weak—and, therefore, poor predictors of disease development17—their identification will shed light on disease processes and provide a foundation for more advanced inquiry.

Figure 5.

Family- and population-based studies are used to identify complex disease susceptibility genes. Current approaches to identify such genes are largely based on genetic linkage in multigenerational families or genetic association in the population. The use of familial linkage is limited for the study of complex disorders due to disease heterogeneity and necessity to define a model of inheritance prior to analysis. Population-based genetic association strategies have great ability to detect small genetic associations when sample numbers are large, but they are particularly susceptible to false positive findings. Often a combination of the two approaches will be employed.


Genetic linkage analysis uses familial observation to map disease-related genomic loci. This approach exploits the nature of recombination during meiosis; genes located close to each other on the same chromosome will be inherited together more often than those located farther apart. Thus, those genetic markers segregating with the disease are assumed to be located near, or linked to, the disease gene. Linkage analysis is used in the study of both Mendelian and complex diseases. Parametric (i.e., model-based) linkage analysis is generally limited to the study of Mendelian diseases, because it requires the specification of an inheritance model (i.e., autosomal dominant) and an estimation of the disease allele frequency, elements that are not readily assumable for most complex disorders.18 However, these approaches can be applied to subsets of families who display Mendelian inheritance of complex diseases or their traits.

More useful in the study of complex disorders are nonparametric (i.e., model-free) linkage approaches. These focus primarily on affected sibling pairs (though they can be applied to affected individuals of other relation) and test for excess sharing of alleles that are identical by descent (i.e., identical alleles inherited from the same common ancestor).18 Owing to the limited number of meioses available in familial groups, the loci identified by linkage-based approaches are generally quite large (5-10 Mb)19 and require an extensive fine-mapping effort to pinpoint the susceptibility gene of a complex disease.


In contrast to linkage-based approaches that use recent familial meiosis to identify disease genes, association studies take advantage of the linkage disequilibrium generated by meiotic recombination throughout our ancestral history.20 Frequently these are population-based studies between unrelated cases and matched controls lacking the disease or trait of interest. However, family-based approaches using discordant sibling pairs or father–mother–offspring trios (e.g., transmission disequilibrium test) as well as methods to assess quantitative traits (e.g., quantitative trait locus mapping) are also employed.21 Association studies have been widely applied to the study of genetic variation in candidate genes selected for their potential involvement in specific disease processes (e.g., genes known to regulate the immune system investigated in autoimmune conditions), and with recent technological advances stemming from the human genome project, genome-wide association studies using hundreds of thousands to millions of SNPs spread across the genome have become reality.22, 23

Association studies have great ability to detect small effects of genetic variation when the sample sizes are large (>500).24 A widely used statistical method to assess association is the χ2test of independence, which assumes the null hypothesis of no association between the disease trait and a marker locus, most frequently at a 5% type I error rate.25 Thus, if the P value corresponding to the χ2statistic is considered significant, the null hypothesis is rejected and an association is surmised. Depending on the marker being tested (often a SNP within a gene), the detected association could directly affect the phenotype (i.e., susceptibility variant) or may be linked to the true phenotypic effector.26

Population-based association studies are particularly susceptible to false positive findings due to population stratification, selection bias, and multiple testing. Replication of the initial association results in an independent dataset is the best method for verifying the research findings, and great care should be taken to ensure that the follow-up study is adequately powered.27 Finally, family-based association designs that lack stratification biases offer an alternative to examine and confirm positive observations of case–control studies.20

As genotyping and statistical methods improve, association studies will continue to demonstrate high investigative potential in the future. The main obstacles to overcome will remain the prospective collection of large sets of cases, their family members, and well-matched controls—a particularly daunting challenge for those studying rare complex diseases.

Genetics of Complex Liver Diseases

A multitude of complex diseases affect the liver, ranging from those that are increasingly common to those that are quite rare. The common complex liver diseases affect vast numbers of the adult population, and a large portion of the risk to their development can be assigned to environmental exposure, particularly to the rapidly escalating “Western lifestyle” characterized by overconsumption and lack of physical activity. Conversely, the risk of developing rare complex liver disorders is likely more dependent on genetic background than on environment. However, it must be stressed that these are generalizations applicable to the overall etiology of these diseases in the population; indeed, individual risk of disease development is apt to cover a wide spectrum between environment and genetics for both common and rare disorders, and a large number of potentially predisposing gene variants should be assumed. Below we review the current state of knowledge regarding the genetics of selected complex hepatic diseases, focusing on two themes: common liver diseases and rare liver disorders. Although genetic background undoubtedly plays a role in the development of hepatocellular carcinoma and outcome of infection with hepatitis B and C virus, coverage of these topics is better served by stand-alone review.

Paradigms of Common Complex Liver Diseases

Alcoholic Liver Disease.

Alcoholic liver disease (ALD) is among a small number of complex diseases that by definition displays an absolute requirement for a voluntary environmental exposure: the consumption of alcohol. ALD presents first as fatty liver and progresses to alcoholic hepatitis and eventually alcoholic cirrhosis, which affects an estimated 360,000 to 800,000 people in the United States.28 The risk of developing ALD is related somewhat to the extent of alcohol consumption; however, the required dosage for significant risk elevation remains unclear.29 Other risk factors for ALD development include female sex, obesity, diabetes, hereditary hemochromatosis, and hepatitis C virus infection.30

The genetics of ALD development involves an inherited predisposition to alcohol dependence as well as the development of liver injury as a result of it. Family studies have established an important role of genetics in alcohol dependence, but the existence of subtypes of disease that are difficult to classify and confounding by broader addictive/compulsive phenotypes31 have hampered the effort to identify associated genes. To date, only two genes have demonstrated significant involvement. The alcohol dehydrogenase ADH1B*1 allele was found to be associated with a nearly threefold increase in alcohol dependence risk by meta-analysis32 and the aldehyde dehydrogenase ALDH2*2 allele was found to confer a 10-fold reduction of alcohol dependence risk.33 Both genes are involved with alcohol metabolism, and the reported associations have been identified in Asian populations.

Although most heavy drinkers develop fatty liver, only a minority progress to more serious liver disease, suggesting that some other environmental or genetic factors are required for disease advancement. Evidence for the involvement of genetics in the progression of alcoholic fatty liver to advanced ALD comes from a twin study that found the concordance rate of alcoholic cirrhosis to be significantly higher in monozygotic twins compared with dizygotic twins (16.9% vs. 5.3%, respectively).34 Study of genes involved with alcohol metabolism (e.g., the alcohol and aldehyde dehydrogenases and cytochrome P450 2E1), as well as genes involved with inflammation (e.g., tumor necrosis factor α and interleukin-10), have been inconclusive, with several allelic associations detected but not verified in follow-up studies.29 Perhaps the most compelling genetic finding for advanced ALD risk involves the immune regulatory cytotoxic T lymphocyte antigen-4 gene, in which homozygosity for the A49G polymorphism was found to confer a significant risk of alcoholic cirrhosis (OR 3.5; P = .03) in Italians.35 However, this finding has yet to be confirmed in follow-up studies.

Nonalcoholic Fatty Liver Disease.

Nonalcoholic fatty liver disease (NAFLD) refers to a spectrum of liver disease ranging from fatty liver through cirrhosis as in ALD, and is diagnosed after ruling out other causes, in particular hepatitis infection and alcohol abuse.36, 37 While many individuals probably have an overlap of ALD and NAFLD,37 it is evident that NAFLD is strongly associated with the metabolic syndrome, and the common risk factors for its development are type 2 diabetes, central obesity, and hypertriglyceremia.36, 37 The prevalence of NAFLD is reaching epidemic proportions as the problem of obesity rises. Indeed, it is estimated that 17% to 33% of Americans are affected.36

Although the majority of individuals with metabolic syndrome characteristics likely have early-stage NAFLD, only a small subset will develop advanced disease,38 similar to the situation seen in ALD. Evidence of genetic factors involved with the progression of NAFLD to cirrhosis has been suggested by reports of familial clustering39, 40 and by ethnically related differences in the prevalence of NAFLD-associated cirrhosis.41, 42 Several genes have been suggested as potential candidates for advanced NAFLD progression, including those involved with fat deposition, oxidant stress, insulin sensitivity, inflammation, and fibrosis.38 However, as with ALD, reported NAFLD gene associations have thus far been inconclusive, the result of small studies that have not yet been verified.

Cholesterol Gallstone Disease.

Cholesterol gallstones are present in 10% of the United States population, and the treatment cost is in excess of $6 billion annually.43 The development of cholesterol gallstones is the result of disrupted homeostasis between cholesterol, bile salt, and phospholipid levels in the bile—most often the result of a high-calorie/fat, low-fiber “Western” diet. Risk factors for cholesterol gallstone disease include obesity and type 2 diabetes; progression to serious disease occurs only in a subset of individuals.44

Despite the distinct role of diet in the development of gallstones, there is convincing evidence for a strong role of genetics in the pathogenesis of cholesterol gallstone disease. In a Swedish study involving over 43,000 twin pairs, the concordance rate for gallstones was twice as high in monozygotic twins compared with dizygotic twins (12% vs. 6%, respectively).45 Furthermore, cholesterol gallstones are two to five times more common in first-degree relatives of patients with known gallstones compared with stone-free controls.46 Finally, ethnic differences in gallstone prevalence—most notably in the vulnerable Native American populations of North and South America, who are also highly susceptible to the development of metabolic syndrome and type 2 diabetes47—provides strong evidence for the role of genetics in cholesterol gallstone disease development and demonstrates the overlap of risk factors inherent in common disease.

Although the evidence that gallstone formation and pathogenesis is largely determined by genetics is quite compelling, the identification of significantly associated human genes is thus far lacking. There has been success in the identification of genes causing rare monogenic forms of cholesterol gallstone disease, including ABCB4 (phosphatidylcholine transporter),48 ABCB11 (bile salt export pump),49 and CYP7A1 (cholesterol 7α-hydroxylase).50 However, efforts to identify gene variants involved with development of the common polygenic form of disease, focused primarily on polymorphisms of apolipoprotein E, have been ambiguous.44

Paradigms of Rare Complex Liver Diseases

Autoimmune Hepatitis.

Autoimmune hepatitis (AIH) is a chronic periportal hepatitis affecting all age groups, characterized by elevation of liver aminotransferases, presence of autoantibodies, and hyper–gamma-globulinemia. The prevalence of AIH in the Caucasian populations of North America and Western Europe is estimated to be 50 to 200 cases per million, with women representing approximately 70% of cases.51 There are currently two classifications of AIH based on detection of particular autoantibody patterns.51 AIH type 1 is the most common form and is characterized by the presence of antinuclear and/or anti–smooth muscle antibodies. Type 2 AIH is defined by the presence of liver–kidney microsomal antibodies and is rare in adult disease but more prevalent in childhood, representing up to 30% of cases.

Because AIH is a relatively rare disease, epidemiological studies of multiply affected families and twin pairs are lacking. The vast majority of data regarding the genetic susceptibility to AIH comes from case–control association strategies focused on the HLA genes of the major histocompatibility complex, residing on the short arm of chromosome 6.52 Studies from Northern Europe and the United States have revealed a marked genetic association of the HLA-DRB1*0301 and HLA-DRB1*0401 alleles with increased risk of type 1 AIH in Caucasian populations.53, 54 Moreover, type 1 AIH is associated with the HLA-DRB1*0405-DQB1*0401 haplotype in Japanese,55 the HLA-DRB1*1301 allele in South Americans,56 and the HLA-DRB1*0404 allele in Mexicans.57 Altogether, these findings emphasize the importance of ethnic origin as related to the HLA-mediated susceptibility of AIH. In addition to the disease associations demonstrated in the HLA region, preliminary studies have implicated the involvement of non-HLA loci with susceptibility to AIH. These include the immune modulator cytotoxic T lymphocyte antigen-4,58 proinflammatory cytokines,59 Fas,60 and CD45.61 Further study is needed to confirm these associations.

Primary Biliary Cirrhosis.

Primary biliary cirrhosis (PBC) is a chronic cholestatic liver disease involving immunomediated destruction of the intrahepatic bile ducts, leading eventually to cirrhosis and liver failure. PBC primarily affects women and is characterized by the presence of disease-specific antimitochondrial antibodies.62 In the United States, the prevalence of PBC is approximately 400 cases per million.63

The involvement of familial predisposition and potential genetic influence on the etiology of PBC has long been appreciated, with early studies reporting familial PBC incidence between 1.5% and 6.4%. Evidence for genetic involvement was demonstrated in a geographically based study from the United Kingdom, in which the relative risk of a sibling (λs) to develop PBC was found to be 10.5 times that of the general population.64 Recently, a large case–control study in the United States indicated that having a first-degree relative with PBC was significantly associated with risk of PBC development (OR 10.736, 95% CI 4.227-27.268).65 Additional evidence pointing to a strong genetic component of PBC comes from a twin study in which the concordance rate for PBC between monozygotic twins was found to be 63%.66

Numerous genetic association studies have been performed for PBC, most of which have focused on the major histocompatibility complex region, in particular the HLA genes. A number of studies have reported the association of the HLA-DRB1*08 allele with PBC in Caucasians.67 Recently, a large study of 412 United Kingdom and 80 Italian PBC patients and controls confirmed and expanded on these reports, finding the HLA-DRB1*0801 allele to be significantly associated with disease, and DRB1*13 to be protective in both population groups.68 A handful of genetic variants in PBC candidate genes have also been investigated; however, study sizes have in general been too small to detect any but the most significant associations. The most promising of these studies comes from the United Kingdom, which identified the association of a coding SNP (49 A/G) in the cytotoxic T lymphocyte antigen-4 gene with PBC susceptibility using 200 cases and 200 controls.58 This finding awaits confirmation.

Primary Sclerosing Cholangitis.

Primary sclerosing cholangitis (PSC) is a progressive, chronic, cholestatic disease of unknown origin characterized by inflammation and fibrosis of the intrahepatic and/or extrahepatic bile ducts, advancing to cirrhosis. An estimated 70% to 80% of PSC patients have concurrent inflammatory bowel disease, most commonly ulcerative colitis.69 Epidemiological studies from the United States and the United Kingdom estimate the overall prevalence of PSC to be 136 cases per million.70, 71 In contrast with AIH and PBC, men represent the majority of PSC cases (50% to 70%).69

Several individual cases of familial PSC have been reported, and a study from Scandinavia determined the prevalence of PSC in first-degree relatives of PSC patients to be 0.7%,72 nearly 70 times that of the general population. However, large studies of familial risk and twin studies are lacking. To date, the best evidence for genetic involvement in PSC comes from disease association of HLA haplotypes. Indeed, the HLA A1B8-TNFA*2-DRB3*0101-DRB1*0301-DQB1*0201,DRB3*0101-DRB1*1301-DQB1*0603, and DRB1*1501-DQB1*0602 haplotypes are associated with increased susceptibility to PSC.73 In addition, variants of non-HLA genes from the major histocompatibility complex locus have been reported to be associated with risk of PSC, including a polymorphism in the promoter region of tumor necrosis factor α (−308 G/A)74 and the *008 allele of the MICA gene.75 Finally, several variants in non–major histocompatibility complex genes have been associated with PSC. These include the −1171 5A allele of the stromelysin (MMP-3) gene,76 a 32-bp deletion in the chemokine receptor 5 (CCR5) gene,77 and homozygosity for the E469 variant of the intracellular adhesion molecule 1 (ICAM1) gene, which was suggested to be protective.78

Future Directions

The Next Steps.

The efforts put forth to date provide a basis for the study of complex liver disease genetics, but together amount to only the “tip of the iceberg” in the level of understanding that will be required if genomic studies are to globally impact the way in which we approach the treatment of these diseases. Increasing our knowledge of genome sequence functionality will take us one step further in this endeavor by providing a basis for the prediction of genetic variant–induced consequences outside of those nonsense and nonsynonomous polymorphisms for which this approach is currently feasible.

Linkage and association analyses provide a means to identify individual genes and genetic variants that are potentially involved with complex liver diseases. However, the knowledge of susceptibility variants and relation to phenotype often does not in itself explain the mechanistic process underlying the pathogenesis of the observed disease or trait. Indeed, even those genetic variants that are found to be highly associated with complex traits are present in a notable portion of the control population. Thus, new strategies to identify phenotypic effects stemming from variation at multiple genetic loci are being developed.79

To further dissect the contribution of these variations to complex phenotypes and diseases, the functional mechanisms underlying the contextual use and network interactions of the gene of interest must be systematically determined. A conceptual framework for this systems-based interrogation is provided by the myriad biological data generated through years of cellular and genetic research using reductionist molecular and biochemical approaches. The ability to leverage this extensive base of existing knowledge to create models of pathway and network interactions upon which data generated by experiments using emerging high-throughput technologies such as gene transcript and protein expression profiling can be applied has recently become feasible with the development of a number of commercially available software packages. While in their infancy, these experimental and analytical technologies are apt to facilitate a better understanding of the phenotypic effects of individual and combinations of genetic variants by helping to identify the genes and environmental components with which they interact.

Opportunities and Challenges.

Human genomics offers the opportunity to better define the molecular taxonomy of complex diseases and eventually to predict their natural course and even response to pharmacological treatment. For example, reclassification of hepatocellular carcinoma based on gene expression profiling of the tumor (i.e., molecular signatures) will allow a better understanding of the pathways and networks leading to malignant transformation of the hepatocyte. Ultimately, such endeavors will advance the approaches for its early detection, prevention, and treatment.

The challenges we face in genomic research of complex liver diseases are significant. First, considerable effort needs to be made in the expansion and standardization of phenotypic classification of complex liver diseases. The emerging potential to subclassify diseased populations based on increasing numbers of disease trait phenotypes—be they clinical, transcriptomic, or proteomic in nature—offers an unprecedented opportunity to dissect the genetic components of disease. To this end, standardization of phenotypic classification of these diseases is critical to diminish the poor interobserver agreement between studies that currently plagues the identification and confirmation of genotype–phenotype associations. Second, marked increases in the size of participant registries and biospecimen banks are necessary to adequately power the search for genetic variants associated with complex liver diseases. Finally, new methods of accurately and comprehensively assessing an individual's history of environmental exposure will be required to better identify and understand the disease-related consequences of gene–environment interaction. For many complex liver disorders, these steps will require cooperation between groups who, in the current system, are competing with each other for the same scarce resources.

Genomic research requires the teamwork of a diverse group of experts including the clinician, wet-bench researcher, genetic epidemiologist, statistical geneticist, and bioinformatics expert; none of whom may fully appreciate the contribution of the others. It is thus apparent that the traditional approach of individual laboratories led by a sole PI may not be feasible for this kind of investigation. Last but not least in genomic-based research is the protection of human participants: patients, unaffected family members, and unrelated healthy controls. These study members are the key component of genomic research, and their legal rights need to be protected if we endeavor to continue and expand on these lines of inquiry to the point at which a global effect on human health is realized.


In the era of the genome, there is opportunity to improve the prognostication, elucidate the pathogenesis, and devise new treatments for complex liver diseases. Despite many challenges, a focus on the human genome, along with its interaction with the environment, is the cornerstone to greater knowledge regarding the pathogenesis of complex liver disease. Such an integrated approach has the potential for future ground-breaking discoveries transformed to more effective therapies.