1. Top of page
  2. Abstract
  3. Patients and Methods
  4. Results
  5. Discussion
  6. Acknowledgments
  7. References
  8. Supporting Information

Approximately 60%-80% of patients with primary sclerosing cholangitis (PSC) have concurrent ulcerative colitis (UC). Previous genome-wide association studies (GWAS) in PSC have detected a number of susceptibility loci that also show associations in UC and other immune-mediated diseases. We aimed to systematically compare genetic associations in PSC with genotype data in UC patients with the aim of detecting new susceptibility loci for PSC. We performed combined analyses of GWAS for PSC and UC comprising 392 PSC cases, 987 UC cases, and 2,977 controls and followed up top association signals in an additional 1,012 PSC cases, 4,444 UC cases, and 11,659 controls. We discovered novel genome-wide significant associations with PSC at 2q37 [rs3749171 at G-protein-coupled receptor 35 (GPR35); P = 3.0 × 10−9 in the overall study population, combined odds ratio [OR] and 95% confidence interval [CI] of 1.39 (1.24-1.55)] and at 18q21 [rs1452787 at transcription factor 4 (TCF4); P = 2.61 × 10−8, OR (95% CI) = 0.75 (0.68-0.83)]. In addition, several suggestive PSC associations were detected. The GPR35 rs3749171 is a missense single nucleotide polymorphism resulting in a shift from threonine to methionine. Structural modeling showed that rs3749171 is located in the third transmembrane helix of GPR35 and could possibly alter efficiency of signaling through the GPR35 receptor. Conclusion: By refining the analysis of a PSC GWAS by parallel assessments in a UC GWAS, we were able to detect two novel risk loci at genome-wide significance levels. GPR35 shows associations in both UC and PSC, whereas TCF4 represents a PSC risk locus not associated with UC. Both loci may represent previously unexplored aspects of PSC pathogenesis. (HEPATOLOGY 2013;58:1074–1083)


caspase-recruitment domain


member 9


confidence interval


control panel 1/2


G-protein-coupled receptor 35


Gene Relationships Across Implicated Loci


genome-wide association study


inflammatory bowel disease




kynurenic acid


linkage disequilibrium


major histocompatibility complex


odds ratio


phenotype difference


plasmacytoid dendritic cell


primary sclerosing cholangitis


quality control

Q-Q plot

quantile-quantile plot


v-rel reticuloendotheliosis viral oncogene homolog


single nucleotide polymorphism


transcription factor 4


ulcerative colitis vs., versus

Primary sclerosing cholangitis (PSC) is a chronic cholestatic liver disease of unknown etiology. There is no effective medical therapy, and PSC is a common indication for liver transplantation.[1] Pathogenetic insight is required to guide the development of novel therapies. An important clinical feature of PSC is the frequent occurrence of extrahepatic comorbidities.[2] Most common is inflammatory bowel disease (IBD), which is reported in 60%-80% of PSC patients of Northern European descent.[1] According to commonly accepted criteria, the IBD diagnosis in PSC is compatible with ulcerative colitis (UC) in 80%-90% of patients, whereas the remaining 10%-20% is classified as Crohn's disease or indeterminate IBD. Conversely, PSC occurs in UC patients with an estimated frequency of 2.5%-7.5%.

PSC and UC are complex genetic traits. Shared genetic predisposition is suggested by the observation that PSC and UC cooccur within families, and heritability estimates demonstrate that siblings of PSC and UC patients are approximately 9-39 and 6-9 times more likely to develop PSC and UC, respectively, than the overall population.[3, 4] It also appears that first-degree relatives of PSC patients are at an increased risk of UC (relative risk of UC in siblings of PSC patients is approximately eight times that of the overall population).[3] Along with the high prevalence of UC in PSC patients, this suggests that shared genetic susceptibility factors are likely to exist between the two disease entities. Genome-wide association studies (GWAS) in UC have contributed to the identification of 47 susceptibility loci.[5] Genetic studies in PSC have been smaller because of lower disease prevalence. Of 10 established risk loci in PSC, six (6p21, 3p21, 2q35, interleukin [IL]2/IL21, caspase-recruitment domain family, member 9 [CARD9], and v-rel reticuloendotheliosis viral oncogene homolog [REL]) have also been detected in UC.[5] These genetic findings, together with the clinical observations, suggest that shared and nonshared genetic risk factors are present in PSC and UC.

In several other diseases, systematic reanalyses of GWAS have yielded new findings. Likewise, for closely related phenotypes, unbiased strategies for defining shared genetic susceptibility have proven successful.[9] In the present study, we hypothesized that yet unrevealed unique PSC-, UC-, and shared PSC-UC risk loci exist. We therefore applied several strategies to systematically interrogate and integrate two GWAS in PSC and UC, aiming to identify novel genetic risk loci in PSC and UC.

Patients and Methods

  1. Top of page
  2. Abstract
  3. Patients and Methods
  4. Results
  5. Discussion
  6. Acknowledgments
  7. References
  8. Supporting Information
Study Subjects

Diagnosis of PSC was based on standard clinical, biochemical, cholangiographic, and histological criteria.[1] Diagnosis of UC was based on commonly accepted clinical, radiological, endoscopic, and histological criteria.[10] The discovery panel (panel A) included 392 PSC cases, 987 UC cases, and, in total, 2,977 healthy controls, all of German descent to minimize population stratification in the analysis (Table 1A). The discovery control panel was, for the purpose of the association tests to be fully independent, divided into two subsets: control panel 1 (CON1) and control panel 2 (CON2) (Table 1). The PSC and UC replication panels (panels B and C, respectively), in total, comprised 495 German and 517 Scandinavian PSC cases and 471 German, 254 Norwegian, 1,046 Belgian/Dutch, and 2,673 UK UC cases, as well as 11,659 healthy controls from Germany, Norway, Belgium, The Netherlands, and the UK (Table 1B,C). The recruitment of study subjects is described in detail in the Supporting Materials.

Table 1. Case/Control Panels Summary
(A) Panel A: GWAS Discovery Panels for PSC and UC That Served as a Basis for Six Different Analysis Strategies (for Listing of Analyses, see Table 2 and Supporting Table 1)
Discovery PanelDiseaseNo. of CasesNo. of ControlsGWAS PlatformAbbreviation
  1. Numbers of samples are given after initial quality control. Abbreviation: abbreviation for case/control panels used in subsequent association analyses (see also Table 2 and Supporting Table 1).

GermanyPSC392Affymetrix 6.0PSC
GermanyUC987Affymetrix 6.0UC
Germany (PopGen)HC1,207Affymetrix 6.0CON1
Germany (KORA S4)HC1,770Affymetrix 6.0CON2
Total 1,3792,977  
(B) Panel B: Replication Panel for PSC
Replication RegionDiseaseNo. of CasesNo. of ControlsGenotyping Platform
GermanyPSC4951,656Sequenom iPlex /Taqman
Norway/SwedenPSC5171,038Sequenom iPlex /Taqman
Total 1,0122,694 
(C) Panel C: Replication Panel for UC
Replication RegionDiseaseNo. of CasesNo. of ControlsGenotyping Platform
GermanyUC4711,789Sequenom iPlex/Taqman
NorwayUC254267Sequenom iPlex/Taqman
Belgium/The NetherlandsUC1,0461,663Sequenom iPlex/Taqman
United KingdomUC2,6735,246Affymetrix 6.0 (1000k)
Total 4,4448,965 

Written informed consent was obtained from all study participants. Study protocols were approved by the ethics committees of all the recruitment centers, as well as the Medical Faculty of the Christian-Albrechts-University (Kiel, Germany) and the Regional Committee for Medical and Health Research Ethics in South-Eastern Norway (S-93178 and S-08872b).

Genome-Wide Genotyping and Single-Nucleotide Polymorphism Data Processing

Genome-wide single-nucleotide polymorphism (SNP) genotyping in the discovery panel was performed using the Affymetrix Genome-Wide Human SNP Array 6.0 (Affymetrix, Santa Clara, CA). For details on genome-wide SNP genotyping, genotype calling, quality control (QC), and imputation, see the Supporting Methods in the Supporting Materials and Fig. 1, left column.


Figure 1. Study workflow. First (SNP data processing), we prepared four high-quality index GWAS datasets comprising 392 PSC cases, 987 UC cases and 2,977 healthy controls (Table 1A, panel A, and Supporting Table 1). For each of these four GWAS datasets, we applied the same quality measures. Second (SNP selection strategies), we employed six analysis strategies (for listing of analyses, see Table 2 and Supporting Table 1) to systematically search for unique, opposite, and shared risk alleles associated with PSC and UC. Last (replication), we performed replication genotyping in a European panel of 1,012 PSC cases, 4,444 UC cases, and 11,659 healthy control individuals (Table 1B,C, panel B,C).

Download figure to PowerPoint

Association Testing of Genome-wide Data

To conduct further analyses and prioritize SNPs for replication genotyping, four index GWAS datasets incorporating different subsets of cases and/or controls from the discovery panels were prepared for association analysis (Supporting Table 1): 389 PSC cases plus 1,207 controls from CON1 and 1,629 controls from CON2 (hereafter referred to as PSC vs. CON1+CON2, including 1,279,891 SNPs); (2) 987 UC cases plus 1,200 controls from CON1 and 1,768 controls from CON2 (UC vs. CON1+CON2, including 1,276,696 SNPs); (3) 385 PSC cases plus 1,196 controls from CON1 (PSC vs. CON1, including 1,290,710 SNPs); and (3) 984 UC cases plus 1,770 controls from CON2 (UC vs. CON2, including 1,286,473 SNPs). For each of these four GWAS datasets, the same QC measures as for the single discovery panels were applied (see Supporting Methods) (given numbers for cases, controls, and SNPs in the four index GWAS datasets described above are after QC).

Association analysis of the genotyped and imputed SNPs was performed separately on the four index GWAS datasets using the PLINK software (version 1.07) framework for logistic regression of dosage data (Fig. 1, middle column).[11] To control for potential confounding by population stratification, the top 10 eigenvectors from EIGENSTRAT in the regression analyses were adjusted for.[12]

Selection of SNPs for Replication

With the overall aim of detecting novel unique PSC-, UC-, and shared PSC-UC risk loci, we applied several strategies to increase the likelihood of exposing true positive risk loci and prioritize and extract SNPs for further replication from the four index GWAS datasets available. In brief, six analysis strategies were applied (strategies A-F) (Table 2).

Table 2. Summary of Analysis Strategies
AnalysisType of AnalysisAimIndex GWAS UsedNo. of SNPsλSNP Selection Criteria for Replication
  1. To systematically search for exclusive, opposite, and shared risk alleles associated with PSC and UC, we employed six different analysis strategies (see text for details). λ: Genomic inflation factor λ is based on the median (0.455) of the 1-df chi-square distribution[17]; NA, combined λ not available. SNP selection criteria for replication genotyping: (A) The most strongly associated SNP with P < 1 × 10−4 from each associated locus was selected, together with previously established UC risk SNPs[5] at P < 0.01. (B) The most strongly associated SNP with P < 1 × 10−4 from each associated locus was selected, together with previously established PSC risk SNPs.[6] (C and D) For details on SNP selection in the CI-based analysis, see Supporting Methods in the Supporting Materials. (E and F) SNPs achieving nominal significance in each of the single-disease GWAS (PPSCvs. CON1 < 0.05; PUCvs. CON2 < 0.05) as well as being associated in the combined-phenotype association meta-analysis at P < 1 × 10−4 were selected.

(A)Standard association analysisPSC risk allelesPSC vs. CON1+CON21,279,8911.051P < 1 × 10−4; Established UC (5) SNPs with P < 0.01
(B)Standard association analysisUC risk allelesUC vs. CON1+CON21,276,6961.098P < 1 × 10−4; Established PSC (6) SNPs with P < 0.01
(C)PD analysisPSC exclusive risk allelesPSC vs. CON1 and UC vs. CON21,141,335NACI-based analysis
(D)PD analysisUC exclusive risk allelesPSC vs. CON1 and UC vs. CON21,141,335NACI-based analysis
(E)Same-effect meta-analysisShared risk allelesPSC vs. CON1 and UC vs. CON21,141,3351.040P < 1 × 10−4; PPSCvs. CON1 < 0.05; PUCvs. CON2 < 0.05
(F)Opposite-effect meta-analysisAlleles with opposite effect on riskPSC vs. CON1 and UC vs. CON21,141,3351.047P < 1 × 10−4; PPSCvs. CON1 < 0.05; PUCvs. CON2 < 0.05

First, we selected SNPs for replication based on P-value thresholds from the results of the standard GWAS performed on the separate GWAS datasets, 1 (PSC vs. CON1+CON2) (strategy A) and 2 (UC vs. CON1+CON2) (strategy B), with the aim of finding new PSC and UC susceptibility loci, respectively, that could have been missed for technical reasons and a different reference for imputation in the original GWAS studies performed on largely the same case panels.[6, 13] In comparison to the original studies, the study design for those reissued GWAS included an improvement of SNP genotype quality by recalling SNP genotypes with Beaglecall software (v1.0.1),[14] an improvement in genotype imputation quality by using a larger set of HapMap3 reference haplotypes and, for GWAS dataset 1, a potential reduction of stratification and batch effects by only including German cases and controls in the discovery panel.

Second, we assessed the two GWAS datasets, 3 (PSC vs. CON1) and 4 (UC vs. CON2), in parallel and developed a novel phenotype difference (PD) confidence interval (CI)-based analysis. The aim of introducing a confidence-based analysis was to eliminate the challenges introduced by the P-value–based SNP-selection method applied in strategies A and B, because P-values are biased by panel size and potential differences in allele frequencies and effect sizes in PSC vs. UC. This analysis was performed separately for PSC (strategy C) and UC (strategy D), with the aim of detecting additional new exclusive PSC and UC risk loci, respectively.

Last, we combined the two GWAS datasets, 3 (PSC vs. CON1) and 4 (UC vs. CON2), to increase the power to detect new shared PSC-UC risk loci. First, a standard meta-analysis, where summary statistics of the two GWAS datesets were combined across the two phenotypes, was performed, where top associated SNPs in the meta-analysis would have the same direction of effect in PSC and UC (strategy E). Conversely, the same two GWAS datasets were combined, but minor and major alleles of all SNPs in GWAS dataset 4 (UC vs. CON2) were flipped to simulate an opposite direction of effect. Then, a meta-analysis was performed to detect shared PSC-UC risk loci with opposite direction of effect in PSC and UC (strategy F).

We only followed up SNPs outside of the major histocompatibility complex (MHC) (defined here as position 25-34 million base pairs on chromosome 6p21). For detailed information on how the six different analysis strategies were performed, which criteria were set for selecting SNPs for replication and number of SNPs selected for further replication from each strategy, see Table 2 and Fig. 1 (middle column) as well as the Supporting Methods in the Supporting Materials.

Altogether, 89 SNPs (83 unique SNPs, because six overlapping SNPs were put forward from different strategies) were selected for replication based on the six analytical strategies described above.

Replication Genotyping and QC

Replication genotyping was performed using the Sequenom iPlex system or TaqMan technology (Applied Biosystems, Foster City, CA). We successfully genotyped all SNPs selected for replication in PSC and UC, except from rs2427828, for which genotyping failed in the PSC replication panel. (For details on QC, see Supporting Methods in the Supporting Materials.)

The total replication panel consisted of 1,012 PSC samples, 1,771 UC samples, and 6,413 healthy controls from Germany, Scandinavia, Belgium, and The Netherlands (Fig. 1, right column). In addition, we utilized imputed genotype data from 2,673 UC cases and 5,246 controls from UK from a previously published GWAS.[15] In total, this included genotype data of 1,012 PSC cases, 4,444 UC cases, and 11,659 healthy controls for replication analysis (Table 1B,C).

For the 51 SNPs from strategies A and B (SNPs PSC = 30, UC = 21) and the 29 SNPs from strategies C and D (SNPs PSC = 25, UC = 4), the entire replication panel was split into a PSC replication panel (panel B, 1,012 PSC samples and 2,694 controls; Table 1B) and a UC replication panel (panel C, 4,444 UC samples and 8,965 controls; Table 1C), where the PSC and UC SNPs were genotyped exclusively in the PSC and UC replication panels, respectively (Fig. 1). For replication genotyping and analysis of the four SNPs from strategy E/same-effect meta-analysis and five SNPs from strategy F/opposite-effect meta-analysis, the total combined replication dataset was used.

P-values for each replication panel (PRepl) of PSC (Table 1B, panel B) and UC (Table 1C, panel C), respectively, were calculated using the chi-square test (df = 1) option in PLINK. P-values for combined replication and discovery panels (PGWAS+Repl) were calculated for PSC and UC separately using the meta-analysis option in PLINK. We used the commonly accepted threshold of 5 × 10−8 for joint P-values to define statistical significance.

GRAIL (Gene Relationships Across Implicated Loci) Pathway Analysis

We used an established statistical genomics method, Gene Relationships Across Implicated Loci (GRAIL),[16] which applies statistical text mining to PubMed abstracts to quantify the degree of relatedness between genes within shared PSC and UC susceptibility loci (Supporting Methods in the Supporting Materials).


  1. Top of page
  2. Abstract
  3. Patients and Methods
  4. Results
  5. Discussion
  6. Acknowledgments
  7. References
  8. Supporting Information
Quality Control

For the standard GWAS analyses of PSC vs. CON1+CON2 (strategy A) and UC vs. CON1+CON2 (strategy B), 1,279,891 and 1,276,696 quality-controlled autosomal imputed SNP markers were available for association analysis in the two discovery panels: PSC patients vs. all healthy controls (PSC vs. CON1+CON2) and UC patients vs. all healthy controls (UC vs. CON1+CON2), respectively (Table 2 and Supporting Table 1). Genetic heterogeneity was found to be low, with estimated genomic inflation factors (λ) of 1.051 and 1.098, respectively (Supporting Table 1).[17] Quantile-quantile (Q-Q) plots of GWAS test statistics showed an excess of significant associations in the tail of the distribution (Supporting Fig. 1, first column). When excluding the extended MHC region (Supporting Fig. 1, second column), this excess was strongly reduced. Manhattan plots for strategies A and B are shown in Supporting Fig. 2.

For strategies C-F, 1,141,335 quality-controlled autosomal imputed SNP markers overlapped for the panels analyzed and were available for comparative association analysis of the German PSC discovery panel and control panel 1 (PSC vs. CON1) and the German UC discovery panel and control panel 2 (UC vs. CON2) (Table 2 and Supporting Table 1). We observed low genomic inflation for PSC vs. CON1 (λ = 1.013) and UC vs. CON2 (λ = 1.058) (Table 2 and Supporting Table 1), with significant associations mainly resulting from associations from the extended MHC region (for Q-Q plots, see Supporting Fig. 3). Manhattan plots for strategies C-F are shown in Supporting Fig. 4.

Association Results

For strategy A, among the 30 SNPs extracted from the GWAS analysis of PSC vs. CON1+CON2 and replicated in the PSC replication panel (panel B), three achieved genome-wide significance (P < 5 × 10−8) in the combined PSC analysis.

The most significant PSC associations were observed at 2q37.3 for SNP rs4676410 [PGWAS = 4.0 × 10−5; PRepl = 7.0 × 10−6; PGWAS+Repl = 2.4 × 10−9; combined OR (95% CI) = 1.38 (1.24-1.53)] and SNP rs3749171 [PGWAS = 3.8 × 10−5; PRepl = 1.0 × 10−5; PGWAS+Repl = 3.0 × 10−9; combined OR (95% CI) 1.39 (1.24-1.55)] (Table 3 and Supporting Table 2). These SNPs are highly correlated (r2 = 0.83; D′ = 0.97) according to the HapMap phase 3 CEU data. rs4676410 is located in an intronic region of the G-protein-coupled receptor 35 gene (GPR35), whereas rs3749171 is a coding SNP located in the 3′ exon of GPR35. Structural modeling showed that the residue affected by this threonine to methionine missense variant is found in the third transmembrane helix of GPR35 (Supporting Fig. 5).

Table 3. Newly Identified PSC Risk Loci
        PSC GWAS (389/2,836)PSC Replication (1,012/2,694)PSC GWAS and Replication (1,401/5,530)
dbSNP IDChrLeft-right (Mb)A1A2AFA1,casesAFA1,controlsGeneFunctionPOR (95% CI)POR (95% CI)POR (95% CI)
  1. dbSNP ID, index SNP marker (National Center for Biotechnology Information's dbSNP build v130); Chr, chromosome of marker; Left-right, association boundaries for each index SNP (see Patients and Methods) A1, minor allele; A2, major allele; AF, allele frequency estimated from allele dosages in the GWAS and discrete allele frequencies from Replication; Gene, candidate gene in the region; Function, predicted function relative to candidate gene; P/OR, P-value and corresponding OR and 95% CI with respect to minor allele for PSC vs. CON1+CON2 GWAS (see also Table 2), PSC replication analysis (Table 1B, panel B), and combined analysis of PSC vs. CON1+CON2 GWAS and PSC replication. For each panel, numbers of cases/controls are displayed in parentheses. Markers meeting genome-wide significance (P < 5 × 10−8) are shown in bold type. For ORs and case/control allele frequencies of associated SNPs in the individual PSC discovery and replication panels, see Supporting Table 2. For full results of replication analysis, see Supporting Tables 3 and 4.

rs46764102q37.3241.20-241.31AG0.250.20GPR35Intronic4.03 × 10−51.61 (1.28-2.02)6.99 × 10−61.31 (1.17-1.48)2.43 × 10−91.38 (1.24-1.53)
rs37491712q37.3241.20-241.31TC0.230.18GPR35Missense3.84 × 10−51.61 (1.28-2.02)1.04 × 10−51.32 (1.17-1.50)2.99 × 10−91.39 (1.24-1.55)
rs145278718q21.251.19-51.69GA0.230.28TCF4Intronic5.86 × 10−60.64 (0.53-0.80)1.52 × 10−40.79 (0.70-0.89)2.61 × 10−80.75 (0.68-0.83)
rs103183118q21.251.19-51.69AC0.280.34TCF4Intergenic4.79 × 10−60.67 (0.56-0.79)2.40 × 10−30.84 (0.75-0.94)3.17 × 10−70.78 (0.71-0.86)

The second significantly associated PSC locus was found at 18q21.2 for SNP rs1452787 [PGWAS = 5.9 × 10−6; PRepl = 1.5 × 10−4; PGWAS+Repl = 2.6 × 10−8; combined OR (95% CI) 0.75 (0.68-0.83)]. In addition, another SNP in the same region, rs1031831, showed strong suggestive evidence of association [PGWAS = 4.8 × 10−6; PRepl = 0.0024; PGWAS+Repl = 3.2 × 10−7; combined OR (95% CI) = 0.78 (0.71-0.86)] (Table 3 and Supporting Table 2). These SNPs are moderately correlated (r2 = 0.55; D′ = 0.84). rs1452787 is located in intron 3 of the gene encoding transcription factor 4 (TCF4), whereas rs1031831 is located shortly upstream of TCF4.

From strategy A, suggestive evidence for association with PSC (PRepl < 0.05) was found at five additional loci (see Supporting Table 3A). For strategy B, none of the 21 SNPs extracted from the GWAS analysis of UC vs. CON1+CON2 met genome-wide significance in the combined analysis of UC GWAS and UC replication data, but one SNP replicated with nominal significance (PRepl < 0.05) in the UC replication panel (panel C) (Supporting Table 3B). For strategy C, among the 25 SNPs tested from the PSC-exclusive PD analysis (Supporting Table 4A), four SNPs achieved nominal significance in the PSC replication panel, but none of these achieved a genome-wide significance level in the combined analysis of the discovery and replication panels. The strongest association was observed for rs1452787 at 18q21.2 (PGWAS+Repl = 7.12 × 10−8), which represents the same SNP at 18q21.2 (TCF4) that achieved genome-wide significance in strategy A. For strategy D, none of the four SNPs tested from the UC-exclusive PD analysis (Supporting Table 4B) replicated with nominal significance. For strategies E and F, of the nine SNPs selected from the same- and opposite-effect meta-analyses (Supporting Table 4C,D), one achieved nominal significance in the PSC replication panel.

The full association results of the 51 SNPs taken on to replication in strategies A and B are shown in Supporting Table 3A,B, whereas the association results of the 38 SNPs taken on to replication in strategies C-F are shown in Supporting Table 4A-D. In addition to the three GPR35 and TCF4 SNPs, 11 SNPs replicated at nominal significance.

GRAIL Pathway Analysis

The GRAIL analysis (Supporting Fig. 6) showed that the majority of loci shared between PSC and UC involve genomic regions that play a general role in regulating the innate and adaptive immune system (IL2/IL21, CARD9, and REL) and are also shared with multiple other diseases of chronic inflammatory character.[18]


  1. Top of page
  2. Abstract
  3. Patients and Methods
  4. Results
  5. Discussion
  6. Acknowledgments
  7. References
  8. Supporting Information

In this study, we performed an extensive integrated analysis of SNP data from PSC and UC GWAS and subsequent independent replication, comprising a total panel of 1,404 PSC patients, 5,431 UC patients, and 14,636 controls. We identified two novel genome-wide significant susceptibility loci for PSC at 2q37.3 and at 18q21.2. Associations at 2q37.3 are detectable in both PSC and UC, whereas 18q21.2 seems to represent a PSC risk locus not associated with UC.

The lead SNPs (rs3749171 and rs4676410) at 2q37.3 reside within the GPR35 gene, and the association peak and linkage disequilibrium (LD) pattern highlight this gene as the most plausible candidate gene at this locus (Fig. 2A). In UC, these SNPs showed suggestive evidence of association in the UC GWAS panel (PGWAS = 7.3 × 10−3 for rs3749171; PGWAS = 7.1 × 10−3 for rs4676410). Furthermore, in the UC GWAS meta-analysis,[5] rs3749171 showed strong suggestive evidence for association [Pmeta = 6.9 × 10−5; OR (95% CI) = 1.11 (1.05-1.18)], whereas another SNP rs4676406 ( inline image = 0.13) shortly downstream of GPR35 was associated [Pmeta = 8.3 × 10−11; OR (95% CI) = 1.14 (1.09-1.18)].[5] At 18q21.2, the second locus meeting genome-wide significant threshold in this analysis, the lead SNP (rs1452787) is located in an intron of the TCF4 gene (Fig. 2B). The association signal is broad, but based on moderate LD and lack of other functionally interesting genes in the region, TCF4 is the only reasonable candidate gene. In the UC GWAS panel, rs1452787 did not show any association with UC (PUCvs. CON1+CON2 = 0.35). No association for this SNP was observed in the UC GWAS meta-analysis[5] (Pmeta = 0.46). Together with the hit for rs1452787 in the PSC-exclusive analysis (PD), this may suggest that 18q21.2 represents a PSC risk locus not associated with UC.


Figure 2. Regional association plots of the new PSC risk loci. (A) PSC risk locus at 2q37.3 (GPR35) (B): PSC risk locus at 18q21.2 (TCF4). Shown are the −log10P-values from the analysis of PSC vs. CON1+CON2 and UC vs. CON1+CON2 (see also Table 2 and Supporting Table 1) with regard to the physical location of markers. Imputation of SNP genotypes is based on the HapMap3 reference (see Patients and Methods); diamonds represent genotyped SNP marker, and circles represent imputed SNP markers. Blue-filled circle: lead SNP of PSC vs. CON1+CON2; other filled diamonds/circles: analyzed SNPs of PSC vs. CON1+CON2, where the fill color corresponds to the strength of linkage disequilibrium (r2) with the lead SNP (for color coding, see legend in the upper right corner of each plot); green triangles: analyzed SNPs of UC vs. CON1+CON2; line: recombination intensity (cM/Mb). Positions and gene annotations are according to National Center for Biotechnology Information's build 36 (hg18).

Download figure to PowerPoint

The GPR35 rs3749171 SNP is non-synonymous, leading to a threonine to methionine shift at Thr3.44. This residue is not conserved throughout mammals, and the mutation is therefore unlikely to completely abolish receptor function, but it may still alter the efficiency of signaling through the GPR35 receptor. GPR35 belongs to the G-protein-coupled receptor family, which are membrane proteins mediating a wide range of physiological processes.[19] The exact functions of GPR35 are not known, but the receptor is predominantly expressed by intestinal crypt enterocytes, as well as by several subpopulations of immune cells.[20, 21]GPR35 has been shown to function as a receptor for kynurenic acid (KYNA), an intermediate in the tryptophan metabolic pathway (kynurenine pathway). KYNA concentrations are high in bile and intestinal contents and increase during inflammation.[20, 22] Interestingly, elevated plasma levels of KYNA have been reported in patients with IBD.[23] Nothing is known about the expression and function of GPR35 in PSC, but based on the available data, it may be speculated to influence the regulation of inflammation in both the gastrointestinal and biliary tract.

TCF4 encodes a transcription factor involved in cell differentiation and growth. Analysis of Tcf4−/− knockout mice have revealed that TCF4 deficiency leads to a partial block in early B- and T-cell development[24, 25] and also results in blocked development of plasmacytoid dendritic cells (PDCs) and impaired type 1 interferon secretion from PDCs upon stimulation with “virus resembling” unmethylated DNA.[26]TCF4 functions thus fit with recent genetic findings in PSC implicating several genes involved in T- and B-cell biology and, in particular, T-cell development.[18] The involvement of PDCs could also be speculated to represent a link between possible triggering viral or bacterial agents and PSC. Of interest is that the TCF4 gene is located in a chromosomal region affected by loss of heterozygosity in 70% of colorectal cancers,[27] to which PSC patients with IBD are particularly prone.

Several statistical strategies were applied to detect new shared and nonshared risk alleles in PSC and UC in the present study. The strongest associations achieving genome-wide significance were all detected through the standard PSC-specific GWAS and replication analysis. The identification of two novel genome-wide significant susceptibility loci in PSC by utilizing a previous PSC GWAS case dataset highlights the potential for novel findings by performing systematic reanalysis of genome-wide association data in PSC in light of associated clinical phenotypes. The reasons for missing out on these two associations in the primary study were most likely a different reference for imputation, different prioritizing strategies for selection of SNPs for replication, and other technical aspects. In the previous PSC GWAS,[6] the GPR35 SNP (rs3749171) was not successfully imputed. The intronic GPR35 SNP (rs467410) showed suggestive evidence of association in the GWAS (PGWAS = 4.6 × 10−5), but was not taken forward for replication. The TCF4 SNP genotyped in the replication phase of the previous GWAS (rs12458015) did not reach statistical significance (PRepl = 0.17) and is moderately correlated with rs1452787 (r2 = 0.64).[6]

Because of the low prevalence (approximately 1 in 10,000),[1] study populations available for genetic studies in PSC are smaller than those for UC. The resulting lack of statistical power in the overall PSC population was the motivating factor for applying refined analysis approaches to existing datasets. The PD CI analysis represents a novel statistical approach intending to identify disease-specific associations. It reduces the challenges introduced by using P-value thresholds when comparing associations in different conditions, because the P-values are influenced by different panel sizes, and the significance thresholds will also vary between SNPs with different allele frequencies. Principal component analysis had previously shown that German and Scandinavian PSC populations differ,[6] and in the PD CI analysis, it was not possible to incorporate this information. We thus moved the Scandinavian PSC cases and Scandinavian controls from the discovery panel to the PSC replication analysis. In addition to the GPR35 and TCF4 outcome of the other analysis strategies, the PD CI analysis yielded several suggestive PSC-specific risk variants, which should be investigated further in larger study populations. Not surprisingly, given the size of the UC panels of the present study, compared with previous analyses,[5] no novel susceptibility loci were detected for UC. Notwithstanding issues related to statistical power, the overall yield of the study of two novel susceptibility loci for PSC at genome-wide significance levels underscores the opportunity of refined analysis of the growing number of available GWAS datasets in closely related conditions.[28, 29]

It is now generally recognized that immune-related diseases share a significant number of genetic susceptibility loci (pleiotropy).[30] This is of interest because it may indicate transferability of therapeutic options between diseases. The current genetic overlap between PSC and UC implicates six loci (6p21, 3p21, 2q35, IL2/IL21, CARD9, and REL), in addition to the shared disease association for the GPR35 locus identified in the present study. The 6p21, 3p21, IL2/IL21, CARD9, and REL loci are also implicated in several other immune-related diseases. In contrast, the current finding of an association at GPR35, as well as the previous findings of associations at 2q35, are, so far, exclusively confined to PSC and UC and may thus point to pathogenic mechanisms of specific relevance to these related disorders. Overall, the current shared genetic susceptibility between PSC and UC seems to reflect the presence of loci involved in the regulation of the innate and adaptive immune system, in addition to loci more specifically involved in the regulation of the immune defense of the intestinal and biliary epithelium.

In summary, we have identified two novel genetic risk loci for PSC, of which one overlaps with UC. Associations at GPR35 and TCF4 may represent previously unexplored aspects of PSC pathogenesis.


  1. Top of page
  2. Abstract
  3. Patients and Methods
  4. Results
  5. Discussion
  6. Acknowledgments
  7. References
  8. Supporting Information

The authors thank all individuals with PSC or UC, their families and physicians, and all healthy controls for their participation. The authors acknowledge the cooperation of the German Crohn and Colitis Foundation (Deutsche Morbus Crohn und Colitis Vereinigung e.V.), the German Ministry of Education and Research (BMBF) competence network “IBD,” and the contributing gastroenterologists. The authors thank Drs. Wolfgang Kreisel, Thomas Berg, and Rainer Günther for contributing German PSC patients. Benedicte A. Lie and the Norwegian Bone Marrow Donor Registry at Oslo University Hospital, Rikshospitalet, Oslo, are acknowledged for contributing the healthy Norwegian control population. The authors thank the Wellcome Trust Case-Control Consortium for the access to the UC case/control data. The authors acknowledge use of DNA from the 1958 British Birth Cohort collection (courtesy of R. Jones, S. Ring, W. McArdle, and M. Pembrey). The project received infrastructure support through the Research Computing Services at the University of Oslo and the DFG Cluster of Excellence “Inflammation at Interfaces” (


  1. Top of page
  2. Abstract
  3. Patients and Methods
  4. Results
  5. Discussion
  6. Acknowledgments
  7. References
  8. Supporting Information
  • 1
    Karlsen TH, Schrumpf E, Boberg KM.Update on primary sclerosing cholangitis.Dig Liver Dis2010;42:390-400.
  • 2
    Saarinen S, Olerup O, Broome U.Increased frequency of autoimmune diseases in patients with primary sclerosing cholangitis.Am J Gastroenterol2000;95:3195-3199.
    Direct Link:
  • 3
    Bergquist A, Montgomery SM, Bahmanyar S, Olsson R, Danielsson A, Lindgren S, et al.Increased risk of primary sclerosing cholangitis and ulcerative colitis in first-degree relatives of patients with primary sclerosing cholangitis.Clin Gastroenterol Hepatol2008;6:939-943.
  • 4
    Orholm M, Munkholm P, Langholz E, Nielsen OH, Sorensen TI, Binder V.Familial occurrence of inflammatory bowel disease.N Engl J Med1991;324:84-88.
  • 5
    Anderson CA, Boucher G, Lees CW, Franke A, D'Amato M, Taylor KD, et al.Meta-analysis identifies 29 additional ulcerative colitis risk loci, increasing the number of confirmed associations to 47.Nat Genet2011;43:246-252.
  • 6
    Melum E, Franke A, Schramm C, Weismuller TJ, Gotthardt DN, Offner FA, et al.Genome-wide association analysis in primary sclerosing cholangitis identifies two non-HLA susceptibility loci.Nat Genet2011;43:17-19.
  • 7
    Janse M, Lamberts LE, Franke L, Raychaudhuri S, Ellinghaus E, Muri Boberg K, et al.Three ulcerative colitis susceptibility loci are associated with primary sclerosing cholangitis and indicate a role for IL2, REL, and CARD9.Hepatology2011;53:1977-1985.
  • 8
    Karlsen TH, Franke A, Melum E, Kaser A, Hov JR, Balschun T, et al.Genome-wide association analysis in primary sclerosing cholangitis.Gastroenterology2010;138:1102-1111.
  • 9
    Zhernakova A, Stahl EA, Trynka G, Raychaudhuri S, Festen EA, Franke L, et al.Meta-analysis of genome-wide association studies in celiac disease and rheumatoid arthritis identifies fourteen non-HLA shared loci.PLoS Genet2011;7:e1002004.
  • 10
    Lennard-Jones JE.Classification of inflammatory bowel disease.Scand J Gastroenterol Suppl1989;170:2-6; discussion, 16-19.
  • 11
    Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al.PLINK: a tool set for whole-genome association and population-based linkage analyses.Am J Hum Genet2007;81:559-575.
  • 12
    Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D.Principal components analysis corrects for stratification in genome-wide association studies.Nat Genet2006;38:904-909.
  • 13
    Franke A, Balschun T, Sina C, Ellinghaus D, Hasler R, Mayr G, et al.Genome-wide association study for ulcerative colitis identifies risk loci at 7q22 and 22q13 (IL17REL).Nat Genet2010;42:292-294.
  • 14
    Browning BL, Yu Z.Simultaneous genotype calling and haplotype phasing improves genotype accuracy and reduces false-positive associations for genome-wide association studies.Am J Hum Genet2009;85:847-861.
  • 15
    Barrett JC, Lee JC, Lees CW, Prescott NJ, Anderson CA, Phillips A, et al.Genome-wide association study of ulcerative colitis identifies three new susceptibility loci, including the HNF4A region.Nat Genet2009;41:1330-1334.
  • 16
    Raychaudhuri S, Plenge RM, Rossin EJ, Ng AC, Purcell SM, Sklar P, et al.Identifying relationships among genomic disease regions: predicting genes at pathogenic SNP associations and rare deletions.PLoS Genet2009;5:e1000534.
  • 17
    Devlin B, Roeder K.Genomic control for association studies.Biometrics1999;55:997-1004.
  • 18
    Karlsen TH, Kaser A.Deciphering the genetic predisposition to primary sclerosing cholangitis.Semin Liver Dis2011;31:188-207.
  • 19
    O'Dowd BF, Nguyen T, Marchese A, Cheng R, Lynch KR, Heng HH, et al.Discovery of three novel G-protein-coupled receptor genes.Genomics1998;47:310-313.
  • 20
    Wang J, Simonavicius N, Wu X, Swaminath G, Reagan J, Tian H, Ling L.Kynurenic acid as a ligand for orphan G protein-coupled receptor GPR35.J Biol Chem2006;281:22021-22028.
  • 21
    Fallarini S, Magliulo L, Paoletti T, de Lalla C, Lombardi G.Expression of functional GPR35 in human iNKT cells.Biochem Biophys Res Commun2010;398:420-425.
  • 22
    Paluszkiewicz P, Zgrajka W, Saran T, Schabowski J, Piedra JL, Fedkiv O, et al.High concentration of kynurenic acid in bile and pancreatic juice.Amino Acids2009;37:637-641.
  • 23
    Forrest CM, Youd P, Kennedy A, Gould SR, Darlington LG, Stone TW.Purine, kynurenine, neopterin, and lipid peroxidation levels in inflammatory bowel disease.J Biomed Sci2002;9:436-442.
  • 24
    Zhuang Y, Cheng P, Weintraub H.B-lymphocyte development is regulated by the combined dosage of three basic helix-loop-helix genes, E2A, E2-2, and HEB.Mol Cell Biol1996;16:2898-2905.
  • 25
    Bergqvist I, Eriksson M, Saarikettu J, Eriksson B, Corneliussen B, Grundstrom T, Holmberg D.The basic helix-loop-helix transcription factor E2-2 is involved in T lymphocyte development.Eur J Immunol2000;30:2857-2863.
  • 26
    Cisse B, Caton ML, Lehner M, Maeda T, Scheu S, Locksley R, et al.Transcription factor E2-2 is an essential and specific regulator of plasmacytoid dendritic cell development.Cell2008;135:37-48.
  • 27
    Herbst A, Bommer GT, Kriegl L, Jung A, Behrens A, Csanadi E, et al.ITF-2 is disrupted via allelic loss of chromosome 18q21, and ITF-2B expression is lost at the adenoma-carcinoma transition.Gastroenterology2009;137:639-648, 648.e631-639.
  • 28
    Festen EA, Goyette P, Green T, Boucher G, Beauchamp C, Trynka G, et al.A meta-analysis of genome-wide association scans identifies IL18RAP, PTPN2, TAGAP, and PUS10 as shared risk loci for Crohn's disease and celiac disease.PLoS Genet2011;7:e1001283.
  • 29
    Ellinghaus D, Ellinghaus E, Nair RP, Stuart PE, Esko T, Metspalu A, et al.Combined analysis of genome-wide association studies for Crohn disease and psoriasis identifies seven shared susceptibility loci.Am J Hum Gen2012;90:636-647.
  • 30
    Zhernakova A, van Diemen CC, Wijmenga C.Detecting shared pathogenesis from the shared genetics of immune-related diseases.Nat Rev Genet2009;10:43-55.

Supporting Information

  1. Top of page
  2. Abstract
  3. Patients and Methods
  4. Results
  5. Discussion
  6. Acknowledgments
  7. References
  8. Supporting Information

Additional Supporting Information may be found in the online version of this article.

hep25977-sup-0001-suppinfo.doc3672KSupporting Information

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.