A novel susceptibility locus in MST1 and gene‐gene interaction network for Crohn's disease in the Chinese population

Abstract The incidence of Crohn's disease is increasing in many Asian countries, but considerable differences in genetic susceptibility have been reported between Western and Asian populations. This study aimed to fine‐map 23 previously reported Crohn's disease genes and identify their interactions in the Chinese population by Illumina‐based targeted capture sequencing. Our results showed that the genetic polymorphism A>G at rs144982232 in MST1 showed the most significant association (P = 1.78 × 10−5; odds ratio = 4.87). JAK2 rs1159782 (T>C) was also strongly associated with Crohn's disease (P = 2.34 × 10−4; odds ratio = 3.72). Gene‐gene interaction analysis revealed significant interactions between MST1 and other susceptibility genes, including NOD2, MUC19 and ATG16L1 in contributing to Crohn's disease risk. Main genetic associations and gene‐gene interactions were verified using ImmunoChip data set. In conclusion, a novel susceptibility locus in MST1 was identified. Our analysis suggests that MST1 might interact with key susceptibility genes involved in autophagy and bacterial recognition. These findings provide insight into the genetic architecture of Crohn's disease in Chinese and may partially explain the disparity of genetic signals in Crohn's disease susceptibility across different ethnic populations by highlighting the contribution of gene‐gene interactions.

Genetic susceptibility, gut microbiota and environmental factors act synergistically in the pathogenesis of Crohn's disease. Although more than 140 susceptibility loci of Crohn's disease in Caucasians have been identified by genome-wide association studies (GWASs) and meta-analyses, [4][5][6][7][8][9] considerable differences in genetic susceptibility to Crohn's disease have been reported between Western and Asian populations. Moreover, the heritability of Crohn's disease in Asian populations has not been fully explained. 5,6 In particular, the well-established Caucasian Crohn's disease susceptibility genes, such as NOD2, ATG16L1 and PTPN, showed a lack of association in the Asian populations. 6,8,[10][11][12][13][14][15] Inconsistent results on IL23R and IRGM were also reported. 16,17 Recent genetic studies in Korean and Japanese populations further revealed new Crohn's disease susceptibility loci (eg rs11235604 in ATG16L2 and rs7329174 in ELF1) that were not significantly associated with disease status in Western populations. [18][19][20] This may be in part related to heterogeneity in effect size (eg TNF-SF15 and ATG16L), differences in risk allele frequency in some of the loci (eg CARD15/NOD2) or altered gene-microbiota and gene-gene interactions across different populations. 21 Collectively, these findings underpinned different genetic architectures in different ethnicities in determining genetic risk for Crohn's disease.
The impact of new loci underlying susceptibility to Crohn's disease cannot be determined until causal variants are identified by fine mapping via directed sequencing. Moreover, it is imperative to determine whether Crohn's disease susceptibility genes identified in Europeans are also associated with disease state in non-European ancestry populations. 22 To address whether genes previously reported in Caucasian populations contribute to Crohn's disease in the Chinese population and their effect sizes, we performed finemapping analysis using next-generation targeted capture sequencing.
Moreover, as interactions among multiple genes could impact on the patients' disease phenotype, we aimed to identify interactions among the targeted captured genes to provide insight into the genetic of Crohn's disease.

| ImmunoChip data set
The design and genotyping of the ImmunoChip have been previously described. 21 In brief, the ImmunoChip is an Illumina Infinium microar-  replication of all nominally associated SNPs (P < .001) from the index GWAS scans and fine mapping of 186 loci associated at genomewide significance with at least 1 of the 12 index immune-mediated diseases. The chip also contains around 3000 SNPs added as part of the Wellcome Trust Case Control Consortium 2 (WTCCC2) project replication phase. The genotype data were extracted for 531 Hong Kong Chinese subjects on the ImmunoChip data set. Quality control was performed as described. 21 The cohort includes 235 controls and 531 IBD cases, including 388 patients with Crohn's disease.

| Statistical analysis
The SNPs from targeted sequencing had low to rare minor allele frequencies (MAFs). The sequence kernel association test (SKAT) is an effective method to detect association of the sequencing data to disease phenotypes. 23,24 The method uses a linear mixed model and performs variance component score test. 24 For epistasis evaluation, a robust W-test was used to evaluate SNP-SNP interactions. 25 The W-test is testing for the difference in genotype distributions formed by a SNP pair in case and control groups. The test follows a chi-squared distribution of which the degrees of freedom is bootstrap-estimated from the data. Therefore, the method is able to correct for bias in distributions due to complicated genetic architecture and return robust estimates. 25 The SKAT and W-test were conducted using R packages. 23,25 The LocusZoom tool was used to draw SNPs Manhattan plot in a specific region and provided a detailed view of the P-value distribution within a gene. 26 A SNP or an interaction pair was significant if its P-value was smaller than Bonferroni-corrected alpha of 5%. Expression quantitative trait loci (eQTL) analysis was carried out using the Genotype-Tissue Expression database. 27

| Power calculation
The power of an association study depends on the sample size, effect size of a variant and its allele frequency. Assuming findings from a previously validated SNP (rs2241880) in ATG16L1 with an odds ratio of 0.69 and a minor allele frequency of 45%, our study had 86.2% power to detect such a variant with an a-error rate of 5%. Alternatively, our study had at least 80% power to detect a variant with odds ratio of 1.5 at a MAF of 20%.

| Patient characteristics, quality control and SNP calling
A total of 262 patients with Crohn's disease and 323 controls were included. The mean age was 43.6 and 55.9 years in the case and control groups, respectively. About half of the subjects were female (45.9%). Table S1 summarizes the basic characteristics of the cases and controls. DNA samples were collected from all patients for targeted capture sequencing, generating a genotype data set of 2046 SNPs. Four subjects were excluded because of empty data files. Targeted capture of all DNA samples was completed with an average sequencing depth (on target) of >50 and a coverage of >99.7% (Table S2). When calling the genotype, missing value was coded if genotype quality was less than 20. Quality control of the genotype data was conducted, and we excluded samples whereby (i) the percentage of missing genotypes was greater than 5%, (ii) SNPs had no variance, and (iii) P-values of test on Hardy-Weinberg equilibrium (HWE) were smaller than 0.05 after Bonferroni correction 28 (Table S3).

| Novel associations of MST1 rs144982232 with
Crohn's disease Sequence kernel association test analysis identified one locus, namely the rs144982232 in MST1 (A>G, P = 1.78 9 10 À5 , odds ratio = 4.87), which was significantly associated with Crohn's disease after controlling for multiple testing by Bonferroni method (Tables 1   and S6). The susceptibility to the disease was 4.87 higher for individuals with a G allele at this locus than those with an A allele (odds ratio = 4.87). The regional association plot of MST1 is shown in    (Table S4) (Table S5).

| DISCUSSION
Although the prevalence and incidence of Crohn's disease are higher in Western countries, they continue to rise in Asia, especially in China. It is anticipated that the number of cases of IBD in Asia might overtake that of the Western world by 2025. 29 In this study, we fine-mapped 23 known Crohn's disease susceptibility genes to identify the causal variants and delineate the relative contribution of these variants to Crohn's disease in the Chinese population. Identification of causal variants is key to understanding the molecular mechanism by which disease susceptibility genes contribute to pathogenesis as well as formulating novel therapeutic strategies. A major advantage of using targeted capture sequencing for fine mapping is the directed focus on genes of interest. Therefore, unlike GWAS, our study was not restricted by the conventional genomewide significance threshold as a result of fewer multiple testing.
In the targeted captured regions, the synonymous SNP (ie rs144982232) in MST1 was most significantly associated with  35 The mechanism of this potential causal variant for Crohn's disease was controversial. 36,37 One study showed that R689C polymorphism had no impact on the ability of MST1 to bind to or signal through RON, whereas carriers of the 689C polymorphism had lower concentrations of MST1 in their serum, which could possibly increase Crohn's disease risk. 36 However, another study showed that the affinity to RON of MST1 with the 689C polymorphism was approximately 10-fold lower than that of the wildtype MST1 and the thermal stability of the mutant MST1 was slightly lower than that of wild-type MST1. 37

S U P P O R T I N G I N F O R M A T I O N
Additional Supporting Information may be found online in the supporting information tab for this article.