Family-based analysis identified CD2 as a susceptibility gene for primary open angle glaucoma in Chinese Han population

Primary open angle glaucoma (POAG) is characterized by optic disc cupping and irreversible loss of retinal ganglion cells. Few genes have been detected that influence POAG susceptibility and little is known about its genetic architecture. In this study, we employed exome sequencing on three members from a high frequency POAG family to identify the risk factors of POAG in Chinese population. Text-mining method was applied to identify genes associated with glaucoma in literature, and protein–protein interaction networks were constructed. Furthermore, reverse transcription PCR and Western blot were performed to confirm the differential gene expression. Six genes, baculoviral inhibitors of apoptosis protein repeat containing 6 (BIRC6), CD2, luteinizing hormone/choriogonadotropin receptor (LHCGR), polycystic kidney and hepatic disease gene 1 (PKHD1), phenylalanine hydroxylase (PAH) and fucosyltransferase 7 (FUT7), which might be associated with POAG, were identified. Both the mRNA expression levels and protein expression levels of HSP27 were increased in astrocytes from POAG patients compared with those from normal control, suggesting that mutation in CD2 might pose a risk for POAG in Chinese population. In conclusion, novel rare variants detected by exome sequencing may hold the key to unravelling the remaining contribution of genetics to complex diseases such as POAG.


Introduction
Primary open angle glaucoma, characterized by optic disc cupping and irreversible loss of retinal ganglion cells, is the most common form of glaucoma, affecting 1-2% of the population over age 40 [1]. Evidence suggest that black race, older age, severe myopia, family history, elevated intraocular pressure (IOP) and low diastolic perfusion pressure are risk factors for POAG [2][3][4]. In recent years, some researchers suggest that diabetes mellitus, helicobacter pylori infection, elevated systolic blood pressure and migraine may also be associated with the risk of POAG [5,6]. Although POAG is thought to be a genetically heterogeneous disorder that results from the interaction of multiple genes and environmental influences, the pathogenic mechanism of POAG is unknown.
Familial aggregation and heritable tendencies of POAG have long been recognized for this complex and heterogeneous disorder [4,7,8]. At least 20 genetic loci have been found to be associated with POAG. Among them, 14 chromosomal loci were designated GLC1A to GLC1N [9][10][11][12][13]. In addition, other genetic loci for POAG were continued to be identified in different populations in these years, such as 15q22-q24 in Chinese population [14], 2q33-q34 and 10p12-p13 in African descent [15]. However, mutations in these regions are responsible for a small fraction of POAG cases. For example, GLC1A (myocilin) is responsible for about 36% of juvenile-onset and 2-4% of adult-onset POAG cases [16,17]. Mutations in optineurin (OPTN; arguably, the second POAG gene) were initially found in 16.7% of families with hereditary and adult-onset POAG [18]. Additional loci or genes involved in the development of POAG are still needed to be identified.
Exome sequencing is a technically feasible and cost-effective methods, which has been used to identify new causal mutations and genes for unresolved rare disorders in recent years [19][20][21]. In this study, we used exome sequencing to search for the risk factors for POAG in Chinese population. By combining with bioinformatics methods, six candidate genes for POAG were identified. Furthermore, experimental verifications were performed for one of these genes.

Materials and methods Families
The clinical samples in this study are extracted from a Li family existing in our hospital. This family is collected from ChongQing province in China and is characterized by an unusual high frequency of POAG (Fig. 1). Detailed phenotypic information is collected at the time of enrolment and put into a database in a de-identified manner for the facilitation of genetic association studies. In this family, 16 persons suffered from POAG, including six females and 10 males. The morbidity is 27.5%. From the database information, we selected members of 303, 305 and 607 for the exome sequencing phase of the study. Member 303 (female) has no blood relationship with member 305 (male) and member 607 (male). Member 305 is the ancestor of member 607.
All participants provided written informed consent. The project was approved by the medical ethics committee of Daping hospital of 3rd military medical university and all procedures were conducted according to the tenets of the Declaration of Helsinki.
Exome sequencing, quality control, alignment and variant calling Genomic DNA was extracted from whole blood using a DNA extraction kit (Qiagen, Valencia, CA, USA). Samples were exome sequenced with paired-end mapping, which is a large-scale genome-sequencing method Fig. 1 The glaucoma genetic map for Li family. '/' indicates that persons had passed away; '○' represents female; '□' represents male. The sequence marked in red represents the susceptibility genes inherited from his (her) ancestor. The persons marked in black represent glaucoma sufferer, and persons without black marker do not suffer from glaucoma.  to identify structural variants (SVs)~3 kb or larger that combines the rescue and capture of paired ends of 3-kb fragments, massive 454 sequencing and a computational approach to map DNA reads onto a reference genome [22]. Sequencing was performed according to standard protocols. The targeted exonic regions of all sequenced samples were sequenced to an average coverage of 64.39 AE 13, which translated to at least 59 coverage of~86% of the Human Genome Organizationdefined protein-coding regions in each participant. The raw sequence data were preformed quality control checks with fastQC software (http:// www.bioinformatics.babraham.ac.uk/projects/download.html) to ensure that there are no hidden problems, which might be more difficult to detect at a later stage. Paired-end reads were aligned to the human reference genome (National Center for Biotechnology Information Build 19) with Burrows-Wheeler Aligner software [23] with a threshold of q = 20. Single-nucleotide polymorphism (SNP) and insertion and deletion (INDEL) calling were performed with GATK unified genotyper program. The SNPs or INDELs with minimum read depth <10 and maximum read depth >20,000 were filtered out. Besides, the quality score >20 for SNP calling and quality score >50 for INDELs calling were selected as cut-off criteria. Annovar [24] was used for annotating variants identified from the sequence data. This software provides each variant with a genomic context (non-synonymous or splice-site coding, gene name, transcript, associated gene ontology [GO] term, etc.).

Identification of genes associated with glaucoma using texting mining
The text-mining technique has become a critical technique for identify genes related to diseases. At the same time, the PubMed database (http:// www.ncbi.nlm.nih.gov/pubmed), which comprises more than 21 million citations for biomedical literature, offers an enriched source for us to explore the genes across human disease [25]. To identify the genes associated with glaucoma, we first mined from the PubMed database using perl program with keywords 'gene symbol' and 'glaucoma' [26]. Then, the articles associated with glaucoma were screened manually.

Protein-protein interaction (PPI) network construction
The human protein reference database (HPRD) is a database of curated proteomic information pertaining to human proteins [27]. The biological general repository for interaction data sets (BioGRID) is a curated biological database of physical and genetic interactions available at http:// www.thebiogrid.org [28]. Total 39,240 pairs of PPIs were collected from HRPD and 379,426 pairs of PPIs were collected from BioGRID. By combining the two databases, total 326,119 unique PPI pairs were collected.
The putative genes were mapped to the PPI data collected from HPRD and BioGRID databases. The relationships among putative genes and their neighbour genes were reserved and visualized using Cytoscape [29].

Identification of differentially expressed genes (DEGs) in glaucoma
The gene expression profile of GSE2378 was downloaded from gene expression omnibus (GEO) database, which was based on the platform of Affymetrix Human genome U95 array and the platform of U95 version 2 array. This data set was deposited by Hernandez et al. [30]. Total 15 genechips were available, including eight genechips from optic nerve head astrocytes of donors with glaucoma and seven genechips from optic nerve head astrocytes of donors without glaucoma.
We first converted the probe-level data in probe cell intensity files into expression measures. For each sample, the expression values of all probes for a given gene were reduced to a single value by taking the average expression value. And then, we imputed missing data [31] and performed robust multichip averaging normalization [32]. The linear models for microarray data (LIMMA) package [33] in R statistical language [34] was used to identify DEGs in astrocytes from patients with glaucoma and without glaucoma. To circumvent the multi-test problem, which might induce too much false-positive results, the Benjamini-Hochberg method [35] was used to adjust the raw P values into false discovery rate (FDR). Fold change (i.e. fold increase or fold decrease) was calculated between experimental and control groups. The FDR less than 0.05 and fold change >1.5 or <À1.5 were selected as cut-off criteria.

Functional enrichment analysis based on GO and pathway
The database for annotation, visualization and integrated discovery (DAVID) provides a comprehensive set of functional annotation tools for investigators to understand biological meaning behind large list of genes [36]. In this study, we inputted each group of genes into DAVID v6.7 to search for over-representation in GO categories and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways.

Cell culture
Optic nerve head astrocytes, isolated from POAG patients (POAG cells) and healthy controls (Normal cells), were purchased from Shanghai Peace eye hospital (Shanghai, China). Primary cultures of astrocytes were prepared from human cadaver eyes with POAG or without eye disease obtained within 24 hrs post-mortem. Astrocytes were cultured as described in previous studies [37]. Briefly, cells were maintained in DMEM supplemented with 10% foetal bovine serum (Solarbio, Beijing, China), 100 lg/ml penicillin (Sangon, Shanghai, China) and 100 lg/ml streptomycin (Sangon). Cells were propagated until passage three at 37°C in a humidified atmosphere containing 5% CO 2 . Cell medium was replaced every 2-3 days.

Semiquantitative RT-PCR
Total RNA derived from normal cells and POAG cells transfected with CD2 was extracted using TRIzol (Invitrogen, Carlsbad, CA, USA) according to the manufacturer's instructions. For each individual, 5 lg of total RNA was reversely transcribed into cDNA using the oligo(dT) 18 primer and M-MLV reverse transcriptase (Promega, Fitchburg, WI, USA) according to the manufacturer's instructions. Real-time quantitative PCR was used to determine the HSP27 transcripts. b-actin was used at the same time as an internal control. Primer sequences for HSP27 and b-actin were shown in Table 1 PCR reactions were performed in a 25 ll reaction mixture containing 2.5 ll 109 Taq DNA polymerase buffer, 1.0 ll Taq DNA polymerase, 2.0 ll 25 mM MgCl 2 , 0.5 ll dNTP, 2.0 ll forward and reverse primers, 2.0 ll cDNA samples and 15 ll sterile water. Reactions were performed through the following conditions: one cycle of 94°C for 5 min. followed by 30 cycles of (94°C 30 sec., 60°C 40 sec., 72°C 30 sec.). Each sample was run in duplicate, and the expression level of each gene was analysed on computer (SDS2.3 software; ABI, Shanghai, China).

SDS-PAGE analysis
SDS-PAGE was performed according to the method of Yan et al. [38] with some modifications in the electrophoresis apparatus and conditions. Five millilitre 12% resolving gel (30% acrylamide, 1.5 M Tris-Hcl, pH 8.8, 10% SDS, 10% ammonium persulfate and TEMED) and 2 ml 5% stacking gel (30% acrylamide solution, 1.0 M Tris-Hcl, pH 6.8, 10% SDS, 10% ammonium persulfate and TEMED) were prepared according to standard protocol. A total of 200 ll non-induced culture or 50 ll induced culture was centrifuged at 13,000 9 g under 4°C for 3 min. Then, the precipitation was re-suspended in 20 ll distilled water and mixed with an equal volume of 29 SDS-PAGE buffer (10% SDS, 0.02% bromophenol blue, 1.5 M Tris-Hcl, pH 8.8, 40% glycerin), followed by placing in a boiling water bath for 3-5 min. After centrifugation (12,000 r.p.m., 10 min.), samples were electrophoresed with 12% gel and mini-cell apparatus at 15 mA for 3 hrs and stained with Coomassi brilliant blue G-250 for 24 hrs. A GS-800 calibrated densitometer was used to scan the gel.

Protein extraction and western blot analysis
Total cellular protein was collected from cultured cells in a lysis buffer containing the following: 10 mM Tris-HCl, 0.5% NP40, 150 mM NaCl, 0.5% sodium deoxycholate, 0.1% SDS, 0.2 mM PMSF in ethanol, 1 lg/ ml aprotinin, 4 lg/ml pepstatin, 10 lg/ml leupeptin and 1 mM sodium orthovanadate (10 ll/ml). Protein concentration was measured using a commercial system (Dc Protein Assay System; Bio-Rad Laboratories, Richmond, CA, USA). Cellular lysate was separated on denaturing polyacrylamide gels and then transferred by electrophoresis to nitrocellulose membranes. The blots were incubated in 5% non-fat dry milk in PBS containing 0.1% Tween-20 (PBS-T) for 1 hr at room temperature, incubated overnight at 4°C with primary antibodies (anti-HSP 27, 1:1000 dilution). Then, the blots were rinsed three times with PBS-T, and detected with horseradish peroxidase-conjugated secondary goat IgG antimouse antibody (Bio-Rad). b-actin served as an internal control.

Results
Exome sequencing and quality control The three samples were exome sequenced with paired-end mapping and then the sequencing results were performed quality control with fast QC software. Figure 2 shows the quality scores across all bases before and after quality control. We could find that the sequence quality was increased after quality control, with the lowest value larger than 20 at 3′ end. The high quality of sequence would guarantee the reliability of further analysis.

Variant calling and annotating
After exome sequencing, alignment and a series of quality control steps, we isolated tens of thousands of SNPs and thousands of IN-DELs from the three samples (Table 2, upper part). Then, Annovar was used for annotating these variants identified from the sequence data (Table 2, lower part). We could find that the variants distributed widely on gene, such as exonic, UTR 3, UTR5.

Screening of glaucoma-related SNPs and INDELs combined with text mining
A total of 1368 genes, which were related with glaucoma, were mining from NCBI PUBMED. Then, we obtained the intersection of the 1368 genes from text mining and the SNPs and INDELs screened above. As a result, we obtained 20, 13, 8 overlapping mutations on the exonic position in sample 607, 303 and 305 respectively. We further focused on the exonic mutations in sample 607, because sample 607 is a POAG sufferer. As shown in Table 3, seven genes developed non-synonymous mutations on the exonic position in sample 607, including ENSG00000073910 (FRY), ENSG00000115760 BIRC6, ENSG0000011 6824 (CD2), ENSG00000138039 LHCGR, ENSG00000170927 PKHD1, ENSG00000171759 PAH and ENSG00000180549 FUT7. In addition to FRY, the remaining six genes specifically existed in sample 607.

PPI network construction
To investigate the mutated genes in a systemic way, we mapped the six mutated genes to HPRD and BioGRID databases and constructed PPI networks using Cytoscape (Fig. 3). From Figure 3, proteins of six mutated genes have interactions with other proteins, in which CD2 has the most interactions.

Identification of DEGs based on DNA microarray
Public microarray data set of GSE2378 was downloaded from GEO database, and the DEGs between glaucoma patients and normal con- Fig. 2 The result of quality control. The central red line is the median value; the yellow box represents the inter-quartile range (25-75%); the upper and lower whiskers represent the 10% and 90% points; the blue line represents the mean quality; the y-axis on the graph shows the quality scores. The higher the score, the better the base call. The background of the graph divides the y-axis into very good quality calls (green), calls of reasonable quality (orange) and calls of poor quality (red).

Functional enrichment analysis and comparison
The database for annotation, visualization and integrated discovery was used to annotate the biological functions of the genes from PPI networks and the DEGs from the microarray data respectively. At a P value of 0.1, the genes from PPI networks were enriched in 280 GO categories and 12 KEGG pathways, and the DEGs from the microarray data were enriched in 201 GO categories and 14 KEGG pathways.
To verify our results, we performed comparisons of the enriched GO categories and pathways. Both a total of 38 GO categories and 1 KEGG pathway were enriched in the two sets of data (Table 4).

Sequence alignment of CD2 gene
From the above analysis, we hypothesized that CD2 might be a susceptible gene for POAG. Therefore, we performed sequence alignment of CD2 gene using DNAman (Fig. 4). A missense mutation at position 596 was observed, resulting in a glutamine being substituted for a tryptophan, and a same sense mutation at position 666 was observed.

SDS-PAGE analysis
The recombinant plasmids pCD2-1 (normal cells) and pCD2-2 (POAG cells) were transformed into Escherichia coli. After induced expression by IPTG, SDS-PAGE was performed with the samples transfected CD2 gene. The non-transfected samples served as control. From Figure 5, lane 2 and lane 3 have specific protein bands at 45 kD, whereas lane 1 has no protein band at 45 kD. The molecular weight of this protein is agreed with predicted result (42 kD). This result suggests that the CD2 was induced to express in E. coli.

Semiquantitative RT-PCR
HSP27 was used as a marker to detect the role of CD2 in POAG cells and normal cells. From Figure 6A, we could find that the expression levels of HSP 27 in POAG cells were higher than those in normal cells at 24, 48 and 72 hrs respectively. Optical density analysis of electrophoretic bands showed that the expression level of HSP 27 in POAG cells increased by 68%, 46.1% and 70.6% compared with those in normal cells (Fig. 6B).  Table 3 The non-synonymous mutation on the exonic position in the three samples  Fig. 3 The protein-protein interaction (PPI) network in which the proteins of the mutated gene and its neighbour proteins are involved. The yellow nodes are the proteins of mutated genes and the pink nodes are their interactive proteins. Table 4 The overlapping GO categories and pathway enriched from our data and microarray data from GEO (only list the top 10 GO categories)

Western blot
To further verify our result at protein level, we performed western blot analysis using protein from POAG cells and normal cells respectively. From Figure 7A, the protein expression levels of HSP27 in POAG cells were higher than those in normal cells at 24, 48 and 72 hrs after pCD2 transfection. Optical density analysis showed that the expression levels of HSP27 in POAG cells were increased by 187%, 169% and 173% compared with those in normal cells (Fig. 7B).

Discussion
In this study, we conducted exome sequencing on three members from a high-frequency POAG family in Chinese Han population. Tens of thousands of SNPs and INDELs were isolated. By combining with the text-mining results, six genes developing non-synonymous mutations on the exonic position in sample 607 were identified, including  BIRC6, CD2, LHCGR, PKHD1, PAH and FUT7. Some of these genes have been suggested to be related with POAG by previous studies [39], and some of them have not been reported. Furthermore, we selected CD2 to perform further analysis. Sequence alignment analysis showed a missense mutation at position 596 and a same sense mutation at position 666. RT-PCR and Western blot results suggest that expression levels of HSP 27 in POAG cells were higher than those in normal cells at 24, 48 and 72 hrs after pCD2 transfection at both mRNA level and protein level.
In this study, mutation in BIRC6 is non-synonymous and specifically existed in sample 607, suggesting that it plays an important role in POAG. Baculoviral IAP repeat containing 6 is an ubiquitin-carrier protein, which inhibits apoptosis by facilitating the degradation of apoptotic proteins by ubiquitination [40]. Our result is in accordance with a previous study, which identified two genes of the unfolded protein response that harbour POAG risk alleles, BIRC6 and PDIA5 in Salt Lake City and San Diego populations [39]. From the PPI network constructed by BIRC6 (Fig. 3), we could find that BIRC6 was interacted with two downstream cell death proteases, CASP (Caspase)3 and CASP9. Similarly, another member in IAP family, BIRC4, is reported to be a direct inhibitor of caspase-3 and -8 in a rat model of glaucoma [41]. We speculate that apoptotic cell death mediated by BIRC6 might impair the ability of the anterior chamber to regulate IOP and lead to POAG resulting from ocular hypertension.
Luteinizing hormone/choriogonadotropin receptor belongs to the G-protein-coupled receptor 1 family. So far, no studies have reported LHCGR mutation in POAG. However, this gene is localized to the short arm of chromosome 2 (2p.21), which is designated GLC3A. Interestingly, several loci in this region have been identified to be related with POAG in several ethnic groups. For example, Melki et al. suggested that CYP1B1 (cytochrome P450, family 1, subfamily B, polypeptide 1) mutations in GLC3A might pose a significant risk for early-onset POAG in French patients and might also modify glaucoma phenotype in patients who do not carry a MYOC mutation [42]. Therefore, we speculated that LHCGR might also pose a risk for POAG in Chinese population; however, further studies are still needed to confirm our speculation.
Mutations in other genes, such as CD2, PKHD1, PAH and FUT7, have not been reported to be related with risk for POAG so far. Therefore, we selected CD2 to perform experimentally verification, because of its higher degree in PPI network (Fig. 3) and frequently enrichment in the overlapping GO categories (Table 4).
In this study, we detected the expression levels of HSP27 in RT-PCR and Western blot. HSP27 is involved in stress resistance and actin organization and translocates from the cytoplasm to the nucleus upon stress induction. It has been shown to play a role in sensory neuron survival following peripheral nerve axotomy [43]. Krueger-Naug et al. demonstrated that HSP27 is detected in retinal ganglion cells, while not present at detectable levels in the control rats by immunohistochemical methods [44]. Li et al. showed that HSP 27 is up-regulated in retinas from ocular hypertensive rats and suggested that up-regulation of HSP27 is a gene-specific event associated with elevated IOP [45]. On the basis of these studies, we chose HSP27 as a marker to monitor the role of CD2 in POAG cells. As expected, both the mRNA expression levels and protein expression levels of HSP27 were increased in POAG cells compared with those in normal cells, suggesting that mutation in CD2 might pose a risk for POAG in Chinese population.
In conclusion, this study employed exome sequencing to identify the risk factors of POAG in Chinese population. Six genes, BIRC6, CD2, LHCGR, PKHD1, PAH and FUT7, were selected by combining with textmining and microarray analysis. Furthermore, role of mutation in CD2 for POAG was confirmed by RT-PCR and Western blot. This study provides a candidate gene list that is likely to include real POAG risk factors. However, limitation of our study includes the small sample size. Further studies based on larger sample size are needed to confirm our results.