Exome data clouds the pathogenicity of genetic variants in Pulmonary Arterial Hypertension

Abstract Background We aimed to provide a set of previously reported PAH‐associated missense and nonsense variants, and evaluate the pathogenicity of those variants. Methods The Human Gene Mutation Database, PubMed, and Google Scholar were searched for previously reported PAH‐associated genes and variants. Thereafter, both exome sequencing project and exome aggregation consortium as background population searched for previously reported PAH‐associated missense and nonsense variants. The pathogenicity of previously reported PAH‐associated missense variants evaluated by using four in silico prediction tools. Results In total, 14 PAH‐associated genes and 180 missense and nonsense variants were gathered. The BMPR2, the most frequent reported gene, encompasses 135 of 180 missense and nonsense variants. The exome sequencing project comprised 9, and the exome aggregation consortium counted 25 of 180 PAH‐associated missense and nonsense variants. The TOPBP1 and ENG genes are unlikely to be the monogenic cause of PAH pathogenesis based on allele frequency in background population and prediction analysis. Conclusion This is the first evaluation of previously reported PAH‐associated missense and nonsense variants. The BMPR2 identified as the major gene out of 14 PAH‐associated genes. Based on findings, the ENG and TOPBP1 gene are not likely to be the monogenic cause of PAH.

We aimed to provide an encyclopedia by gathering all previously published PAH-associated genes and variants, and further evaluate the pathogenicity of each variant by performing comprehensive in silico prediction analysis, together with investigating the frequency of each PAHassociated variant in two large online exome databases.

| MATERIALS AND METHODS
The Human Gene Mutation Database (HGMD), PubMed, and Google Scholar were searched for previously published PAH-associated genes and variants until October 2016. The following queries were used: ((ʻpulmonary arterial hypertension ʼ (MeSH)) or (pulmonary arterial hypertension)), ((ʻgeneticsʼ (MeSH)) or (genetic)), ((ʻmutationʼ (MeSH)) or (mutation)) and ((ʻvariantsʼ (MeSH)) or (variants)). We revisited all identified genetic variants searching for published data on functional and familial cosegregation studies. In order to have a solid baseline, familial cosegregation was defined as at least two genotype positive family members having the same phenotype. The Hugo Genome Organisation and Gene Nomenclature Committee was used for standard nomenclature of human genes (HGNC database of human gene names | HUGO Gene Nomenclature Committee). The publicly available Ensembl genome database was used to find the location of variants in the genome and determine the amino acid changes in the protein coding regions of genes (Ensembl Genome Browser 85).

| Exome sequencing project
In the Exome Sequencing Project (ESP), next-generation sequencing of all protein coding regions in 6,503 individuals of African American (n = 2,203) and European American (n = 4,300) from different population studies have been carried out (Exome Variant Server). Clinical data were not available. The ESP was searched for previously published PAH-associated variants. The ESP lacks the data regarding variants positioned in promotors, introns, and untranslated regions, therefore variants found in these regions were not included in present study.
All previously identified PAH-associated variants in our investigation were subdivided into two groups; those that were identified in the ESP (ESP-positive) and those that were not identified in the ESP (ESP-negative).

| The exome aggregation consortium
In the exome aggregation consortium (ExAC) comprehend exome sequencing data from 60,706 unrelated individuals (ExAC Browser). African/African American (n = 5,203), Latino (n = 5,789), East Asian (n = 4,327), Finnish (n = 3,307), Non-Finnish European (n = 33,370), South Asian (n = 8,256), and other (n = 454) nationalities are presented in the ExAC database (ExAC Browser). Like the ESP, this browser only encompasses human genome data that encodes proteins as part of several various exome-studies on populations with specific diseases (ExAC Browser). The PAH-associated variants were subdivided into those that were identified in the ExAC (ExAC-positive) and those that were not identified in the ExAC (ExAC-negative).
The ESP and ExAC databases are considered as back ground population in this study.

| In silico prediction analysis
The functional effects of all missense variants were assessed by using the four prediction tools including conservation across species, Grantham Score, PolyPhen-2 (Polymorphism Phenotyping v2), and SIFT (Sorting Intolerant from Tolerant, v5.1.1). Data for conservation across species were obtained from Ensembl, and classified as occurring at a position with no substitutions (conserved/ pathogenic) or ≥1 substitutions (not conserved/benign). Grantham physicochemical values were calculated using the Grantham amino acid difference matrix. We defined a value above 100 as radical (pathogenic), and value under 100 as conservative (benign). Using PolyPhen-2, each variant were labeled "probably damaging", "possibly damaging", or "benign". Variants labeled "probably damaging" and "possibly damaging" considered "damaging" (pathogenic) in our analysis. Finally, SIFT prediction classified variants as "tolerant" (benign) or "damaging" (pathologic). In a final analysis using all prediction tools, a variant was considered pathogenic if ≥3 in silico prediction tools determined the variant to be pathogenic, as previously described (Giudicessi et al., 2012). Variants that predicted pathogenic by only 1 or 2 tools was considered to be variants of uncertain significance (VUS).

| RESULTS
To date, 14 genes and 180 missense/nonsense variants have been identified as PAH-associated genes and variants ( Table 1).
We performed a prediction analysis of only the missense PAH-associated variants (n = 126), because the nonsense variants are classified to be damaging by nature of the uncompleted translation. By doing so, 76 of 126 missense PAH-associated variants were predicted as pathogenic (Table 2). Accordingly, 52 of 86 PAH-associated missense variants in BMPR2 gene were predicted pathogenic. Prediction analyses for all PAH-associated genes are available in the Table 2.
By investigating the frequency of the PAH-associated variants in the two large population databases, we found that the ESP comprised 9 of 180 variants (Table 3), while the ExAC in counted 25 PAH-associated missens/nonsense variants (Tables 3 and 4).
In the most frequent reported PAH-associated gene, BMPR2, we found 2 ESP-positive and 12 ExAC-positive variants of 135 variants (Tables 3 and 4).
In the literature, functional studies had been performed in 29 of 180 PAH-associated variants (Table 5), assessing the functional properties of the resulted protein using in vivo and/or in vitro studies. All functional studies showed that mutated proteins, except 2 variants (p.Ser160-Asn (rs149589961) and p.Phe392Leu) in BMPR1B, displayed a loss of function phenotype (Table 5).

| DISCUSSION
In this novel study, we provide the clinicians the first comprehensive evaluation tool for genetic diagnostic of PAH, by evaluating the allele frequency of previously reported PAH-associated variants in the two large background T A B L E 1 Overview of PAH-associated genes and variants

Gene
Missense variants Nonsense variants   population databases (ESP and ExAC), and also by adding in silico prediction analysis using an established conservative method (Abbasi et al., 2016;Jabbari et al., 2013;Risgaard et al., 2013). Surprisingly, in the literature we identified very limited data on familial cosegregation, thus, unfortunately, the familial cosegregation in our evaluation was very limited. This, however, goes hand in hand and support our findings that the identified ESP-and ExACpositive variants may not be the monogenic cause of the PAH. The pathogenic PAH-associated variants in BMPR2 gene have reduced penetrance and gender dependant (Austin, Loyd, & Phillips, 1993). Therefore, ESP and ExAC databases most likely include unaffected heterozygotes parents. The penetrance information for pathogenic PAHassociated variants in ACVRL1, KCNK3, CAV1, SMAD9, and BMPR1B genes is unknown (Austin et al., 1993). Our investigation supports that the BMPR2 gene is of major importance in the development of the heritable and idiopathic PAH (Simonneau et al., 2013;Soubrier et al., 2013). According to our findings, BMPR2 included 75% (135 of 180) of the previously reported missense/nonsense PAH-associated variants. Familial cosegregation was only identified for three variants (p.W13*, p.E386V and p.K512T) in BMPR2 gene (Fu et al., 2008;Hamid et al., 2010;Machado et al., 2006). Prediction analysis of BMPR2 missense variants (n = 86), using agreement of ≥ 3 of 4 in silico prediction tools indicated that only 60.4% variants (n = 52) were predicted pathogenic. The annual incidence of PAH is estimated from 2.4 to 25 cases per million per year in the general population (Gaine & Rubin, 1998;Humbert et al., 2006;Ling et al., 2012). In total, 12 variants in BMPR2 were identified in the ESP and ExAC databases (Tables 3 and 4). This means 9% of previously identified PAH-associated variants in BMPR2 were found in the background population. According to the incidence of PAH in background population, this is an expected frequency of PAH-associated variants in BMPR2 in the background population. The 12 identified functional studies on variants in BMPR2 gene revealed that all mutated proteins had a loss of function phenotype (Table 5).Taken together, these findings point to a pivotal role of BMPR2 in pathogenesis of PAH.
In contrast, we found all the three PAH-associated missense variants (p.  Tables 3 and 4; de Jesus Perez et al., 2014). Our prediction analysis showed that only p.R309C was predicted pathogenic. Since the PAH-associated variants in the TOPBP1 have high allele frequency in the background population (n = 3 variants) is not likely to be the monogenic cause of PAH. No functional studies have been reported on these variants and these are indeed needed to clarify the effect of variants in TOPBP1 as a modifier gene in the pathogenesis of PAH. Furthermore, no familial cosegregation are reported in order to support TOPBP1 monogenic cause of PAH (de Jesus Perez et al., 2014).
In 2013, the 5th World Symposium on Pulmonary Hypertension established the ENG gene to be a PAH-associated gene, since two missense PAH-associated variants (p.G214S [rs150932144] and p.G545S [rs142896669]) were reported (Simonneau et al., 2013). In our analysis, these two variants were predicted VUS (Table 2), questioning the pathogenicity of these variants in the PAH-etiology. Furthermore, both the p.G214S and p.G545S were present in the ESP and ExAC databases (Tables 3 and 4). The allele frequency of p.G214S in the ESP was 0.0002 and 0.0001 in the ExAC database. The p.G545S variant found in the ESP with allele frequency 0.0006 and 0.0005 in the ExAC browser (Tables 3 and 4). Although ENG gene known as a PAH-associated gene in the development of PAH, our data and analysis do not support that ENG variants are likely to be a monogenic or one of the major causes in the pathogenesis of PAH. It is important to perform a comprehensive functional study to determine the exact effect of the reported amino acid changes in the ENG and the effect of p.G214S and p.G545S in expression level of the protein.
In the literature we found five PAH-associated variants in SMAD genes: SMAD1 (p.V3A), SMAD4 (p.N13S) and SMAD9 (p.K43E, p.C202* and p.R294*). None of the five were found in the ESP, but we found three variants (p.N13S, p.K43E and p.R294*) in the ExAC (Table 4). Using pulmonary artery smooth muscle cells (PASMCs), Nasim M.T. et al. demonstrated that the p.V3A in SMAD1 gene, p.N13S in SMAD4 gene, and p.K43E in SMAD9 gene resulted in reduced signaling activity in vitro of amino acid substitutions (Nasim et al., 2011). Another functional study analyzed the function of p.C202* in SMAD9 (aliases: SMAD8) by using COS1 cells (a fibroblast-like cell). This study revealed that the mutated protein was not able to have interaction with SMAD4 gene (Tables 2 and 5; Shintani, Yagi, Nakayama, Saji, & Matsuoka, 2009). Although the p.V3A and p.N13S were predicted as VUS, the results of functional studies (loss of function) support the effect of these variants in the pathogenesis of PAH.
In the ACVRL1 gene 16 missenses and one nonsense variants were reported (Table 1). Thirteen variants (81.25%) were predicted as pathogenic ( Table 2). None of these variants were identified in ESP or ExAC databases.
One in vitro functional study used NIH 3T3 fibroblasts and COS-7 cells analyzing the protein expression of three PAH-associated variants (p.L381P, p.R484Q and p.R484W) (Ricard et al., 2010). The study reported that p.R484Q and p.R484W were inactive in the transactivation step (Ricard et al., 2010). The mutated protein of p.L381P did not respond to the bone morphogenetic protein 9 (BMP9) stimulation (loss of function) (Ricard et al., 2010). These findings support the hypothesis of the role of mutated proteins in ACVRL1 in pathogenesis of PAH, despite the lack of data of familial cosegregation.
The two PAH-associated variants in BMPR1B gene were investigated in a functional study by using COS1 cells (Chida et al., 2012). They showed that amino acid changes in p.F392L and p.S160N increased the activation of proteins above wild-type (gain of function) (Chida et al., 2012). The p.S160N and p.F392L identified are unlikely to be an important cause of development of PAH based on results of the functional study. Furthermore, the p.S160N and p.F392L were predicted VUS, which supports the result of the functional study (Table 2; Chida et al., 2012).
To describe the function of all six variants in KCNK3, Lijiang Ma et al. performed a functional analysis by using COS-7 cells (Ma et al., 2013). The mutated proteins showed the loss of ion-channel function (Ma et al., 2013). Supporting these results, our in silico prediction analysis predicted that all PAH-associated variants in KCNK3 except p.V221L were pathogenic (Table 2).
Burg ED et al. analyzed the mutated protein of p.E211D and p.G182R in KCNA5 gene (Burg, Platoshyn, Tsigelny, Lozano-Ruiz, & Rana, 2010). In an in vitro study, they compared the function of mutated proteins with wild type using human embryonic kidney cells (HEK-293) and COS-1 (Burg et al., 2010). They found that mutated proteins accelerated the inactivity of the voltage-gated K + (K(V)) channels, which have an important role in regulating PASMCs (Burg et al., 2010). These findings support the role of p.E211D and p.G182R in KCNA5 gene as uncommon cause of the etiology of PAH, although these two variants predicted as VUS (Tables 2, 3 and 4). Song et al. (2016) identified the p.Y311* as a heterozygote mutation in EIF2AK4 gene in an heritable or idiopathic PAH patient. The p.Y311*/EIF2AK4 was not present in the ESP and ExAC. A functional characterization of p.Y311* by a protein-expression study and cosegregation analysis in a pedigree will support the role of p.Y311*/EIF2AK4 in pathogenesis of PAH.

| CONCLUSION
To our knowledge, this is the first evaluation of previously reported rare PAH-associated genes and variants. In the literature, we found 14 genes and 180 missense/nonsense variants. BMPR2 were identified to be the most important and common reported cause of PAH.
By using prediction analysis and the allele frequency of PAH-associated variants in TOPBP1 and ENG genes in the background population, suggests that these variants are unlikely to be the monogenic cause of the PAH pathogenesis. Further functional studies are required to clarify the function of mutated proteins.