Identification of 99% of CFTR gene mutations in Bulgarian‐, Bulgarian Turk‐, and Roma cystic fibrosis patients

Abstract Background The spectrum and frequencies of CFTR mutations causing Cystic fibrosis (CF) varies among different populations in Europe, and beyond. Methods We identified 98.9% of all CFTR mutations in a representative cohort of 140 CF patients comprising 107 Bulgarian‐ (BG), 17 BG Turk‐, and 16 BG Roma cases. The compiled clinical and genotype dataset includes 110 previously analyzed patients with 30 cases currently analyzed for rare CFTR variants by massively parallel sequencing of the entire CFTR coding region and adjacent introns combined with the analysis of intra‐CFTR rearrangements. Results Altogether 53 different mutations, of which 15 newly identified in the BG CF population, were observed. Comparison of clinical and laboratory data between individual BG ethnic groups proved that BG Roma have a more severe nutritional status and are younger than other CF patients, as well as that the spectrum mutations differs between them. Conclusion This collaborative study improves genetic counselling in BG, facilitates introduction of multitier CF neonatal screening and fosters public health measures for improvement of care in the Roma CF population.

The at birth prevalence of cystic fibrosis (CF) in Bulgaria (BG) was estimated using epidemiological methods as being 1:3,600 live births (Savov, 2011). In 2017 this estimate was substantiated by 20 newly clinically diagnosed cases in a total of 64,359 live births (data from the national BG CF registry/ BGCFR/; this study). Nevertheless, the updated at birth prevalence of CF is likely not accurate since BG has so far not implemented a nationwide cystic fibrosis neonatal screening program (CFNBS) (National Centre of Public Health & Analyses, 2014).
Population genetic studies provided evidence that BG share about in about 45% of their genetic variation with the Balto-Slavic populations. In addition, the second half of the BG "genetic legacy" is of Mediterranean origin with minor influences from the Caucasus-, M. East-, and N. Africa (Hellenthal et al., 2014). Furthermore, in the history BG underwent multiple immigration waves mainly from current Turkey (TK) and Greek Thrace. The last nationwide census (2011) reports three major self-reported ethnic groups comprising BG (85%), Bulgarian Turks (BGTK; 8.8%), and Bulgarian Roma (BGRM; 4.9%) within country of 7 million inhabitants (National Statistical Institute of the Republic of Bulgaria, 2011).
As of December 2018, 201 CF patients were reported in the BGCFR, whereas more than half of them are regularly followed up at the University Hospital Alexandrovska (Sofia). This University centre takes care of about two thirds of all known cases in BG and runs the BGCFR. Thirty cases where one or both CFTR gene mutations in trans remained unidentified following the initial screening for common population specific CFTR mutations carried out by collaborating National Genetic Laboratory in Sofia (Angelicheva et al., 1997;Savov, 2011;Savov et al., 1995) were examined in collaboration with the Department of Biology and Medical Genetics (Prague, Czech Republic; CZ). There the complete analysis of the CFTR gene coding region, including analysis of intra-CFTR rearrangements and of adjacent intronic sequences, was performed according to an established methodology (Křenková et al., 2013).
The aim of this study was to report distribution of CFcausing mutations in a representative group of BG CF patients, divided according to their ethnicity and thus representing constitutive BG-, BGTK-, and BGRM populations.
This study supersedes previously published limited reports (Angelicheva et al., 1997;Savov, 2011;Savov et al., 1995), both in terms of the nationwide representativeness, overall number of patients examined and comprehensiveness of CFTR gene molecular genetic analysis by massively parallel sequencing (MPS) complemented by intra-CFTR rearrangement analysis. We have also carried out genotype phenotype correlations stratified by individual BG subpopulations.

| METHODS
The clinical diagnosis of CF was established in 140 unrelated patients, comprising three major BG ethnic groups (BG-107, BGTK-17, and BGRM-16 cases), according to clinical and laboratory consensus diagnostic criteria (Farrell et al., 2008). An outline of their key demographic, clinical, and laboratory characteristics according to BGCFR data (2017) is presented in Table 1, and their geographic origin is shown on Figure  1. Initially, all cases were examined for the most common "European" CF mutations using "in-house" methods and Sanger DNA sequencing of selected CFTR exons with this combined approach leading to the identification of both CFTR mutations in approx. 80% of all cases as published reported (Angelicheva et al., 1997;Savov, 2011;Savov et al., 1995). GenBank reference sequence and version number for the gene studied was NM_00492.3 (CFTR).
In this study, the 30 patients drawn from the three BG ethnic groups, where one or both CF alleles remained unidentified, were subjected a "cascade" mutation screening approach. First use the panel of the 50 most common CFTR variants in the European-derived populations Elucigene CF-EU ver.2Tm (Elucigene, UK), followed by MPS-based analysis of the entire CFTR coding region, adjacent splice site junctions, and several introns using a locus-specific library preparation assay (CFTR NGS assay ™ ; Devyser, Sweden; www.devys er.com). MPS sequencing was performed on the MiSeq System ™ (Illumina, USA; www.Illum ina.com). Bioinformat ic analysis was carried out using th e SOPHiA Platform for Hereditary Disorders ™ online sof tware (www. sophi agene tics.com). Where applicable positive cases were confir med b y targeted Sanger DNA sequencing on ABI 3130xl DNA Analyser ™ (ThermoFisher, USA; www.therm ofish er.com). Multiplex ligation-dependent probe amplificiation (MLPA ) analysis of intra-CFTR rearrange ments and copy number variation was performed by the SALSA MLPA P091 CFTR Assay ™ followed by analysis of raw data on the propri etary software Coffalyser.Net ™ (MRC-Hol land, The Netherlands; www.MRC-holla nd.com). The linkage phase of detected mutations was established by testing less common mutations or suspected complex CFTR alleles in index case's parents (data not shown). Variant pathogenicity was assessed according to the CFTR2 database (www.cftr2.org), whereas detected BG mutations were submitted to it in return where applicable. This study was approved by the respective ethics commit tees o f collaborating CZ and BG academic institutions and BG CF patients consented to CFTR genotyping. Figure 1 visually supports the representativeness of the studied cohort and that there is no regional bias. The number of patien ts fr o m individual BG regions corresponds to relative population density and respective census data of their domicile. All CF mutations are associated with the classical form of the disease (Table 1; additional detailed clinical and laboratory data are available upon request). In terms of key clinical characteristics of CF patients of BG-versus BGTK origin were not significantly different (Table 1) (1), dilated cardiomyopathy (2), hydrocephalus (1) and brain aneurysm (1). d The patient's mother had a Bulgarian ancestor, but she self identifies as being of Roma origin. e The patient's parents had a Turkish ancestor, but both self-identified as being of Roma origin. significant difference is also apparent when BG versus BGRM F508del homozygous patients are compared (Table 1). Table 2 depicts genotyping data from a total of 140 BG CF patients of BG-, BGTK-, and BGRM origin drawn from the previous reports (n = 110) (Makukh et al., 2010;Orenti et al., 2018;Radivojevic et al., 2004) with those generated in this study (n = 30; formatted in italics). We detected a total of 53 different CFTR variants located throughout all CFTR exons, with only 17 being present at a frequency of over 1%. Approximately half of all variants observed (n = 28) were private since they were detected only within a single family. Three novel mutations were detected according to the data from the CF Mutation Database (www.genet.sickk ids.on.ca/app; Accessed January 12, 2019). From all tested cases with the classical form of the disease only 3 alleles remained unknown (1.07%; Table 2). The population spectra of mutations in the three BG constitutive patient cohorts are presented in Table 2.

| DISCUSSION
This study presents a comprehensive overview of the CFTR mutation distribution in a representative cohort of 140 unrelated Bulgarian CF patients (i.e., proportionally representing BG, BGTK, and BGRM populations) originating from all regions of the country (Figure 1).
The lack of signi ficant differences in the course of CF between BG and BGTK populations generally reflects their equal access to medical care. In contrast, BGRM CF population despite being clinically diagnosed at an early age is much younger and has worse nutritional status most likely due to their higher infan t/childhood mortality. This issue also reflects their generally lower socioeconomic status (Georgiev, Tomova, Grekova, & Kanev, 2001) and observed relatively worse compliance with therapy compared to BG and BGTK CF families (Table 1). Thus, this study provided a basis for a nationwide public health initiative to improve the quality of care, not only in CF, in BGRM.
Using MPS-based sequencing we have identified 98.30% of all CF-causing mutations (Table 2; with legacy mutation nomenclature being further used in the Discussion) in combined cohort of 140 case s. In this regard, 15 variants which were not previously reported (Angelicheva et al., 1997;Bobadilla et al., 2002;Savov, 2011;Savov et al., 1995) were identified as well as three complex alleles (in 4 patients) in accordance with previous publications (Savov et al., 1995) (Table 2). We now comply with the diagnostic standards stipulated by recent ECFS Best Practice Guidelines and can confidently implement multitier CFNBS involving DNA testing (Castellani et al., 2018).
The observed differences between the frequencies of different CFTR variants in BG and BGTK populations could not be statistically assessed due to lower number of BGTK cases (Table 2). Although in BGTK patients the c.1040G>C,p. Arg347Pro was the second most common CFTR variant, it is generally less common in TK proper (Bobadilla et al., 2002). Although according to previous publications (Bobadilla et al., 2002;Savov, 2011) all BGRM patients were reported to be c.1521_1523delCTT,p.Phe508del homozygous, we identified two compound heterozygous patients who retrospectively acknowledged BG and BGTK admixture. Three patients where 1 allele remained undetected have classical form of CF with mean sweat chloride concentrations over 60 mM which shows that pathogenic CFTR variants may be present in nonexamined CFTR introns or that there are other molecular mechanisms involved, but not covered by the utilized assays and/or bioinformatic algorithms (Chen et al., 2008;Lee et al., 2017).
Lower frequency of the predominant c.1521_1523delCTT, p.Phe508del variant reflects its European North-to-S. East F I G U R E 1 Regional origin of examined Bulgarian-, Bulgarian Turk-, and Bulgarian Roma CF patients. Legend: Regional CF patient distribution (BG •, BGTK + and BGRM □) is based on postal codes of their domicile. Respective population density in BG according to Eurostat data (ec.europa.eu/eurostat and www.nsi.bg/sites/ defau lt/files/ files/ data/ table/ BG_grid_POP_1K_2011_poster_0. pdf; Accessed January 12, 2019) | 5 of 7 PETROVA ET Al. gradient, whereas marked allelic heterogeneity is in line with previous reports demonstrating its higher rates in S. European populations (Bobadilla et al., 2002) and the high sensitivity/specificity of the applied CFTR genotyping approach. The c.3903C>G,p.Asn1303Lys variant which is the second most frequent one in BG is commonly found in adjacent Greek and in TK CF-populations (Bobadilla et al., 2002;Orenti et al., 2018). The third most prevalent variant, c.1624G>T,p.Gly542*, is typical for populations around the Mediterranean and is rather frequent in neighboring Greece (Kanavakis et al., 2003) and N. Macedonia (Orenti et al., 2018). The fourth most common variant c.2052_2053in-sA,p.Gln685Thrfs*4 is rather common in W. Ukraine (Ivády et al., 2014) and in E. Hungary (Makukh et al., 2010), but is underrepresented in neighboring CF populations (Bobadilla et al., 2002;Kanavakis et al., 2003;Radivojevic et al., 2004). In summary, our data provide a strong basis for improvement of DNA diagnostics of CF, foster provision of reproductive choice in preconception-, preimplantation-, and/or prenatal DNA testing, facilitate the introduction of multitier CFNBS and eventually will provide patient stratification for the implementation of CFTR modulator therapy (Mitchell, Jones, & Barry, 2018).   (Orenti et al., 2018;Makukh et al., 2010;Radivojevic et al., 2004) with this study. The three novel variants are underlined. a : complex CFTR allele.

ACKNOWLEDGMENTS
T A B L E 2 (Continued)