Genomic disparity impacts variant classification of cancer susceptibility genes in Turkish breast cancer patients

Abstract Objective Turkish genome is underrepresented in large genomic databases. This study aims to evaluate the effect of allele frequency in the Turkish population in determining the clinical utility of germline findings in breast cancer, including invasive lobular carcinoma (ILC), mixed invasive ductal and lobular carcinoma (IDC‐L), and ductal carcinoma (DC). Methods Two clinic‐based cohorts from the Umraniye Research and Training Hospital (URTH) were used in this study: a cohort consisting of 132 women with breast cancer and a non‐cancer cohort consisting of 492 participants. The evaluation of the germline landscape was performed by analysis of 27 cancer genes. The frequency and type of variants in the breast cancer cohort were compared to those in the non‐cancer cohort to investigate the effect of population genetics. The variant allele frequencies in Turkish Variome and gnomAD were statistically evaluated. Results The genetic analysis identified 121 variants in the breast cancer cohort (actionable = 32, VUS = 89) and 223 variants in the non‐cancer cohort (actionable = 25, VUS = 188). The occurrence of 21 variants in both suggested a possible genetic population effect. Evaluation of allele frequency of 121 variants from the breast cancer cohort showed 22% had a significantly higher value in Turkish Variome compared to gnomAD (p < 0.0001, 95% CI) with a mean difference of 60 times (ranging from 1.37–354.4). After adjusting for variant allele frequency using the ancestry‐appropriate database, 6.7% (5/75) of VUS was reclassified to likely benign. Conclusion To our knowledge, this is the first study of population genetic effects in breast cancer subtypes in Turkish women. Our findings underscore the need for a large genomic database representing Turkish population‐specific variants. It further highlights the significance of the ancestry‐appropriate population database for accurate variant assessment in clinical settings.


| INTRODUCTION
Breast carcinoma is the leading cause of cancer mortality in women in Turkey, largely due to a lack of breast cancer awareness and delayed diagnosis. 1The most common histological type of cancer is ductal carcinoma (DC), 2 followed by lobular carcinoma (LC). 3Women diagnosed with LC are more likely to have bilateral breast carcinoma and a first-degree relative affected with breast carcinoma. 4Although there are well-known breast cancer predisposition genes, most studies have been performed primarily on ductal breast carcinoma. 5he genetic susceptibility to lobular breast carcinoma is relatively unknown. 6CDH1 is the only well-known gene associated with increased invasive lobular carcinoma (ILC) risk. 7,8Recently, CHEK2 and BRCA2 have been reported in association with ILC. 9,10hile studies have evaluated the genetic etiology of constitutional breast cancer in the Turkish population, 11,12 the effect of population genetics in the clinical evaluation of breast cancer in Turkish individuals remains unknown.The Turkish population is genetically unique due to a high rate of consanguineous marriage and admixture, [13][14][15] which in turn can greatly affect the frequency of alleles in this population.The accurate assessment of allele frequency is critical in the clinical assessment of variant classification and determination of pathogenicity.Despite its value, the Turkish genome is underrepresented in publicly available databases.gnomAD, currently the largest genome database (v2 n = 125,748 exomes and 15,708 genomes, v3.1 n = 76,156 genomes), does not contain sub-population data from Turkish individuals. 16In the Great Middle East (GME) Variome, only 12% (140/1111) of samples sequenced for the database are from the Turkish population. 17Turkish Variome is the only genome database of the Turkish population, it only contains 3362 combined exomes and genomes. 14urrently, Turkish Variome is the only publicly available database generated with the genome data obtained from Turkish individuals. 14It includes exome and genome data from non-cancer patients with amyotrophic lateral sclerosis, ataxia, delayed sleep phase disorder, essential tremor, obesity, Parkinson's disease, polycystic ovarian syndrome, and neurological and immunological disorders. 14urkish Variome is the only genome database of the Turkish population, it only contains 3362 combined exomes and genomes.
The effect of genomic inequity in population databases on patient's clinical management is well documented.Genomic inequity refers to an unequal opportunity for genetic testing or database generation in many geographical locations, due to economic and other limiting reasons, which in turn leads to the underrepresentation of many ethnic populations in large public genome databases.Recent studies highlighted the role of inadequate representation of population-specific variants in current databases in the misclassification of variants in disease. 18,19The underrepresentation of non-European variants in databases has also been shown to be an incorrect result in a high percentage of variants of uncertain significance (VUS). 20,21These limitations can perpetuate healthcare disparities further by resulting in inaccurate diagnoses.
This study aims to provide insight into the genetic background of invasive ductal carcinoma (IDC) and ILC patients from the Turkish population, focusing on germline findings and clinicopathological features.We hypothesized that some features of germline variants in our patient populations might be a function of genomic structure specific to the Turkish population.We addressed the effect of population-specific allele frequency in the clinical assessment of germline sequence variants by performing a comparative evaluation of genetic variations in breast cancer and a non-cancer control group from Turkey.The study highlights the importance of population-specific databases to promote the unbiased application of genomic medicine.

AND METHODS
To test this hypothesis of the population effect on germline variations, we compared the germline findings of cancer and non-cancer cohorts and compared the population allele frequency of the variants obtained from gnomAD It further highlights the significance of the ancestry-appropriate population database for accurate variant assessment in clinical settings.

K E Y W O R D S
breast cancer, cancer genetics, molecular genetics, next-generation sequencing with those obtained from Turkish Variome.The specific methods are described below.

| Breast cancer cohort
The breast cancer cohort included 43 patients with ILC, 69 with IDC, 3 with ductal carcinomas in situ (DCIS), and 17 with mixed invasive ductal and lobular carcinoma (IDC-L).IDC-L is poorly studied; as per the WHO classification, tumors with specialized histologic patterns occur in at least 50%-90% of the tumor area, and those with a non-specialized pattern occur in 10%-49% of the tumor area. 22Data were collected from patients evaluated at Umraniye Research and Training Hospital (URTH) in 2020 and 2021.The age range of the breast cancer cohort was 24-80 years old.Before ordering the genetic testing, patients' medical and family histories were evaluated by clinical geneticists according to the current National Comprehensive Cancer Network (NCCN) guidelines. 23Patients with breast cancer who met the NCCN criteria were enrolled in the study.Patients without breast cancer and those who did not consent to genetic testing as part of the routine clinical practice were excluded from this study.Cascade testing for the families was offered and performed whenever it was possible, as part of routine clinical management.Informed consent was obtained from the patients as part of the routine clinical care.

| Non-cancer cohort
The non-cancer control cohort comprised a total of 492 participants.The participants were recruited as part of a different study in the URTH between 2016 and 2017.The inclusion criteria were the absence of personal and family history of cancer as assessed by the clinical geneticist and research associate before genetic testing.None of these individuals had a personal or family history of cancer, but some presented with hypertension and diabetes mellitus.The probands in the non-cancer control were unrelated.The age range of individuals in the non-cancer cohort was between 62 and 104 years old.Participants without a genetic panel test results were excluded from this study.Cascade testing and routine clinical management were offered to the non-cancer cohort with any unexpected pathogenic (P)/likely pathogenic (LP) findings.Participants consented to a broad research study per institutional ethical policy at URTH.

Turkish participants
A genetic variant database for all 625 participants (noncancer n = 492, breast cancer n = 132) was generated.For the cancer cohort, personal cancer history, demographic information, three-generation family cancer history, and histopathological features of the carcinoma types were included.For all participants, details of germline variants and supported evidence were included in the database.ACMG classification 26 was performed for all variants, and designations of P, LP, and VUS were included.gnomAD (v2.1.1 n = 125,748 exomes and 15,708 genomes) database was used to investigate allele frequency in large population databases. 16Turkish Variome (n = 2589 exomes and n = 773 genomes) was used to investigate allele frequency, more specifically in the Turkish population. 14

| Query of database
The genes identified in the cancer cohort were compared with the non-cancer cohort to evaluate the population genetics effect.In addition, the occurrence of each variant from the cancer cohort was crosschecked with the noncancer cohort.The denoted variant allele frequencies of the VUS in Turkish Variome and gnomAD were analyzed.These variants were reevaluated for variant classification with consideration of their allele frequencies in these databases.

| Statistical analysis
The allele frequency of all 121 variants from the breast cancer groups was obtained from Turkish Variome and gno-mAD.The differences in allele frequencies in the 2 groups were measured using Prism, unpaired t-test (GraphPad, version 9.5).The two-tailed p value was calculated with a 95% confidence interval.

| Patient demographics and clinicopathologic characteristics
A total of 60 patients were histopathologically diagnosed with ILC (n = 43) and IDC-L (n = 17).All patients were female, with a median age of 44 (range 24-80 years) (Table 1).In four patients, multiple primary carcinomas were detected in addition to lobular breast carcinoma.Nine patients presented with bilateral breast carcinoma.ER and PR positivity rate was 83.3% and 81.6%, respectively, whereas 16.6% of the ILC and IDC-L cases were HER2 positive (Table 1).Of 60 patients, 36 had available E-cadherin test results, of which 34 were negative.Family history assessment showed 39 patients (65%) had positive family history of cancer, including breast cancer, in the first-degree relatives.In 38 patients (63.3%) the family history extended to the second-degree relatives, and in 11 patients (18.3%) to the third-degree relatives (Tables S1 and  S2).
Patients with ductal breast carcinoma (IDC n = 69, DCIS n = 3) had a median age of 45 (26-70 years); 20 of these patients were younger than 40 years old (Table 1).The ER, PR, and HER-2 positivity rates were 73.6%, 70.8%, and 32%, respectively (Table 1).The evaluation of family history showed that 57 (79.2%) patients had a positive family history of cancer, including breast cancer, in the first-degree relatives.In 64 (88.8%) patients, the positive family history extended to the second-degree family members, and in 28 patients (38.9%) to the third-degree relatives (Table S3).
The non-cancer group included 492 participants, 308 of whom were female and 184 were male.The average age of participants was 75, ranging from 65 to 104 years old.All participants were negative for any personal and family history of cancer (Table S4).

| Genomic analysis of breast cancer cohort
The genetic assessment of 43 females with ILC showed that four patients (9%) had actionable variants in CDH1, BRCA2, MUTYH, RAD50, and CHEK2 (Table S1).There were no recurring variants among these patients, except for CHEK2 c.1427C > T, a known common risk factor.There were 13 VUS reported in this group, more commonly in BRCA2, followed by APC, CHEK2, and PALB2 (Figure 1).In our IDC-L cohort, two patients harbored P, LP variants in MSH2, and PALB2.Nine VUS were identified in the IDC-L cohort (Table S2).The most common gene harboring VUS was ATM, followed by BRCA2 (Figure 1).
Actionable results among the ILC, IDC-L, and IDC cohorts were observed in BRCA2, PALB2, MUTYH, RAD50, CDH1, and MSH2 genes.The more diverse genes were observed in the IDC cohort (Table S3).Notably, BRCA1 was only affected in the IDC group.At the variant level, there were no common VUS among these breast cancer groups (Table S5), except for CHEK2 c.1427C > T (classified as a risk factor) and CHEK2 c.549G > C (VUS).In total, 75 non-recurrent VUS were found in the breast cancer cohort (Table S5).The distribution of genes with VUS designation was similar in the three breast cancer groups (Figure 1).The occurrence of VUS in these three breast cancer subtypes might be a factor in population genetic structure.The distribution and frequency of the VUS from the breast cancer group were therefore evaluated against those from the non-cancer cohort collected from the same center, presumably all representing the Turkish population at large.

| The analysis of population genetics effects
We hypothesized that the occurrence of the germline variants in our patients might be a function of genomic structure specific to the Turkish population.To test this hypothesis, we compared the germline findings identified in the cancer and non-cancer cohorts.At the variant level, 21 variants were common between the breast cancer and non-cancer cohorts (Table S5).
To assess the effect of population genetic structure on a larger scale, the allele frequency of VUS was assessed using both gnomAD and Turkish Variome.The gnomAD is the largest population genome database, but it does not contain Turkish subpopulation data.Turkish Variome, while much smaller, is a genome database generated from the Turkish genomic non-cancer data.In the ILC patient group, the aggregated allele frequency of VUS in the Turkish Variome was 19 times larger than the value in gnomAD; in IDC-L and IDC patients this allele frequency difference was respectively 1.36 times and 6 times greater (Figure 2).Altogether of 75 VUS in breast cancer groups, 25 variants (33%) had higher allele frequency in the Turkish Variome than in gnomAD (Figure 3).A total of 22 VUS had an allele frequency reported both in gnomAD and Turkish Variome; the allele frequency in 100% of these variants was greater in Turkish Variome compared to those in gnomAD (on average 60 times, ranging from 1.37-354.4times) (Table S5).A total of 5/25 P, LP variants were reported both in gnomAD and Turkish Variome, all had greater frequency in Turkish Variome compared to gnomAD, on average 49 times, ranging from 2.2 to 161.74.Overall, the allele frequency value of 27 variants (VUS n = 22, P, LP n = 5) was significantly higher in Turkish Variome compared to those in gnomAD (p < 0.0001, 95% CI, Figure 4).We further set to investigate the effect of population allele database in variant classification.We conducted ACMG classifications by using both the allele frequencies in Turkish Variome and gnomAD.PM2, the evidence supporting the absence or extremely rare allele frequency, was invoked for 74 variants based on values in gnomAD (Table S5).Using allele frequency in the Turkish Variome, however, only 59 variants met the PM2 criterion and 2 met the BS1 criterion, due to a more common occurrence of variants in the Turkish Variome.Overall, this affected the classification of 6.7% (5/75) of variants, as they were downgraded from VUS to likely benign (Table S5).The variants were in EPCAM, RAD50, STK11, and MRE11 genes.The allelic frequencies of the actionable variants were below the PM2 threshold in both Turkish Variome and (Table S6).

| DISCUSSION
We addressed two goals in this study.First, we aimed to assess the population-specific germline etiology of lobular and ductal breast carcinoma in the Turkish population.To that end, we performed a comprehensive Histopathological features of our breast cancer cohort from the Turkish population were similar to those reported previously in non-Turkish individuals.ILC and IDC pathology findings are generally ER+, PR+, and HER-2-status. 27Although the current studies on IDC-L characteristics are scarce and consist of small cohort sizes, the immunohistochemical profile of IDC-L is found to be similar to ILC. 28 The lack of E-cadherin The range of population allele frequencies of VUS in cancer genes identified in three breast cancer subgroups.The ratio of variant allele frequency in Turkish Variome over gnomAD was 19 in invasive lobular carcinoma (ILC), 1.36 in mixed invasive ductal and lobular carcinoma (IDC-L), and 6 in invasive ductal carcinoma (IDC).The green bar represents variant allele frequency in the Turkish Variome.The dotted bar represents variant allele frequency in gnomAD.expression is the hallmark of ILC and IDC-L. 29In concordance with these studies, the majority of our patients with ILC, IDC-L, or IDC were ER-positive, PR-positive, and HER-2-negative.As expected, E-cadherin was negative in 34 out of 36 patients with available E-cadherin test results in our ILC and IDC-L cohorts.Almost all studies showed ER and PR positivity is more likely in ILC and IDC-L than IDC. 27This was in line with the higher ER and PR positive rates among ILC/IDC-L patients than IDC/ DCIS in our cohort.The bilaterality of breast cancer was 15% and 4% in ILC/IDC-L and DCIS/ IDC cohorts, respectively, and these results are consistent with previous studies defining lobular breast carcinoma as a possible risk factor for the onset of bilateral breast cancer. 30round 10%-15% of breast cancers are familial, and this rate is shown to be higher in ILCs. 4 In our Turkish breast cancer cohort, 27% of our patients with ILC or IDC-L had a first-degree relative affected by breast cancer.Germline variants of CDH1 are described in ILC. 31,32owever, we only detected one patient with germline pathogenic CDH1 variant, which might stem from the small patient number in our ILC cohort.Previous studies have reported CDH1 and BRCA2 (and not BRCA1 and CHEK2 c.470 T > C) to be strongly associated with ILC. 10,33Despite the small sample size, we observed similar findings; the BRCA2 pathogenic variant, but not the BRCA1 and CHEK2 c.470 T > C, were present in our ILC cohort.The BRCA1, BRCA2, TP53, CHEK2, ATM, and PALB2 genes are well-known for predisposition to IDC. 5 In our IDC/DCIS cohort, in addition to these genes, we identified P, and LP variants in MSH6 and MLH1.The risk association between these genes and breast cancer is not well established. 34he population allele frequency is a main criterion for assessing the pathogenicity of variants per the ACMG guideline.In the currently available genome databases, 59%-94% of sequence data are obtained from European individuals. 16,35,36In one study of the Brazilian population, 207,621 variants identified in their cohort were not reported in major publicly available genomic databases. 37Therefore, these databases may not adequately represent the genomic diversity of underrepresented populations, such as the Turkish population.The genomic diversity of the Turkish population is well-documented.Every subregion of Turkey has a diverse admixture, mainly consisting of four groups of population of Europe, Balkan, Caucasus, and GME. 14nbreeding also plays a crucial role in the genetics of the Middle Eastern populations. 13Turkey's average consanguineous marriage rate is 21.1%. 38This high rate of consanguinity may increase the occurrence of excessively rare variants, 14 as well as the occurrence of common pathogenic or polymorphic variants in Turkish individuals compared to other populations.In our non-cancer control cohort, 6% of the variants were present in the breast cancer cohort.This observation might be due to the population-specific structure.However, the common occurrence of these variants could be explained by reduced or incomplete penetrance of alleles.Further large-scale studies are required to delineate the underlying effect of population genetics in cancer cohorts.
Currently, Turkish Variome is the only publicly available database generated with the genome data obtained from Turkish individuals. 14It includes exome and genome data from non-cancer patients with amyotrophic lateral sclerosis, ataxia, delayed sleep phase disorder, essential tremor, obesity, Parkinson's disease, polycystic ovarian syndrome, and neurological and immunological disorders. 14Despite the smaller size of the Turkish Variome compared to gnomAD, in our cancer cohort, 27/121variants had a significantly higher allele frequency in the Turkish Variome compared to gnomAD (p < 0.0001, 95% CI) the difference in ratio ranging from 1.37 to 354 times.The allele frequency in the Turkish population could potentially be higher when assessed in a larger genome database generated with sequencing data from healthy Turkish individuals.This highlights the critical role of population genetic structure in genetic medicine.It also emphasizes the need for a comprehensive genomic variant database F I G U R E 4 Comparative analysis of variant allele frequency values in gnomAD and Turkish Variome.The differences in allele frequencies of 121 variants in the 2 groups were measured using an unpaired t-test in Prism (GraphPad, version 9.5).The two-tailed p value was calculated with a 95% confidence interval.specific to the Turkish population for the clinical evaluation of patients with Turkish backgrounds.
The accurate assessment of the clinical utility of variants is crucial in patient management.In our cancer cohort, 6.7% (5/75) of VUS were reclassified to likely benign by using Turkish Variome.Although downgrading VUS to LB might not often affect patient management, this reclassification holds clinical importance. 391][42] Patients with reported VUS experience increased anxiety and even unnecessary procedures. 43,446][47] As genome sequence databases grow in volume, VUS can be more accurately assessed and perhaps reclassified.In a large study on the prevalence of variant reclassification, 91.2% of VUS assessed between 2006 and 2018 were downgraded to likely benign or benign. 48The reclassifications are feasible due to the accumulation of larger genome databases and additional supporting evidence.][51][52][53] Therefore, large prospective genetic studies are needed to assess the larger reclassification of variants in Turkish and other populations.Furthermore, the development of large gnomAD-like databases representing the Turkish population and making such knowledge bases publicly available is a crucial next step in addressing the current limitations.Future directions should include the assembly of aggregate allele data from healthy individuals as well as different disease groups, such as cancer, neurology, etc.This disease-and population-specific collation of databases would allow the utility of an accurate allele frequency value in variant classifications and, therefore more informed clinical management of patients with specific genetic conditions.Data sharing is equally crucial in addressing genomic disparity.Making population-specific genome aggregate databases publicly and freely accessible will allow clinicians and researchers to utilize the variant data.
While this study underscores the importance of ancestry-appropriate databases in genomic medicine, several limitations should be acknowledged.Individuals without cancer diagnosis but with a strong family cancer history could not be included in the study.The patient cohort in the cancer genetics center at UTRH largely represents those who are referred to this center for cancer evaluation.Therefore, most patients have a personal history of cancer, with or without a family history.Another limitation may stem from the timeline when the personal and family cancer history of the non-cancer cohort was collected.Given that data were collected in 2016-2017, some of this information might have changed over time.
Additionally, the time difference in testing for the noncancer and cancer cohort was about 4-5 years.Although the number of genes in the testing panel remained the same, the informatics and annotation tools were updated to newer versions over time.To account for possible changes in annotations, all germline variants in this study were assessed for population allele frequency at the time of this study.
In conclusion, we comprehensively investigated the genetic susceptibility of breast cancer subtypes.To our knowledge, no other studies investigated these subtypes of breast cancer in the Turkish population.This study highlights the role of a population-specific genome database in the genetic assessment of breast cancer in the Turkish population.The statistically significant difference of variant frequencies between gnomAD and Turkish Variome indicates the importance of creating and using ancestryappropriate genomic databases to decrease inequity in genomic medicine.Further, it shows the potential effect of genomic databases on clinical management by decreasing the reported VUS ratio.More globally, it presents the need for creating larger genomic databases from genetically underrepresented populations for patient care in addressing healthcare disparities in genomic medicine.

ACKNOWLEDGMENTS
We would like to thank our patients for their participation in our study.This project has been financially supported by "The Istanbul Development Agency (ISTKA)" under project number YNY2016/144.

F
I G U R E 1 (A) Distribution of cancer genes harboring reported variants in patients diagnosed with ILC, (B) IDC-L, (C) IDC, and (D) in the non-cancer cohort.Orange color represents commonly affected genes in all these four groups.Gray represents shared genes between (A), (C), and (D).The yellow color represents shared genes between (B), (C), and (D).Green and blue colors represent common genes between (A) and (D) and between (C) and (D), respectively.Red represents a unique gene in the non-cancer control cohort.investigation of patients with lobular and ductal breast carcinomas from the Turkish population by assessing germline genetic, clinical, and histopathological features.Second, we investigated the effect of population genetics in variant classification by assessing germline variants detected in 132 patients with all subtypes of breast cancer and those detected in 492 individuals in the non-cancer group.

F I G U R E 3
List of variants of uncertain significance VUS in three breast cancer cohorts and associated allele frequencies.The allele frequency obtained from Turkish Variome is shown in gray, and from gnomAD is shown in blue.

and tumor characteristics Invasive lobular carcinoma/mixed invasive ductal and lobular carcinoma of the breast Ductal breast carcinoma
Demographic and tumor characteristics of patients with breast carcinoma.
T A B L E 1Abbreviations: ER, estrogen receptor; HER2, human epidermal growth factor receptor; IDC-L, mixed invasive ductal and lobular carcinoma of the breast; ILC, invasive lobular carcinoma; PR, progesterone receptor.