Estimation of carrier frequencies utilizing the gnomAD database for ACMG recommended carrier screening and Finnish disease heritage conditions in non‐Finnish European, Finnish, and Ashkenazi Jewish populations

American College of Medical Genetics and Genomics (ACMG) recommends offering Tier 3 carrier screening to pregnant patients and those planning a pregnancy for conditions with a carrier frequency of ≥1/200 (96 genes for autosomal recessive [AR] conditions). Certain AR conditions referred to as Finnish disease heritage (FINDIS) have a higher prevalence in Finland than elsewhere. Data from gnomAD v2.1 were extracted to assess carrier frequencies for ACMG‐recommended AR and FINDIS AR and X‐linked genes in Finnish, non‐Finnish European, and Ashkenazi Jewish populations. Following variants were considered: ClinVar pathogenic or likely pathogenic, loss‐of‐function, and Finnish founder variants. Gene carrier (GCR), cumulative carrier (CCR), and at‐risk couple rates (ACR) were estimated. In Finnish population, 47 genes had a GCR of ≥0.5%. CCRs were 52.7% (Finnish), 48.9% (non‐Finnish European), and 58.3% (Ashkenazi Jewish), whereas ACRs were 1.4%, 0.93%, and 2.3% respectively. Approximately 141 affected children with analyzed AR conditions are estimated to be born in Finland annually. Eighteen genes causing FINDIS conditions had a GCR of ≥0.5% in the Finnish population but were absent in the ACMG Tier 3 gene list. Two genes (RECQL4 and RMRP) had GCR of ≥0.5% either in non‐Finnish Europeans or Ashkenazi Jewish populations. Results highlight the need for careful curation of carrier screening panels.

Historically, the carrier frequencies in Finnish population have been estimated based on the incidence data or utilizing relatively small numbers of Finnish individuals available at the time of the gene and founder variant identification (Pastinen et al., 2001;Peltonen et al., 1999).
The current large projects such as Genome Aggregation Database (gnomAD) have also included a large number of Finnish individuals, and gnomAD v2.1 data have exome or genome data available for 12,562 Finnish individuals; gnomAD v3 data have 5255 Finnish individuals with their genome data (Karczewski et al., 2020).gnomAD v2.1 data are mainly exome data containing a smaller proportion of individuals with genome data but larger population sample size, whereas gnomAD v3.1 is based on genome data but has smaller total population sample size compared with v2.1.
Identifying the risk couples, who are at risk of having a child with severe disease condition, would allow for genetic counseling, family planning, and preparing for investigations of the newborn after birth if needed for early treatment.Increased reproductive choices via family planning could include prenatal diagnosis, preimplantation genetic testing, utilizing a gamete donor, adoption, or deciding not to have children (Gregg et al., 2021).There are data suggesting that offering carrier screening could also reduce financial costs in health services (Schofield et al., 2023).Many countries have adopted population carrier screening, such as for Ashkenazi Jewish, where certain severe recessive conditions were relatively frequent.With carrier screening and prenatal/ preimplantation genetic testing, a decrease in the prevalence of tested severe disorders has been noted (Singer & Sagi-Dain, 2020).
Recently, Guo and Gregg (2019) utilized the gnomAD exome data to estimate carrier rates using variants classified as pathogenic or likely pathogenic in the ClinVar database (excluding conflicting interpretation variants) across six major ancestries for 415 genes associated with severe recessive conditions.However, they excluded the Finnish subpopulation from their initial analysis.They estimated that from 32.6% (East Asian) to 62.9% (Ashkenazi Jewish) of individuals are variant carriers in at least one of the 415 genes.The recent article also evaluated the Finnish population for the automated ClinVar variant filtering for the pre-selected gene list (Schmitz et al., 2022).However, as some Finnish founder variants do not have ClinVar (Landrum et al., 2018) classifications (TRIM37 founder variant), have conflicting classifications in ClinVar (MKS1 founder variant, NM_017777.4(MKS1):c.1408-34_1408-6del,ClinVar variation ID: 188400, conflicting interpretation, 7Â pathogenic, 1Â variant of uncertain significance [VUS]), are enriched in Finnish populations and rare elsewhere (GLE1, HYLS1), or located in untranslated region (UTR) region (SLC26A2 founder variant NM_000112.4(SLC26A2):c.-26+2T>C),manual curation of these variants is required for establishing accurate carrier frequencies.

American College of Medical Genetics and Genomics (ACMG)
recently published a practice resource for carrier screening (Gregg et al., 2021).They propose that screening for conditions with a severe or moderate phenotype and with a combined carrier frequency ≥1/200 (0.005) (Tier 3) should be offered for all pregnant patients and those planning a pregnancy.The ACMG group recommended 96 AR genes for carrier screening in Tier 3 in the US population, and for the X-linked conditions, 18 genes were considered appropriate for carrier screening.Some of the genes (SMN1, FMR1, and HBA1/HBA2) have variants that require either utilizing tailored bioinformatics pipeline (SMN1 deletion) or complimentary methods (repeat expansion variant in FMR1, common deletions in HBA1/HBA2).
Australia is piloting carrier screening for couples in Mackenzie's Mission research project with an even broader gene list for conditions with childhood-onset life-limiting or disabling conditions or conditions where early diagnosis and intervention would substantially change outcomes.The final gene list for Mackenzie's Mission currently includes 1300 genes (Kirk et al., 2021).

| Genes
gnomAD v2.1.1 data were extracted for the genes causing AR and X-linked conditions belonging to Finnish disease heritage (Kestilä et al., 2010) and for the autosomal genes suggested for carrier screening by ACMG (Gregg et al., 2021).The ACMG suggested X-linked conditions were excluded from the analysis, as the estimates would likely be significant underestimations due to the limitations of gnomAD data in detecting repeat expansions, single or multiexon deletions/ duplications and structural variants and acknowledging the likelihood of de novo variants in X-linked conditions.The known Finnish founder variants for the Finnish disease heritage conditions, often referred to as Fin-major and Fin-minor, were manually identified from gnomAD v2.1.1 data (Table 1).SLC26A2 c.-26+2T>C is in the UTR region, and data for this variant were retrieved from gnomAD v3.1.2.The gno-mAD v2.1 dataset contains data from 125,748 exomes and 15,708 whole genomes, all mapped to the GRCh37/hg19 reference sequence (Karczewski et al., 2020).This v2.1 dataset includes exome data for 10,824 Finnish individuals and genome data for 1738 Finnish individuals totaling 12,562 Finnish individuals.The previous carrier frequency estimation for each variant was obtained from a review by Kestilä et al. (2010).
The final gene list consisted of 123 genes causing AR conditions (Table S1) and two genes causing X-linked conditions (Table S2).Out of the total 125 genes, 34 belonged to Finnish disease heritage.
Of note is the fact that some well-known Finnish founder variants either do not have classifications in ClinVar (June 2022 data) (loss-offunction [LOF] variant in TRIM37) or variants have conflicting interpretations (MKS1).
T A B L E 1 Carrier frequencies for the known Finnish founder variants reported in Finnish disease heritage conditions in gnomAD database.

| Variants
The variants were filtered according to the following principles.All variants reported pathogenic or likely pathogenic (P/LP) in ClinVar, LOF variants in gnomAD (VEP annotations stop_gain, frameshift_variant, splice_acceptor_variant, splice_donor_variant), as well as Finnish founder variants, were included.LOF variants with lc_lof, lcr, lof_flag, mnv, benign, or likely benign tags were excluded.P/LP variants with lc_lof and mnv tags were excluded.All Finnish founder variants were included, regardless of their tags.LOF and Finnish founder variants curated as a VUS were included.Manual evaluation for LOF variants with more than 20 heterozygous individuals and having conflicting interpretation, uncertain significance or no ClinVar submission was performed to check for the effect in different transcript(s) and to assess the relevance based on literature and database findings.This led to exclusion of 11 variants due to LOF variants due to uncertain effect: especially PCDH15 gene had many LOFs located in the last exon in MANE select transcript but not being coding variants in other transcripts (Table S3).
Furthermore, all variants with 10+ homozygotes were excluded; if homozygosity is common, the variant is probably not pathogenic when severe congenital, infantile, or juvenile conditions are in question.The pathogenicity of all variants with 10+ hemizygotes was reviewed from the literature for X-linked conditions.We excluded one AR condition variant as well after studying it from the literature (VPS13B: c.9406-2A>T).
As some variants with conflicting interpretations in ClinVar could still be well-known disease-causing variants or presumed pathogenic variants, we also evaluated the most current ClinVar classifications (October/November 2023) for variants in genes causing AR disorders for the variants that had a variant allele frequency >0.003 either in Finnish, non-Finnish European, or Ashkenazi Jewish population in gnomAD v2.1 data.This approach led to the review of 745 variants with original conflicting interpretation in ClinVar data.For X-linked genes, manual review was performed for three variants with conflicting interpretation in ClinVar.Most variants had conflicting interpretation due to classifications of uncertain significance, likely benign and benign and were not included.After manual review, 16 variants for AR genes, that had majority of ClinVar submissions with pathogenic/likely pathogenic classification and literature data to suggest disease-causing role were added for carrier frequency calculations.At the same time, two variants (presumed LOF variants) with uncertain or conflicting interpretation in ClinVar, as well as one previously likely pathogenic but now interpreted as benign, were excluded from the analyses (Table S3).
The final variant list consisted of 9274 variants causing AR conditions (Table S4) and 12 variants causing X-linked conditions (Table S5).

| Editorial policies and ethical considerations
Ethical approval was not required, as the study used only open-source data.No individuals can be identified from the data.

| Methods
The calculations of this study were implemented in Microsoft Excel for Mac version 16.63.1.Figure 1 was created with Excel and Figure 2 in RStudio version 2022.12.0+353.All used formulas were directly or edited from another article (Guo & Gregg, 2019), except the estimated number of affected children.
Initially, variant carrier rates (VCR) were estimated for each gene variant, and the results were used to estimate gene carrier rates (GCR).The formula of VCR for AR variants was virtually the same as in the article of Guo and Gregg (2019), but the number of homozygotes was multiplied by two for an accurate number of alleles.The formula for X-linked variants is presented here.The formula for GCR was the same as in the article of Guo and Gregg (2019).
The GCR was utilized to estimate the cumulative carrier rate (CCR) and at-risk couple rate (ACR).Guo and Gregg (2019) presented the formula of CCR for both AR and X-linked genes in their article.The formula of ACR for AR genes is virtually the same as in the article of Guo and Gregg (2019).The formula of ACR for X-linked conditions is The estimated number of affected children (NAC) to be diagnosed or born per year in Finland was calculated by ACR of the AR conditions and rounded annual number of births in Finland in 2021, 50,000 as provided by the official statistics in 2021 (Tilastokeskus., 2021).
The estimated frequency of affected children should be one-fourth of the condition's ACR by the Mendelian inheritance principles.
where BR is annual number of births.

| RESULTS
According to GCR, three subgroups were formed: GCR of ≥1.0%, GCR of ≥0.5%, and GCR of ≥0.1%.The purpose of subgroups is to illustrate how different criteria would affect the number of genes screened and the efficiency of screening.

| Finnish disease heritage conditions
The CCR for the Finnish founder variants for Finnish disease heritage conditions (Table 1) was 28.0%.The Finnish founder variants covered 93.5% of the CCR for Finnish disease heritage conditions in the Finnish population.
Most genes linked to Finnish disease heritage conditions had a GCR of ≥0.5% in the Finnish population, all of which were AR: 27 out of 34 genes.For 26 genes, the carrier frequency of Finnish founder variants was enough to raise the GCR by ≥0.5% in the Finnish population.The Finnish population carried distinctly more known or likely pathogenic variants in genes causing Finnish disease heritage conditions than non-Finnish European or Ashkenazi Jewish populations.
The CCR considering both AR and X-linked Finnish disease heritage conditions was 30.0% in the Finnish population, 8.8% in the non-Finnish European population, and 5.8% in the Ashkenazi Jewish population (Table 1).
Altogether, 18 genes causing childhood onset moderate or severe Finnish disease heritage diseases had a GCR of ≥0.5% in the Finnish  S6).Two of these genes (RECQL4 and RMRP) had GCR of ≥0.5% either in non-Finnish Europeans or Ashkenazi Jewish populations.These 18 genes should be included at least in carrier panels designed for the Finnish population.

| Autosomal recessive conditions
When assessing the GCRs per gene, most of the carrier burden was contributed by the variants having pathogenic or likely pathogenic classification in ClinVar (Table S6).Presumed LOF variants that had either no ClinVar submission or conflicting or VUS had generally modest effect on the GCRs.
Most of the analyzed genes had GCR of ≥0.1% in every population studied (Table S6).The AR conditions with the highest carrier frequencies were quite different among the populations, and only five genes had GCR ≥1.0% in all studied populations: CFTR, CYP21A2, GJB2, PMM2, and TYR.In some genes, the Finnish and Ashkenazi Jewish populations did not carry any known or likely pathogenic variants, whereas carrier findings for all genes were observed in non-Finnish Europeans (Figure 1 and Table S6).
GCR of ≥0.5% in the Finnish population was observed in 47 genes (Figure 2 and Table 2).There was variation in carrier frequencies among the studied subpopulations.However, there were also similarities: Out of the 47 genes, non-Finnish European and Ashkenazi Jewish populations had a GCR of ≥0.5% in 22 and 19 genes, respectively.
The genes with GCR ≥1.0% formed at least half of the CCR in each population (Figure 1).The CCRs in AR conditions in the Finnish, non-Finnish European, and Ashkenazi Jewish populations were 52.6%, 48.9%, and 58.3%, respectively (Figure 1 and Table S6).The CCRs of Finnish disease heritage AR conditions in each population were 29.8%, 8.8%, and 5.8%, respectively.
The estimated ACRs in AR conditions for the Finnish, non-Finnish European, and Ashkenazi Jewish populations were 1.13%, 0.89%, and 2.27%, respectively.In the Finnish population, there would be approximately 141 children affected with an AR condition studied per year.(TableS6).

| X-chromosomal conditions
Carrier frequencies for X-linked conditions would be underestimated with the available gnomAD data and for the prevalence of de novo mutations in X-linked conditions.Thus, the genes that ACMG suggested for X-linked conditions (Gregg et al., 2021) were excluded from the analysis.The estimates were calculated for two X-linked Finnish disease heritage genes (CHM and RS1) as their most common pathogenic variants are sequence variants and thus possible to analyze by gnomAD sequence variant data.The CCRs for XX-individuals and the ACRs in the Finnish, non-Finnish European, and Ashkenazi Jewish populations were 0.24%, 0.037%, and 0.00%, respectively (Table 3).

| Genes with non-analyzable most common variants
For CLN3, CSTB, and TYROBP, the short-read next-generation sequencing data do not allow detection of the Finnish founder variants that are either exon level or larger deletions (CLN3 and TYROBP) or repeat expansion variant (CSTB).Similarly, proper carrier frequency estimation could not be performed for HBA1, HBA2, SMN1, and FXN due to variants being exon-level deletions, repeat expansions, or inversions within a gene (Table 4).

| DISCUSSION
In total, CCRs were 52.7%, 48.9%, and 58.3% in Finnish, non-Finnish European, and Ashkenazi Jewish populations, respectively.ACRs were 1.4%, 0.93%, and 2.3%, respectively.There were 47, 44, and 45 genes with GCR of ≥0.5% in Finnish, non-Finnish European, and Ashkenazi Jewish populations, respectively.During the reviewing process, more extensive gnomAD v4.0 data were released: if the analyses were executed with v4.0 data, some of the estimates would likely change.
However, the scale would probably be approximately the same in the analyzed populations.
All the estimated carrier frequencies for each population most likely underestimate the carrier frequency as some variant types are not available for analysis utilizing gnomAD sequence variant data and we did not attempt to classify variants with uncertain significance in ClinVar or without ClinVar classification.Thus, some true diseasecausing variants are not taken into account and similarly, some variants annotated as LOF might not be eventually disease-causing.By utilizing the threshold for having maximum 10 homozygous individuals per variant, some milder/hypomorphic variants are also not included in the analysis.For example, in Finnish disease heritage, two genes (CLN3 and CSTB) causing AR conditions had <0.1% carrier frequency.Due to the most common type of variant in genes CLN3 (small exon-level deletions) and CSTB (repeat expansion) being unable to be correctly analyzed from gnomAD sequencing variant data, the actual GCR may be higher.The carrier rate of CLN3 has been estimated to be 1.4% and CSTB 1.5% in the literature (Kestilä et al., 2010).Similar examples also include SMN1: previously published results suggest a carrier frequency of 1.98% for Caucasian individuals (Westemeyer et al., 2020).We did not find research studying the carrier frequencies of these genes in the Finnish population with as high number of participants.One gene (PCDH15) had quite a high number of variants annotated as LOF, and this gene also has multiple transcripts making the interpretation and classification of variants challenging.Thus, the carrier frequency estimates for PCDH15 gene should be considered with caution.
The results of this study are consistent with similar antecedent studies (Fridman et al., 2021;Guo & Gregg, 2019;Schmitz et al., 2022).However, these carrier frequency estimates are likely to vary according to the samples and technologies used and might differ if gnomAD version 4.0 data had been used.In an IVF study for mostly Caucasian individuals, 766 couples were tested, and 2.6% were at increased risk of having an affected child (Capalbo et al., 2021).In a previous study in Finland, one in three individuals in the Finnish population was estimated to be a carrier.However, the number of studied individuals and variants was low: 2151 samples were analyzed, and 31 variants were monitored (Pastinen et al., 2001).
The estimated GCRs by Zhu et al. (2022) were, in some of the genes, quite dissimilar from this study.For instance, for GALT, the GCR was estimated to be 12.2%, whereas our estimation was 0.024% in the Finnish population (Table S6).A possible explanation for this is that GALT Duarte variant (NM_000155.2(GALT):c.-119_-116delGTCA,ClinVar variation ID: 25111) was most likely included in their study (Zhu et al., 2022), while it was excluded in our analysis.
Most children with Duarte galactosemia do not present developmental differences when compared with controls even if exposed to dairy products (Carlock et al., 2019).Thus, including this type of hypomorphic variant for carrier screening might not be beneficial.Another example could be MKS1: the Finnish founder variant (NM_017777.4 (MKS1):c.1408-34_1408-6del)was excluded for having conflicting classifications in ClinVar, thus causing the estimated GCR to be low (Zhu et al., 2022).
There are some differences between estimations in this study and previous estimations of Finnish founder variants in Finnish disease heritage genes.Mostly, the magnitudes are the same.However, for example, in LCT, the prior estimation was less than half of what is now proposed (Table 1).The dissimilarity is presumably partially caused by how the previous estimations have been formed: they often were based on the number of affected children and the variants they had (Pastinen et al., 2001;Peltonen et al., 1999).The number of studied controls has also been low-usually only a couple hundred Finnish T A B L E 3 X-linked carrier frequencies for sequence variants.people (Pastinen et al., 2001).Some founder variants are also enriched in specific geographic regions within Finland, such as with northern epilepsy (CLN8) in Kainuu, located in northern-eastern Finland (Norio, 2003).No such detailed analysis is possible utilizing gnomAD.
Countries with known founder effects should study the disorder spectrum and carrier rates with focused effort rather than solely relying on international guidelines.The apparent cause for not including most of the Finnish disease heritage genes on the ACMG guidelines was that they considered that Finns are not contributing to the US population in a significant manner.Eighteen genes causing childhood onset Finnish disease heritage disorders had GCR of ≥0.5% in the Finnish population and were not present in the ACMG suggested carrier-screening list.They should be included at least in carrier panels designed for the Finnish population.RECQL4, with ≥0.5% GCR in both non-Finnish European and Ashkenazi Jewish populations, and RMRP, with GCR of ≥0.5% in Ashkenazi Jewish population should be considered to be included in pan-ethnic carrier-screening panels.
When considering the effect of carrying recessive pathogenic variants, focusing on the ACR rather than on the CCR is essential.In real life, most, if not all, people carry at least one recessive pathogenic or LOF variant in their genome (Fridman et al., 2021).Even though being carrier is common, the estimated number of children expected to be born annually with AR conditions analyzed in this study is small, <150, compared with the annual number of births in Finland (50,000 births in 2021).This estimate does not take into account that not all children are born to parents of Finnish ancestry.However, ACR of 0.93%-2.3% in studied populations means that there are many couples at risk of having an affected child.Only those couples who are carriers of either the same variant or different variants in the same gene are at 25% risk of having an affected child with an AR condition.Considering these, in Tier 3 carrier screening, reporting the possible carrier state should be more functional only if both parents are carriers of variants in the same gene.The same has been suggested in earlier studies (Birnie et al., 2021).
Currently, in Finland, follow-up examinations during pregnancy are offered if the risk of a fetus having chromosomal abnormality is 1:250 or higher in combined prenatal screening as stated by the government guidance on the law of providing screenings (Valtioneuvoston asetus seulonnoista, 2011).According to this study, the ACR in Finland is around 1:73, raising the question of whether Tier 3 carrier screening should also be offered.Ethical and practical issues should be considered before Tier 3 carrier screening can be provided, such as which genes and variants to include and how to organize and communicate about the screening.These issues are not discussed in this study.
The Finnish population would probably view population carrier screening positively (Hietala et al., 1995).The reception will also depend on communication with patients and the careful design of the screening protocol.One possible approach is concentrating on a limited but broad enough gene and variant panel with explicit criteria.
Other strategies have also been implemented, such as in Mackenzie's Mission in Australia (Kirk et al., 2021).The reproductive decisions of ACRs might be impacted by the number and severity of screened conditions (Wang et al., 2023).Due to the results and literature reviewed in this study, a carrier-screening pilot in Finland as a research study would give valuable insight into the feasibility, uptake, outcomes, and target population opinions.The pilot could include designing the process, recruiting volunteers, analyzing outcomes, and collecting feedback.It could be beneficial as well to make a similar analysis as in this study to some other online Mendelian inheritance in man (OMIM) genes not recommended by ACMG; Gregg et al., 2021) in case new genes with ≥0.5% GCR arise.
Carrier frequencies in the gnomAD v2.1 database for AR conditions in Finnish disease heritage conditions and ACMG suggested AR conditions.
F I G U R E 2 Genes with gene carrier rate (GCR) ≥0.5% in the Finnish population (FIN).As comparison GCRs for non-Finnish European (NFE) and Ashkenazi Jewish (ASJ) populations.T A B L E 2 Genes with GCR >0.005 in Finnish population are presented.Abbreviations: ACMG, American College of Medical Genetics and Genomics; AR, autosomal recessive; GCR, gene carrier.
Conditions where utilizing gnomAD is not possible due to the most common type of variant being non-analyzable from sequence variant data.