β‐Thalassemia pathogenic variants in a cohort of children from the East African coast

Abstract Background β‐Thalassemia is rare in sub‐Saharan Africa. Previous studies have suggested that it is limited to specific parts of West Africa. Based on hemoglobin A2 (HbA2) concentrations measured by HPLC, we recently speculated that β‐thalassemia might also be present on the East African coast of Kenya. Here, we follow this up using molecular methods. Methods We used raised hemoglobin A2 (HbA2) values (> 4.0% of total Hb) to target all HbAA members of a cohort study in Kilifi, Kenya, for HBB sequencing for β‐thalassemia (n = 99) together with a sample of HbAA subjects with lower HbA2 levels. Because HbA2 values are artifactually raised in subjects carrying sickle hemoglobin (HbS) we sequenced all participants with an HPLC pattern showing HbS without HbA (n = 116) and a sample with a pattern showing both HbA and HbS. Results Overall, we identified 83 carriers of four separate β‐thalassemia pathogenic variants: three β0‐thalassemia [CD22 (GAA→TAA), initiation codon (ATG→ACG), and IVS1‐3ʹ end del 25bp] and one β+‐thalassemia pathogenic variants (IVS‐I‐110 (G→A)). We estimated the minimum allele frequency of all variants combined within the study population at 0.3%. Conclusions β‐Thalassemia is present in Kilifi, Kenya, an observation that has implications for the diagnosis and clinical care of children from the East Africa region.


| BACKGROUND
β-Thalassemia is a heterogeneous group of genetic disorder caused by pathogenic variants in HBB that lead to the reduced (β⁺) or absent(β 0 ) synthesis of β-globin (Weatherall & Clegg, 2002). They are examples of balanced polymorphisms, selection having been driven through a survival advantage against Plasmodium falciparum malaria in heterozygotes at the expense of the early mortality of homozygotes from intractable anemia (Williams & Weatherall, 2012). Such selection has resulted in current carrier frequencies of 1%-20% in a number of populations including the Mediterranean, the Middle East, India, Southern China, and parts of the Far East (Galanello & Origa, 2010;De Sanctis et al., 2017;Weatherall & Clegg, 2002). In contrast to other red blood cell disorders that have been selected in a similar way, β-thalassemia is believed to be generally rare within most of sub-Saharan Africa. Although carrier frequencies of 0.8%-1.7% have been reported in limited parts of Nigeria (Esan, 1970) and Ghana (Weatherall & Clegg, 2002;Weatherall et al., 1971), higher frequencies (of up to 9%) are limited to specific ethno-linguistic groups within Liberia (Willcox, 1975) and occurrences within East and Central Africa have been largely limited to case reports (McGann et al., 2018).
Recently, we reported that an unexpected proportion of children who we recruited into the REACH trial (NCT01966731), through which we are investigating the use of hydroxyurea among children with sickle cell disease in four African countries, were compound heterozygotes for HbS and β 0 -thalassemia (HbS/β 0 -thalassemia;) McGann et al., 2018). Seven percent of participants from our site in Kilifi, Kenya, were carriers of two different β 0 -thalassemia pathogenic variants. Separately, we have also reported that a small proportion of unselected children recruited to a genetics cohort in Kilifi had suggestive evidence for heterozygous β 0 -thalassemia based on HPLC-derived raised HbA 2 values (Macharia et al., 2019). Here, we have investigated this observation further, through sequencing studies in a subset of children from the latter study.
By extrapolation from the β-thalassemia allele frequencies in each of the HbA 2 categories, we estimated the allele frequency among members of the Kilifi Genetic Birth Cohort Study overall at 95 of 31,154 (0.3%; Table 2). Assuming Hardy-Weinberg equilibrium, this equates to an approximate birth prevalence for β-thalassemia heterozygotes and homozygotes of six of 1,000 and one of 100,000, respectively. Given that, on the basis of our HPLC data, the allele frequency for the β s pathogenic variant was approximately 8%, we estimate that the birth prevalence for HbSS within this population approximates to one of 100 and that for HbS/βthalassemia to one of 1,000, meaning that approximately 10% of all cases of sickle cell disease within our study population are due to HbS/β 0 -thalassemia.
We examined the origin of the β-thalassemia chromosomes we identified by constructing a haplotype map using three common variants: rs12788013, rs1609812, and rs713040. These existed tightly in one block (Dʹ > 0.95) and were at overall minor allele frequencies of 10%, 13%, and 14%, respectively. A total of six haplotypes were identified across this LD block (Table 3 and Figure S1). The CCT haplotype was more common in those with the CD22 (GAA→TAA) pathogenic variant, occurring at a frequency of 49%. Conversely, the GTC haplotype was more frequent in those without a β-thalassemia pathogenic variant and in those with other β-thalassemia pathogenic variants, occurring at the following frequencies: (i) no pathogenic variant 88.9%; (ii) initiation codon (ATG→ACG) 100%, IVS1-3ʹ end del 25bp 91.7%; and (iii) IVS-I-110 (G→A) 100% (Table 3). On investigating the origin of participants, 47 of 55 (85%) of infants with the CD22 (GAA→TAA) pathogenic variant were from the Chonyi ethnolinguistic group while all but one (19/20; 95%) of those with the initiation codon (ATG→ACG) were of non-Chonyi origin ( Figure S3).
Finally, we used the estimated allele numbers from within the HbAA subgroup to calculate the sensitivity, specificity, and positive-(PPV) and negative predictive values (NPV) for the diagnosis of heterozygous β-thalassemia on the basis of our various HbA 2 cutoffs. A threshold of ≥ 3.5% gave values of 100%, 98%, 21%, and 100%, respectively, while one of ≥ 4% was associated with values of 91%, 100%, 75%, and 100%, respectively (Table 4).When all participants were included in this analysis, including those with presumed HbAS and presumed HbSS, specificity and PPV estimates for an HbA 2 threshold of ≥ 3.5% dropped substantially to 86% and 4%, respectively, and those for a threshold of ≥ 4% fell to 94% and 9%, respectively (Table S1).

| DISCUSSION
With the exception of North Africa, where both the prevalence and causal pathogenic variants have been well-described previously (Agouti, Badens, Abouyoub, Levy, & Bennani, 2008;Habib & Book, 1982;Hamamy & Al-Allawi, 2013), it is generally thought that β-thalassemia is rare throughout most of the rest of the continent. In the present study, we screened members of the Kilifi Genetic Birth Cohort Study, recruited on the coast of Kenya, for β-thalassemia. We detected four different β-thalassemia pathogenic variants which together affected approximately 0.6% of all study participants.
In a recent report, we showed that HbA 2 values were consistent with heterozygous β-thalassemia in a total of 0.8% of Kilifi Genetic Birth Cohort members with an HPLC pattern suggestive of HbAA (Macharia et al., 2019). In the current study, we focused on a number of subgroups within this cohort which we selected on the basis of HbA 2 value ranges that reflected their relative likelihood of being carriers for β-thalassemia. Among HbAA participants, we found that an HbA 2 value of ≥ 4% was associated with a sensitivity of 91% and a PPV of 75% for β-thalassemia heterozygosity. Nevertheless, despite this high sensitivity, in keeping with previous reports (Gasperini et al., 1993), we found no β-thalassemia alleles in a quarter of this subgroup. This could be explained by the presence of rare pathogenic variants in the δ-globin gene or by other causes of elevated HbA 2 , which include megaloblastic anemia, HIV infections, and hypothyroidism (Weatherall & Clegg, 2002), none of which we investigated in the current study.
When considering the entire population, including those with HPLC phenotypes consistent with either HbAS or HbSS, the use of an HbA 2 level of ≥ 4% resulted in a considerably higher false positivity rate, reflected in a PPV of only 9%. This is almost certainly explained by the presence of glycated HbS and adducts associated with HbS, which have been shown to co-elute with HbA 2 and to result in artifactually raised HbA 2 values when using the BioRad Variant system (Macharia et al., 2019;Suh, Krauss, & Bures, 1996). For this reason, we sequenced all individuals with an HPLC pattern suggestive of HbSS, in whom we found that 10% were compound heterozygotes for both HbS and β 0 -thalassemia. When these subjects were typed for the rs334 pathogenic variant by PCR, the results were indicative of HbAS, emphasizing the need for confirmatory testing of suspected SCD by more than one diagnostic method. As anticipated, we found no occurrences of β-thalassemia among those participants who displayed an HPLC pattern that included both HbA and HbS. As β-globin production is abolished on chromosomes carrying β 0 -thalassemia alleles, the only β-globin that would be produced by a β 0 -thalassemia carrier would be encoded by the complimentary chromosome and, as a result, only one form of β-globin (either HbA or HbS) would be visible on HPLC. While the same is not true for β + -thalassemia, where a variable amount of residual of β-globin would be produced, only two children displayed HPLC patterns in which the quantity of HbS exceeded that of HbA. In both these cases this difference was only marginal and was in the presence of high levels of HbF, probably reflecting the accuracy limits of the HPLC method. No significant differences were seen in % HbA 2 means when comparing β-thalassemia pathogenic variants to each other (p = .25).
To date, more than 300 β-thalassemia pathogenic variants have been identified worldwide (Kountouris et al., 2014), although approximately 90% of cases are caused by only 40 (Thein, 2018). The majority are point mutations or small deletions or insertions that affect HBB function at either the transcriptional, post-transcriptional or translational stages (De Sanctis et al., 2017;Thein, 2018;Weatherall & Clegg, 2002). Pathogenic variants differ by geographical region with only one or two accounting for > 50% of cases within any given region (Weatherall & Clegg, 2002). In North Africa, for example, IVS110G→A is the most common pathogenic variant, which together with CD39 C→T, IVS1-1 G→A, and IVS1-6 T→C is responsible for over 60% of cases (Bennani et al., 1993;Douzi et al.., 2015;Elmezayen, Kotb, Sadek, & Abdalla, 2015;Weatherall & Clegg, 2002). The cluster of pathogenic variants defining β-thalassemia in Kilifi, observed through this study, is dominated by two β 0 -thalassemia variants, CD22 (GAA→TAA; 66.3%) and initiation codon (ATG→ACG; 24.1%), while the remaining cases were explained by one additional β 0 -(IVS1-3ʹ end del 25bp) and a single β + -thalassemia variant ]. Most are rare in other populations, for example, with the exception of our own previous study (McGann et al., 2018), to the best of our knowledge the main pathogenic variant we found (CD22 (GAA→TAA)) has only previously been reported in one individual from the Reunion Republic (Ghanem et al., 1992) in whom the clinical outcome was not described. Similarly, to date, ATG→ACG has only been reported in three members of one family from Yugoslavia (Wildmann et al., 1993), two family members of Swiss origin (Beris, Darbellay, Speiser, Kirchner, & Miescher, 1993), and two family members of Russian origin (Molchanova, 1998). Notably, in all three studies it was reported that HbA 2 levels were higher than those commonly seen in other β-thalassemia pathogenic variants. In this study, we identified 18 β-thalassemia carriers and two HbS/β 0 -thalassemia subjects with this variant in whom HbA 2 levels were also higher than in carriers of other variants, although this observation did not reach statistically significance. Finally, the third most common pathogenic variant observed in this study was IVS1-3ʹ end del 25bp. First identified by Orkin and colleagues (Orkin et al., 1983) in a patient of Indian origin, this pathogenic variant has since been found to be common in a number of Middle Eastern populations. The highest frequencies have been reported in Bahrain, where in one study it accounted for 36% of all β-thalassemia alleles (Jassim, Al-Arrayed, Al-Mukharraq, Merghoub, & Krishnamoorthy, 2000). However, in other Middle Eastern countries the frequencies of this pathogenic variant are not as high, being 7.3% in Kuwait (Adekile et al., 1994), 9.5% in United Arab Emirates (el-Kalla & Mathews, 1997), and 12.9% in Saudi Arabia (el-Hazmi, Al-Swailem, & Warsy, 1995). In the current study IVS1-3ʹ end del 25bp explained 6.8% of the β-thalassemia alleles, and was the cause of HbS/β 0 -thalassemia HbS/β-thalassemia results from coinheritance of both a β-thalassemia and a β s pathogenic variant on contralateral chromosomes. Although the clinical manifestations of HbS/ β-thalassemia are thought to be generally similar to those of HbSS, they can vary depending on the type of β-thalassemia mutation that is co-inherited, which in turn varies from one region to another (Steinberg, Forget, Higgs, & Weatherall, 2009). In particular, depending on the amount of β-globin produced, individuals of the HbS/β + -thalassemia can have a milder form of SCD compared to HbS/β 0 -thalassemia and HbSS (Jha, Mishra, Verma, Pandey, & Lakkakula, 2018;Serjeant, Sommereux, Stevenson, Mason, & Serjeant, 1979;Yadav et al., 2016). Describing the disease phenotypes that are associated with these mutations is important because it can inform guidelines on the better management of individuals suffering from these conditions. In the current cohort we identified 10 individuals with HbS/β 0 -thalassemia; a full description of their clinical phenotype is an aim of ongoing work.
In our haplotype analysis, we found that CD22 (GAA→TAA) was strongly linked to the CCT haplotype, which is different from the background haplotype of the general population (GTC).This suggests that this pathogenic variant may have arrived in Kilifi through gene flow or population migration and that it is unlikely to have arisen on the chromosomal background of the current population. By contrast, the remaining β-thalassemia pathogenic variants were exclusively linked with the native haplotype, suggesting that these pathogenic variants may have occurred de novo within this population. Examination of the ethnic background on which both the different pathogenic variants and haplotypes were found suggests that both relate to specific subpopulations within the cohort as a whole. Further work will be required to investigate the origins of these pathogenic variants in more detail.
On the basis of our current study, we estimate that approximately 0.6% of the Kilifi population carry a β-thalassemia pathogenic variant, which means that some cases of β-thalassemia major should be seen. Nevertheless, we have not previously identified a suspected case within our clinical practice, and, to the best of our knowledge, β-thalassemia major has not been reported from anywhere else in the East Africa region. The most likely explanations for this discrepancy relate to the low predicted birth rate for homozygotes, historically high rates of both all-cause child mortality and severe acute anemia (Macharia et al., 2019), and the lack of diagnostic facilities. Under these circumstances, it seems likely that most homozygotes will have been dying in early life without coming to medical attention. Whatever the explanation, as both malaria and all-cause mortality decline, it is likely that affected children will increasingly survive to the point at which their disease will be recognizable and, as such, clinicians within the region should learn more about β-thalassemia including appropriate methods for its diagnosis and treatment.
Our study has several weaknesses of which the most important were the age range of the children studied and the method by which blood was taken. Switching to adult patterns of hemoglobin production may well have been incomplete, particularly in the youngest subgroup, making the classification of HbA 2 values potentially misleading. In addition, the volumes of blood collected did not allow us to refine our screening strategy to take account of data from complete blood counts or to investigate children for alternative causes of raised HbA 2 values, which include megaloblastic anemia, HIV infections, and hypothyroidism (Weatherall & Clegg, 2002). In contrast, these weaknesses do not detract from the central message of our study which is that β-thalassemia is present in a region in which it was previously not known to occur. While deficiencies in the study design might have led us to miss some cases, our study still provides a minimum estimate of the true prevalence of β-thalassemia within the study population.
In conclusion, we have described the prevalence and spectrum of β-thalassemia in a cohort of children on the coast of Kenya, a region in which this condition has not been previously recognized. In addition, we show the sensitivity of various HbA 2 threshold values as a method for β-thalassemia screening in this population and speculate on the possible origin of the pathogenic variants we have identified. Our study has implications for the clinical care of children from the East Africa region.

| Study design
The study was conducted among members of the Kilifi Genetic Birth Cohort Study that was designed to investigate the impact of host genetic factors on a range of common child health outcomes. The methods for recruitment to this study have been described in detail previously (Macharia et al., 2019;Williams et al., 2009). Briefly, infants 3-12 months of age who were born in the area served by the Kilifi Health and Demographic Surveillance System (KHDSS) between January 2006 and April 2010 were eligible for inclusion in the study (Scott et al., 2012). Capillary blood samples collected from the heel were used for HPLC analysis and DNA extraction. The method and volumes collected did not allow for the conduct of additional tests, including full blood count analysis. The proportions of specific types of hemoglobin were assayed at enrolment on a Variant Classic TM HPLC analyser using the β-thalassemia Short Program (BioRad,Hercules, CA, USA). HbA 2 values were within the β-thalassemia range in a small proportion of recruited children (Macharia et al., 2019), a group that forms the primary focus for the current study.

| Study participants
First, among participants with an HPLC pattern suggestive of HbAA (a pattern showing the presence of HbA and the absence of any abnormal variants), we sequenced subjects selected on the basis of their recruitment HbA 2 values as follows. We sequenced all children with HbA 2 values of ≥ 4.0%, a group in whom we expected to find a high proportion of β-thalassemia carriers on the basis of published reports (Van Delft et al., 2009;Mosca, Paleari, Ivaldi, Galanello, & Giordano, 2009). We also sequenced a random sample of 110 participants whose values were between 3.5% and 3.9%, a cutoff associated with a high sensitivity but low specificity for β-thalassemia (Van Delft et al., 2009;Mosca et al., 2009). Finally, we sequenced a random sample of 114 participants whose values of <3.5%, in whom β-thalassemia should be rare. Next, because the interpretation of HbA 2 values in relation to the diagnosis of β-thalassemia is less clear in the presence of HbS (Suh et al., 1996), we sequenced all participants with an HPLC pattern suggestive of HbSS and a subset of participants with HPLC patterns suggestive of HbAS as follows: (1) a random sample of ~100 participants with each of the HbA 2 cutoffs outlined for HbAA subjects above; (2) all participants with HPLC patterns that showed both HbA and HbS but in whom the concentration of HbS exceeded that of HbA, a pattern consistent with a diagnosis of HbS/ β + -thalassemia (Weatherall & Clegg, 2002).

| DNA extraction and HBB sequencing
Genomic DNA was extracted for all members of the Kilifi Genetic Birth Cohort Study from capillary blood samples collected into EDTA within 7 days of their recruitment using an ABI PRISM 6100 Nucleic AcidPrepStation™ (Applied Biosystems). For the current study, β-globin gene sequencing was conducted on selected participants using samples that had been stored at −80°C. We sequenced amplicons derived by PCR using HBB primers using an ABI 3730xl sequencer and the BigDye Terminators cycle Sequencing Kit (Applied Biosystems), as described in detail previously (Clark & Thein, 2004). In brief, our sequencing method covered a 1.8kb region that included the 5ʹ promoter, 5ʹ and 3ʹ untranslated regions, exons 1-3, and the intervening sequence I and II regions flanking exons 2 and 3.

| Statistical analysis
Differences in mean HbA 2 percentages between different β-thalassemia pathogenic variants were analyzed using one-way ANOVA. p values of < 0.05 were considered statistically significant. All statistical analyses were performed using the R Foundation for Statistical Computing Platform Version 3.1.1 (The, 2017). Linkage disequilibrium analysis and the construction of haplotypes were carried out using Haploview Version 4.2 (Barrett, Fry, Maller, & Daly, 2005).

| AUTHOR CONTRIBUTORS
AWM, SU, CMN, REW, and TNW designed the study and conducted the literature review. GM, collected the clinical data while AWM, JM, MT, GN, and EN assisted with sample preparation and analysis. AWM analyzed data and all the authors helped to interpret the data. AWM, TNW, and SU wrote the first draft of the paper. All the authors contributed to editing the final version.

ACKNOWLEDGMENTS
The study was funded by Senior Research Fellowships awarded to TNW (091758 and 202800) and by core support to the KEMRI-Wellcome Trust Research Programme in Kilifi, Kenya (203077), all awarded by the Wellcome Trust. AWM is supported through the DELTAS Africa Initiative [DEL-15-003]. The DELTAS Africa Initiative is an independent funding scheme of the African Academy of Sciences Alliance for Accelerating Excellence in Science in Africa (AESA) and supported by the New Partnership for Africa's Development Planning and Coordinating Agency (NEPAD Agency) with funding from the Wellcome Trust [107769/Z/10/Z] and the UK government. We thank all the members of staff who helped with data and sample collection and processing at the Kilifi County Hospital and the KEMRI-Wellcome Trust Research Programme, Kilifi, and the study participants and their parents for participating in this study. In particular, we thank the field staff engaged in recruitment to the Kilifi Genetic Birth Cohort Study and in the conduct of the Kilifi Health and Demographic Surveillance System. This paper is published with permission from the Director of the Kenya Medical Research Institute.

CONFLICT OF INTEREST
None of the authors have any conflict of interest to declare.

ETHICS
Individual written informed consent was provided by the parents of all study participants. Ethical approval for the study was granted by the Kenya Medical Research Institute/ National Ethical Review Committee in Nairobi, Kenya (reference number SCC 1058).

DATA AVAILABILITY STATEMENT
The datasets generated and analyzed during the current study are not publicly available because specific permission for public deposition was not obtained at the time of informed consent but are available from the corresponding author on reasonable request.