Next‐generation sequencing improves molecular epidemiological characterization of thalassemia in Chenzhou Region, P.R. China

Abstract Objectives Thalassemia is a highly prevalent monogenic inherited disease in southern China. It is important to collect epidemiological data comprehensively for proper prevention and treatment. Methods In this study, blood samples collected from 15 807 residents of Chenzhou were primarily screened by hematological tests. A total of 3973 samples of suspected thalassemia carriers were further characterized by combined next‐generation sequencing (NGS) and Gap‐PCR. Results In total, 1704 subjects were diagnosed as thalassemia carriers with a total prevalence rate of 10.78%, including 943 α‐thalassemia carriers, 708 β‐thalassemia carriers, and 53 composite α and β‐thalassemia carriers. The prevalence rates of α‐thalassemia, β‐thalassemia, and composite α and β‐thalassemia were 5.97%, 4.48%, and 0.34%, respectively. Meanwhile, we characterized 19 α‐thalassemia variations and 21 β‐thalassemia variations in thalassemia carriers. Approximately 2.88% of thalassemia carriers would be missed by traditional genetic analysis. In addition, four novel thalassemia mutations and one novel abnormal hemoglobin mutation were identified. Conclusions Our data suggest a high prevalence of thalassemia and a diverse spectrum of thalassemia‐associated variations in Chenzhou. Also, combined NGS and Gap‐PCR is an effective thalassemia screening method. Our findings might be helpful for prevention and treatment of thalassemia in this region.


| INTRODUC TI ON
Thalassemia is one of the most prevalent monogenic inherited disorders in the world. Approximately 5% of the population worldwide are thalassemia carriers. Due to population growth in recent decades, the number of births suffered from thalassemia is increasing, especially in developing and low-income regions. 1,2 Thalassemia is characterized by reduced or even absent production of one of the subunits of hemoglobin. The majority of adult hemoglobin is composed of two α-globin and two β-globin subunits, while fetal hemoglobin is composed of two α-globin and two γ-globin subunits.
Thalassemia mainly consists of α and β-thalassemia. For α-thalassemia, because of the absence or reduced production of α-globin chains, excess β chains or γ chains form non-functional tetramers, which are called hemoglobin H and hemoglobin Bart's, respectively. Hemoglobin H could form inclusion bodies which are harmful to erythrocytes. On the contrary, β-thalassemia is caused by little or reduced production of β-globin chains. Hence, erythrocytes would be damaged by insoluble aggregates formed by excess free α-globin chains. 1,2,4 Clinically, thalassemia has variable manifestations ranging from absence of symptoms to fatal. Thalassemia is mainly classified as thalassemia trait, thalassemia intermedia, and thalassemia major according to clinical severity. The latter two subgroups are also diagnosed as thalassemia patients. The phenotypic severity of the disease mainly correlates with degree of imbalance of α:non-α chains. 5 Although prognosis for thalassemia has been markedly improved, lifelong care is required for many cases. 6,7 Proper treatment brings substantial financial burden to patients as well as society in prevalent areas. 11 Accordingly, prevention of births with thalassemia is particularly important. Comprehensive molecular epidemiological data of the disease are necessary for proper prevention and treatment. At present, combined reverse dot blot (RDB) and Gap-PCR is the most commonly used method in identifying thalassemia mutations. 12 The major limitation of these methods is that only common variations could be identified. Therefore, it is required to develop novel technology to screen mutations comprehensively. Recently, next-generation sequencing (NGS) was used to screen thalassemia carriers in a few studies of China. 13,14 These studies indicated that the spectrum of thalassemia-associated variations was much broader than previously reported and suggested that NGS was an effective method in screening thalassemia-associated variations to facilitate diagnosis.
Thalassemia is popular in tropical and subtropical regions, including South China. Chenzhou is the southernmost city of Hunan Province, People's Republic of China, and sits on the border of Hunan and Guangdong provinces. In China, Guangdong and Guangxi provinces have the highest prevalence of thalassemia. Our previous results showed a high prevalence rate of thalassemia in Chenzhou by RDB and Gap-PCR. 16 However, the spectrum of thalassemia variations was not comprehensive. We speculated that many types of thalassemia variations could be missed. Here, we firstly combined NGS and Gap-PCR in screening thalassemia variations in Chenzhou Region to assess thalassemia variation burden comprehensively and its potential application in preventing births with thalassemia.

| Primary hematological screening
Peripheral venous blood samples were obtained from all participants.
All samples were primarily screened with routine blood examination and/or hemoglobin electrophoresis. Subjects were considered as suspected thalassemia carriers if either of the following parameters was tested positive: (a) mean corpuscular volume (MCV) <82 fl and/ or mean corpuscular hemoglobin (MCH) <27 pg, (b) Hb A2 concentration < 2.5% and Hb F concentration < 2%, and (c) Hb A2 concentration > 3.5% and Hb F concentration at >3.5%. Suspected thalassemia subjects were subjected to further genetic analysis.

| DNA extraction
Genomic DNA of suspected thalassemia carriers was extracted from whole blood using QIAamp DNA Blood Mini Kit (Qiagen, Hilden, Germany). The concentration of DNA samples was quantified by

| NGS screening
The full length of HBA1, HBA2, and HBB was amplified by PCR. The amplicons spanned all the exons and introns of HBA1, HBA2, and HBB genes, which ensured that most thalassemia-associated mutations and CNVs in the HbVar database could be detected. Sequencing libraries were constructed according the Illumina HiSeq sequencing library preparation protocol. These libraries were further pairedend-sequenced for 100 base pairs (PE100) with an Illumina HiSeq 2000 machine. The protocol of bioinformatic analysis of identifying hemoglobin gene variations was described previously. 14 All variations were validated with Sanger sequencing.

| Thalassemia carriers found by NGS
In total, 15 807 subjects were primarily screened by hematological examinations and 3973 suspected subjects were further analyzed by NGS and Gap-PCR. Among these subjects, 1704 subjects were diagnosed as thalassemia carriers, including 943 α-thalassemia carriers, 708 β-thalassemia carriers, and 53 composite α and β-thalassemia carriers. The overall prevalence rate of thalassemia in Chenzhou was 10.78%, and the prevalence rates of α and β-thalassemia were 5.97% and 4.48%, respectively. In addition, the rate of composite α and βthalassemia was firstly determined in Chenzhou, which was found in 0.34% of all subjects.
Among 996 carriers with α-thalassemia variations, we identified 19 different variations with 30 distinct genotypes in this study (Table 1) In this cohort, we also found 21 β-thalassemia mutations and 32 genotypes in 761 subjects ( Table 2) Fifty-three subjects were carriers with both α-and β-globin variations (Table 3). Among these carriers, 83.02% of genotypes In addition, 13 abnormal hemoglobin variants were identified in 35 subjects with a carrier rate of 0.22% (Table 4). Among these subjects, nine subjects and five subjects were simultaneously affected by α-thalassemia mutations and β-thalassemia mutations, respectively. Three rare abnormal hemoglobin variants, Hb Zurich-Langstrasse, Hb Yusa, and Hb Genova, were reported for the first

| Characterization of novel mutations
In this study, four novel thalassemia mutations were identified by NGS in five probands and were further confirmed by Sanger sequencing ( Figure 1). Among these mutations, two novel α-thalassemia mutations have been identified in three individuals. The mutations and their hematological parameters are listed in Table 5. The mutation HBA1:c.2T>C, which was a novel mutation of the translation initiation codon of the α1-globin gene, was found in a woman with reduced MCV and MCH. The HBA2:c.6_7insTG, resulting in a completely different polypeptide from the original alpha-globin peptide, was observed in two unrelated individuals. They were both associated with reduced MCV and MCH. One of them was a woman who had two children.
Direct DNA sequencing showed that this mutation was inherited from her mother and her two children were both heterozygous for this mutation. They all show slightly reduced MCV and MCH.
Two novel β-thalassemia mutations were discovered, and the molecular and hematological parameters of each allele are listed in Table 5. These two mutations were both frameshift mutations.

| D ISCUSS I ON
There is a high frequency of thalassemia in southern China, particularly in Guangdong, Guangxi, and Hainan provinces. 18,19 The percentage of thalassemia carriers is relatively low in most regions in Hunan Province. 21 This disorder was neglected by provincial health system in some high-prevalence regions due to limited molecular epidemiological data.
In this study, we report molecular epidemiological data of thalassemia in Chenzhou Region comprehensively for the first time. Our data confirmed that Chenzhou had the highest overall prevalence rate of thalassemia (10.78%) in Hunan Province, which was slightly higher than that of our previous result (10.00%). 16 Meanwhile, the rates of α-thalassemia and β-thalassemia were slightly higher and less than those of our previous results, respectively. These changes were probably caused by different genetic screening methods and/or population mobility and migration. The rate of thalassemia in Chenzhou was significantly higher than the average of Hunan Province (4.18%) 22,23 and was closer to that of Guangdong Province and southern Jiangxi Province. 19,24 Notably, the rates of α and β-thalassemia in Chenzhou had a special distribution pattern. Generally, α-thalassemia occurs at a much higher frequency than that of β-thalassemia; however, the rate of α-thalassemia carriers (5.97%) was close to that of β-thalassemia carriers (4.48%) in Chenzhou Region. This distribution pattern was consistent with the data of   Changsha Region in Hunan Province and indicated that Chenzhou had a lessened rate of α-thalassemia and a relatively higher rate of β-thalassemia compared with surrounding provinces of southern China. 25 Moreover, the rate of composite α and β-thalassemia (0.34%) in Chenzhou was newly determined in this study.
In contrast to six variations and 10 genotypes identified in our previous study, 16 we identified 30 distinct α-thalassemia genotypes with 19 different variations here. Among α-thalassemia genotypes, the most common subtype was --SEA /αα with a remarkable proportion of 67.87%. The proportion was similar to that of Changsha Region in Hunan Province and higher than that of surrounding provinces. 21,25 Apart from these common variation types, a series of rare and novel variations were identified. IVS-I-117 (G>A) was a rare mutation which was firstly reported in Indian population. 26 And HBA2:c.95+5_95+28delGGCTCCCTCCCCTGCTCCGACCCG was only reported once in Malaysia. 27 Interestingly, HBA2:c.184A>T(Lys>End) was a recently reported novel mutation which was also found by NGS in Guangdong Province, China. 28 Furthermore, we identified two novel α-thalassemiaassociated mutations in this study, including HBA1:c.2T>C and HBA2:c.6_7insTG. For β-thalassemia, we identified 21 β-thalassemia variations with 32 genotypes in this cohort, whereas only 13 mutations and 13 genotypes were identified in our previous study. 16 Of the β-thalassemia genotypes, Codons 41/42 (-TTCT)/ β N and IVS-II-654 (C>T)/β N were the most two frequent β-thalassemia subtypes, accounting for 65.31% of the genotypes. And the ranking order of the two major mutations was different from our previous result. 16 We assumed that it was probably due to popula-    In conclusion, we demonstrated the great diversity of thalassemia-associated variations and a high prevalence of thalassemia in Chenzhou Region by applying combined NGS and Gap-PCR technology. The rare and novel variations of this region were identified for the first time. Our findings would be meaningful for prevention and treatment of thalassemia in this region and other high-prevalence areas. Our updated epidemiological data may also draw the attention of local governments in the severity of this disorder, and more public funds might be allocated for prevention. Limitations of our study, such as possible missed carriers during primary screening and complex variations, will be further addressed in the future.

CO N FLI C T O F I NTE R E S T
The authors declare no conflict of interest.