Corresponding author: Thessalia Papasavva, Molecular Genetics Thalassaemia Department, The Cyprus Institute of Neurology and Genetics, 6, Int'l Airport Ave., Agios Dometios, Nicosia, 1683, Cyprus. Tel: +357 22 392664; Fax: +357 22 392615; E-mail: email@example.com
β-thalassaemia is one of the commonest autosomal recessive single-gene disorders worldwide. Prenatal tests use invasive methods, posing a risk for the pregnancy itself. Development of a noninvasive prenatal diagnostic method is, therefore, of paramount importance. The aim of the present study is to identify high-heterozygote informative single-nucleotide polymorphisms (SNPs), suitable for the development of noninvasive prenatal diagnosis (NIPD) of β-thalassaemia. SNP genotyping analysis was performed on 75 random samples from the Cypriot population for 140 SNPs across the β-globin cluster. Shortlisted, highly heterozygous SNPs were then examined in 101 carrier families for their applicability in the noninvasive detection of paternally inherited alleles. Forty-nine SNPs displayed more than 6% heterozygosity and were selected for NIPD analysis, revealing 72.28% of the carrier families eligible for qualitative SNP-based NIPD, and 92% for quantitative detection. Moreover, inference of haplotypes showed predominant haplotypes and many subhaplotypes with sufficient prevalence for diagnostic exploitation. SNP-based analyses are sensitive and specific for the detection of the paternally inherited allele in maternal plasma. This study provides proof of concept for this approach, highlighting its superiority to NIPD based on single markers and thus providing a blueprint for the general development of noninvasive prenatal diagnostic assays for β-thalassaemia.
β-Thalassaemia is one of the commonest autosomal recessive single-gene disorders worldwide (Weatherall, 2004). In Cyprus, about 12% of the population are carriers (Kyrri et al., manuscript accepted, Haemoglobin), with the IVSI-110 mutation representing 79.8% of the total (Baysal et al., 1992). Currently, fetal genetic material for prenatal diagnosis is sampled by invasive procedures (e.g. chorionic villus sampling or amniocentesis), which are associated with a significant risk of induced abortion (Tabor et al., 1986; Alfirevic et al., 2003). The discovery of cell-free fetal DNA in maternal plasma has opened up new avenues for noninvasive prenatal diagnosis (NIPD, Lo et al., 1997; Lo et al., 1998), but poses a technical challenge, since fetal DNA represents a minor population in maternal plasma and, moreover, is highly fragmented (Chan et al., 2004; Li et al., 2004). Recent studies showed that fetal DNA comprises an average of 10% of free DNA in maternal plasma (Lun et al., 2008a), thus facilitating the development of NIPD approaches. However, approaches that will permit the reliable detection of single-gene mutations or single-nucleotide polymorphisms (SNPs) using cell-free fetal DNA in maternal plasma are still in development. Over the last few years, a number of different strategies have been investigated to accomplish the noninvasive detection of β-thalassaemia using maternal plasma. Allele-specific real-time polymerase chain reaction (PCR) is one of the first approaches that have been used to exclude paternal mutations in the maternal circulation (Chiu et al., 2002). Preferential detection of fetal alleles was achieved through initial enrichment of fetal DNA (Li et al., 2005; Li et al., 2009), whereas others enhanced the production of the mutated fetal allele by employing either peptide nucleic acid (PNA) probes (Galbiati et al., 2008) or COLD PCR (Galbiati et al., 2011). In the specific case of β-thalassaemia, matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) has also been investigated (Li et al., 2009). Moreover, Lun et al. employed digital size selection and investigated relative mutation dosage (RMD) for the NIPD of β-thalassaemia (Lun et al., 2008b). Our group employed the APEX/thalassochip approach, based on the detection of polymorphic SNPs, in order to successfully identify the paternally inherited allele of the fetus in the maternal plasma (Papasavva et al., 2006; Papasavva et al., 2008), whereas more recently Phylipsen et al. employed pyrophosphorolysis-activated polymerization (PAP) in combination with melting curve analysis (MCA) using polymorphic SNPs to detect the paternal allele in maternal plasma of a β-thalassaemia carrier (Phylipsen et al., 2012). The popular NIPD approach of replicating traditional detection of primary mutations in a noninvasive setting has limited scope for future applications, particularly in Cyprus and in other communities where most couples share the same mutation, thus making it impossible to differentiate the maternal from the paternal allele. A more reliable and universal approach to NIPD for β-thalassaemia and other monogenic disorders would be the detection of multiple, redundant features linked to the primary mutations.
In this study, we therefore explore the possibility of exploiting polymorphic SNPs and haplotype analysis for the NIPD of β-thalassaemia. Multiple SNPs can be used to differentiate the maternal from the paternal allele, providing redundancy (Ratip et al., 1997) and thus minimizing the risk of misdiagnosis even if both parents share the same primary mutation.
The aims of this project are (i) the identification of high-heterozygosity SNPs by genotyping the β-globin cluster on random samples from the Greek Cypriot population, (ii) shortlisting and categorization of informative SNPs for carrier couples and their fetus and (iii) the assessment of shortlisted SNPs as markers for β-thalassaemia NIPD.
Material and Methods
The study was approved by the Cyprus National Bioethics Committee and the Ethics Committee of the Cyprus Institute of Neurology and Genetics. All participants gave informed written consent.
Sample Collection and Processing
To identify high-heterozygosity SNPs in the Greek Cypriot population, peripheral blood samples were collected from 75 random nonthalassemic individuals. To determine informative SNPs for NIPD, peripheral blood was collected from 101 family trios consisting of carrier parents and corresponding fetal (chorionic villus) sample. These family trios were referred to us for prenatal diagnosis between 2004 and 2009. Genomic DNA was extracted from all blood samples using the Puregene Blood Core Kit C (Qiagen, Germantown, MD, USA) according to the manufacturer's instructions.
SNP Selection and Genotyping on Random Samples
NCBI dbSNP (Sayers et al., 2011) in combination with the ENSEMBL genome browser (Kersey et al., 2011) were used for the selection of SNPs located throughout the β-globin cluster on chromosome 11. These SNPs were then employed in the genotype analysis of 75 random samples, using genomic DNA and multiplexed PCR, followed by primer extension reaction coupled with the iPLEX chemistry according to the manufacturer's instructions and subsequent MALDI-TOF MS (all Sequenom GmbH, Hamburg, Germany). A set of programs, ProxSNP and PreXTEND, were used to denote proximal SNPs, to check for known effects that might result in assays failing or producing misleading results, and to select unique and specific PCR primers. All PCR primers and extension primers were designed by Sequenom GmbH.
Determination of Suitable SNPs for NIPD
Once the degree of heterozygosity for all SNPs analyzed was determined, a panel of those above 6% was selected to be used for NIPD. The cutoff value regarding heterozygosity was chosen based on previous studies where cutoff values between 5% and 10% showed relatively good statistical power to detect genetic association (Stram, 2004; Wang et al., 2005; De La Vega et al., 2006) and to maximize the number of SNPs to be used for further analysis. In order to ascertain the suitability of the identified SNPs for NIPD and to estimate the percentage of families that would benefit, 101 couples at risk for β-thalassaemia were genotyped for the selected panel.
A total of 101 families from the carrier population were analyzed, each comprising mother, father and fetus. The families were typed for seven β-thalassaemia mutations and 44 SNPs located on the β-globin locus. The trio data were analyzed using PHASE v2.1.1 with options –P1, –X20 and –x5, which, respectively, specify that the data are in the form of family trios, allow extra run-time in the final optimization steps for more accurate haplotype- and population-frequency estimates and perform five independent analyses to select the one with the best goodness of fit for the final output. The resulting haplotype predictions were clustered by calculating a custom distance matrix for the allele sequences and performing hierarchical clustering (R version 2.14.1).
Two hundred sixty-four SNPs located throughout the β-globin cluster on chromosome 11 were selected for analysis and were refined using the program ProxSNP, so that 124 SNPs were rejected and 140 SNPs were selected for subsequent genotype analysis. The relevant amplification primers were designed using PreXTEND, and multiplex reactions designed using MassARRAY® Designer (Sequenom, Inc., San Diego, CA, USA), so that the 140 SNPs resulted in a 35plex, a 34plex, a 33plex, a 28plex and a 10plex for genotype analysis (Table S1). After data acquisition, 13 SNPs turned out to have failed the analysis; therefore subsequent evaluations were performed for the remaining 127 SNPs, only (Table S2). For these, the degree of heterozygosity, i.e. frequency of the heterozygous SNP in the random sample population, was determined (Table 1). Forty-nine of 127 SNPs analyzed, positioned between nucleotide positions 52,118 and 82,429 on reference sequence NG_000007 (http://www.ncbi.nlm.nih.gov/nuccore/NG_000007), have a degree of heterozygosity equal to or higher than 6% (Table 2) and were therefore selected for the determination of their suitability for NIPD (Fig. 1).
Table 1. Degree of heterozygosity (DoH) of the 127 SNPs analyzed for 75 random samples
No. of SNPs
50 to <63
40 to <50
20 to <40
10 to <20
6 to <10
1 to <6
Table 2. Panel of 49 high-heterozygosity SNPs selected for the determination of informative SNPs for NIPD
Heterozygous (A/B) No of samples
Allele A No of samples
Allele B No of samples
SNPs were categorized according to their position relative to the β-globin gene and pseudogene sequences, as follows: [A] γ–Ψβ intergenic region, [B] Ψβ intragenic region, [C] Ψβ–δ intergenic region, [D] δ–β intergenic region, [E] β intragenic region and [F] post-β distal region. Nucleotide positions on the human genome are indicated relative to the 5′ end of reference sequence NG_000007. NA, not available; DoH, degree of heterozygosity.
Identification of Suitable SNPs
In order to ascertain the suitability of the identified SNPs for NIPD and to determine the percentage of the families that would benefit, the informative SNPs for each family were identified. One hundred one couples at risk for β-thalassaemia carrying seven different mutations were genotyped for the 49 highly heterozygous SNPs. More specifically, our population sample consisted of 404 chromosomes, of which 202 were normal and 202 were β-thalassaemia chromosomes. DNA samples from parent couples and the corresponding fetal chorionic villus samples were randomly assigned to positions on the 96-well plate and genotyped blindly for the highly heterozygous SNPs. Of 49 SNPs, six failed to be analyzed, so that our genotyping data for the sample population comprise 43 SNPs.
Informative SNPs are essential to determine the paternally inherited allele and occur if a mother is homozygous for an allele (i.e. A/A) for which the father is heterozygous (i.e. A/B). In that case, the SNP will allow the determination of the paternal allele and hence inferrance of the phase, normal or β-thalassemic. If the father is homozygous for the alternative allele (i.e. B/B), then the SNP is not used to determine the phase of the allele, but to confirm the presence of the paternal allele in the maternal sample. When the informative SNPs for 101 couples at risk for β-thalassaemia were ascertained (Fig. 2, dark grey), the analysis showed that 72.28% of the families have at least three informative SNPs, thus fulfilling our criteria for an accurate NIPD with internal redundancy. For the same set of samples, single markers would have allowed much lower coverage of carrier couples, and without the benefit of inherent redundancy of markers. For instance, analysis of the most common informative single SNP, rs3813727, would only have been informative for 30.7% of families examined, and analysis of disease-causing mutations alone would only have allowed analysis of 26.7% of families, since 73.3% of couples share the same mutation. However, our analysis revealed that 27.72% of the families at risk have fewer than three informative SNPs and are consequently unsuitable for NIPD. Therefore, there is a strong need to have a large number of highly heterozygous SNPs, so that more families will fulfil the criteria for NIPD based on the qualitative detection of a diagnostic SNP signature.
A means of increasing the number of families that would benefit from NIPD would be a relative quantification of SNPs in maternal blood, rather than their qualitative detection. To this end, one could adapt and expand the RMD approach for use with multiple SNPs. RMD analysis is based on the detectability of allelic over- or underrepresentation of sequences in plasma DNA, compared to the genotype of the mother, and was first described and developed by Lun et al. for the detection of CD41/42 (-CTTT) and haemoglobin E mutations on the HBB locus (Lun et al., 2008b). In this vein, suitable SNPs for RMD detection are present when the mother is heterozygous for the SNP (A/B), regardless of the genotype of the father. In addition, informative SNPs for quantitative analyses also result if the mother is homozygous for one allele (A/A) and the father is either heterozygous (A/B) or homozygous for the alternative allele. Based on these considerations, the proportion of the 43 SNPs was determined that was suitable for quantitative RMD-based genotyping of the 101 carrier couples under study (Fig. 2, medium grey). The analysis showed that 91.09% of families could be analyzed based on this approach, with only 8.91% of families showing fewer than three and thus insufficient informative SNPs to allow NIPD without internal redundancy. Moreover, if one uses a combination of both, qualitative and quantitative, approaches, 99.01% of families could be analyzed for NIPD with only 0.99% having insufficient suitable SNPs for analysis (Fig. 2, light grey).
Haplotype Analysis and Association with β-globin Gene Mutations
In addition to the identification of potentially informative and suitable SNPs in the Cypriot population, we also aimed to extrapolate prevailing haplotypes to investigate the linkage disequilibrium between individual SNPs and hence their usefulness in NIPD. The genotypes of the mother and father were thus phased against the genotypes of their children in order to obtain estimates of haplotypes in the carrier population of Cyprus. The inferred haplotypes were then clustered hierarchically to characterize haplotype subgroups in the parent population (Fig. S1). Seventy-four predicted haplotypes fell below a frequency of 0.25% (i.e. below 1 in 404 haplotypes) and were therefore statistically not represented in the parent population of 202 individuals. Although a haplotype with a lower predicted frequency might still be present in the sample and one with a higher predicted frequency might still be absent, 0.25% appeared the least arbitrary cutoff point to choose for a statistical analysis of inferred haplotypes. Any plainly impossible haplotypes (such as a co-occurrence of different primary mutations within the same haplotype) were effectively excluded using this cutoff (Fig. S1). The 48 other, high-prevalence haplotypes with a predicted frequency of at least 0.25% together accounted for 88% of all haplotype frequencies and were clustered independently (Fig. 3). Seven major groups of haplotypes were observed at a branch height of 0.2 for these high-prevalence haplotypes. One major cluster (Fig. 3 cluster 5) accounted for 56% of all haplotype frequencies in the test population and contained 10 of the 11 high-prevalence haplotypes containing the IVSI-110 mutation (rs35004220 g/a), with one of these haplotypes predicted to account on its own for 31.37%. Statistically, this haplotype alone is thus responsible for 9.8% (31.37% × 31.37%) of couples whose haplotype phased with the disease allele will be identical and who will therefore not be distinguishable with the given set of SNPs by purely qualitative detection.
Prevalence, migratory spread, severity and expense of currently available and mostly palliative treatment of thalassaemia make its prevention a global health care priority (Lederer et al., 2009). Therefore, the potential impact of routine NIPD for β-thalassaemia is immense, making the development of a reliable NIPD assay of paramount importance. The detection of paternally inherited disease alleles of the fetus provides potentially robust identification of healthy fetal genotypes. With the aim to increase the reliability of an NIPD assay for β-thalassaemia and to extend the range of testable families, we investigated the use of multiple SNPs for the identification of the paternal allele and its potential association with the mutation. In the current study, we analyzed 75 samples from the Greek Cypriot population for 127 SNPs and determined the degree of heterozygosity for each SNP. Those SNPs with the highest heterozygosity were shortlisted for the identification of informative SNPs per family, with a view to using them as NIPD markers. There is no clear-cut consensus on the number of SNPs required for the clinical prediction of genotypes, as this cutoff critically depends on the reliability of SNP detection and on their linkage disequilibrium with the trait under study and for the population at hand (Doescher et al., 2012; Huang et al., 2012; Phylipsen et al., 2012). For the sake of this analysis, we will assume that three SNPs constitute the minimum acceptable under any scenario, providing merely twofold redundancy for failure of SNP detection, while still comparing favourably to commonly accepted diagnostic approaches for the direct detection of primary mutations, which provide no redundancy (Vrettou et al., 2003; Baris et al., 2010). Our study showed that more than two-thirds (72.27%) of the couples examined had three or more informative SNPs and as a result might benefit from the proposed NIPD approach, given improvements in SNP detection technology. This provides an unprecedented endorsement for the potential of SNP analysis in the effective and accurate NIPD of β-thalassaemia. Statistically, for half of the couples suitable for testing (36.14%), detection of a normal paternally inherited allele would be expected and invasive prenatal diagnosis could be avoided. As a limitation for the set of SNP markers investigated in this study, 14.85% of the families had only one or two and 12.87% of families had no SNPs informative for NIPD. Our analysis, based on initially 127 SNPs and with a lenient threshold of three informative SNPs, thus excluded almost one-third of families tested, demonstrating the importance of having a large set of SNPs for genotyping when using conventional detection and analysis methods.
Based on the same threshold, however, coverage of families for our own data set would be greatly extended by using alternative analysis approaches. When using a quantitative approach analogous to RMD, more SNPs fulfil the inclusion criteria, to the extent that based on our analysis 91.09% of families would benefit from this approach with only 8.91% of the families excluded. In cases where both approaches, qualitative and quantitative, are applied complementarily, almost all families (99.01%) can be analyzed for NIPD. RMD is based on the quantitative discrimination of small imbalances in concentrations between mutant and wild-type alleles in maternal plasma (Lun et al., 2008b). Therefore, the quantitative RMD-based deduction of the fetal genotype currently relies on technology at the cutting edge of the field, of high precision and analytical power. This present limitation is set to be overcome with further development and a standardization of methods and, like conventional qualitative detection, will be far more robust when based on multiple SNPs than when relying on detecting a single primary mutation.
Indeed, in line with previous publications (Ratip et al., 1997), the current work has shown for β-thalassaemia in Cyprus that in comparison to methods based on single markers, analysis drawing on multiple SNPs has markedly increased statistical power. Moreover, the suggested approach is scalable, and with the ongoing identification of additional SNPs in our laboratory, will enable the differentiation of the maternal from the paternal haplotype with a higher level of accuracy and for a greater proportion of carrier couples. Importantly, many of the SNPs identified as informative in the present study are already routinely used in our laboratory in experimental NIPD analyses (Papasavva et al., 2006, 2008 and unpublished data), thus paving the way from the present conceptual study towards actual clinical application of our findings.
Island populations and those with a high level of consanguinity generally pose challenges to haplotype analyses, as they show a lower genetic diversity and hence fewer informative SNPs. In line with this, the albeit numerous estimated haplotypes in the carrier population of Cyprus have low frequencies, with only seven haplotypes accounting for the majority of population diversity, so that the observed alleles between partners were often similar in the families under study. Moreover, the SNPs close to and co-segregating with the primary mutations are not helpful in the inference of haplotypes. To achieve better coverage of genetically homogeneous carrier populations in addition to quantitative and global detection technology, we therefore propose the addition of further markers for NIPD. Prime candidates currently under investigation are further SNPs covered in the present study and highlighted as genetically diverse in existing databases holding data for Mediterranean and European haplotypes (e.g. http://www.ensembl.org).
In the absence of alternatives, inclusion of additional SNPs is the logical choice for reliable NIPD of β-thalassaemia. Conventional wisdom has it that a balance needs to be struck between universality and therapeutic certainty on the one hand and cost of a routinely applicable test on the other. It is becoming apparent, however, that with the development and fall in price of high-throughput detection methods, such as next-generation sequencing technology, the cost of including additional SNPs and the reliability of quantitative (RMD-analogous) analyses might soon become marginal concerns. Although there are still many technical barriers and the cost is considerably high, paired-end massively parallel sequencing has been used to measure relative haplotype dosage by analyzing thousands of SNPs for deducing the maternal inheritance of the fetus (Lo et al., 2010), while more recently the whole genome sequence of a fetus was noninvasively determined and measured with sequencing (Fan et al., 2012; Kitzman et al., 2012). Other, more mundane problems persist, however, inherent to any prenatal diagnosis. For the purposes of the present study and inference of parental haplotypes, our analysis could rely on the availability of the fetal sample, in addition to the parental samples. In a clinical setting, NIPD analysis using SNPs would need to draw on additional family members in order to detect the haplotype and phase the paternally inherited allele, owing to the absence of information on the fetal genotype. In a real-life scenario, however, grandparental samples are often unavailable, thus posing a practical limitation to haplotype-based NIPD analyses. A possible means to overcoming this limitation is direct molecular haplotyping, an approach that does not rely on pedigree data and does not require previous amplification of the entire genomic region containing the selected markers (Ding & Cantor, 2003). The advent of rapid and cost-effective techniques for DNA sequencing has made the measurement of the fetal genome easier. Digital PCR strategies have been used to successfully deduce the parental haplotypes of β-thalassaemia carriers, obviating the need for more family members, although the method is laborious and depends on the distribution of SNPs across the analyzed region (Lam et al., 2012). Moreover, the determination of the whole fetal genome has been demonstrated without the need for the paternal DNA (Fan et al., 2012; Kitzman et al., 2012).
This work demonstrates the feasibility of using SNP-based analyses for NIPD in families with β-thalassaemia and outlines the current limitations and necessary improvements for this method. The identification of further diverse and potentially informative SNPs for the target population of this study will move the results presented here further towards a clinical application of NIPD for β-thalassaemia. Moreover, the principle of using multiple heterozygous SNPs for NIPD of monogenic diseases, also in target populations of low genetic diversity, can be equally applied to other disorders, so that the proof of concept provided here might serve to encourage development of corresponding SNP-based NIPD approaches for other disorders. A future aim is the study of the frequency and applicability of the identified SNP markers in other neighbouring countries.
We thank Dr. Petros Kountouris for contributing statistical analysis. We thank Mrs. Elena Kyriacou for her formatting assistance.