Genetic determinants of haemolysis in sickle cell anaemia

Authors

Errata

This article is corrected by:

  1. Errata: Corrigendum Volume 166, Issue 3, 468, Article first published online: 9 July 2014

  • Drs. Gladwin and Steinberg contributed equally to this manuscript.
  • For the Walk-PHAAST Investigators detail see Appendix I.

Correspondence: Martin H. Steinberg, Department of Medicine, Boston University School of Medicine, 72 E. Concord St., Boston, MA 02118, USA.

E-mail: mhsteinb@bu.edu

Summary

Haemolytic anaemia is variable among patients with sickle cell anaemia and can be estimated by reticulocyte count, lactate dehydrogenase, aspartate aminotransferase and bilirubin levels. Using principal component analysis of these measurements we computed a haemolytic score that we used as a subphenotype in a genome-wide association study. We identified in one cohort and replicated in two additional cohorts the association of a single nucleotide polymorphism in NPRL3 (rs7203560; chr16p13·3) (P = 6·04 × 10−07). This association was validated by targeted genotyping in a fourth independent cohort. The HBA1/HBA2 regulatory elements, hypersensitive sites (HS)-33, HS-40 and HS-48 are located in introns of NPRL3. Rs7203560 was in perfect linkage disequilibrium (LD) with rs9926112 (r2 = 1) and in strong LD with rs7197554 (r2 = 0·75) and rs13336641 (r2 = 0·77); the latter is located between HS-33 and HS-40 sites and next to a CTCF binding site. The minor allele for rs7203560 was associated with the −∝3·7thalassaemia gene deletion. When adjusting for HbF and ∝ thalassaemia, the association of NPRL3 with the haemolytic score was significant (P = 0·00375) and remained significant when examining only cases without gene deletion∝ thalassaemia (P = 0·02463). Perhaps by independently down-regulating expression of the HBA1/HBA2 genes, variants of the HBA1/HBA2 gene regulatory loci, tagged by rs7203560, reduce haemolysis in sickle cell anaemia.

The phenotype of sickle cell anaemia is caused by sickle vasoocclusion and haemolytic anaemia (Kato et al, 2007). Haemolysis in this disease has been associated with complications that could result in part from vascular nitric oxide (NO) depletion due to scavenging by free plasma haemoglobin (Gladwin et al, 2004; Morris et al, 2005; Taylor et al, 2008). Plasma haemoglobin is a specific marker of intravascular haemolysis and red cell survival studies are the definitive measurement of haemolysis. Nevertheless, these tests are rarely done and are not available in large cohorts. However, haemolysis can be estimated by the reticulocyte count, lactate dehydrogenase (LDH), aspartate aminotransferase (AST) and bilirubin levels, all of which are commonly measured in cohort studies, although none is specific for haemolysis (Kato et al, 2007; Hebbel, 2011).

High concentrations of foetal haemoglobin (HbF) decrease the polymerization tendency of sickle haemoglobin (HbS) (reviewed in Akinsheye et al, 2011). In sickle cell anaemia with concurrent ∝ thalassaemia, the concentration of HbS in sickle erythrocytes is reduced, decreasing its polymerization tendency (reviewed in Steinberg & Embury, 1986). Both high HbF and ∝ thalassaemia increase the lifespan of the sickle erythrocyte (De Ceulaer et al, 1983; Steinberg & Sebastiani, 2012). However, other genes might also affect cell life span.

We estimated the severity of haemolysis using a principal component analysis of the commonly measured markers of haemolysis (Gordeuk et al, 2009; Minniti et al, 2009). The development of such a component resolves the problem of dealing with correlated predictors in multivariate analyses, allows for adjustment of important confounders, such as gender, hydroxycarbamide use, age, haemoglobin levels, HbF, α thalassaemia and site variability in laboratory assay protocols and standards. The haemolytic score was associated with intravascular haemolysis as measured by plasma haemoglobin and red cell micro particles and had a significant inverse relationship to total haemoglobin levels, HbF and α thalassaemia (Nouraie et al, 2012). Therefore, the score can be used as a robust population-based continuous measure of haemolytic rate and is suited for studies of genetic factors that regulate the intensity of haemolytic anaemia in sickle cell anaemia. We used this score as a phenotype in a genome-wide association study (GWAS).

Methods

Study participants

The patient cohort used for discovery consisted of 1 117 patients from the Cooperative Study of Sickle Cell Disease (CSSCD; NCT00005277) homozygous for the HbS gene or with HbS-β0 thalassaemia. Four-hundredand forty-nine patients from the Pulmonary Hypertension and Sickle Cell Disease with Sildenafil Therapy (Walk-PHaSST) study (NCT00492531) and 296 patients from the Pulmonary Hypertension and the Hypoxic Response in Sickle Cell Disease (PUSH) study (NCT 00495638) with similar haemoglobin phenotypes as the discovery set were used for replication of the discovery findings. The demographics of these studies have been described (Gaston & Rosse, 1982; Dham et al, 2009; Machado et al, 2011). For further validation, targeted genotyping was performed in a third cohort of 213 additional patients with sickle cell disease from London, UK. These studies were approved by the Institutional Review Boards of each participating institution.

Phenotype

We used principal component analysis to derive a haemolytic score from reticulocyte count, LDH, AST and bilirubin using the most informative component as previously suggested (Gordeuk et al, 2009; Minniti et al, 2009). This score is a linear combination of the 4 haemolytic variables with mean of 0.

Serum total bilirubin, LDH, AST, and reticulocyte counts were measured using automated chemical and haematologic analysers. For the CSSCD patients, longitudinal bilirubin measurements were collected from phases 1, 2, and 3 of the study. Only steady state measurements 4 months removed from blood transfusion were used. The longitudinal measurements of these study patients were analysed using a Bayesian hierarchical mixed model that included a random effect per patient to account for the repeated measurements, as well as random intercept and age effects that were allowed to vary with the clinics. The random intercept and age effects were used to remove the between-site systematic differences. Markov Chain Monte Carlo method in Open bugs was used to estimate the predicted values, and log-transformed median predicted values for each of the 4 haemolytic markers were used in the principal component analysis (Milton et al, 2012). For the PUSH and Walk-PHaSST patients, log transformed baseline values were used in the principal component analysis. Patient characteristics for the discovery, replication and validation cohorts are shown in Table 1.

Table 1. Patient characteristics
 CSSCD (n = 1,117)Walk-PHaSSTa (n = 449)
VariableOverallM (n = 585)F (n = 532)OverallM (n = 207)F (n = 242)
Age17·86 (11·71)18·85 (11·98)16·79 (11·31)37·20 (13·20)35·81 (12·94)38·35 (13·34)
Haemoglobin2·11 (0·14)2·10 (0·13)2·12 (0·15)2·24 (0·69)2·26 (0·80)2·23 (0·55)
Haematocrit3·18 (0·15)3·17 (0·14)3·18 (0·17)3·08 (2·51)3·12 (2·54)3·01 (2·51)
HbF1·80 (0·52)1·86 (0·51)1·73 (0·52)1·64 (1·48)1·57 (1·49)1·69 (1·46)
∝ thalassaemia370 (33·3%)189 (32·3%)181 (34%)129 (30·1%)56 (28·4%)73 (31·6%)
AST3·74 (0·29)3·81 (0·27)3·67 (0·20)3·80 (1·35)3·91 (0·53)3·70 (0·50)
LDH6·02 (0·22)5·98 (0·21)6·06 (0·23)6·08 (0·59)6·00 (0·59)6·16 (0·96)
Reticulocyte2·36 (0·40)2·37 (0·39)2·35 (0·40)2·11 (0·73)2·15 (0·73)2·08 (0·73)
Bilirubin1·13 (0·51)1·19 (0·51)1·08 (0·51)2·48 (1·35)2·67 (1·38)2·35 (1·32)
Haemolytic Scoreb−2·97 × 10−18 (1·08)0·20 (1·53)−0·22 (1·49)−2·18 × 10−16 (1·00)0·29 (1·59)−0·23 (1·51)
 PUSH (n = 296)London (n = 213)
VariableOverallM (n = 140)F (n = 156)OverallM (n = 94)F (n = 119)
  1. Summary statistics of patient characteristics in the CSSCD discovery cohort, the PUSH and WALK-PHaSST replication cohorts and in the London validation cohort. For each cohort, the first column reports statistics (mean and standard deviate or frequencies) for all patients included in the analysis and the second and third columns report statistics stratified by gender. Aspartate transaminase (AST), lactate dehydrogenase (LDH), reticulocyte count, bilirubin, haemoglobin and haematocrit are reported as log transformed values in the Walk-PHaSST and PUSH cohorts and sex, age and clinic adjusted values for the CSSCD cohort. M-male, F-female. HbF values are reported as cubic root transformed values in the Walk-PHaSST and PUSH cohorts and sex, age and clinic adjusted values for the CSSCD cohort.

  2. a

    Walk-PHaSST variables measured in SI units.

  3. b

    The haemolytic score is a linear combination of the 4 haemolytic variables with mean of 0. Males have higher values than females in all 4 cohorts studied.

Age11·85 (5·42)12·04 (5·25)11·65 (5·61)33·02 (10·89)32·30 (10·55)33·59 (11·17)
Haemoglobin2·22 (0·57)2·23 (0·69)2·22 (0·39)2·11 (0·18)2·17 (0·18)2·08 (0·16)
Haematocrit3·28 (1·69)3·28 (1·81)3·28 (1·52)   
HbF2·20 (1·98)2·09 (1·96)2·32 (1·97)1·79 (0·48)1·67 (0·47)1·89 (0·46)
∝ thalassaemia38 (32·2%)51 (43·22%)29 (24·58%)75 (35%)28 (30%)47 (39%)
AST3·85 (0·40)3·89 (0·40)3·81 (0·39)3·71 (0·32)3·78 (0·33)3·66 (0·31)
LDH6·17 (0·48)6·17 (0·43)6·17 (0·53)6·04 (0·32)6·04 (0·35)6·04 (0·31)
Reticulocyte2·17 (0·73)2·17 (0·80)2·17 (0·67)5·81 (0·43)5·89 (0·32)5·74 (0·48)
Bilirubin0·99 (0·65)1·08 (0·70)0·91 (0·58)3·86 (0·56)4·00 (0·58)3·75 (0·52)
Haemolytic Score−7·17 × 10−16 (1·08)0·21 (1·62)−0·22 (1·50)1·82 × 10−16 (1·06)0·29 (1·37)−0·23 (1·32)

Heritability of haemolysis

To examine heritability of the 4 haemolytic markers and the haemolytic score in the CSSCD population, we examined their correlation in 90 sibling pairs that could be identified by identity by decent (IBD) analysis in PLINK (Purcell et al, 2007) using the genome-wide single nucleotide polymorphism (SNP) data. As a comparison, we randomly selected 200 unrelated pairs 1 000 times and computed average correlation the 4 haemolytic markers and the haemolytic score among unrelated individuals.

Genotyping

DNA from the CSSCD, PUSH and Walk-PHaSST samples that formed the discovery and replication cohorts were genotyped using Illumina Human610-Quad SNP arrays (Illumina, San Diego, CA, USA) with approximately 600 000 SNPs. All samples were processed according the manufacturer's protocol and Bead Studio Software was used to make genotype calls utilizing the Illumina pre-defined clusters. Samples with less than a 95% call rate were removed and SNPs with a call rate <97·5% were re-clustered. After re-clustering, SNPs with call rates >97·5%, cluster separation score >0·25, excess heterozygosity between −0·10 and 0·10, and minor allele frequency >5% were retained in the analysis. We used the genome-wide identity by descent analysis in PLINK to discover unknown relatedness. Pairs with IBD measurements greater than 0·2 were deemed to be related and related subjects within individual or different studies were removed. We also removed samples with inconsistent gender findings defined by heterozygosity of the X chromosome and gender recorded in the database.

Samples from the validation cohort were genotyped using predesigned TaqMan SNP genotyping assays according to standard Applied Biosystems protocol (Applied Biosystems, Carlsbad, CA, USA). For each sample, 10 ng DNA was dried in a 384-well plate, upon which was added a 5 μl reaction volume (2·5 μl TaqMan universal polymerase chain reactionmaster mix, 2·375 μl Sigma water, 0·125 μl Assay). The samples were run on the ABI7900 under the following thermal cycling conditions: 50°C for 2 min, 95°C for 10 min, 40 cycles of 95°C for 15 s and 60°C for 1 min. For each assay run, 12 control samples and 3 non template controls were included. Genotype data was exported from Sequence Detection System (SDS) 2·2 software (Applied Biosystems).

In all studies, the presence of gene deletion α thalassaemia was directly ascertained by restriction endonuclease analysis or by multiplex polymerase chain reactions (Dozy et al, 1979; Liu et al, 2000).

Analysis

The association between haemolytic score and each SNP was tested with multiple linear regression adjusting for age and sex using an additive model in PLINK (Purcell et al, 2007). Age and gender were both included as covariates as both were significantly associated with the haemolytic score (age P-value = 3·40 × 10−11, sex P-value = 1·74 × 10−11). The minor allele was used as the coded allele in the additive model. To assess population stratification in the CSSCD cohort the genomic control lambda factor was calculated using PLINK. A meta-analysis was also performed with all 4 cohorts with Meta Analysis Helper (METAL) using the inverse variance weighting method (http://www.sph.umich.edu/csg/abecasis/metal). To determine if there was an association between our top genetic variant and α thalassaemia, a χ2 test was implemented in the CSSCD cohort.

To determine the exact coordinates of the hypersensitive site region in chromosome 16, we used the Basic Local Alignment Search Tool (BLAST; Altschul et al, 1990) and VISTA (Mayor et al, 2000) to compare and align sequences in regions containing the regulatory regions of the α-globin cluster using methods described by Hughes et al (2005). Phylogenetic conservation at, or near, the hypersensitive sites was also determined (Hughes et al, 2005).

Results

Patient characteristics

Table 1 shows demographics, haematological data, laboratory measures of haemolysis and the derived haemolytic score for all 4 cohorts. Males had a higher haemolytic score compared with females. Participants in the Walk-PHaSST study were adults and had the highest mean age, patients in the PUSH study were children with the lowest mean age while CSSCD cases included both adults and children and had intermediate ages. There was a significant association between age and the haemolytic score (r = 0·195, P-value = 3·40 × 10−11) and this was adjusted for in the analysis. Cooperative study of sickle cell disease cases did not take hydroxycarbamide, 58% of Walk-PHaSST and 44% PUSH patients were treated with hydroxycarbamide.

The first principal component that was calculated explained 67·4% of the total variance and this measurement was used as the haemolytic score in the CSSCD cohort. When comparing the association between the haemolytic score and each of the 4 markers of haemolysis, as each of the 4 haemolytic markers increased the haemolytic score increased (Fig S1). This consistency of effect across the 4 haemolytic markers indicates that the first principal component is a good marker of haemolysis.

Heritability analysis

In the 90 sibling pairs identified in the CSSCD there was a significant positive correlation (r = 0·24, P-value = 0·02) of haemolytic score compared with 200 unrelated subjects, where there was no correlation, suggesting that the haemolytic score is heritable. Table 2 shows similar heritability for LDH, AST, bilirubin and reticulocyte count.

Table 2. Heritability analysis for haemolytic markers and haemolytic score
Clinical VariableSibling PairsUnrelated Pairs
 CorrelationP-valueCorrelationP-value
  1. AST, aspartate transaminase; LDH, lactate dehydrogenase.

AST0·5593·94 × 10−100·0020·4982
LDH0·3070·0048−0·00130·4978
Serum bilirubin0·3040·0036−0·00540·4866
Reticulocyte count0·4956·01 × 10−080·00130·512
Haemolytic score0·240·022710·00020·5051

Genome-wide association study and targeted genotyping

After quality control there were 569, 554 SNPs left for the GWAS. There was no significant correlation between the top 10 principal components and the haemolytic score, suggesting no evidence for confounding by population stratification.

The Manhattan plot summarizing the results of the GWAS is shown in Fig 1 and the results of the most significantly associated SNPs summarized in Table 3. The QQ plot in Fig 2 shows no inflation and a genomic lambda factor of 1·01 was calculated, indicating that there is no confounding due to population stratification. Although no SNP met the 10−8 level of genome-wide significance, rs7203560 in NPRL3 was associated with haemolytic score with a P value of 6·04 × 10−07. Rs7203560 lies ~30 kb upstream from HS-33. The minor allele of this NPRL3 SNP shows a protective effect; as the number of minor alleles increases the haemolytic score decreases, reflecting decreased haemolysis. These results were replicated in the Walk-PHaSST cohort (β = −0·52, P-value = 0·0143).

Table 3. SNPs associated with haemolytic score
Variant informationMeta- analysisCSSCD
SNPChrGenebpCoded alleleNon-coded alleleMAFβSEP-valueβSEP-value
rs720356016 NPRL3 184390CA0·07−0·440·072·06 × 10−09−0·440·096·04 × 10−07
rs794847111 OR51I2,OR51I1 5471746AG0·21−0·260·043·03 × 10−10−0·210·055·87 × 10−05
rs793842611 OR51I2,OR51I1 5471832GA0·21−0·250·041·09 × 10−08−0·210·056·08 × 10−05
rs244528411 OR51L1 5029703GA0·05−0·820·071·34 × 10−29−0·420·107·39 × 10−05
Variant informationWalk-PHaSSTPUSHLondon
SNPChrGenebpCoded alleleNon-coded alleleMAFβSEP-valueβSEP-valueβSEP-value
  1. Single nucleotide polymorphisms (SNPs) meeting significance threshold (5 × 1004) in the CSSCD study that replicate in the three independent cohorts. The table reports the SNP identifier from dbSNP, chromosome (Chr), physical coordinates [human genome (hg)19], the coded allele in PLINK (also minor allele) and the non-coded allele, the minor allele frequency (MAF), the gene clusters where the SNP is located, and regression coefficient (β), standard error (SE) and P-value in each study. Additive models of association were used in all studies adjusting for age and gender. bp denotes position of the SNP according to hg19 coordinates.

rs720356016 NPRL3 184390CA0·07−0·530·220·014−0·130·310·672−0·420·200·0367
rs794847111 OR51I2,OR51I1 5471746AG0·21−0·350·110·002−0·390·140·004−0·460·150·017
rs793842611 OR51I2,OR51I1 5471832GA0·21−0·370·110·0012−0·390·140·004NANANA
rs244528411 OR51L1 5029703GA0·05−1·320·151·60 × 10−17−1·750·173·01 × 10−210·410·280·146
Figure 1.

Manhattan Plot of Haemolytic Score. Manhattan plot summarizing the results of the genome-wide association study of haemolytic score with minor allele frequency >0·05.

Figure 2.

QQ plot from CSSCD for single nucleotide polymorphismswith minor allele frequency >0·05.

Genome-wide data was not available in the London cohort who were largely of African/West African/Afrocaribbean origin. In this validation cohort, rs7203560 was associated with lower haemolytic score (P-value = 0·03674). Rs7203560 was not as highly associated with the 4 haemolytic markers individually after adjusting for age and gender in the CSSCD cohort (Table 4). These results suggest that using a summary trait (haemolytic score) allows us to have more power to detect traits that are associated with all 4 of the haemolytic variables with consistent effects in comparison to performing 4 individual GWAS.

Table 4. Association of rs7203560 with four haemolytic markers
Haemolytic MarkerβStandard ErrorP-value
  1. The association of rs7203560 with each of the 4 haemolytic markers individually using an additive model and adjusting for age and gender using data from the CSSCD.

  2. AST, aspartate transaminase; LDH, lactate dehydrogenase.

Reticulocyte count−0·150·035·12 × 10−06
Serum bilirubin−0·090·040·0099
LDH−0·070·020·0003
AST−0·070·020·0026

Four SNPs in olfactory receptor (OR) genes on chromosome chr11p; and OR51L2 (rs1391617: β = −0·17, P-value = 0·0003), OR51L1 (rs2445284: β = −0·42, P-value = 7·39 × 10−05), OR51L1/OR51L2 rs7938426: β = −0·21, P-value = 6·08 × 10−05, rs7948471: β = −0·21, P-value = 5·87 × 10−05) were associated with the haemolytic score although below the generally accepted level for genome-wide significance. These results were replicated in the Walk-PHaSST and PUSH cohorts and rs794847 in OR51L1/OR51L2 was validated in the London cohort (Table 3).

When a meta-analysis of genome-wide data was done, rs7203560 and OR receptor polymorphisms all met genome-wide significance levels (Table 3).

Two SNPs located in BCL11A (rs766432: β = 0·22, P-value = 9·27 × 10−07; rs10195871: β = 0·18, P-value = 9·87 × 10−05) that were associated with HbF in sickle cell anaemia in a previous GWAS were also associated with haemolytic score (Solovieff et al, 2010).

Eight SNPs had regression coefficients in the same direction in the CSSCD cohort and the Walk-PHaSST and PUSH cohorts. The results for the annotated SNPs are shown in Table 3.

Linkage disequilibrium patterns

We hypothesized that rs7203560 is a marker for genetic variants in or near the hypersensitive site regions that are known to regulate expression of the HBA1/HBA2 genes (Fig 3).

Figure 3.

Location of the α-globin regulatory elements (blue brackets) within introns of NPRL3 and the linkage disequilibrium patterns of rs7203560 according to HapMap. Rs7203560 is circled in yellow. Circled in red are single nucleotide polymorphisms (SNPs) for which it was possible to calculate linkage disequilibrium. The allele frequencies for the SNPs are as follows: 0·139 for rs7203560, 0·175 for rs13336641, 0·175 for rs9926112 and 0·219 for rs7197554.

Using Haploview, Barrett et al (2005), rs7203560 was in perfect linkage disequilibrium (LD) with rs9926112 (r2 = 1) and in strong LD with rs7197554 (r2 = 0·75) and rs13336641 (r2 = 0·77). SNPs rs7197554 and rs9926112 are located approximately 10 kb upstream from the HS-33 site and rs13336641 is located approximately 3–4 kb upstream from HS-40 and right at the edge of CTCF and other transcription factor binding sites (ENCODE data). While we were able to determine that rs7203560 was in high LD with SNPs near the HS regions, we were unable to determine if our top SNP was in LD with SNPs in the HS regions due to the low minor allele frequencies of these SNPs. Resequencing is required to determine if the haemolytic score is associated with novel variants in the HS region.

Effects of ∝ thalassaemia and HbF

As the number of minor alleles of rs7203560 increases from 0 to 1 to 2, the odds of at least one HBA1/HBA2 deletion (−α3·7) increases (0·33 to 2·18 to 3, respectively; χ22 = 97·32, P-value <0·0001). The odds of having at least one HBA1/HBA2 deletion is 0·39 for individuals without a copy of the rs7203560 protective allele and is 2·23 for individuals with a copy of the protective allele (χ21 = 106·15, P-value <0·0001). To determine if the association seen between rs7203560 and the haemolytic score was a result of gene deletion ∝ thalassaemia, a 2-way analysis of variance was performed including the interaction between rs7203560 and ∝ thalassaemia. The interaction was not significant (χ12 = 0·23, P-value =0·4623) and the lack of interaction shows that α thalassaemia does not modify the association between the haemolytic score and rs7203560.

There was a significant association between HbF and the haemolytic score after adjusting for ∝ thalassaemia status (β = −0·04, P-value = 4·00 × 10−13). After adjusting for HbF and ∝ thalassaemia in a GWAS, rs7203560 remained significantly associated with the haemolytic score (β = −0·28, P-value = 0·00375) indicating that the association between this SNP and the haemolytic score is not solely due to gene deletion ∝ thalassaemia and HbF. When we removed patients who had at least one HBA1/HBA2 deletion the association between rs7203560 and the haemolytic score remained significant (P-value = 0·02463).

Discussion

A composite variable derived from readily available markers of haemolysis was used as a subphenotype of sickle cell anaemia in a GWAS. This approach to estimating haemolysis was recently validated by showing that the haemolytic score was significantly associated with plasma red blood cell micro particles and cell-free haemoglobin concentration, both of which are more direct markers of intravascular haemolysis in sickle cell disease (Cannan, 1958; Reiter et al, 2002; Nouraie et al, 2012). As the baseline rate of haemolysis in an individual with sickle cell anaemia is stable over time, it is a suitable subphenotype in a genetic association study (Taylor et al, 2008).

Some complications of sickle cell anaemia more often appear in patients with high rates of haemolysis (Kato et al, 2007; Taylor et al, 2008). Among these are stroke, leg ulcers, sickle nephropathy, priapism and sickle vasculopathy, estimated by the tricuspid regurgitant jet velocity (TRV). All have been used as subphenotypes in genetic association studies (reviewed in Steinberg & Sebastiani, 2012). α Thalassaemia reduces the risk of all these complications, while the degree of protection afforded by a high HbF level is less clear (Steinberg & Sebastiani, 2012). Insufficient power was present to detect an association of rs7203560 with individual disease subphenotypes associated with haemolysis because of the very small numbers of patients having these complications who were homozygous for the minor allele (data not shown).

Focused genotyping of candidate genetic modifiers of selected subphenotypes have shown that polymorphisms in TGFBR3 and a few other genes have been associated with the haemolysis-associated subphenotypes. These results have not been replicated by GWAS. This is likely to be a result of the small effects of these polymorphisms on a subphenotype that then requires a very large number of cases to meet the stringent significance levels needed when hundreds of thousands of comparisons are being made. It has not yet been possible to assemble cohorts of the requisite size in a rare disorder like sickle cell anaemia.

Rs7203560, a SNP in the first intron of NPRL3 (Nitrogen Permease Regulator-Like 3; C16orf35; chr 16p13·3) was most significantly associated with haemolytic score. NPRL3 is highly conserved and upstream of the human HBA1/HBA2 gene cluster in all vertebrates examined. (Hughes et al, 2005) Its functions are unknown, although, its deletion causes embryonic lethality from multiple cardiovascular defects (Kowalczyk et al, 2012). Within introns of NPRL3 lie the HBA1/HBA2 gene regulatory elements, HS-48, HS-40 and HS-33 that are required for HBA1/HBA2 gene expression (Hughes et al, 2005). SNPs very close to the HBA1/HBA2 regulatory elements are in strong LD with rs7203560. Also located between HS-33 and HS-40 adjacent to rs1333664 is a CTCF binding site. CTCF is a conserved zinc finger protein with many regulatory functions (Phillips & Corces, 2009). We therefore hypothesize that rs7203560 is a marker for one or more variants in or near the HBA1/HBA2 gene regulatory elements that down-regulateHBA1/HBA2 gene expression causing a mild α thalassaemia-like effect. The functional basis for this effect is unknown although it has recently become apparent that polymorphisms in DNase I hypersensitive sites and other regulatory regions play a critical role in defining gene expression (Maurano et al, 2012). The minor allele frequency (MAF) of rs7203560 is 0·139 in the HapMapYoruban population; however, it has a MAF of 0 in the HapMap Central European population. This suggests that, like gene deletion α thalassemia, genetic elements marked by rs7203560 SNP could confer a selective advantage in populations with a high prevalence of malaria.

A meta-analysis of 24 167 individuals of European ancestry and 14 700 Japanese, found SNPs in ITFG3 that were associated with mean corpuscular volume (MCV) and mean corpuscular haemoglobin (MCH) (Ganesh et al, 2009; Kamatani et al, 2010). ITFG3 (integrin alpha FG-GAP repeat containing 3; C16orf9; 16p13·3) is downstream of HBA1 and 41·2 kb from rs7203560. Down-regulating HBA1/HBA2 gene expression should reduce MCV; however, because of the relatively small number of patients we studied we were unable to show a relationship between rs7203560 and MCV.

Six haplotypes of the HS-40 region have been described and haplotypes A and D are most prevalent in individuals of African origin (Harteveld et al, 2002). The common −α3·7 deletion is found on several of these haplotypes. Whether or not HS-40 haplotypes affect HBA1/HBA2 gene expression, as reflected by the α/β globin biosynthesis ratios, or alter the properties of erythrocytes, estimated by MCV and MCH, have not been definitively determined. Expression of a luciferase reporter in K562 cells was greatest when the enhancer was the 350 bp core element of the A haplotype of HS-40(Ribeiro et al, 2009). The relationship of haplotypes of HS-40—haplotypes of other HBA1/HBA2 gene regulatory elements have yet to be described— to rs7203560 is unknown.

In conclusion, a SNP in NPRL3 was independently associated with haemolysis in sickle cell anaemia and was in LD with regions of the genome that regulate HBA1/HBA2 gene expression. Perhaps by independently down-regulating expression of the HBA1/HBA2 genes, variants of the HBA1/HBA2 gene regulatory loci, tagged by the NPRL3 SNP, rs7203560, reduce haemolysis in sickle cell anaemia. As intravascular haemolysis plays an important role in the pathophysiology of sickle cell disease and other diseases where chronic haemolysis is present, like malaria, and in the lesion associated with blood storage, understanding further the genetic basis of red cell destruction has potential therapeutic implications (Rother et al, 2005; Gladwin et al, 2012).

Acknowledgements

This work was supported by National Institutes of Health Grants R01 HL87681 (MHS), RC2 L101212 (MHS), 5T32 HL007501 (JNM), 2R25 HL003679-8 (VRG), R01 HL079912 (VRG), 2M01 RR10284-10 (VRG), R01HL098032 (MTG), R01HL096973 (MTG), P01HL103455 (MTG), the Institute for Transfusion Medicine and the Hemophilia Center of Western Pennsylvania (MTG),Medical Research Grant (UK) G0001249 ID 56477 (SLT). Monika Kowalczyk and Douglas Higgs kindly provided their unpublished results and very important input.

Conflict-of interest disclosure

The authors declare no competing financial interests.

Author contributions

JNM and PS analysed the data, CTB, EM, ED, HR performed laboratory analyses, VRG, GRK, CM, JT, AC, LL-J, SR, OC contributed patients to the studies, JNM, EM, YZ, MN, PS, SLT, MTG and MHS analysed and interpreted data and wrote and edited the manuscript.

Appendix I

Walk-PHAAST Investigators: DB. Badesch1, RJ Barst2, OL Castro3, JSR Gibbs4, RE Girgis5, MT Gladwin6, 7, JC Goldsmith8, VR Gordeuk3, KL Hassell1, GJ Kato9, L Krishnamurti10, S Lanzkron5, JA. Little11, R F Machado12, CR. Morris13, M Nouraie3, O Onyekwere3, EB Rosenzweig2, V Sachdev14, DE Schraufnagel12, MA Waclawiw15, R Woolson16, NA Yovetich16.

1University of Colorado HSC, Denver, CO; 2Columbia University, New York, NY; 3Howard University, Washington, DC; 4National Heart & Lung Institute, Imperial College London, and Hammersmith Hospital, London; 5Johns Hopkins University, Baltimore, MD; 6Division of Pulmonary, Allergy and Critical Care Medicine and the 7Vascular Medicine Institute, at the University of Pittsburgh, Pittsburgh, PA; 8 National Heart Lung and Blood Institute/NIH, Bethesda, MD (personal views that do not represent the Government); 9 Cardiovascular and Pulmonary Branch, NHLBI, Bethesda, MD; 10 Children's Hospital of Pittsburgh, Pittsburgh, PA; 11Albert Einstein College of Medicine, Bronx, NY; 12University of Illinois, Chicago, IL 13 Children's Hospital & Research Center Oakland, Oakland, CA; 14 Translational Medicine Branch, NHLBI, Bethesda, MD; 15 Office of Biostatistics Research, NHLBI, Bethesda, MD (personal views that do not represent the Government) 16Rho, Inc., Chapel Hill, NC.

Walk-PHAAST Intramural NHLBI staff: Research nurses: C Seamon; A Chi; W Coles; Pulmonologists: S Alam; Haematologists: J Taylor; C Minniti; Protocol Management: MK. Hall;

Walk-PHAAST Children's Hospital & Research Center Oakland, Oakland, CA staff: L Lavrisha; W Hagar; H Rosenfeld; Echocardiography lab: C Brenneman; S Sidenko; C Birdsall; W Li, M St. Peter; C Brenneman.

National Heart & Lung Institute, Imperial College London and Hammersmith Hospital, London: Central Middlesex Hospital staff: Ki Anie, G Cho; S Davies; A Gilmore; Hammersmith Hospital staff: M Layton; I Cabrita; G Mahalingam; S Meehan; G Addis; St. Thomas Hospital staff: J Howard; C Woodley.

Ancillary