Disclosures: The authors received an honorarium from the Publisher for preparation of this article.
: Professor Jenefer Blackwell, Telethon Institute for Child Health Research, PO Box 855, West Perth, Western Australia 6872, Australia (e-mail: email@example.com).
Ninety per cent of the 500 000 annual new cases of visceral leishmaniasis (VL) occur in India/Bangladesh/Nepal, Sudan and Brazil. Importantly, 80–90% of human infections are sub-clinical or asymptomatic, usually associated with strong cell-mediated immunity. Understanding the environmental and genetic risk factors that determine why two people with the same exposure to infection differ in susceptibility could provide important leads for improved therapies. Recent research using candidate gene association analysis and genome-wide linkage studies (GWLS) in collections of families from Sudan, Brazil and India have identified a number of genes/regions related both to environmental risk factors (e.g. iron), as well as genes that determine type 1 vs. type 2 cellular immune responses. However, until now all of the allelic association studies carried out have been underpowered to find genes of small effect sizes (odds ratios or OR < 2), and GWLS using multicase pedigrees have only been powered to find single major genes, or at best oligogenic control. The accumulation of large DNA banks from India and Brazil now makes it possible to undertake genome-wide association studies (GWAS), which are ongoing as part of phase 2 of the Wellcome Trust Case Control Consortium. Data from this analysis should seed research into novel genes and mechanisms that influence susceptibility to VL.
The parasitic disease visceral leishmaniasis (VL) caused by protozoa of the Leishmania donovani species complex (L. donovani, L. archibaldi, L. infantum/chagasi) is associated with liver, spleen and lymph gland enlargement, fever, weight loss and anaemia, and is fatal unless treated. Ninety per cent of the 500 000 annual new cases occur in India/Bangladesh/Nepal, Sudan and Brazil (http://www.who.int/inf-fs/en/fact116.html). Importantly, 80–90% of human infections are sub-clinical or asymptomatic, usually associated with strong cell-mediated immunity (positive skin-test delayed type hypersenstivity (DTH+); lymphocyte proliferation; interferon-γ T-cell response) to leishmanial antigen (1–4). Understanding the environmental and genetic risk factors that determine why two people with the same exposure to infection differ in susceptibility could provide important leads for improved therapies. Indeed, the intersect of studies on human genetic variation with gene expression studies have the power to influence one of the major bottlenecks in drug development, that of choosing the best targets that represent key points of therapeutic intervention (5). One of the major aims of genetic studies is to identify genes/mechanisms/pathways that contribute to the pathogenesis of disease, for example, by influencing trafficking or survival of the parasite in host macrophages, or the development of a protective type 1 immune response. Genetics can also provide concrete evidence for the role of modifiable environmental variables (e.g. iron, as might be indicated by an association between polymorphism at SLC11A1 and susceptibiltiy to intramacrophage pathogens, cf. below) in determining disease outcome (6). Knowledge gained through both avenues can translate into improved interventions.
Although most genetic studies of VL undertaken to date have been underpowered, some common genes have emerged by studying different populations, while founder effects and population sub-structure in Africa have provided an interesting avenue to identifying new genes that contribute to susceptibility to VL. These studies are highlighted here. For the future, more recent advances in study design and technology promise to provide the power to examine candidate genes with confidence, and to find novel genes influencing the complex phenotypes of VL or DTH response using genome-wide association studies (GWAS).
APPROACHES TO GENETIC STUDIES
Traditional approaches to genetic analysis of complex diseases have included allelic association analysis of candidate genes and linkage analysis using multicase families. The linkage test, usually reported as a LOD score (logarithm of the odds for linkage), is based on genetic recombination events in families and maps disease susceptibility genes into intervals of 10–20 centiMorgans (~10–20 Mb). This approach has generally been used to undertake genome-wide linkage scans (GWLS), i.e. to search for new regions of the genome carrying susceptibility loci, the first such study being to look for genes controlling the complex disease type 1 diabetes (7). Such studies typically genotype all members of multicase families for 400–500 highly polymorphic microsatellite markers spaced at 10–20 centimorgan (cM) intervals across the genome. Due to the multiple testing problem related to studying large numbers of polymorphic markers, Lander and Kruglyak (8) proposed a classification for reporting the results of genome-wide scan data based on the number of times one would expect to see a result at random in a dense, complete genome scan. The thresholds they propose are as follows: ‘suggestive linkage’, where statistical evidence would be expected to occur one time at random in a genome scan; ‘significant linkage’ 0·05 times; ‘highly significant linkage’ 0·001 times; and ‘confirmed linkage’, where significant linkage from an initial scan has been confirmed with a nominal P-value of ≤ 0·01 in a second independent study. The first three categories correspond to point-wise significance levels of 7 × 10−4, 2 × 10−5 and 3 × 10−7 (LOD scores 2·2, 3·6 and 5·4). Although some authors consider these thresholds to be over-conservative (9), they serve as a guide to evaluate the significance associated with the nominal point-wise P-values reported by most authors (cf. VL studies below). Allelic association tests determine direct association between alleles at particular loci, e.g. a candidate gene, or haplotypes of closely linked markers (i.e. markers in linkage disequilibrium or LD with each other) and a disease phenotype. Until recently, this approach was used largely to analyse candidate genes. However, with the advent of technologies that allow upwards of 500 000 single nucleotide polymorphisms (SNPs) to be assayed simultaneously, the so-called ‘SNP-chip’ technology, GWAS have become possible (cf. below). Allelic association is measured over smaller intervals, usually < 1 Mb depending on the extent of LD in the population under investigation. For example, LD generally extends over larger intervals in Caucasian compared to African populations (10). Allelic association studies can be undertaken using either population-based sampling (e.g. case-control), or family-based collections of case-parent trios. Logistic regression analysis is usually used to analyse case-control samples, facilitating adjustment for data on environmental variables. Family-based allelic association tests (e.g. FBAT (11)) based on the transmission disequilibrium test (TDT) (12), which looks for a bias in transmission of alleles from heterozygous parents to affected offspring, are used to analyse case-parent trios/families. Robust tests can be applied to data from multicase families for association testing, taking pedigree or family clustering or known linkage to a region into account. Case-control sampling can be a problem in ethnically admixed populations, where mis-matching of cases and controls can lead to type I errors. The TDT approach, which uses family-based controls, is therefore preferable in ethnically admixed populations. A case/pseudo-control strategy (13) and conditional logistic regression analysis can also be used for trios, where the case is the actual genotype transmitted from parents to the affected offspring, and the pseudo-controls are the 1–3 genotypes (depending on phase) that could have been transmitted. This allows for easy adjustment for data on environmental variables, and extension of the analysis to determine whether multiple loci/SNPs within a gene show independent main effects, or whether one SNP carries all of the information for that gene (i.e. is a haplotype tagging SNP or tag-SNP for markers in LD across the region of the gene associated with disease).
COMPLEXITY AND HERITABILITY OF LEISHMANIASIS SUSCEPTIBILITY
Studies in mice (reviewed in Ref. (14)) provided early support for a strong genetic component to susceptibility to L. donovani infection. In this defined model system, it was possible to demonstrate that different genes control innate vs. adaptive immunity, as expected for a complex disease. In humans, studies based on ethnic differences (15,16), familial aggregation and segregation analysis (15,17–19), and a high relative risk (λ2S = 34) of disease in further siblings of affected sibling pairs (18) support a genetic hypothesis, with longitudinal studies showing a strong interplay between environmental and host factors during outbreaks (1,19,20). Segregation analysis undertaken following total population surveys that measured DTH responses in Peru (21) and Brazil (22) support multifactorial genetic control over a sporadic model. We recently undertook a GWLS for the quantitative DTH response in families from Natal, Brazil (23). Familial correlations were estimated using the fcor program of the SAGE (Statistical Analysis for Genetic Epidemiology) software package (version 4·8 ). Heritability (h2) was estimated from the sibling correlations r using the equation h2 = 2r. In these families there were 440 sibling pairs with a correlation of 0·42, 212 grandparent-grandchild pairs with a correlation of 0·265, and 90 cousin pairs with a correlation of 0·13. This is consistent with genetic control of the DTH response, where first-degree relatives have a stronger correlation than second- or third-degree relatives. Estimated heritability of the DTH immune response was 84%, suggesting a substantial genetic component to variation in induration size, as determined by the DTH skin test.
A number of labs, including our respective labs, have undertaken both candidate gene (24–33) and GWLS (23,34–36) for the VL phenotype (cf. below). In Brazil, we also analysed candidate genes/gene regions (27,32) using the DTH response as a qualitative trait and, as alluded to above, carried out a GWLS analysing DTH as a quantitative trait (QTL) (23). It should be noted, however, that all of the allelic association studies carried out to date were underpowered to find multiple genes of small effect sizes (odds ratios or OR < 2). Similarly, all GWLS using multicase pedigrees were only sufficiently powered to find single major loci, or at best oligogenic control.
THE ISSUE OF SAMPLE SIZE AND POWER
One major problem with all candidate gene studies for infectious diseases reported to date is that they were under-powered (reviewed in Ref. (37)). Until recently, this was a general problem in genetic analysis of complex disease, along with issues relating to study design and population history (reviewed in Refs. (38,39)). Small sample sizes also preclude definitive conclusions being drawn from a number of VL studies (Table 1) reporting no association for candidate genes (24,25,29,30,40,41). Figure 1 compares power to detect association at OR = 1·5 or OR = 2, given different risk allele frequencies, P-values and sample sizes. This shows that 500 trios or case-control pairs have little power to detect association for small effect sizes (OR = 1·5). Even with 1000 trios or case-control pairs, power is limited for risk alleles with frequency < 0·2 for an effect size (OR) 1·5, although low frequency (e.g. 0·10) risk alleles with larger effect sizes (OR > 2) may be detected. All published VL studies have sample sizes < 300 cases, severely limiting power even for hypothesis-driven candidate genes.
Table 1. Summary of reported linkages or associations between candidate genes and VL or DTH phenotypes. Adapted and updated from Ref. (37)
A. Papers Reporting Significant Linkage (L) or Association (A)
Reported results (L, A)
Abbreviations: aff = affected; Ca = case; Co = control; DTH = delayed type hypersensitivity; fam = nuclear families; INT = intron; LOD = log10 likelihood for linkage; msat = microsatellite; ns = not significant; OR = odds ratio; P = probability; Pc = probability corrected for multiple testing; PKDL = post Kala-azar dermal leishmaniasis; RR = relative risk; VL = visceral leishmaniasis.
A number of candidate gene studies have been reported for VL (Table 1) (24–31). Important amongst these are ones arising from murine studies, where both innate immunity under the control of the Slc11a1 (formerly Lsh/Ity/Bcg/Nramp1) gene (42,43) and acquired immunity directed by the major histocompatibility complex (H-2 in mice, HLA in man) (44) were shown to be important. The latter form part of a broader analysis of genes that control T helper 1 (Th1) vs. T helper 2 (Th2) immune responses.
The innate resistance gene SLC11A1
Recent interest has focused on the role of innate immunity in driving the adaptive immune response, particularly in relation to intra-macrophage pathogens. In mice, the archetypal innate resistance gene was first identified as a gene controlling VL caused by Leishmania donovani sensu strictu (reviewed in Ref. (45)). This gene, originally designated Lsh, Ity or Bcg, was also shown to influence innate resistance to Salmonella typhimurium, Mycobacterium bovis BCG, M. lepraemurium and M. intracellulare. Following its identification by positional cloning (46), it was renamed the natural resistance associated macrophage protein (Nramp1). This is now superseded by the functional designation solute carrier family 11a (proton-coupled divalent metal ion transporters) member 1 or Slc11a1, consistent with formal demonstration that the proteins encoded by murine Slc11a1 and human SLC11A1 function as proton/divalent cation (Fe2+, Zn2+ and Mn2+) antiporters (47,48). The protein localizes to the late endosomal/lysosomal compartment of macrophages (49) and has many pleiotropic effects on macrophage function (reviewed in Ref. (45)). In humans, SLC11A1 has been linked to genetic susceptibility to leprosy in Vietnam and to tuberculosis in Brazil and Aboriginal Canadians (reviewed in Ref. (50)). SLC11A1 is globally associated with TB, with both 5′ and 3′ polymorphisms contributing independently to disease risk (51–53). SLC11A1 is associated with HIV (54) and a wide range of autoimmune diseases in humans (reviewed in Ref. (50)).
Polymorphism at SLC11A1 has been linked (24,26) and associated (26) with VL in Sudan, the latter study by us demonstrating allelic association with 5′ (GTn, 274C/T, 469+14G/C) but not 3′ (D543N, 3′UTR TGTG, 3′UTR CAAA) markers within SLC11A1. To date, only the promoter GTn is known to be functional in regulating expression of SLC11A1 (55), modulated by SNPs at –237 bp (56) and –86 bp (H.S. Mohamed & J.M. Blackwell, unpublished) in the promoter region. The activity of the GTn functions by binding Hypoxia-Inducible Factor 1 alpha (HIF1α) to a sequence element within the repeat (57). Preliminary data from a subset of the Brazilian cohort also shows association of VL with the 274C/T (χ2 = 5; P = 0·03) and 469+14G/C (χ2 = 4·28; P = 0·04) polymorphisms (S.E. Jamieson, J.M. Blackwell, M.E. Wilson, S.M. Jeronimo, unpublished). These data require robust analysis in a larger sample. So far the region of the genome that includes SLC11A1 has also failed to register suggestive or significant linkage on any GWLS (23,34–36), but this could also be due to lack of power. It will be of interest to see whether there are positive results for SLC11A1 in ongoing GWAS where much larger samples sizes are being examined for Brazil and India (cf. below).
Genes associated with Th1 vs. Th2 responses
Clinical VL is a complex disease phenotype so we expect multiple genes to influence susceptibility to disease. In particular, genes that regulate induction of an adaptive T cell response will be important. In mice (44) we know, for example, that the right H-2 haplotype controlling adaptive immunity can overcome innate susceptibility caused by mutation at Slc11a1. Preliminary reports (Table 1) of failure to link class II (DR/DQ) and/or class III (TNFA, LTA, TNFa) HLA genes to VL in Brazil (30) or Sudan (24) were underpowered. Case-control studies in Iran (28) (52 cases; 222 controls) and Tunisia (29) (156 cases; 154 controls) reporting association with class I (A25, OR = 13·27, P = 0·004) and class II (DRB1*15*16, OR = 0·54, P = 0·04; DQB1*0201, OR = 0·46, P = 0·03) genes were also underpowered and not robust to multiple testing correction. Preliminary studies in Brazil show associations at TNFA when VL cases are compared to DTH+ (i.e. >5 mm induration) indviduals, and a bias in transmission of alleles at TNFA (59 haplotype transmissions, P = 0·0265; 36 TNFA–308 bp transmissions, P = 0·0006) from heterozygous parents to DTH+ individuals in families (27). Again, the HLA region has failed to provide positive linkages on GWLS carried out to date, suggesting that this is not a major locus controlling susceptibility to VL (23,34–36). The HLA region requires robust analysis in a larger sample.
In murine leishmaniasis, polarization of the adaptive immune response down Th1 vs. Th2 pathways is associated with resistance and susceptibility in different mouse strains (58). The region on murine chromosome 11 that has conserved synteny with human Chromosome 5q23-q33 and carries the genes encoding interleukin 4 (IL-4) and other type 2 cytokines is linked to visceralization of L. major in susceptible BALB/c mice (59). In humans IL-4 is detected in all Brazilian VL patient sera (60), and is at a 13-fold higher level in Indian VL patients compared with controls (61). Interestingly, nonexposed individuals also show differences in patterns of IFN-γ and/or IL-4 production upon stimulation with L. donovani antigens (62), suggesting an inherent bias in response. Candidate gene analysis of IL4 and IL9 within this cluster (Figure 2), whose cytokine products IL-4 and IL-9 mediate Th2 responses, provided evidence for association at IL4 but not IL9 in Sudan (25). We recently reported (32) multiple independent associations across this gene cluster for the DTH phenotype in Brazil (Figure 2). DTH was analysed as a qualitative trait, either as DTH+ (> 5 mm) or DTH– (< 5 mm; resident ≥ 3 years in an areas with > 40% infection rate). No associations were observed for VL, which may reflect low power (107 VL trios). No association was observed DTH+ (176 trios) and SNPs at IL4, but two independent associations were observed in separate LD blocks at LECT2 (OR 2·25; P = 0·005; 95% CI 1·28–3·97; Figure 2: markers L, M, N) and TGFBI (OR 1·94; P = 0·003; 95% CI 1·24–3·03; Figure 2: markers R, T, U). Independent associations were observed for DTH-(118 trios) and SNP rs2070874 (Figure 2: marker D) at IL4 (OR 3·14; P = 0·006; 95% CI 1·38–7·14) and SNP rs30740 (Figure 2: marker O) between LECT2 and TGFBI (OR 3·00; P = 0·042; 95% CI 1·04–8·65). The former is interesting in relation to the innate role that early IL-4 production plays in determining outcome of L. major infection in BALB/c mice. In relation to other genes in the region, LECT2 encodes the leucocyte cell-derived chemotaxin 2 that has neutrophil chemotactic activity. This could be relevant to the DTH+ phenotype, although its principal expression in human hepatocytes and endothelial cells of hepatic arteries, portal veins and central veins make it an unlikely candidate. TGFBI encodes a protein whose transcript is upregulated by TGF-β in adenocarcinoma cells. The gene product keratoepithelin contains an N-terminal signal peptide and a C-terminal RGD motif similar to other adhesion proteins, and is expressed in many tissues. Expression in skin epithelial cells could contribute to DTH, but further candidates, SMAD5 and its antisense transcript DAMS, located ~100 kb distal to TGFBI have not been studied. SMAD5 plays a critical role in the signalling by which TGFβ inhibits cell proliferation. As for SLC11A1 and HLA loci, the 5q23-q33 region has not been positive on GWLS carried out to date (23,34–36). Lack of power precludes speculation as to whether this represents a biological divergence between mouse and man, or simply the fact that we are comparing inbred mouse data with outbred human populations and underpowered human genetic studies. The 5q23-q33 results require robust validation and replication in larger samples, using a more complete set of tag-SNPs across the region.
GWLS FOR VL AND DTH PHENOTYPES
Despite limited power in GWLS for VL, evidence for strong genetic effects in local populations is found in Sudan that relate to recent migration, marital systems and population substructure. One study (34) reported genome-wide significance for a gene on chromosome 22q12 (LOD score 3·5, nominal P = 3 × 10−5, λS = 1·83 for all families; LOD score 3·9, nominal P = 1 × 10−5 if affected towards the beginning of an outbreak) using multicase families of VL from eastern Sudan. This could indicate important differences in innate immune mechanisms that might operate in a naïve population at the beginning of an outbreak, compared to acquired immune mechanisms important as the outbreak progresses. Follow up studies in this population indicate that polymorphisms at IL2RB likely contribute to this peak of linkage (33). This is of interest in relation to GWAS data showing that polymorphism at IL2RB contributes to susceptibility to rheumatoid arthritis (63), another disease where regulatory T cells play an important role. Nevertheless, variation at IL2RB is likely to contribute only a minor component of linkage at 22q12 (33). Other important candidate genes in the region include NCF4 that encodes the p40 subunit of NADPH oxidase, CSF2RB encoding the receptor for GMCSF and LIF encoding leukaemia inhibitory factor. Further studies are required to identify the major genes contributing to the 22q12 peak of linkage in this population in Sudan.
We also reported (35) genome-wide significance for major susceptibility loci at D1S1568 on 1p22 (LOD score 5·65; nominal P = 1·72 × 10−7; empirical P < 1 × 10−5; λS = 5·1) and D6S281 on 6q27 (LOD score 3·74; nominal P = 1·68 × 10−5; empirical P < 1 × 10−4; λS = 2·3)) using multicase families from villages in eastern Sudan. In this case, the linkages were Y-chromosome-lineage and village-specific. The results suggested strong lineage-specific genes within villages due to founder effect and consanguinity in recent immigrant populations. Fine mapping and identification of aetiological genes within these regions at chromosomes 1p22 and 6q27 has been carried out (M. Fakiola, M. Raju, J.M. Blackwell and colleagues, unpublished data) using a combination of dense tag-SNP genotyping and allelic association analysis across multiple biological candidate genes in these regions, re-sequencing, haplotype and LD analysis, in silico bioinformatics analysis, qualitative RT/PCR analysis of tissue and cellular (cell lines, including control vs. classically activated macrophages), and quantitative RT/PCR analysis of gene expression in splenic aspirates from VL patients in India compared to commercially available normal spleen. Allelic association data from India and Brazil (M. Fakiola, JM Blackwell etal., unpublished) identify the gene (DLL1) encoding delta notch ligand 1 as the aetiological gene under the 6q27 linkage peak. DLL1 is expressed in stromal cells and antigen presenting cells, in particular dendritic cells (64,65). Bone marrow stromal cells expressing DLL1 induce the emergence of T/NK cell precursors from human haematopoietic progenitors and are required for T cell lineage specification during early thymocyte development (66,67). Bone marrow stromal cells are targeted by L. donovani, particularly those with macrophage characteristics that support long-term growth of parasites and exhibit increased capacity for haematopoiesis (68). Antigen presenting cells also use Notch signalling to promote Th cell differentiation (69) with DLL1 on dendritic cells interacting with Notch3 of CD4+ T cells to induce the Th1 phenotype (69,70). Dll1 in mice can alter susceptibility to L. major in BALB/c mice by promoting a Th1 response (70). We found that DLL1 was strongly down-regulated in RNA from splenic aspirates of all VL patients compared to control spleen, consistent with a depressed Th1 response in VL patients. Re-sequencing failed to identify novel coding variants at DLL1 that could alter function. Bioinformatic analysis, similar to that undertaken previously (32), pinpoint putative regulatory variants in conserved noncoding sequence as the aetiological variants associated with disease.
At least nine biological candidate genes were identified under the 1p22 linkage peak (M. Raju, J.M. Blackwell etal., unpublished), including BCL10, DDAH1, TGFBR3, GLMN, GFI1, MTF2, DR1, GCLM and VCAM1, which we have investigated in detail using the strategy outlined above. TGFBR3, GLMN, GFI1, MTF2, DR1 and GCLM lie immediately beneath the peak of linkage, whereas BCL10/DDAH1 and VCAM1 lie ~7·5 Mb distal and proximal to the peak of linkage, respectively. The results show independent genetic and functional evidence for association between DDAH1 and VL, and between extended haplotypes across GFI1/MTF2/DR1/GCLM and VL. DDAH1 encoding dimethylarginine dimethylaminohydrolase 1 is of particular interest as an inhibitor of nitric oxide synthase activity (71). Although previous studies had identified the DDAH2 isoform encoded in the class II region of HLA at 6p21.3 as the isoform expressed in macrophages and inhibiting inducible nitric oxide synthase (71,72), we found that DDAH1 was expressed in mature macrophages and was strongly down-regulated in splenic aspirates from all VL patients compared to normal control spleen. DDAH1 lies between the markers D1S207 and D1S2766 for which the nominal P-values for linkage were in the range 5·8 × 10−4 < P < 3·04 × 10−5. Although not at the peak of linkage, these nominal P-values still fall within the Lander and Kruglyak (73) criteria for significant linkage on a genome-wide scan. Although a disease locus found in a family based linkage study might normally be found immediately under the peak of linkage (74), the true susceptibility gene can be displaced by up to 10 cm (= 10 Mb) from the linkage peak, particularly in smaller samples (75,76). Hence, DDAH1 cannot be discounted as the potential aetiological gene under the 1p22 linkage peak. For the biological candidates directly under the peak of linkage, the extended haplotype associations across GFI1/MTF2/DR1/GCLM are accompanied by some evidence for co-regulation at the mRNA level in splenic aspirates from patients. These genes are also of specific interest. GFI1 encoding growth factor independent 1 transcription factor has recently emerged as a major determinant driving Th2 differentiation (77), providing another potential gene in the important immunological pathway of Th1 vs. Th2 differentiation that may be associated with VL disease. MTF2 encoding metal-response element-binding transcription factor 2 is involved in the activation of metallothionein genes in response to heavy metal ions (78). MTFs coordinate the expression of genes such as Glycine Cleavage System H Protein, metallothionein, the zinc-transporter-1, which are involved in zinc homeostasis and protection against metal toxicity and oxidative stresses (79). Recent microarray data shows that metallothionein genes are amongst the most highly upregulated genes in monocyte-derived macrophages infected with L. chagasi (80). Together with evidence for the role of divalent cation homeostasis in regulating intra-macrophage pathogens like L. donovani provided by the innate resistance gene SLC11A1 (reviewed in Ref. (45)), this means that candidacy for MTF2 as a VL susceptibility gene is also strong. DR1 encodes the down-regulator of transcription 1 (also called negative cofactor 2β/NC2β) that inhibits transcription by binding to the TATA box binding protein (81) (TBP). CIITA, the transactivator of major histocompatibility complex class II molecules in antigen presenting cells, requires participation of, and is extremely sensitive to mutations in, TBP (82). GCLM encodes the regulatory light subunit of gamma-glutamylcysteine synthetase, also known as glutamate-cysteine ligase, which is the first rate-limiting enzyme in the de novo synthesis of tripeptide glutathione. Glutathione redox status plan an important role in induction of nitiric oxide synthase (83) and hence production of antimicrobial nitric oxide (84), and GSH modulators have been shown to influence parasite loads in lesions and tissues during the course of L. major infection in susceptible BALB/c mice (85). Work is in progress to further validate these genes as genetic and functional candidates for VL susceptibility at the chromosome 1p22 peak of linkage.
Familial aggregation is also a feature of VL caused by L. chagasi in northeastern Brazil (17), providing a high relative risk (λ2S = 34) of disease in further siblings of affected sibling pairs (18). We undertook (36) a GWLS for susceptibility genes in this ethnically admixed population using 91 families including 215 affected relatives from four ethnically admixed peri-urban populations in northern Brazil. Not surprisingly (because of the ethnic admixture), the primary scan identified multiple regions at low significance level, with weak evidence for linkage retained at chromosomes 6q27 (LOD score 0·99, P = 0·016) and 17q21.3 (LOD score 1·67, P = 0·003) with refined mapping. The peak at 6q27 was coincident with the peak observed in Sudan, suggesting a common susceptibility gene now mapped in an extended sample of affected child-parent trios from the Brazilian study (M. Fakiola, J.M. Blackwell etal., unpublished data). The peak at 17q21.3 was within a cluster of immune response genes, multiple members of which had previously been shown by us to contribute to leprosy and tuberculosis in this region of Brazil (86). Initial analysis of SNPs in genes across the cluster (36) identified the chemokines CCL1 and CCL16 as genes associated with VL in Brazil, but the picture is likely to be more complex. Further analysis using a much larger sample now available in Brazil should provide the power to dissect out the genes contributing to VL susceptibility at 17q21.3. Output from the ongoing GWAS of VL and DTH phenotypes (cf. below) should contribute to this analysis.
As expected, GWLS for the VL phenotype in the ethnically admixed population of Brazil have not achieved genome-wide significance for the numbers of families used in these studies (23,36), nor have our recent studies of 59 Indian multicase pedigrees (77 nuclear families; 372 individuals) with 156 affected relatives (M. Fakiola, A. Mishra, J.M. Blackwell, S. Sundar and colleagues, unpublished data). The 59 families, comprised of 32 Hindu and 27 Muslim pedigrees, underwent a 10-cM GWLS of 515 polymorphic microsatellite markers. Peaks of linkage at P < 0·01 were observed for both religious groups on chromosomes 2q12.2 and 11q14.2, with Hindu-specific peaks on 6p25.3 and 8p23 and Muslim-specific peaks on 1p13.1 and Xq23. These regions were further investigated by genotyping 65 additional Indian VL multicase families and 19 additional polymorphic microsatellites to provide a denser 2–5 cm refined map. Refined analysis that corrects for over-relatedness in the population retained evidence for linkage at 2q14.1 (D2S363; singlepoint LOD = 2·33; P = 5 × 10−4), at 6p25.1 (D6S1617; singlepoint LOD = 1·00; P = 0·016), at 8p23.1 (D8S516; singlepoint LOD = 1·52; P = 0·004), at 11q14.2 (D11S1780; singlepoint LOD = 2·53; P = 3 × 10−4) and at Xq23 (DXS8055; multipoint LOD = 1·58; P = 0·004). These results contribute novel population-specific candidate regions for ongoing studies into VL susceptibility in India. The GWAS underway using a much larger sample of VL cases and controls from India (cf. below) should validate (or not) these peaks of linkage and provide data for allelic association mapping and gene identification.
As mentioned above, we also undertook a GWLS using Brazilian families that segregate for both VL and DTH phenotypes (23). A total of 405 (385 autosomal, 20 sex chromosomal) microsatellite markers were genotyped for 1290 individuals from 191 pedigrees, including 188 VL cases. The VL phenotype was analysed as a qualitative trait, DTH as a continuous quantitative trait based on the actual induration of the skin test (classically a 5-mm induration is considered a positive response). It should be noted that the study was much better power to analyse the quantitative trait, so it cannot be concluded that because a region showed linkage to this trait it was not involved in the qualitative VL trait. Data were analysed using MERLIN (v1·0-alpha) which allowed for inclusion of discordant pairs and variance components (DTH only) (87). The strongest evidence for linkage for VL occurred at D9S1118 on chromosome 9q (LOD = 1·60; P = 0·003). This region was not positive for the DTH phenotype, indicating that a gene here might contribute to susceptibility to disease by a mechanism independent of the kind of cell-mediated immune response that the DTH reflects. Variance components analysis, also performed in MERLIN (88), included age and gender as covariates in the model. Evidence for linkage for DTH was found on chromosomes 2, 13, 15 and 19, the highest at D15S657 (LOD = 2·50; P = 0·0003) and D19S246 (LOD = 1·93; P = 0·0014). Refined mapping by genotyping of additional markers across the peaks of linkage for both the VL and DTH phenotypes is underway. Again, validation of these peaks of linkage and data for allelic association mapping and gene identification should be provided by the GWAS underway (cf. below) for VL and DTH phenotypes for Brazil.
Overall, the results of candidate gene and GWLS point to potential geographical and population differences in the genes controlling susceptibility to VL. This could reflect local adaptation of the parasite to host genetic background, differences in selective pressure leading to different functional variants. However, until the issue of study power is resolved, we cannot speculate further on population-related differences in the genes so far identified in controlling susceptibility to VL.
GWAS FOR VL AND DTH PHENOTYPES
The application of SNP-chip based GWAS has been highly successful in rapidly increasing the number of loci that have been positively associated with complex diseases. For example, the WTCCC study published in 2007 (63) of 14 000 cases of seven common diseases and 3000 shared controls has itself identified 24 independent association signals at P < 5 × 10−7, nine of which were in Crohn's disease, three in rheumatoid arthritis, seven in type 1 diabetes, and three in type 2 diabetes. The increased problem of multiple testing associated with genotyping in excess of 500 000 SNPs in large numbers of cases and controls necessitated this stringent threshold P-value < 5 × 10−7 to achieve genome-wide significance. However, across all diseases, a large number of further signals, including 58 loci with single point P-values between 10−5 and 5 × 10−7, were identified which are likely to yield additional susceptibility loci. A number of papers providing validation of the original WTCCC data have already been published (89–96). The WTCCC study is but one of an increasing number of published GWAS for complex diseases (e.g. Refs. (97–113)) now catalogued at http://www.genome.gov/gwastudies/. As pointed out in the recent News feature on ‘Genetics by Numbers’ in Nature (114), identification of the initial SNP association on a primary GWAS is only the first step on the path to validating and identifying the aetiological genes and associated mechanisms of disease susceptibility. Some papers have demonstrated at least two independent aetiological variants at one locus (6q23) associated with rheumatoid arthritis, while regions like the HLA complex require intensive fine mapping. For example, following the WTCCC type 1 diabetes study, a combined total of 1729 polymorphisms across HLA were genotyped in > 6000 cases and controls across cohorts, and statistical methods of recursive partitioning and regression applied to pinpoint disease susceptibility to the MHC class I genes HLA-B and HLA-A (risk ratios > 1·5; P (combined) = 2·01 × 10−19 and 2·35 × 10−13, respectively) in addition to the established associations of the MHC class II HLA-DQB1 and HLA-DRB1 genes (115). This demonstrates the intensive mapping that must follow any primary GWAS to validate the SNP associations and identify the genes associated with disease, and the statistical power that can be achieved by genotyping large numbers of cases across multiple cohorts. Interestingly, the only GWAS performed in Africa so far (MalariaGen and WTCCC, Nature Genetics, in press) has found evidence for population substructure between geographically adjacent ethnic groups, making it impossible to impute the HbS causal SNP for malaria using current HapMap reference data because it has independently arisen on different haplotypic backgrounds in different parts of Africa.
These recent successes of GWAS for complex diseases using sufficiently powered large sample sizes (reference Figure 1) stimulated us to collect larger DNA banks that would permit this approach to be applied to the study of VL susceptibility. The accumulation of DNA from a total of 1217 VL cases in multicase families or trios/sibships from India (total sample 3630 individuals), together with 1000 separately ascertained unrelated age-, village- and sex-matched controls, and 626 VL cases and 1160 DTH+/900 DTH– individuals in families (total 2882 individuals with parents) from Brazil, underpinned a successful bid to phase two of the WTCCC. The strategy is to undertake SNP-chip analysis of 1000 genetically unrelated VL cases and 1000 unrelated controls from India, and a family-based GWAS using all the DNAs in Brazilian families, i.e. a total of 4880 SNP-chips. Positive hits in the Indian study will be validated using FBAT analysis of dense tagging SNPs in the 1217 VL cases in families to control for possible sub-structure in this population. Across both Indian and Brazilian studies, identification of novel susceptibility and resistance genes, and associated functional aetiological variants, based on these GWAS will involve validation of positive allelic association signals using dense tag-SNPs, re-sequencing, in silico bioinformatic analysis, and mRNA and protein expression analysis in clinical and experimental samples. The results of these GWAS should be in the public domain by late 2009, and will provide a wealth of new data that will seed many novel functional studies on mechanisms of disease that can be translated into better interventions for the future.
We acknowledge the many members of our laboratories who have contributed to research reviewed here. We thank the communities in Sudan, Brazil and India who have participated in our studies. Our research is funded by The Wellcome Trust and the NIH.