Genomics of racial and ethnic disparities in childhood acute lymphoblastic leukemia


  • Joshua Yew-Suang Lim BSc,

    1. Department of Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, Tennessee
    2. Department of Pediatrics, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
    Search for more papers by this author
  • Smita Bhatia MD, MPH,

    1. Department of Population Sciences, City of Hope, Duarte, California
    Search for more papers by this author
  • Leslie L. Robison PhD,

    1. Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, Tennessee
    Search for more papers by this author
  • Jun J. Yang PhD

    Corresponding author
    1. Department of Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, Tennessee
    • Corresponding author: Jun J. Yang, PhD, Pharmaceutical Sciences, MS313, Room I5104, St. Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN 38105-3678; Fax: (901) 595-8869;

    Search for more papers by this author

  • We thank Sharon Naron for her editorial assistance.


Although the cure rates of childhood acute lymphoblastic leukemia (ALL) have improved dramatically in the past 40 years, not all children have benefited equally from this impressive progress. Racial and ethnic disparities in the incidence and treatment outcome of childhood ALL persist, with Hispanic children having an elevated risk of developing ALL and one of the lowest survival rates after ALL therapy. A critical barrier to progress is the lack of an understanding of the causes of ALL disparities, particularly racial and ethnic differences in ALL biology. In this review, the authors summarize the current knowledge on population variation in childhood ALL incidence and treatment outcome, discuss the contributing genetic and nongenetic variables, and highlight possible therapeutic interventions to mitigate disparities in ALL. Cancer 2014;120:955–962. © 2013 American Cancer Society.


Acute lymphoblastic leukemia (ALL) is the most common cancer in children, with approximately 3500 new cases each year in the United States.[1] Cure rates have improved dramatically over the past 40 years, attributed largely to risk-adapted combination chemotherapy.[2, 3] However, racial and ethnic disparities persist in both the incidence and treatment outcome of ALL. For example, Hispanic children have not only the highest incidence of ALL[4, 5] but also one of the lowest survival rates in the US populations.[6-8] African American (AA) children are least likely to develop ALL,[5, 9] but they fare worse with ALL therapy than European Americans (EAs) and Asians.[6, 7, 10]

Although epidemiologic studies have repeatedly documented racial and ethnic differences in ALL incidence and outcome, the underlying causes remain poorly understood. Recent high-throughput genomic profiling of ALL (eg, genome-wide gene expression, DNA copy number, and single nucleotide polymorphism [SNP] genotype) has dramatically improved our understanding of the genetic landscape of this disease and has also led to the discovery of novel molecular markers for treatment individualization.[1, 3, 11, 12] In this review, we provide an overview of epidemiologic findings of racial and ethnic disparities in childhood ALL; and, more importantly, we discuss the plausible genomic basis of these variations in ALL incidence and treatment outcome (Fig. 1).

Figure 1.

Genetic and nongenetic factors that influence racial and ethnic disparities in childhood acute lymphoblastic leukemia (ALL) are illustrated.

Although there is a perception that race categorization is based more on physical appearance and ethnicity is more related to cultural identity, the exact distinctions between these 2 terms are not well defined. For the purpose of this review, we focus primarily on ALL disparities among the 5 population groups: EA, AA, Hispanic American (HA), Asian American (Asian or Pacific Islander), and Native American (American Indian/Alaska Native [NA]) in the United States. Self-reported race and ethnic categories are used throughout the article unless otherwise specified (eg, genetic ancestry). There is also increasing evidence for critical roles of epigenetic regulation in ALL pathogenesis,[13-15] but its relevance in racial and ethnic disparities of ALL has yet to be characterized and, thus, is not discussed here.

Race, Ethnicity, and Genetic Ancestry

The definition of race and ethnicity has a long and tumultuous history in medical research, and a focal point of contention is whether or not there is a biologic basis of racial and ethnic classification.[16, 17] As humans evolved in the past 100,000 years, genetic variation inevitably arose across the genome as results either of random mutation or of selection imposed by environmental factors, forming the basis of interindividual variability in a wide range of phenotypic traits. During human diasporas before modern days, individuals were more likely to mate with one another if they lived in close proximity, and this assortative mating pattern is likely to be the driving force of genetic differences among geographically divided populations (eg, Africans, European, Asians, etc).[18, 19] However, we commonly divide individuals into groups (or self-identify) on the basis of biologic (eg, physical appearance) or nonbiologic (eg, language) features, without the appreciation of human genetics (ie, genetic ancestry). Therefore, depending on the criteria used, race and ethnicity can be completely genetics-based (genetic ancestry) or entirely nongenetic (language). This introduces enormous heterogeneity within self-reported racial and ethnic groups: eg, 2 individuals both self-identified as AA can have drastically different levels of African genetic ancestry. This is particularly problematic for Hispanics, arguably one of the most broadly defined ethnic groups in the United States. With various degrees of admixture among Europeans, Africans, and NAs, the genetic ancestry composition of Hispanics is extremely diverse.[20-22] Hispanics in Florida are more likely to be of Cuban origin and have much higher African genetic ancestry compared with Hispanics in California of Mexican descent, in whom there are high levels of NA genetic ancestry. Conversely, replacing self-reported race or ethnicity with genetic ancestry can overlook potentially critical contributions of environmental or cultural factors. Therefore, it is prudent to recognize the limitations of self-reported race and ethnicity as well as those associated with genetic ancestry. The discussion of racial and ethnic disparities in cancer would not be comprehensive without considering both genetic and nongenetic features as well as the interactions between the two.

Racial and Ethnic Differences in Susceptibility to Childhood ALL

ALL incidence rate and molecular subtypes in different racial and ethnic groups

On the basis of the National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) registries, the overall age-adjusted incidence of childhood ALL (diagnosed between ages 0 and 19 years) is 31.9 per million person-years,[23, 24] accounting for approximately 27% of all pediatric cancers and disproportionally affecting children between the ages of 2 and 5 years. Although age patterns are consistent across racial or ethnic groups, the incidence of ALL differs markedly. Children of African descent have a significantly lower incidence of ALL compared with EAs (14.8 vs 35.6 per million, respectively),[4, 5] whereas childhood ALL is most common among HAs (40.9 per million person-years).[4, 5] A higher incidence rate was also reported for NAs, although with a relatively small sample size.[9]

Although ALL is generally distinguished from other hematologic malignancies by over-representation of lymphoblast cells in the bone marrow, it can be further divided into subtypes with distinct immunophenotypes (B-cell ALL [B-ALL] or T-cell ALL [T-ALL]) and/or genetic abnormalities (chromosomal translocations: t[12,21] with ETV6-RUNX1 fusion, t[1,19] with TCF3-PBX1 fusion, t[9,22] with BCR-ABL1 fusion, MLL rearrangements at 11q23, numeric chromosomal gain [hyperdiploidy of 47-50 or ≥51 chromosomes] or loss [hypodiploidy of <45 chromosomes]). In a population-based survey using the SEER database of 4952 cases of childhood ALL, Kadan-Lottick et al observed a significant over-representation of T-ALL in AA children (9%) compared with EA children (5%).[7] Similarly, among 8447 children with ALL who were treated on Children's Cancer Group (CCG) protocols from 1983 to 1995, the incidence of T-ALL was 1.7-fold higher in AAs than in EAs.[6] In a study of 412 children with ALL who were consecutively treated at St. Jude Children's Research Hospital, a higher prevalence of T-ALL again was evident in AAs.[25] Similarly, TCF3-PBX1 fusion was also over-represented in AAs with ALL, whereas ploidy abnormalities and other translocation events were not.[25] In a small cohort of children with ALL in California (N = 53), ETV6-RUNX1 translocation was more common in EAs than in HAs,[26] but this was not validated in a larger national study of 2534 children with ALL.[27]

Genetic basis for racial and ethnic differences in ALL incidence

The etiology of ALL is likely to be complex, with genetic and environmental factors collectively contributing to leukemogenesis. Several congenital genetic abnormalities have been linked to predisposition to childhood ALL, lending support to a genetic basis for ALL susceptibility. For example, children with Down syndrome (constitutive chromosome 21 trisomy) are at a significantly elevated risk of developing acute leukemia,[28] particularly ALL with somatic cytokine receptor-like factor 2 (CRLF2) lesions.[29] Inherited interindividual genetic variations (eg, differences in DNA sequence between individuals) are common across the human genome and often are related to geographic ancestry of racial or ethnic groups.[19] Thus, genetic polymorphisms can contribute to racial and ethnic differences in ALL incidence if the frequency of a susceptibility variant differs by race or ethnicity and/or when genetic variants are associated with ALL in a population-specific manner.

The contribution of genetic variations in “candidate” pathways (eg, carcinogen metabolism, folate metabolism, DNA repair) to ALL susceptibility has been examined extensively over the past 2 decades, with inconsistent results. In a recent meta-analysis, 47 studies of 25 polymorphisms in 16 genes were summarized, and a statistically significant (P < .05), albeit modest, association with ALL susceptibility was observed for only 8 variants (eg, glutathione S-transferase μ1 [GSTM1] deletion; solute carrier family 19 member 1 [SLC19A1] glycine-to-alanine substitution at codon 80 [G80A]), with an estimated false-positive probability of 20%.[30] In a similar pooled analysis of methylenetetrahydrofolate reductase (MTHFR) polymorphisms in 12 studies, a significant association was observed for the cysteine-to-threonine substitution at codon 677 (C677T) variant but not for the alanine-to-cysteine substitution at codon 1298 (A1298C) polymorphism.[31] Germline SNPs in the interleukin 12A (IL12A) gene and the major histocompatibility complex, class II, DP β1 (HLA-DP) gene also were linked to ALL risk in Hispanics,[32, 33] suggesting that immune modulation plays a role in ALL etiology. However, a comprehensive analysis of the major histocompatibility complex region in 824 patients with B-ALL and 4737 controls of European genetic ancestry did not reveal a statistically significant association between HLA variants and ALL susceptibility.[34]

Advances in high-throughput genotyping now allow genome-wide association studies (GWAS) to interrogate a large number of genetic variations across the entire human genome for associations with a variety of phenotypic traits. GWAS do not rely on prior knowledge of the disease biology but, instead, systematically examine genetic variants in an agnostic fashion. To date, GWAS of childhood ALL susceptibility have discovered 5 genomic loci at the genome-wide significance level (P < 5 × 10−8)[35-38]: AT-rich interactive domain 5B (ARID5B) (10q21.2); IKAROS family zinc finger 1 (IKZF1) (7p12.2); CCAAT/enhancer binding protein ε (CEBPE) (14q11.2); cyclin-dependent kinase inhibitor 2A (CDKN2A) (9p21.3); and BMI1 polycomb ring finger oncogene–phosphatidylinositol-5-phosphate 4-kinae, type II, α (BMI1-PIP4K2A) (10p12.31-12.2). Although these germline variants had never been linked to ALL before GWAS, there is compelling evidence implicating all 5 genes in the pathogenesis of ALL. For example, germline variants in ARID5B have the strongest association with ALL susceptibility across the genome, and the loss of Arid5b in mouse leads to significant defects in lymphoid cell development.[39] IKZF1, an important transcription factor in all lymphoid lineages, is frequently targeted by copy number alterations in ALL blast cells (particularly in high-risk ALL), and IKZF1 deletion is associated with a poor prognosis.[40] Loss of CDKN2A/CDKN2B occurs in up to 40% of B-precursor ALL and is likely to contribute to cell cycle deregulation in leukemia.[41] CEBPE is related specifically to myeloid cell maturation and terminal differentiation,[42, 43] but intrachromosomal translocations involving the immunoglobulin heavy locus (IGH) and CEBPE genes also have been described in childhood ALL.[44] The remarkable convergence of germline ALL susceptibility loci and somatic aberrations on genes involved in lymphoid cell development, cell cycle control, and tumor suppression reinforces the contribution of these key pathways to leukemogenesis and also points to the possibility that inherited and acquired genetic variations act synergistically in the development of childhood ALL. It is noteworthy that, unlike the candidate gene studies, these loci that exhibit a significant genome-wide association with the risk of ALL are repeatedly validated by subsequent reports,[45-52] unequivocally establishing the importance of inherited genetic variations in ALL susceptibility.

The question naturally arises regarding whether the racial and ethnic pattern of ALL incidence could be explained by population differences in the frequency of the ALL-predisposing genetic variations. In AAs, allele frequency at ARID5B SNP rs10821936 differed significantly between ALL cases and non-ALL controls, confirming the association of the ARID5B variant with ALL susceptibility.[46] The risk variant also was substantially less common in individuals of African descent than in those with European ancestry, explaining an increase of 5.2 per million person-years in ALL incidence (eg, 30% of the observed racial difference).[46] Similarly, ARID5B SNP genotype was associated with ALL risk in HAs, and the frequency of the risk allele was highest in HAs, consistent with the higher ALL incidence in this population.[50] Additional genotyping at the ARID5B locus identified 5 and 3 susceptibility variants that were specific to EAs and HAs, respectively.[50] At the PIP4K2A locus, rs7088318 was associated with ALL risk across race and ethnicity, and the population differences in risk allele frequency paralleled racial and ethnic differences in ALL incidence.[37] In contrast, CEBPE SNPs were more strongly related to ALL risk in EAs, with variable effects in populations of non-European descent.[37] IKZF1 SNPs were associated with ALL susceptibility across racial and ethnic groups, with comparable risk allele frequency and, thus, little relation to racial and ethnic differences in ALL incidence.[37] Together, these findings consistently point to the ARID5B and PIP4K2A loci as important determinants of racial and ethnic differences in ALL susceptibility.

Nongenetic factors influencing racial and ethnic disparities in ALL incidence

In contrast to the remarkable advances in our understanding of the basic biology and treatment of childhood ALL, there has been little, if any, new information regarding an environmental etiology.[53] Despite large epidemiologic studies of childhood leukemia and a substantial volume of reports of largely unconfirmed associations, we are still left with a limited list of potential environmental leukemogenic exposures, eg, ionizing radiation, alkylating chemotherapeutic agents, and topoisomerase-II inhibitors.[53] Based on the distinct age-specific incidence pattern and international variations observed among populations of different social and economic development, a hypothesis was proposed for a role of infection in the etiology of childhood ALL.[54] Subsequently, considerable interest and research has focused on infection with a proposed mechanism of proliferative stress resulting from delayed exposure to infectious agents during infancy (Greave hypothesis)[55] or population mixing (Kinlen hypothesis).[56]

It is unlikely that any major environmental factors are driving racial and ethnic differences in the occurrence of ALL given the paucity of reproducible associations within the context of the substantial amount of epidemiologic research.[53, 57, 58] Conversely, the possibility also exists that our ability to accurately document or quantify exposure remains inadequate. Given the genotypic and phenotypic heterogeneity of childhood ALL, focusing on distinct subsets of the disease may provide insights into environmental influences in ALL risk.[59-61]

Racial and Ethnic Differences in ALL Treatment Outcome

Racial and ethnic differences in survival after childhood are well documented, with inferior treatment outcome in AAs and HAs compared with EAs. In the Pediatric Oncology Group ALL trials between 1981 and 1994 (N = 5086), AAs and HAs had greater excess mortality than EAs.[10] A retrospective study of 8447 children with newly diagnosed ALL who were treated on CCG protocols during the same period demonstrated similar racial and ethnic differences in 5-year event-free survival (Asians, 75.1% ± 3.5%; EAs, 72.8% ± 0.6%; HAs, 65.9% ± 1.5%; and AAs, 61.5% ± 2.2%).[6] These differences resulted mainly from differential risks of relapse, because nearly all children achieved clinical remission regardless of race/ethnicity. Racial and ethnic disparities in outcome remained significant even after adjusting for known clinical and molecular risk factors (eg, age, leukocyte count at diagnosis, sex, ALL lineage, molecular subtypes) and persisted across the treatment eras (1983-1989 and 1989-1995). However, in a single-institution study of children with ALL treated at St. Jude Children's Research Hospital, there was a significant gap in survival between AAs and EAs during the early treatment era (1962-1983), which became nonsignificant in more recent treatment protocols (1984-1992 and 1991-1998).[25, 62] Those authors concluded that, with equal access to contemporary, protocol-based ALL therapy, AA children fare as well as EA children. This is also reflected in a recent meta-analysis of 21,626 children treated on the Children's Oncology Group (COG) frontline ALL trials between 1990 and 2005, in which the absolute 5-year survival difference between AAs and EAs fell from 11% during 1990 to 1994 to 8.1% during 1995 to 1999; and, eventually, to 3.3% during 2000 to 2005.8 In contrast, even with risk-adapted treatment regimens, HA children with ALL continued to fare worse than patients of European descent across treatment eras. Another population-based study using the National Cancer Institute SEER database (from 1988 to 2008) reported a continuing trend of worse treatment outcomes in AA, HA, and NA children but also noted poorer survival in Asian children with ALL (particularly Vietnamese and Filipinos).[63]

Genetic basis of racial and ethnic differences in ALL treatment outcome

Relapse is the primary cause of death in children with ALL, and racial and ethnic disparity in relapse can arise from differences in host disposition of and/or tumor response to antileukemic agents, which may be influenced by both inherited (germline) and acquired (somatic) genetic factors. Genome-wide studies comparing germline SNPs in global populations have revealed dramatic differences in the genetic make-up of racial and ethnic groups,[19] raising the possibility that these ancestry-related genetic variations contribute to racial and ethnic disparities in ALL treatment response. To test this hypothesis, we performed a near population-based GWAS that examined 444,044 genetic polymorphisms for their associations with ALL relapse in 2534 children with newly diagnosed disease from COG and St. Jude front-line protocols (Fig. 2A).[27] They first quantitatively assessed European, African, East Asian, and NA genetic ancestry in children with ALL by comparing genome-wide SNP genotype against established reference individuals who had clearly defined ancestral origin (eg, indigenous Maya, Nahua, Aymara, and Quechua populations as NA references[20]). Among 4 ancestries that were evaluated, only NA genetic ancestry was associated significantly with the cumulative incidence of ALL relapse (Fig. 2B), independent of known prognostic factors (leukocyte count and age at diagnosis, ALL lineage, and molecular subtypes). It is important to note that NA ancestry retained a significant association with relapse even within self-declared EAs (for whom genetic ancestry was cryptic) (Fig. 2C), strongly arguing for a genetic basis for the elevated risk of leukemia relapse in HAs. Particularly of clinical relevance, NA genetic ancestry was predictive of relapse after adjusting for minimal residual disease and even within the group of patients with no detectable residual leukemia at the end of induction therapy. A possible therapeutic intervention also was noted by the analysis of the COG P9904/9905 protocols, in which children were randomized to receive or not receive the delayed intensification therapy (eg, an 8-week multiagent treatment) (Fig. 2D,E). This additional phase of chemotherapy almost completely mitigated the poor prognosis attributed with NA ancestry. Although the exact benefit of delayed intensification treatment in the context of ancestry needs to be examined in future ALL trials with diverse treatment regimens, these results illustrate the importance and feasibility of individualizing therapy on the basis of race and ethnicity-related genetic variation. Together, the findings from this study convey 2 very important messages: 1) there is a biologic basis for racial and ethnic disparities in ALL treatment outcome, and 2) race and ethnicity-related relapse risk can be overcome by more individualized chemotherapy.

Figure 2.

Genetic ancestry and the risk of relapse in childhood acute lymphoblastic leukemia (ALL) are illustrated. (A) This is a genetic ancestral composition of 2534 children with ALL. Each patient's ancestry is indicated as a column, and the color represents the proportion of ancestry estimated for that patient (European, red; African, gray; Asian, green; Native American, blue). Genetic ancestry was estimated using the program STRUCTURE (Pritchard Laboratory, Stanford University, Stanford, Calif). Patients were clustered using the Ward clustering method based on dissimilarity in genetic ancestry measured by 1-minus pair-wise correlation. (B-E) Higher levels of Native American (NA) ancestry were linked to an increased risk of relapse (B) in all patients, (C) within the self-reported European Americans, and (D) for those who did not receive delayed intensification, but (E) not within those who did receive delayed intensification in the Children's Oncology Group P9904/9905 trial. Although the cumulative incidence of relapse is plotted separately for patients with <10% (red) versus ≥10% (blue) Native American ancestry, all P values were estimated using a Fine and Gray cumulative incidence hazard regression model in which Native American ancestry was treated as a continuous variable. (Reproduced with permission from Yang JJ, Cheng C, Devidas M, et al. Ancestry and pharmacogenomics of relapse in acute lymphoblastic leukemia. Nat Genet. 2011;43:237-241.27)

A spectrum of somatic chromosomal abnormalities has been described for ALL, as noted above, and some of these abnormalities are highly prognostic and are used to assign patients to treatment intensification.[3] There is little evidence that ETV6-RUNX1, TCF3-PBX1, BCR-ABL1, or MLL rearranged ALL is disproportionally distributed between HA and EA children with ALL; and multivariate analyses indicate that racial and ethnic differences in relapse remain significant after adjusting for leukocyte count and age, ALL lineage and molecular subtypes, and/or minimal residual disease.[6, 27] More recently, global gene expression profiling of leukemia blasts identified up to 15% of childhood ALL with a transcription signature similar to that of Philadelphia chromosome (Ph)-positive ALL; thus, this novel subtype is termed Ph-like ALL.[40, 64-66] Up to 50% of Ph-like ALLs exhibit overexpression of CRLF2 as a result of balanced translocation or gene fusion and are associated with a very poor prognosis.[66] It is noteworthy that Ph-like ALL with CRLF2 lesions is significantly over-represented in self-reported Hispanic children (35.3%) compared with non-Hispanic children (7.1%),[66] plausibly contributing to the gap in treatment outcome between the 2 groups. In fact, our group recently identified germline variants in the GATA binding protein 3 (GATA3) gene that predispose children to Ph-like ALL and were associated with ALL relapse.[67] These 2 variants are strikingly over-represented in patients with high levels of NA genetic ancestry, consistent with the inferior treatment outcome of Hispanic children with ALL.

Nongenetic factors that influence racial and ethnic disparities in ALL treatment outcomes

Whereas the discovery of genetic ancestry as a prognostic factor in ALL indicates a biologic basis of racial and ethnic disparities, it by no means precludes the contribution of nongenetic factors, eg, the timing of diagnosis, access to quality health care, and adherence to treatment.[68] In fact, the inclusion of socioeconomic status in the multivariate model substantially reduced the statistical significance of the difference in relapse between HA and EA children with ALL.[6]

Poor adherence to medication can negatively influence cancer treatment outcome and is of particular importance in childhood ALL therapy, which requires prolonged, daily, oral administration of antimetabolites (6 mercaptopurine [6MP] for up to 2 years). In a recent study, it was observed that the mean oral 6MP adherence rate was significantly lower among relapse patients compared with those in continuous complete remission (88.2% and 96.2%, respectively),[69] and a progressive increase in the risk of relapse was observed with decreasing levels of adherence. Adherence was significantly lower among self-reported HAs for whom there was also a higher rate of relapse compared with EAs. In a multivariate model, the prognostic value of Hispanic ethnicity became nonsignificant after adjusting for 6MP adherence and socioeconomic status. However, when the analysis was restricted to patients with high medication compliance (≥90% adherence rate), relapse still was more common among HAs than among EAs, highlighting the contribution of genetic and biologic factors to the racial and ethnic gap in ALL treatment outcome. A comprehensive evaluation of both genetic and nongenetic factors is warranted in future studies to accurately ascertain the causes underlying the racial and ethnic disparities in ALL survival.


It is hard to over-emphasize the need to address racial and ethnic gaps in the incidence and outcome of childhood ALL, the most common pediatric cancer. Differences in ALL biology by race and ethnicity are poorly characterized, contributing to the persisting disparity despite overall improvement in cure rates. Recent genomic profiling of ALL described the genetic landscape of this disease with unprecedented resolution, discovering novel molecular markers associated with disease biology and prognosis. In particular, inherited genetic variants associated with NA ancestry were linked to the susceptibility and treatment outcome of childhood ALL, partly explaining higher ALL incidence and poorer survival among HAs. With these advances in cancer genomics and novel approaches to investigating nongenetic variables, the timing is opportune for comprehensive research of ALL disparity to finally close the racial and ethnic gaps in this catastrophic disease in children.


This work was supported by the National Institutes of Health grants U01GM92666 and RC4CA156449, and by the American Lebanese Syrian Associated Charities (ALSAC). J. J. Yang is an American Society of Hematology Scholar.


The authors made no disclosures.