Fax: +44 191 202 3060
Spatial clustering of childhood cancer in Great Britain during the period 1969–1993
Article first published online: 2 SEP 2008
Copyright © 2008 Wiley-Liss, Inc.
International Journal of Cancer
Volume 124, Issue 4, pages 932–936, 15 February 2009
How to Cite
McNally, R. J.Q., Alexander, F. E., Vincent, T. J. and Murphy, M. F.G. (2009), Spatial clustering of childhood cancer in Great Britain during the period 1969–1993. Int. J. Cancer, 124: 932–936. doi: 10.1002/ijc.23965
- Issue published online: 11 DEC 2008
- Article first published online: 2 SEP 2008
- Accepted manuscript online: 2 SEP 2008 12:00AM EST
- Manuscript Accepted: 30 JUL 2008
- Manuscript Received: 29 FEB 2008
- North of England Children's Cancer Research (NECCR)
- Department of Health and the Scottish Ministers
- spatial clustering
The aetiology of childhood cancer is poorly understood. Both genetic and environmental factors are likely to be involved. The presence of spatial clustering is indicative of a very localized environmental component to aetiology. Spatial clustering is present when there are a small number of areas with greatly increased incidence or a large number of areas with moderately increased incidence. To determine whether localized environmental factors may play a part in childhood cancer aetiology, we analyzed for spatial clustering using a large set of national population-based data from Great Britain diagnosed 1969–1993. The Potthoff-Whittinghill method was used to test for extra-Poisson variation (EPV). Thirty-two thousand three hundred and twenty-three cases were allocated to 10,444 wards using diagnosis addresses. Analyses showed statistically significant evidence of clustering for acute lymphoblastic leukaemia (ALL) over the whole age range (estimate of EPV = 0.05, p = 0.002) and for ages 1–4 years (estimate of EPV = 0.03, p = 0.015). Soft-tissue sarcoma (estimate of EPV = 0.03, p = 0.04) and Wilms tumours (estimate of EPV = 0.04, p = 0.007) also showed significant clustering. Clustering tended to persist across different time periods for cases of ALL (estimate of between-time period EPV = 0.04, p =0.003). In conclusion, we observed low level spatial clustering that is attributable to a limited number of cases. This suggests that environmental factors, which in some locations display localized clustering, may be important aetiological agents in these diseases. For ALL and soft tissue sarcoma, but not Wilms tumour, common infectious agents may be likely candidates. © 2008 Wiley-Liss, Inc.
Both genetic and environmental components are thought to be involved in the aetiology of childhood cancer. A number of environmental risk factors have been implicated. These include infections, ionizing radiation, electromagnetic fields, chemical exposures, parental smoking, parental alcohol consumption, contaminated drinking water and hair dyes.1 Some of these exposures (mainly infections, ionizing radiation, electromagnetic fields, chemical exposures and contaminated drinking water) may exhibit geographical variation and if they have a role in aetiology then the distribution of cases may be predicted to exhibit spatial heterogeneity.
The present study is concerned with the detection of irregular spatial distributions of specific cancer types. A general irregular spatial distribution of cases that is not confined to 1 particular small area is known as ‘spatial clustering’. This sort of clustering could arise because there are a small number of areas with greatly increased incidence or a large number of areas with moderately increased incidence, persistent over time.
We have recently demonstrated space–time clustering amongst cases of leukaemia (especially ALL), soft tissue sarcomas and osteosarcomas using the same data set that is considered in the present study.2, 3 It is important to note that space–time clustering relates to a different pattern of occurrence to spatial clustering. Space–time clustering is observed when there are excess numbers of cases within small geographical locations over very limited periods of time.
A number of previous studies from parts of Europe, Australia and Hong Kong have found statistically significant spatial clustering amongst cases of childhood leukaemia.4–10 However, 3 studies from Sweden, north-west England and France showed little or no evidence of overall spatial clustering for childhood leukaemia.11–13 Another report from New Zealand also found little evidence of overall spatial clustering for childhood leukaemias or lymphomas.14 There have been only a few studies on other types of childhood cancer. In north-west England, spatial clustering was found amongst cases of Wilms tumour, but not for soft tissue sarcomas or CNS tumours.15, 16 It should be noted that there was some limited overlap between the data set from north-west England and the one considered in the present analysis. Furthermore, caution must be exercised when interpreting negative findings. It is entirely possible that negative findings may result from lack of statistical power, due to small numbers of cases.
A variety of statistical methods have been developed to test for spatial clustering. These include methods due to Potthoff and Whittinghill,17 Muirhead and Ball,18 Cuzick and Edwards19 and others that are reviewed in Alexander and Boyle.20 An empirical study of clustering tests has shown that the method due to Potthoff and Whittinghill was the most powerful against a number of alternative hypotheses.20
The aim of the present study was to test predictions of spatial clustering which might arise as a result of persistent and localized environmental exposures.
Material and methods
The study analyzed all children diagnosed with cancer in Great Britain between January 1, 1969 and December 31, 1993 and registered by the NRCT. The NRCT is a population-based registry covering the whole of England, Wales and Scotland.21 It includes records for nearly all children, aged 0–14 years, diagnosed with cancer from 1962 to the present day. Since 1993 childhood cancer registrations from Northern Ireland have been included in the national coverage of the NRCT. The data set analyzed here was restricted to the period 1969–1993 because reliable small area population data were not available prior to 1969 and at the time of preparation (1999) the data set was only considered complete up to the end of 1993.
Cases were classified into diagnostic groups according to the ICD-O.22 The following diagnostic groups were specified a priori for analysis: (i) leukaemia; (ii) acute lymphoblastic leukaemia (ALL); (iii) ANLL; (iv) lymphomas; (v) HL; (vi) NHL; (vii) CNS tumours; (viii) soft tissue sarcomas; (ix) bone tumours; (x) sympathetic nervous system tumours; (xi) Wilms and other renal tumours; (xii) germ cell and related tumours; (xiii) all cancers except leukaemia and lymphoma; and (xiv) all cancers.
For each of 10,444 small areas, person-years by sex and age group (<1, 1–4, 5–9 and 10–14 years) were estimated using population values for each of the 1971, 1981 and 1991 censuses.3 Expected numbers of cases for each small area were computed by applying overall and sex- and age-specific rates for the relevant time periods (1969–1975, 1976–1985, 1986–1993) to the person-years at risk in each small area.
The following hypotheses were tested.
- aSpatial clustering will be observed in specific diagnostic groups, notably leukaemia, ALL (especially the ‘childhood peak’ age range), HL in younger children (aged 0–9 years) and Wilms tumours.
- bOther childhood cancers will not show spatial clustering.
- cSpatial clustering of ALL will persist across different time periods.
- dThe areas involved for clustering of cases of HL diagnosed in younger children (aged 0–9 years) will change over time.
- eSpatial clustering of ALL will involve cases from different age-groups.
- fCases of HL diagnosed at different ages (0–9 years and 10–14 years) will not cluster together.
There has been a large amount of research in recent years into statistical methods for identifying localized clustering of disease. Alexander and Boyle have compared different methods, using simulated data for census wards.20 The method used here is based on the Potthoff-Whittinghill test.17, 18 This method was used previously in the EUROCLUS project, which analyzed childhood leukaemia data from various European countries.8, 9 Statistical significance was taken as p < 0.05 in all analyses.
The Potthoff-Whittinghill method
In the absence of clustering, the observed number of cancer cases in a geographical area should follow a Poisson distribution with mean equal to the expected number of cases in that area. If so, then the variance of the observed number of cases would equal the expected number of cases. The Potthoff-Whittinghill test looks to see whether the ratio of the variance to the expected is greater than 1. If it were, then the data would be overdispersed relative to the Poisson distribution and relatively large numbers of cases would arise in some areas more often than predicted under the Poisson distribution.
Estimate of extra-Poisson variation
The magnitude of any overdispersion can be described by an estimate of the additional variation in the number of cases in each area, compared with that predicted under the Poisson distribution. This EPV, denoted β, would equal zero if the data were distributed as Poisson and would be greater than zero if the data were overdispersed. For example, an EPV (β) of 0.1 would represent 10% greater variation in the observed number of cases in each area than predicted if the data followed a Poisson distribution. The null distribution of β was simulated by randomly allocating cases to each small area (with probability proportional to the childhood population of the ward). p values were calculated by comparing the observed estimate of EPV () with the simulated null distribution. In the interests of clarity, p values are presented only for statistically significant (p < 0.05) results. A 90% confidence interval for , calculated using a normal approximation to its distribution, is used to describe the uncertainty in this estimate.20
To test prior hypotheses (c), (d), (e) and (f), additional analyses split the EPV (β) between and within subgroups (EBV between subgroups was denoted as βB and EBV within subgroups as βW). These subgroups were time of diagnosis (5 periods of 5 years) and age at diagnosis. The methodology used was described in Alexander and Boyle20 and derived from Muirhead and Ball.18
The study included 32,323 cases of childhood cancer aged 0–14 years. Table I gives the total number of cases for each analysis group.
|Diagnostic group and age (years)||Number of cases|
|Leukaemia, ages 0–14||10,737|
|Leukaemia, ages 1–4||5,094|
|Leukaemia, ages 5–14||5,092|
|ALL, ages 0–14||8,687|
|ALL, ages 1–4||4,441|
|ALL, ages 5–14||3,906|
|ANLL, ages 0–14||1,737|
|Lymphomas, ages 0–14||3,308|
|HL, ages 0–14||1,364|
|HL, ages 0–9||487|
|NHL, ages 0–14||1,678|
|NHL, ages 0–9||1,027|
|CNS tumours, ages 0–14||7,473|
|Soft tissue sarcomas, ages 0–14||2,101|
|Bone tumours, ages 0–14||1,507|
|Sympathetic nervous system tumours, ages 0–14||2,111|
|Wilms and other renal tumours ages 0–14||1,890|
|Germ cell and related tumours ages 0–14||983|
|All cancers except leukaemia and lymphoma, ages 0–14||18,278|
|All cancers, ages 0–14||32,323|
Table II presents detailed results from the analyses for the entire time period 1969–1993 (testing hypotheses (a) and (b)). There was statistically significant spatial clustering for cases of leukaemia, aged 0–14 as a whole ( = 0.045; 90% confidence interval [CI]: 0.02, 0.07; p = 0.004). This was attributable to statistically significant spatial clustering for cases of ALL ( = 0.05; 90% CI: 0.025, 0.07; p = 0.002) but not for cases of ANLL ( = 0; 90% CI: −0.02, 0.03; NS). Furthermore, the spatial clustering was only found for cases of ALL aged 1–4 years ( = 0.03; 90% CI: 0.008, 0.05; p = 0.015) and not for cases of ALL aged 5–14 years ( = −0.002; 90% CI: −0.025, 0.02; NS). Table II also shows statistically significant spatial clustering for soft tissue sarcomas ( = 0.03; 90% CI: 0.003, 0.05; p = 0.04), Wilms and other renal tumours ( = 0.04; 90% CI: 0.01, 0.06; p = 0.007), all cancers except leukaemia and lymphoma ( = 0.04; 90% CI: 0.02, 0.06; p = 0.004) and all cancers ( = 0.075; 90% CI: 0.05, 0.1; p= 0.0005).
|Diagnostic group and age (years)||Estimate of EPV, (90% confidence Interval [CI] for )||p value|
|Leukaemia, ages 0–14||= 0.045 (90% CI: 0.02, 0.07)*||p = 0.004*|
|Leukaemia, ages 1–4||= 0.03 (90% CI: 0.008, 0.05)*||p = 0.02*|
|Leukaemia, ages 5–14||= −0.008 (90% CI: −0.03, 0.015)||NS|
|ALL, ages 0–14||= 0.05 (90% CI: 0.025, 0.07)*||p = 0.002*|
|ALL, ages 1–4||= 0.03 (90% CI: 0.008, 0.05)*||p = 0.015*|
|ALL, ages 5–14||= −0.002 (90% CI: −0.025, 0.02)||NS|
|ANLL, ages 0–14||= 0 (90% CI: −0.02, 0.03)||NS|
|Lymphomas, ages 0–14||= 0.007 (90% CI: −0.02, 0.03)||NS|
|HL, ages 0–14||= 0.007 (90% CI: −0.02, 0.03)||NS|
|HL, ages 0–9||= −0.015 (90% CI: −0.04, 0.008)||NS|
|NHL, ages 0–14||= 0.01 (90% CI: −0.009, 0.04)||NS|
|NHL, ages 0–9||= 0.02 (90% CI: −0.003, 0.04)||NS|
|CNS tumours, ages 0–14||= 0.001 (90% CI: −0.02, 0.02)||NS|
|Soft tissue sarcomas, ages 0–14||= 0.03 (90% CI: 0.003, 0.05)*||p = 0.04*|
|Bone tumours, ages 0–14||= 0.008 (90% CI: −0.015, 0.03)||NS|
|Sympathetic nervous system tumours, ages 0–14||= −0.02 (90% CI: −0.04, 0.003)||NS|
|Wilms and other renal tumours ages, 0–14||= 0.04 (90% CI: 0.01, 0.06)*||p = 0.007*|
|Germ cell and related tumours ages, 0–14||= 0.004 (90% CI: −0.02, 0.03)||NS|
|All cancers except leukaemia and lymphoma, ages 0–14||= 0.04 (90% CI: 0.02, 0.06)*||p = 0.004*|
|All cancers, ages 0–14||= 0.075 (90% CI: 0.05, 0.1)*||p = 0.0005*|
Between- and within-time period components of EPV were analyzed for cases of ALL aged 0–14 and cases of HL aged 0–9 (testing hypotheses (c) and (d)). For ALL the results showed that for 5-year time periods, the between-time-period component of EPV was much more important than the within-time-period contribution (B= 0.04, p = 0.003; W = 0.01, p = 0.047). In contrast for HL, although not statistically significant, the results suggested that EPV within 5-year time periods (W = 0.006) was more important than EPV between 5-year time periods (B = −0.02). There was statistically significant EPV for cases of HL aged 0–9 for the period 1989–1993 only ( = 0.025, p = 0.049).
To test hypotheses (e) and (f), between- and within-age group components of EPV were analyzed for cases of ALL (split into age-groups 0–4 and 5–14) and HL (split into age-groups 0–9 and 10–14). For ALL, the between-age group component of EPV was dominant (B = 0.03, p = 0.0005; W = 0.02, p = 0.047). In contrast, both the between- and within-age group components were small, and not statistically significant, for HL (B = 0.003; W = 0.004).
The analyses presented here have been carried out using an optimal statistical method on high quality population-based incidence data. It is the largest analysis of spatial clustering of childhood cancer ever published, based on 32,323 cases. It should be noted that the present study analyses the same data set that was considered in the eleventh report of the Committee on Medical Aspects of Radiation in the Environment (COMARE).3 However, further analyses are also presented in the current paper (which include examination of additional diagnostic groups and hypotheses). Spatial clustering has been particularly identified for cases of total leukaemia, ALL, soft tissue sarcomas, Wilms and other renal tumours, the group comprising all cancers except leukaemia and lymphoma, and the group comprising all cancers. Thus prior hypotheses (a) and (b) were only partly supported as HL did not cluster over the entire study period and there was clustering of soft tissue sarcoma. The results for total leukaemia, ALL and Wilms tumours were in agreement with previous studies from the UK and elsewhere.1, 4–10, 15, 23
For ALL there was a suggestion that whatever causes the cases to cluster persisted in individual areas over lengthy periods of time because of the greater contribution of between-time-period EPV (consistent with prior hypothesis (c)). In contrast, for HL clustering was particularly present in certain time periods but absent in other time periods and so any relevant aetiological factor appears transient because of the greater contribution of within-time-period EPV (consistent with prior hypothesis (d)). For ALL cross-clustering between younger and older cases suggested a common aetiological factor (consistent with prior hypothesis (e)). However, the lack of cross-clustering between younger and older cases for HL would point to there being different aetiologies for the 2 age groups (consistent with prior hypothesis (f)).
In all these analyses consideration has to be given to the possibility that apparent aggregations of cases are attributable to the fact that twins or siblings living at the same address are both affected by the same malignant disease. It is known that monozygotic co-twins of children with leukaemia, particularly at young ages, have a greatly increased risk of also being diagnosed with leukaemia.27 Other siblings of children with malignant disease also tend to have a slightly higher risk of being affected.28 Therefore, results have been confirmed by repeating the analyses with 1 of each pair of twins or siblings excluded (in any analysis for which both were eligible and resided in the same ward at diagnosis). This had very small effects on the results.
The analyses were repeated by calculating expected numbers of cases for each ward adjusted using the area-based Townsend deprivation score.29 However, there was little difference between the results of the adjusted and unadjusted analyses. We interpret the results of the spatial clustering analyses in conjunction with the results of the space–time clustering analyses of the same data set.2, 3 There are 4 possibilities: (i) both spatial and space–time clustering absent; (ii) spatial clustering present but space–time clustering absent; (iii) spatial clustering absent but space–time clustering present; and (iv) both spatial and space–time clustering present.
Possibility (ii) would occur if there are some small areas with prolonged high risk and others with prolonged low risk but few, if any, where this status varies over very short time intervals. This suggests that there is some environmental exposure that affects the high-risk areas and that is absent from the low-risk areas for large proportions of the time period. There are a number of plausible interpretations. For example a temporary exposure in the high-risk areas that was followed by a highly variable latent period would show this pattern. Such an exposure could be an infectious agent or exposure to, for example, ionizing radiation or benzene.
Possibility (iii) would occur if many areas have temporary exposures leading to high risk of disease but exposed areas change their status frequently. This would be consistent with a series of mini-epidemics of infections followed by relatively constant latent periods. It may also be consistent with other changing environmental factors.
Possibility (iv) would occur if there is significant heterogeneity of risk between small areas but the simple picture of some being high-risk for the entire period and others low-risk does not apply. Thus some areas are high-risk for a substantial portion of the time period which may represent a sequence of discrete times within it. Some areas show similar patterns of low-risk. This pattern would be consistent with the existence of many small areas where causative exposure occurs commonly but is not permanent and the latent period does not vary sufficiently to dilute the effect of the temporal variability in exposure.
Leukaemia shows significant spatial and space–time clustering. Results for total leukaemias are driven by those for ALL. This applies for the entire age range and the younger children (aged 1–4 years) but not those aged 5–14 years analyzed separately. The pattern is that described in (iv) in preceding text. The results are best interpreted as providing further evidence supporting the involvement of 1 or more common infections in aetiology. The observed pattern of persistent occurrence is consistent with a number of previously isolated small areas having subsequent high rates of sustained population mixing.30, 31
HL cases showed distinct differences between younger (0–9 years) and older (10–14 years) ages. There was only limited evidence of spatial clustering in the children aged 0–9 years. The areas involved in the clustering for this age-group changed over time. In these younger children we have the pattern described in (ii) in preceding text. This is most plausibly interpreted as spatial clustering identifying demographic situations where exposure to the ubiquitous EBV is likely to occur early.24 The absence of space–time and spatial clustering for the older children provides good evidence that the aetiology of HL in older children does not involve exposure to an infectious agent that predominates in some areas or that shows an epidemic pattern. This is consistent with the nature of EBV and with the idea that HL in older children may be regarded as the early tail of the young-adult incidence peak.32
Soft tissue sarcomas show statistically significant spatial and space–time clustering. This is the pattern described in (iv) in preceding text. Such a pattern of occurrence may be explained by an infectious agent or agents. Higher rates may be limited to small areas that have a special set of sociodemographic circumstances, leading to higher rates of new infections by an aetiological agent or, alternatively, greater susceptibility to its effect.
Wilms tumours show statistically significant spatial clustering but not space–time clustering, i.e. pattern (ii). This is consistent with a number of localized high risk areas. There is little or no evidence linking Wilms tumour to infectious exposures. Inherited conditions, such as Denys-Drash syndrome and Beckwith-Wiedemann syndrome, have been associated with a much higher risk of developing Wilms tumour. However, a number of environmental pollutants have also been postulated to be involved in aetiology. These include hydrocarbons, lead, boron and pesticides.1 The present findings lend support to the involvement of such a localized environmental pollutant.
The groups that comprise all cancers and all cancers except leukaemia and lymphoma showed spatial but not space–time clustering. Due to the heterogeneous make up of these groups it is more difficult to draw firm conclusions. The role of chance in these latter findings certainly cannot be excluded.
Some limitations of the methodology need to be stressed. The method relies on accurate case and underlying small area population data. Whilst there are no known biases in the case data, any underestimation or overestimation of the denominator populations may bias the results. Boundary effects may also play a role. The analysis determines whether spatial clustering is apparent at ward level. Clustering at a much higher or lower level of aggregation may not be detected by this method. Furthermore, spatial autocorrelation is not taken into account. However, in spite of these caveats, spatial clustering was detected in very specific diagnostic groups and the results mostly supported the prior hypotheses that were derived from previous studies.
In summary, we have used a well tested statistical method and high-quality data. The results are elucidated together with the findings of space–time clustering analyses. We found evidence of spatial clustering for cases of leukaemia as a whole, ALL, soft tissue sarcoma, Wilms and other renal tumours, all cancers except leukaemia and lymphoma and all cancers. Clustering tended to persist across different time periods for cases of ALL and mainly involved younger cases. We have observed a low level of spatial clustering that is attributable to a limited number of cases, since the estimates of EPV were all small ( < 0.08). We interpret this as evidence that environmental factors, which in some locations display localized clustering, may be important agents in the aetiology of these diseases. For ALL and soft tissue sarcoma 1 or more common infectious agents are likely candidates. For Wilms tumours other environmental pollutants are more plausible.
We thank the anonymous referees for their most helpful and constructive comments on an earlier version of this article. We are also grateful to the North of England Children's Cancer Research (NECCR) fund for providing financial support for childhood cancer epidemiology research at Newcastle University. The Childhood Cancer Research Group (University of Oxford) receives Programme Grant support for its core functions from the Department of Health and the Scottish Ministers.
- 1Epidemiology of childhood cancer. Lyon: IARC, 1999. IARC scientific publications, no. 149..
- 3Committee on Medical Aspects of Radiation in the Environment (COMARE). Eleventh report - the distribution of childhood leukaemia and other childhood cancers in Great Britain, 1969–1993. Chilton, Didcot, Oxfordshire: Health Protection Agency, Radiation Protection Division, 2006.
- 4DraperGJ, ed. The geographical epidemiology of childhood leukaemia and non-Hodgkin lymphomas in Great Britain, 1966–83. London: HMSO, 1991. Studies on Medical and Population Subjects, no. 53.
- 18Contribution to the discussion at the Royal Statistical Society meeting on cancer near nuclear establishments. J R Stat Soc Ser A 1989; 152: 376., .
- 19Spatial clustering for inhomogeneous populations (with discussion). J R Stat Soc Ser B 1990; 52: 73–104., .
- 20AlexanderFE,BoyleP, eds. Methods for investigating localized clustering of disease. Lyon: IARC, 1996. IARC Scientific Publications, no. 135.
- 22FritzA,PercyC,JackA,ShanmugaratnamK,SobinL,ParkinDM,WhelanS, eds. International Classification of Diseases for Oncology (ICD-O),3rd ed. Geneva: World Health Organization, 2000.
- 29TownsendP,PhillimoreP,BeattieA, eds. Health and deprivation: inequality and the North. London: Croom Helm, 1987.