SEARCH

SEARCH BY CITATION

Keywords:

  • tuberculosis;
  • clustering;
  • cohort study;
  • systematic review;
  • epidemiology;
  • health systems evaluation
  • tuberculose;
  • étude de cohorte;
  • revue systématique;
  • épidémiologie;
  • évaluation des systèmes de santé
  • Tuberculosis;
  • estudio de cohortes;
  • revisión sistemática;
  • epidemiología;
  • evaluación de sistemas de salud

Summary

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Appendices

Objectives  The proportion of tuberculosis cases in a population that are clustered (i.e. share identical strains of Mycobacterium tuberculosis) reflects ongoing M. tuberculosis transmission. It varies markedly, but it is unclear how much of this variation reflects measurable differences in study design, setting and the patient population. We aimed to assess the relative impact of these factors and develop a tool to improve interpretation of the proportion clustered from an individual study.

Methods  We systematically reviewed all population-based TB clustering studies that used IS6110 RFLP as their main DNA fingerprinting technique. Meta-regression was used to see how much of the variation in the proportion clustered between studies could be explained by variables describing study design, setting and population. We compared expected clustering, based on study design and setting, with that observed.

Results  Forty-six studies were included. Just four factors related to study design and setting–study duration, sampling fraction, handling of low band strains and tuberculosis incidence–explained 28% of the variation in the proportion clustered. Additionally including average patient age and proportion foreign born explained 60% of the variation in clustering for industrialized countries. Comparison of expected and observed proportions showed that for some studies the expected proportion clustered differed strongly from that observed.

Conclusions  We were able to account for much of the variation in the proportion clustered. The comparison of expected and observed clustering allows for a more valid comparison of studies and provides a tool for identifying outliers that warrant further investigation.

Revue systématique et méta-analyse des études d’épidémiologie moléculaire sur la tuberculose: développement d’un nouvel outil d’aide à l’interprétation

Objectifs:  La proportion de cas de tuberculose (TB) qui sont regroupés dans une population (i.e. par groupes qui partagent des souches identiques de M. tuberculosis), reflète une transmission en cours. Elle varie de façon importante, mais il n’est pas clair dans quelle mesure cette variation reflète des différences mesurables dans la conception des études, l’endroit et la population de patients. Nous avions pour but d’évaluer l’impact relatif de ces facteurs et de développer un outil pour améliorer l’interprétation des proportions groupées dans une étude donnée.

Méthodes:  Nous avons passé en revue systématiquement l’ensemble des études de populations sur le regroupement de la TB, basées sur le RFLP IS6110 comme principale technique d’établissement de l’empreinte génétique de l’ADN. Une méta-régression a été utilisée pour voir à quel point la variation dans des proportions regroupées entre les études pourrait être expliquée par des variables décrivant le concept de l’étude, l’endroit et la population. Nous avons comparé les regroupements attendus à ceux observés, basés sur la conception de l’étude et l’endroit.

Résultats:  46 études ont été incluses. 4 facteurs seuls, liés à la conception de l’étude et l’endroit - durée de l’étude, fraction d’échantillonnage, manipulation de souches à faible nombre de bandes IS et incidence de TB - expliquaient 28% de la variation dans la proportion regroupée. En outre, l’inclusion de l’âge moyen des patients et la proportion de ceux nés à l’étranger, expliquait de 60% de la variation dans le regroupement pour les pays industrialisés. La comparaison des proportions attendues et observées a montré que, pour certaines études, la proportion attendue différait fortement de celle observée.

Conclusions:  Nous avons été capables de définir la plus grande partie de la variation dans la proportion regroupée. La comparaison du regroupement attendu à celui observé permet d’obtenir une comparaison plus valide des études et fournit un outil permettant d’identifier les points atypiques justifiant une investigation supplémentaire.

Revisión sistemática y meta-análisis de estudios de epidemiología molecular de la tuberculosis: desarrollo de una nueva herramienta para ayudar en la interpretación.

Objetivos:  La proporción de casos de tuberculosis en una población que está agrupada (es decir que comparte cepas idénticas de M. tuberculosis) refleja la transmisión en curso de M. tuberculosis. La variación es marcada, pero no está claro hasta que punto dicha variación refleja diferencias medibles en el diseño del estudio, el lugar de estudio y la población de pacientes. Nuestro objetivo era evaluar el impacto relativo de estos factores, y desarrollar una herramienta para mejorar la interpretación de la proporción agrupada en un estudio individual.

Métodos:  Hemos revisado sistemáticamente todos los estudios de grupos de TB basados en la población que utilizaron la RFLP IS6110 como su principal técnica de tipificación del ADN. Se utilizó una meta-regresión para ver cuanta variación en la proporción agrupada entre estudios podía explicarse mediante variables que describiesen el diseño del estudio, el lugar del estudio o la población. Se comparó la agrupación esperada, basada en el diseño y la ubicación del estudio, con aquella observada.

Resultados:  Se incluyeron 46 estudios. Solo 4 factores relacionados con el diseño y la ubicación del estudio – duración del estudio, fracción de muestreo, manejo de cepas con bandas bajas, e incidencia de tuberculosis – explicaban el 28% de la variación en la proporción agrupada. Adicionalmente, incluir la edad media de los pacientes y la proporción de extranjeros explicaba un 60% de la variación en la agrupación en países industrializados. La comparación de proporciones esperadas y observadas mostró que para algunos estudios la proporción esperada agrupada difería mucho de la observada.

Conclusiones:  Hemos podido explicar una buena parte de la variación en la proporción agrupada. La comparación de la agrupación esperada y observada permite realizar estudios de comparación más válidos y provee una herramienta para la identificación de marginales que justificarían investigaciones futuras.


Introduction

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Appendices

The epidemiology of tuberculosis (TB) disease is complex. Once infected with Mycobacterium tuberculosis (Mtb), an individual may or may not develop active tuberculosis (TB) in the months or years to come, and also may or may not be reinfected with Mtb during that time (Sutherland et al. 1982; Vynnycky & Fine 1997). This has complicated investigations of Mtb transmission and subsequent TB disease in populations.

DNA fingerprinting of Mtb strains has provided an important tool to enhance our understanding of TB epidemiology (Cohn & O’Brien 1998). By assuming that active TB cases are epidemiologically related if their Mtb strains have identical DNA fingerprints (i.e. are clustered), researchers have a method of assessing the proportion of cases involved in ongoing Mtb transmission (Glynn et al. 1999a,b). A reduction in the proportion clustered in an area over time is considered to be a sign of improved TB control (Cattamanchi et al. 2006).

The proportion clustered has shown extreme variation between study areas (Hermans et al. 1995; Bishai et al. 1998; Glynn et al. 2005). It is important to understand how much this depends on true variation in the proportion of TB that is due to recent transmission, and how much on other factors. The proportion clustered depends on the design of the study in which it is measured as well as on the epidemiology of TB in the area. Modelling and molecular epidemiological studies have shown that who is included in the study can have a major influence on measured clustering. Clustering is underestimated as the proportion of all TB cases in the region that are included (the sampling fraction) decreases (Glynn et al. 1999a,b). Clustering depends on study duration, increasing with longer duration up to a plateau after about 4 years (Jasmer et al. 1999; van Soolingen et al. 1999; Glynn et al. 2005). A clearly defined study area is important, as high levels of migration in and out of a poorly defined area will artificially decrease the proportion of cases found to be clustered. This problem is reduced when an area approximates to a complete and relatively isolated population such as a country or district (Glynn et al. 1999a,b). Immigration and emigration from an area can have a major influence, and will both lead to a failure to identify cases that are due to recent transmission. In general, any feature which means that the study population is not a complete sample of a closed population will tend to underestimate the true proportion clustered.

Overestimation of the proportion clustered is less likely, although theoretically possible with biased sampling and contact tracing. In areas where there is insufficient variation in strains, or predominance of strains with few bands, identical strains cannot be assumed always to reflect recent transmission. There is also variation between studies in the laboratory techniques used, and in the rigour of the definition of clustered strains (Burman et al. 1997; Glynn et al. 1999a,b).

The proportion of TB due to recent transmission in a population depends on the annual risk of infection (both currently and in the past); the age pattern of TB cases (since older individuals have a higher risk of reactivation disease); and possibly on other factors such as HIV infection, which could have different effects on the risk of disease following recent or past infection (Haas et al. 1999; Borgdorff et al. 2000; Bruchfeld et al. 2002; Murray 2002a,b; Glynn et al. 2005).

We performed a systematic literature review on all population-based studies on TB clustering that used IS6110 based RFLP as the main DNA fingerprinting technique, as this has been widely used as the standard since the early 1990s (van Embden et al. 1993). We assess the extent to which the variation in observed clustering can be explained by study design, study setting (the local epidemiology of TB) and study population. Additionally we develop a tool for interpreting a local proportion clustered in the context of a study’s design and setting.

This paper distinguishes itself from previous reviews by examining the variability of clustering on the population level, rather than collating results from studies with varying design and settings to attempt to identify individual-level risk factors for being part of a cluster (Fok et al. 2008; Nava-Aguilera et al. 2009).

Methods

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Appendices

Inclusion and exclusion criteria

Studies on TB clustering that used IS6110 RFLP as the main DNA fingerprinting technique were eligible for inclusion if they were population-based and reported clustering results (number of strains in RFLP analysis and proportion clustered) for more than 100 individuals. Studies were excluded if the sample population was not representative of the general population as this could bias the proportion clustered (e.g. prison population, drug resistant patients, outbreak studies).

In addition, the study area had to be suitable for making a valid estimate of local TB clustering. For example, inclusion of patients from a single hospital would be acceptable if all TB patients in the area are likely to go to that hospital, but not otherwise. In practice study areas had to minimally consist of a geographically defined urban or rural district. Overall, these criteria were necessary to reduce bias (which cannot be corrected for) in the dataset, allowing a valid examination of and correction for factors that influence the estimated proportion clustered.

There was no language restriction, and papers were translated by a fluent speaker as required.

Literature search

PubMed and Embase databases were searched between 1990 and November 2006. After maximizing the sensitivity of the search by using a private collection of relevant TB clustering papers, the following PubMed search query was used: ‘IS6110 OR RFLP OR fingerprint* OR cluster* OR genotyp* OR Epidemiology, Molecular (MeSH) OR molecular typing OR transmission OR molecular epidemiological OR molecular epidemiology) AND (tuberculosis OR tb) AND [1990 : 3000 (PDAT)]’. A similar search query was used for Embase. The TB publications archive from the Centers for Disease Control and Prevention was accessed but yielded no additional papers (CDC 2006). After excluding duplicates, all titles were scanned twice by RH for possible relevance. The remaining abstracts were read and eligible full length papers were retrieved if possible.

Data extraction

From each paper RH collected information on study methods (secondary DNA typing methods, the cut-off points (if present) for Mtb strains with few IS6110 bands, matching of DNA fingerprints), the number of patients eligible and included in the final DNA fingerprint analysis, TB clustering results, characteristics of the study population (age, proportion males, HIV positive and foreign born) and TB disease (proportion of smear positive, extra pulmonary or drug resistant TB cases). In addition, we recorded TB incidence in the region. If more than one paper provided data on the same population and study period only the one with the longest study duration was included. Extracted data were checked by JG.

Measures of TB disease due to recent Mtb transmission

The main study outcome was the proportion of clustered TB cases, i.e. number of cases in clusters/total number of cases. The reported proportion clustered was used, although the strictness of the definition differed between papers: most required identical fingerprints, but some allowed one band difference. IS6110 RFLP is not sufficiently discriminatory in strains with low band numbers (usually five or less) (Glynn et al. 1999a,b), and if sufficient information was provided we either excluded these strains or recorded the overall proportion clustered when a secondary DNA typing technique [Spoligotyping, Polymorphic GC Sequencing (PGRS), Direct Repeats] was used for these low band number strains. The proportion of TB due to recent transmission was estimated as (number of clustered cases−number of clusters)/number of cases, the n − 1 method (Small et al. 1994). This assumes that each cluster consists of one source case, due to reactivation disease, the rest being due to recent infection.

Study design

Study duration was recorded in months, and cross-sectional surveys were assigned a duration of 0 months. If studies only allowed cases to cluster in a certain period after the source case, this clustering interval was used as study duration. We recorded the fraction of all culture positive (c+) TB cases (of all types) in the catchment area that had RFLP results available. We recorded whether the RFLP analysis included Mtb strains with low band numbers, and if so, whether a secondary DNA typing technique (e.g. spoligotyping or polymorphic GC sequencing) was applied for these (van Soolingen et al. 1993; Kamerbeek et al. 1997).

Study setting

When possible the TB incidence in the study region as reported in the paper was used. Otherwise the sampling fraction, reported TB incidence, study duration and study population size were combined to estimate the regional TB rate. If that was not possible, the scientific literature was searched for region specific estimates or we used the WHO Global Database to acquire a country wide estimate (WHO 2008). Studies were classified into low, middle and high burden TB areas (TB incidence <10, 11–50 and 50 + TB cases per 100 000 per year respectively).

Study population

Studies from industrialized areas (e.g. Western Europe, North America) were grouped together. For these areas we recorded the proportion of TB cases that were foreign born. In these regions immigrants usually account for a substantial part of the TB case population. They are likely to have been infected with Mtb in their country of origin where the annual risk of infection is relatively high (van Soolingen et al. 1999; Cattamanchi et al. 2006; Dahle et al. 2007), and therefore often have unique (non-clustered) strains, adding to the diversity of Mtb strains in the population (Small et al. 1994; Hernandez-Garduno et al. 2002). However, through socio-demographic factors they could also be at a higher risk of being in a cluster, thus increasing the proportion clustered. Where available we also collected data on the proportion of cases with (a history of) homelessness or drug and alcohol abuse.

Age was recorded as the average for the TB case population; either as the reported mean or median age or, if only age strata were reported, through estimation of the median age within the age stratum that held the median observation. Gender was recorded as the proportion of males in the study population. As historical data on the annual risk of infection were not available for the majority of studies, the mean age of the TB case population can be used a proxy, with a declining annual risk of infection shown by a high mean age of infection (Vynnycky et al. 2003). If at least 50% of all TB cases were systematically tested for HIV, we recorded the proportion HIV positive of those with test results. If the paper itself did not report the variable, an estimate was extracted from papers that reported on the same population.

Statistical analyses

Non-parametric tests (Wilkinson rank sum) were used to compare studies from industrialized countries with those from other countries.

To assess the extent to which the variation in clustering seen can be explained by study design, study setting (the local epidemiology of TB) and study population we applied meta-regression (Sterne 2009). This technique allows multivariate regression analysis to ascertain how well individual or a combination of variables can explain the between study variation in the proportion clustered (the tau^2) (Berkey et al. 1995; Thompson & Higgins 2002).

Meta-regression assumes a linear association between the outcome (proportion clustered) and the independent variable as well as an approximately normal distribution of the residuals. The latter was checked through visual inspection of the residuals (scatterplots and histograms) and statistical tests (sktest in Stata version 10; Stata Corp LP., College Station, TX, USA). The choice between entering a variable as categorical or linear was dependent on its distribution, known epidemiological associations and the impact on the between study variation. A variable describing the standard error of the outcome (proportion clustered) is required for each study. We calculated this by taking the square root of ‘p*(1 − p)/N’, where ‘p’ stands for the proportion clustered in a study, and N the total number of study participants.

The effect of the study design and recorded variables was first examined through univariate meta-regression. To allow for potential negative and positive confounding due to high heterogeneity of variable values between studies, all variables were considered for the multivariate models. Final inclusion of a variable in a multivariate model was based on whether adding the variable had a significant impact (>2.5% reduction) on the between-study variation.

Four models were created. The main model was limited to variables describing study design and epidemiological setting (the local TB incidence), to ascertain to what extent these could reduce the variation in the proportion clustered.

The second model included variables describing the study population as well. The third model was limited to studies from industrialized countries, and included the proportion of TB cases that were foreign born. These models were designed to test to what extent the observed variation could be explained by known factors, and how much residual, unexplained variation would remain. The fourth model excluded local TB incidence from the main model.

All models were repeated with the proportion of cases due to recent transmission (n−1 method) as the outcome measure to test the robustness of the findings.

Tool to interpret local proportions clustered

The coefficients from the main model were applied to each study to acquire a predicted value for the proportion clustered based solely on the study’s design and local TB incidence. These estimates were compared with the observed values [(observed−expected)/expected × 100%] to provide a measure of how much the observed proportion clustered differed from its expected value. This correction for study design and setting provides a new perspective on the proportion clustered, and allows for better comparison between studies. Confidence intervals for the relative differences were estimated using the standard error of the predicted values (stdp option in Stata version 10).

Results

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Appendices

Systematic literature review

Our primary search yielded 11 654 records, and after selection (Figure 1) 46 papers were included for analysis.

image

Figure 1.  Flow diagram of systematic literature review process.

Download figure to PowerPoint

Descriptive analyses

The majority of studies (36) were performed in industrialized country settings (North America, Western Europe, Japan or Hong Kong); the others were done in sub-Saharan Africa (n = 3), South America (n = 4) and in South East Asia, Eastern Europe, and the Middle East (one each).

The Forest plot for all 46 included studies arranged by study location (Figure 2a) shows a large variation in the proportion clustered between studies (Q-test for heterogeneity chi square = 6622 (d.f. = 46), P < 0.0001), from 6% in Northern Bangladesh to 86% in Greenland. Included studies and their recorded variables are listed in Appendix 1.

image

Figure 2.  Observed (left) and predicted (right) proportions clustered of 46 included studies. The predicted values for each study were acquired using the coefficients from the meta-regression model 1 (see Table 2, column 2) that included study duration, sampling fraction, handling of Mtb strains with low band numbers and local TB incidence. Appendix 1 holds the values for each study, see appendix 2 for further illustration of the calculations. Box size and error bars indicate number of patients included in the study. *‘Other regions’ refers to sub Saharan Africa, South East Asia, South America, Eastern Europe and Middle East.

Download figure to PowerPoint

Table 1 shows that studies were diverse in design (e.g. study duration between 0 months (cross-sectional study) and >10 years) as well as setting (recorded local TB incidence between 1.7 and 304 cases/100 000/year). TB incidence differed strongly between industrialized settings and other settings (P < 0.001), whereas the proportion of HIV positive TB cases did not (P = 0.88). Insufficient data were available on homelessness and drug or alcohol abuse so these variables were not included in the analyses.

Table 1.   Summary of included studies by study region
Variables*Study region Industrialized countries N = 36 Other countries† N = 10
  1. N = total number of studies in TB burden category; n = number of studies used in cell; ‘%’ = proportion of all study participants.

  2. *Median and range given unless otherwise indicated.

  3. †Studies performed in sub-Saharan Africa (n = 3), South America (n = 4), South East Asia (n = 1), Eastern Europe (1) and the Middle East (n = 1).

  4. ‡Sampling fraction is fraction of all culture positive TB patients in the study area with RFLP results available.

Study design
 Study Duration (months)48 (3–120)13 (0–87)
 Number of patients520 (114–4266)372 (105–1029)
 Sampling fraction‡0.90 (0.30–1.00)  n = 360.82 (0.002–1.00) n = 10
 Low band numbers included  n/N (% studies)13/36 (36)6/10 (60)
 Secondary DNA typing method used for low band numbers  n/N (% studies)13/36 (36)3/10 (27)
Study setting
 TB rate in the region   (n/100 000/year)9 (2–185)90 (8–761)
Study population
 Average age (years)45 (30–69) n = 3045 (33–55) n = 9
 Sex (% male)65 (44–74) n = 2859 (47–76) n = 8
 % Foreign born44 (3–83) n = 28Not recorded
 % HIV positive16 (1–57) n = 2116 (1–65) n = 4
 % Resistant to ≥1 drug10 (0–31) n = 1710 (6–23) n = 5
TB Transmission
 % clustered35 (8–86)29 (6–72)
 % recent transmission25 (4–78) n = 3420 (3–59)

Meta-regression analyses

Figure 3 shows that the proportion clustered increased with increasing study duration, sampling fraction and TB incidence, decreased with increasing age and proportion foreign born (in industrialized countries) and changed little with increasing study size. In the univariate meta-regression analyses these associations were confirmed (Table 2). Some variables reduced the tau^2 by more than 10% (duration of study, handling of low band Mtb strains, age and country of birth of TB case population), whereas others had less or no effect.

image

Figure 3.  The association between the proportion clustered and recorded variables. Scatter plots show univariate associations between the proportion clustered and selected recorded variables. Vertical lines show the categories used in the meta-regression. Open diamonds signal outlier values for the recorded variable.

Download figure to PowerPoint

Table 2.   Meta-regression models: percentage explained between study variation and coefficients for change in the proportion clustered for variables describing study design, setting and population
 Univariate models*Model 1: study design and settingModel 2: overallModel 3: industrialized countries†
(n = 46) 28% of variation explained(n = 39) 36% of variation explained(n = 25) 60% of variation explained
  1. Bold numbers show the proportion of the heterogeneity explained by each variable, calculated as the absolute reduction in explained variation when the variable is removed from the meta-regression model. Only variables that increased the overall explained variation by at least 2.5% were included in the multivariate models, otherwise a – is shown. The coefficients (95% CI) in the multivariate analysis show the difference in proportion clustered between categories (e.g. low vs. medium TB burden) or per unit increase (e.g. 1 year of average age) in the variable. The interpretation of the coefficients is further illustrated in appendix 2.

  2. The individual % explained variation per variable do not sum up to the overall % explained variation in the model. This is because of high levels of correlation between some explanatory variables.

  3. NA – not applicable.

  4. *Proportion of explained variation in the univariate analysis.

  5. †Includes studies from Western Europe, North America, Hong Kong and Japan.

Study design
 Study duration (months)11.718.35.624.9
 0–12refrefrefref
 13–48−7.3 (−23 to 9)−3.2 (−20 to 13)−2.6 (−17 to 12)21.1 (4–38)
 >4811.5 (−5 to 28)18.3 (2 to 35)11.1 (−4 to 26)28.1 (10–46)
 Sampling fraction (proportion of culture positive cases with RFLP)2.28.98.75.6
 0–0.50refrefrefref
 0.50–0.7522.2 (−4 to 48)27.7 (0 to 55)30.1 (5 to 55)29.4 (−2 to 61)
 0.75–118.9 (−11 to 49)29.6 (6 to 53)22.7 (0 to 45)24.1 (−3 to 51)
 Low band strains10.83.916.1
 Excludedrefref ref
 Included with secondary typing9.1 (−5 to 23)0.6 (−12 to 13) 3.3 (−9 to 16)
 Included, no secondary typing21.8 (5 to 38)18.8 (−1 to 39) 21.7 (5 to 38)
 Number of patients included0
 100–200ref   
 201–500−0.1 (−17 to 17)   
 >5008.0 (−7 to 24)   
 Matching of fingerprints0 
 Identicalref   
 One-band difference1.6 (−21 to 18)   
Study setting    
 TB incidence in study area0.817.93.83.1
 Low (≤10/100 000/year)refrefrefref
 Medium (11–50/100 000/year)3.7 (−11 to 19)17.9 (2 to 34)9.8 (−4 to 23)8.3 (−6 to 22)
 High (>50/100 000/year)12.2 (4 to 28)25.4 (9 to 41)13.6 (−2.5 to 30)16.4 (−14 to 47)
Study population
 Average age27.6 −1.28 (−1.9 to −0.6)NA7.9 −0.83 (−1.6 to 0.04)9.3 −0.9 (−1.9 to −0.02)
 % foreign born17.7 −0.36 (−0.6 to −0.1)NANA63.7 −0.54 (−0.8 to −0.3)
 Sex (% male)0 −18.9 (−102 to 65)NA
 % HIV positive9.5NA
 0–10ref   
 10–2514.8 (−1 to 31)   
 >2512.7 (−3 to 28)   
 % Resistant to ≥1 drug0NA
 0–10ref   
 10–20−10.0 (−34 to 14)   
 >20−5.9 (−35 to 23)   
Constant (baseline value) −12 (−40 to 17)42 (−9 to 92)46 (−16 to 109)

Study duration was entered as a categorical variable so its association with clustering could take any shape, including the one shown within populations where clustering increases with study duration, but reaches a plateau after 4 years (Glynn et al. 1999a,b;van Soolingen et al. 1999a,b; Glynn et al. 2005; Jasmer et al. 1999).

In the multivariate meta-regression model (Table 2, Model 1) 28% of between-study variation was explained by study duration, sampling fraction, handling of strains with low band numbers and local TB incidence. Most coefficients of the included variables were statistically significant at the 0.05 level, and the residuals were approximately normally distributed (P-value sktest = 0.46). In this model, the proportion clustered increased with study duration and sampling fraction. The model also showed that including strains with a low number of IS6110 bands increased the proportion clustered, unless secondary typing methods were applied. Additionally, study settings with high TB incidence reported higher proportions clustered.

Incorporating variables describing the study population further reduced the tau^2, explaining up to 60% of the between study variance (Table 2, Model 3). When studies for all countries were considered (Table 2, Model 2), the average age of the TB case population showed a strong negative association with the proportion clustered. In studies from industrialized settings, the proportion clustered decreased as the proportion foreign born increased (P-value coefficient <0.001). This negative association was found in 18 of the 21 studies from industrialized settings that reported the proportion of foreign born TB cases by cluster status, whereas only one study, from Italy, found a positive association (Matteelli et al. 2003).

Excluding local TB incidence from the main model (Model 4) reduced the explained variation from 28 to 10% as well as the precision of the coefficients. However, the direction and size of the coefficients remained similar.

None of the other variables we recorded had a relevant effect on the tau^2. Similar results were found using the proportion of cases due to recent Mtb transmission (estimated using the n−1 method) as the study outcome, rather than the total proportion clustered (results not shown).

Observed vs. expected proportion clustered

For each study the expected proportion clustered could be estimated from the coefficients of the main model. The results are shown in Figure 2b and the relative difference between the expected and observed values is shown in Figure 4. Appendix 2 illustrates these calculations. Figure 4 shows that high observed proportions clustered often lie close to their expected values [e.g. studies from Elche (Spain) Cape Town (South Africa) and Karonga (Malawi) (Ruiz Garcia et al. 2002; Verver et al. 2004; Glynn et al. 2005)]. However, for some studies the levels of reported clustering were twice as high as expected based on study design and setting alone. This applied to regions with apparent moderate as well as higher levels of clustering, for example Arkansas (∼40% clustered) and Gran Canaria (72% clustered) (Braden et al. 1997; Pena et al. 2003; Cave et al. 2005). On the other hand, studies from Vancouver, Japan and Bangladesh reported proportions clustered half as high as that expected (Blenkush et al. 1996; Hernandez-Gardun∼o et al. 2002; Storla et al. 2006).

image

Figure 4.  Relative difference (in %) between observed and expected proportion clustered. *Other regions’ refers to sub Saharan Africa, South East Asia, South America, Eastern Europe, Middle East. Error lines indicate 95% confidence intervals of predicted values, calculated using standard error of predictions.

Download figure to PowerPoint

Discussion

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Appendices

We show that although the proportion clustered varies widely between population-based studies, 28% of this variation can be explained by just four variables describing study design and setting. Models including the average age and immigrant status of the TB case population explained up to 60% of the between-study variation in industrialized countries.

The residual variation can be due to imprecision of included variables, unmeasured factors and interactions of the included variables (for which there were insufficient studies to test). With the current level and detail of reporting of clustered studies it seems likely that any explanatory model will have a lot of unknowns.

Although most coefficients of the main model were statistically significant at the 0.05 level, the confidence intervals were wide. This is in part due to the low number of included studies (46–25 depending on the model) and possibly the high heterogeneity in variable values. The latter is also suggested in Model 4. The removal of one explanatory variable (local TB incidence) has a big impact on the unexplained variation between studies as well the precision of the coefficients, without affecting their overall patterns.

This high heterogeneity is one of the main reasons we excluded studies that applied DNA fingerprinting techniques other than IS6110 RFLP as their main method of strain typing. The number of studies per technique is relatively low, and insufficient to correct for the added variation due to, for example, an unknown level of difference in molecular clocks of the markers used in each technique. However, with the recent standardisation and subsequent more widespread use of PCR based Mycobacterial Interspersed Repetitive Unit-Variable Number of DNA Tandem Repeats (MIRU-VNTR) a valid comparison between RFLP and MIRU-VNTR may in the future become viable (Supply et al. 2006).

Most associations found in this review are statistically strong and in line with observations made in individual epidemiological and modelling studies, thus giving our meta-regression models additional validity. The importance of study duration in clustering studies has been well documented through modelling (Glynn et al. 1999a,b) and epidemiological studies (Jasmer et al. 1999; van Soolingen et al. 1999; Glynn et al. 2005). Our results show that this is also important when comparing between studies. The effect of the sampling fraction has been predicted through modelling (Glynn et al. 1999a,b; Murray 2002a,b) and is intuitive, especially when the average cluster size is small: the ‘missing’ part of the population will lead to more clustered cases being classified as unique, thus underestimating the proportion clustered. The expected increases in clustering with local TB incidence and with younger age were also seen (van Soolingen et al. 1999; Cattamanchi et al. 2006; Dahle et al. 2007).

The strong negative association of foreign born TB cases and the proportion clustered, and the fact that 18 of 21 studies from industrialized countries reported the same statistically significant association on the individual level, both imply that on average foreign born cases have a lower risk of being part of an identified cluster in a study setting.

Our comparison between observed and expected clustering highlights outliers. More clustering than expected could arise in situations where there is a low number of circulating Mtb strains and little population movement, so that identical strains could reflect transmission many years previously; this could account for the findings from Arkansas (Braden et al. 1997). TB outbreaks will increase clustering, which was the case in Gran Canaria where two Mtb strains were involved in 30% of all clustered TB cases (Pena et al. 2003).

Lower than expected clustering (based on model 1) could reflect an old population with much disease due to reactivation, as in Japan where the average age was 69 years (Fujikane et al. 2004). If we apply Model 2, which includes age, to this study the expected clustering is estimated at 18%, which lies much closer to the observed value. The results from the Bangladesh study appear to be due to under sampling; only 111 of 1264 (9%) notified cases from the region and study period were confirmed by culture and thus potentially included in the clustering analysis (Storla et al. 2006). This effect is not included in our model; the sampling fraction was calculated as the fraction of confirmed culture positive cases due to limitations in the reporting of studies.

The comparison of observed and expected clustering also shows the degree to which high observed clustering can be explained: in Malawi, Cape Town and Greenland the high levels of clustering were largely due to the studies’ long duration, high sampling fraction and the high local TB rates (Soborg et al. 2001; Zolnir-Dovc et al. 2003; Thomsen et al. 2004; Verver et al. 2004; Glynn et al. 2005a,b).

We would have liked to include more studies from high burden countries, but there were no further studies available. However, the effect of study design factors is likely to be constant in different regions (as is the case with study duration (Jasmer et al. 1999; van Soolingen et al. 1999a,b; Glynn et al. 2005)). We did include 10 studies from areas with a high annual TB incidence (>50/100 000), which should make our results applicable to high burden settings.

Previous reviews have had limitations, either through not including all relevant studies or potential problems in the analysis (Fok et al. 2008; Houben et al. 2009; Nava-Aguilera et al. 2009). Also, rather than collating results from varying study designs and populations on the individuals’ risk of clustering, we chose to take a more public health-oriented approach by investigating the proportion clustered in populations while explicitly taking into account the differences in study design and setting.

We have focussed on the extent to which measured clustering can be explained by known factors, and the methods and associations presented here can be applied by researchers to acquire a new perspective on the proportion clustered, after adjusting for study design and setting. This will allow a more valid comparison between studies, highlight outliers and help researchers to assess their local levels of ongoing Mtb transmission.

Acknowledgements

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Appendices

We thank Sara Thomas for fruitful discussions on the systematic literature review and Simon Cousens, Jonathan Sterne and Chris Frost for their useful statistical advice.

References

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Appendices
  • Asgharzadeh M, Shahbabian K, Majidi J et al. (2006) IS6110 restriction fragment length polymorphism typing of Mycobacterium tuberculosis isolates from East Azerbaijan Province of Iran. Memorias do Instituto Oswaldo Cruz 101, 517521.
  • Berkey CS, Hoaglin DC, Mosteller F & Colditz GA (1995) A random-effects regression model for meta-analysis. Statistics in Medicine 14, 395411.
  • Bishai WR, Graham NMH, Harrington S et al. (1998) Molecular and geographic patterns of tuberculosis transmission after 15 years of directly observed therapy. The Journal of American Medical Association 280, 16791684.
  • Blackwood KS, Al-Azem A, Elliott LJ, Hershfield ES & Kabani AM (2003) Conventional and molecular epidemiology of Tuberculosis in Manitoba. BMC Infectious Diseases 3, 11.
  • Blackwood KS, Wolfe JN & Kabani AM (2004) Application of mycobacterial interspersed repetitive unit typing to Manitoba tuberculosis cases: can restriction fragment length polymorphism be forgotten? Journal of Clinical Microbiology 42, 50015006.
  • Blenkush M, Kunimoto D, Black W, Elwood RK & FitzGerald JM (1996) Evidence for TB clustering in Vancouver: results from pilot study using RFLP fingerprinting. Canada Communicable Disease Report 22, 4951.
  • Borgdorff MW, Behr MA, Nagelkerke NJ, Hopewell PC & Small PM (2000) Transmission of tuberculosis in San Francisco and its association with immigration and ethnicity. International Journal of Tuberculosis and Lung Disease 4, 287294.
  • Braden CR, Templeton GL, Cave MD et al. (1997) Interpretation of restriction fragment length polymorphism analysis of Mycobacterium tuberculosis isolates from a state with a large rural population. Journal of Infectious Diseases 175, 14461452.
  • Bruchfeld J, Aderaye G, Palme IB et al. (2002) Molecular epidemiology and drug resistance of Mycobacterium tuberculosis isolates from ethiopian pulmonary tuberculosis patients with and without human immunodeficiency virus infection. Journal of Clinical Microbiology 40, 16361643.
  • Burgos M, DeRiemer K, Small PM, Hopewell PC & Daley CL (2003) Effect of drug resistance on the generation of secondary cases of tuberculosis. Journal of Infectious Diseases 188, 18781884.
  • Burman WJ, Reves RR, Hawkes AP et al. (1997) DNA fingerprinting with two probes decreases clustering of Mycobacterium tuberculosis. American Journal of Respiratory & Critical Care Medicine 155, 11401146.
  • Cattamanchi A, Hopewell PC, Gonzalez LC et al. (2006) A 13-year molecular epidemiological analysis of tuberculosis in San Francisco. International Journal of Tuberculosis and Lung Disease 10, 297304.
  • Cave MD, Yang ZH, Stefanova R et al. (2005) Epidemiologic import of tuberculosis cases whose isolates have similar but not identical IS6110 restriction fragment length polymorphism patterns. Journal of Clinical Microbiology 43, 12281233.
  • CDC (2006) Centers for Disease Control and Prevention; TB-update list. Available at http://listmanager.aspensys.com/read/?forum=tb-update. Accessed at November 2006.
  • Chan-Yeung M, Kam KM, Leung CC et al. (2006) Population-based prospective molecular and conventional epidemiological study of tuberculosis in Hong Kong. Respirology 11, 442448.
  • Cohn DL & O’Brien RJ (1998) The use of restriction fragment length polymorphism (RFLP) analysis for epidemiological studies of tuberculosis in developing countries. International Journal of Tuberculosis and Lung Disease 2, 1626.
  • Cowan LS, Diem L, Monson T et al. (2005) Evaluation of a two-step approach for large-scale, prospective genotyping of Mycobacterium tuberculosis isolates in the United States. Journal of Clinical Microbiology 43, 688695.
  • Cronin WA, Golub JE, Magder LS et al. (2001) Epidemiologic usefulness of spoligotyping for secondary typing of Mycobacterium tuberculosis isolates with low copy numbers of IS6110. Journal of Clinical Microbiology 39, 37093711.
  • Dahle UR, Sandven P, Heldal E & Caugant DA (2001) Molecular epidemiology of Mycobacterium tuberculosis in Norway. Journal of Clinical Microbiology 39, 18021807.
  • Dahle UR, Sandven P, Heldal E & Caugant DA (2003) Continued low rates of transmission of Mycobacterium tuberculosis in Norway. Journal of Clinical Microbiology 41, 29682973.
  • Dahle UR, Eldholm V, Winje BA, Mannsaker T & Heldal E (2007) Impact of immigration on the molecular epidemiology of Mycobacterium tuberculosis in a low-incidence country. American Journal of Respiratory and Critical Care Medicine 176, 930935.
  • Dale JW, Nor RM, Ramayah S, Tang TH & Zainuddin ZF (1999) Molecular epidemiology of tuberculosis in Malaysia. Journal of Clinical Microbiology 37, 12651268.
  • Das SD, Narayanan S, Hari L et al. (2005) Differentiation of highly prevalent IS6110 single-copy strains of Mycobacterium tuberculosis from a rural community in South India with an ongoing DOTS programme. Infection, Genetics & Evolution 5, 6777.
  • De Bruyn G, Adams GJ, Teeter LD, Soini H, Musser JM & Graviss EA (2001) The contribution of ethnicity to Mycobacterium tuberculosis strain clustering. International Journal of Tuberculosis and Lung Disease 5, 633641.
  • Diel R, Seidler A, Nienhaus A, Rusch-Gerdes S & Niemann S (2005) Occupational risk of tuberculosis transmission in a low incidence area. Respiratory Research 6, 35.
  • Van Embden JD, Cave MD, Crawford JT et al. (1993) Strain identification of Mycobacterium tuberculosis by DNA fingerprinting: recommendations for a standardized methodology. Journal of Clinical Microbiology 31, 406409.
  • Fok A, Numata Y, Schulzer M & Fitzgerald MJ (2008) Risk factors for clustering of tuberculosis cases: a systematic review of population-based molecular epidemiology studies. International Journal of Tuberculosis and Lung Disease 12, 480492.
  • Frieden TR, Woodley CL, Crawford JT, Lew D & Dooley SM (1996) The molecular epidemiology of tuberculosis in New York City: The importance of nosocomial transmission and laboratory error. Tubercle & Lung Disease 77, 407413.
  • Fujikane T, Fujiuchi S, Yamazaki Y et al. (2004) Molecular epidemiology of tuberculosis in the north Hokkaido district of Japan. International Journal of Tuberculosis and Lung Disease 8, 3944.
  • Glynn JR, Bauer J, De Boer AS et al. (1999a) Interpreting DNA fingerprint clusters of Mycobacterium tuberculosis. European Concerted Action on Molecular Epidemiology and Control of Tuberculosis. International Journal of Tuberculosis and Lung Disease 3, 10551060.
  • Glynn JR, Vynnycky E & Fine PE (1999b) Influence of sampling on estimates of clustering and recent transmission of Mycobacterium tuberculosis derived from DNA fingerprinting techniques. American Journal of Epidemiology 149, 366371.
  • Glynn JR, Crampin AC, Yates MD et al. (2005) The importance of recent infection with Mycobacterium tuberculosis in an area with high HIV prevalence: a long-term molecular epidemiological study in Northern Malawi. Journal of Infectious Diseases 192, 480487.
  • Haas WH, Engelmann G, Amthor B et al. (1999) Transmission dynamics of tuberculosis in a high-incidence country: prospective analysis by PCR DNA fingerprinting. Journal of Clinical Microbiology 37, 39753979.
  • Hermans PW, Messadi F, Guebrexabher H et al. (1995) Analysis of the population structure of Mycobacterium tuberculosis in Ethiopia, Tunisia, and The Netherlands: usefulness of DNA typing for global tuberculosis epidemiology. Journal of Infectious Diseases 171, 15041513.
  • Hernandez-Garduno E, Kunimoto D, Wang L et al. (2002) Predictors of clustering of tuberculosis in Greater Vancouver: a molecular epidemiologic study. CMAJ: Canadian Medical Association Journal 167, 349352.
  • Houben RM, Glynn JR, Fok A, Numata Y, Schulzer M & Fitzgerald JM (2009) Systematic review and analysis of population-based molecular epidemiological studies. International Journal of Tuberculosis and Lung Diseases 13, 275276.
  • Jasmer RM, Hahn JA, Small PM et al. (1999) A molecular epidemiologic analysis of tuberculosis trends in San Francisco, 1991-1997. Annals of Internal Medicine 130, 971978.
  • Jimenez-Corona ME, Garcia-Garcia L, DeRiemer K et al. (2006) Gender differentials of pulmonary tuberculosis transmission and reactivation in an endemic area. Thorax 61, 348353.
  • Kamerbeek J, Schouls L, Kolk A et al. (1997) Simultaneous detection and strain differentiation of Mycobacterium tuberculosis for diagnosis and epidemiology. Journal of Clinical Microbiology 35, 907914.
  • Kempf MC, Dunlap NE, Lok KH, Benjamin WH Jr, Keenan NB & Kimerling ME (2005) Long-term molecular analysis of tuberculosis strains in alabama, a state characterized by a largely indigenous, low-risk population. Journal of Clinical Microbiology 43, 870878.
  • Kunimoto D, Sutherland K, Wooldrage K et al. (2004) Transmission characteristics of tuberculosis in the foreign-born and the Canadian-born populations of Alberta, Canada. International Journal of Tuberculosis and Lung Disease 8, 12131220.
  • Lari N, Rindi L, Sola C et al. (2005) Genetic diversity, determined on the basis of katG463 and gyrA95 polymorphisms, spoligotyping, and IS6110 typing, of Mycobacterium tuberculosis complex isolates from Italy. Journal of Clinical Microbiology 43, 16171624.
  • Lillebaek T, Dirksen A, Kok-Jensen A & Andersen AB (2004) A dominant Mycobacterium tuberculosis strain emerging in Denmark. International Journal of Tuberculosis and Lung Disease 8, 10011006.
  • Maguire H, Dale JW, McHugh TD et al. (2002) Molecular epidemiology of tuberculosis in London 1995–7 showing low rate of active transmission. Thorax 57, 617622.
  • Matteelli A, Gori A, Pinsi G et al. (2003) Clustering of tuberculosis among Senegalese immigrants in Italy. International Journal of Tuberculosis & Lung Disease 7, 967972.
  • Moro ML, Salamina G, Gori A et al. (2002) Two-year population-based molecular epidemiological study of tuberculosis transmission in the Metropolitan area of Milan, Italy. European Journal of Clinical Microbiology & Infectious Diseases 21, 114122.
  • Murray M (2002a) Determinants of cluster distribution in the molecular epidemiology of tuberculosis. Proceedings of the National Academy of Sciences of the United States of America 99, 15381543.
  • Murray M (2002b) Sampling bias in the molecular epidemiology of tuberculosis. Emerging Infectious Diseases 8, 363369.
  • Nava-Aguilera E, Andersson N, Harris E et al. (2009) Risk factors associated with recent transmission of tuberculosis: systematic review and meta-analysis. International Journal of Tuberculosis and Lung Disease 13, 1726.
  • Park YK, Bai GH & Kim SJ (2000) Restriction fragment length polymorphism analysis of Mycobacterium tuberculosis isolated from countries in the western pacific region. Journal of Clinical Microbiology 38, 191197.
  • Pena MJ, Caminero JA, Campos-Herrero MI et al. (2003) Epidemiology of tuberculosis on Gran Canaria: a 4 year population study using traditional and molecular approaches. Thorax 58, 618622.
  • Pfyffer GE, Strassle A, Rose N, Wirth R, Brandli O & Shang H (1998) Transmission of tuberculosis in the metropolitan area of Zurich: A 3 year survey based on DNA fingerprinting. European Respiratory Journal 11, 804808.
  • Ruiz Garcia M, Rodriguez JC, Navarro JF, Samper S, Martin C & Royo G (2002) Molecular epidemiology of tuberculosis in Elche, Spain: A 7-year study. Journal of Medical Microbiology 51, 273277.
  • Samper S, Iglesias MJ, Rabanaque MJ et al. (1998) The molecular epidemiology of tuberculosis in Zaragoza, Spain: A retrospective epidemiological study in 1993. International Journal of Tuberculosis and Lung Disease 2, 281287.
  • Scott AN, Menzies D, Tannenbaum TN et al. (2005) Sensitivities and specificities of spoligotyping and mycobacterial interspersed repetitive unit-variable-number tandem repeat typing methods for studying molecular epidemiology of tuberculosis. Journal of Clinical Microbiology 43, 8994.
  • Sharnprapai S, Miller AC, Suruki R et al. (2002) Genotyping analyses of tuberculosis cases in U.S.-and foreign-born Massachusetts residents. Emerging Infectious Diseases 8, 12391245.
  • Small PM, Hopewell PC, Singh SP et al. (1994) The epidemiology of tuberculosis in San Francisco. A population-based study using conventional and molecular methods. New England Journal of Medicine 330, 17031709.
  • Soborg C, Soborg B, Pouelsen S, Pallisgaard G, Thybo S & Bauer J (2001) Doubling of the tuberculosis incidence in Greenland over an 8-year period (1990–1997). International Journal of Tuberculosis and Lung Disease 5, 257265.
  • Van Soolingen D, De Haas PE, Hermans PW, Groenen PM & Van Embden JD (1993) Comparison of various repetitive DNA elements as genetic markers for strain differentiation and epidemiology of Mycobacterium tuberculosis. Journal of Clinical Microbiology 31, 19871995.
  • Van Soolingen D, Borgdorff MW, De Haas PE et al. (1999) Molecular epidemiology of tuberculosis in the Netherlands: a nationwide study from 1993 through 1997. Journal of Infectious Diseases 180, 726736.
  • Sterne J (2009) Meta-Analysis in Stata: An Updated Collection from the Stata Journal. Stata Press, College Station, TX, 259 p.
  • Storla DG, Rahim Z, Islam MA et al. (2006) Heterogeneity of Mycobacterium tuberculosis isolates in Sunamganj District, Bangladesh. Scandinavian Journal of Infectious Diseases 38, 593596.
  • Supply P, Allix C, Lesjean S et al. (2006) Proposal for standardization of optimized mycobacterial interspersed repetitive unit-variable-number tandem repeat typing of Mycobacterium tuberculosis. Journal of Clinical Microbiology 44, 44984510.
  • Sutherland I, Svandova E & Radhakrishna S (1982) The development of clinical tuberculosis following infection with tubercle bacilli. Tubercle 63, 255268.
  • Thompson SG & Higgins JP (2002) How should meta-regression analyses be undertaken and interpreted? Statistics in Medicine 21, 15591573.
  • Thomsen VO, Lillebaek T & Stenz F (2004) Tuberculosis in Greenland--current situation and future challenges. International Journal of Circumpolar Health 63(Suppl. 2), 225229.
  • Vachee A, Vincent P, Savage C et al. (1999) Molecular epidemiology of tuberculosis in the Nord department of France during 1995. Tubercle & Lung Disease 79, 361366.
  • Verver S, Warren RM, Munch Z et al. (2004) Transmission of tuberculosis in a high incidence urban community in South Africa. International Journal of Epidemiology 33, 351357.
  • Vynnycky E & Fine PE (1997) The natural history of tuberculosis: the implications of age-dependent risks of disease and the role of reinfection. Epidemiology and Infection 119, 183201.
  • Vynnycky E, Borgdorff MW, Van Soolingen D & Fine PE (2003) Annual Mycobacterium tuberculosis infection risk and interpretation of clustering statistics. Emerging Infectious Diseases 9, 176183.
  • Weis SE, Pogoda JM, Yang Z et al. (2002) Transmission dynamics of tuberculosis in Tarrant county, Texas. American Journal of Respiratory and Critical Care Medicine 166, 3642.
  • WHO. (2008) World Health Organization; Global TB database. Available at: http://www.who.int/tb/country/global_tb_database/en/index.html (accessed on 15 February 2008).
  • Wilkinson D, Pillay M, Crump J, Lombard C, Davies GR & Sturm AW (1997) Molecular epidemiology and transmission dynamics of Mycobacterium tuberculosis in rural Africa. Tropical Medicine & International Health 2, 747753.
  • Zolnir-Dovc M, Poljak M, Erzen D & Sorli J (2003) Molecular epidemiology of tuberculosis in Slovenia: Results of a one-year (2001) nation-wide study. Scandinavian Journal of Infectious Diseases 35, 863868.

Appendices

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Appendices

Appendix 1. Summary of studies included in meta-regression analysis

Study LocationStudy designStudy Setting & populationTB transmission
Study population and DNA fingerprinting methods*Inclusion by IS6110 band numberSecondary DNA typing method (cut-off)†Patients in DNA fingerprint analysisDuration (months)Sampling fraction‡ Local TB incidence (n\year\ 100 000)Average age (years) HIV positives (%)Sex (% male) Foreign born (%)Clustered (%) Recent transmission (%)
  1. c = culture positive; ‘.’ = data not available or found; % = percentage of all study participants; Spol, Spoligotyping; PGRS, polymorphic GC–repetitive sequence typing; ‘no.’ = number; ‘∼’ = approximately.

  2. *Sample collection as reported by the authors.

  3. †Secondary DNA fingerprinting method used for clustering analysis, if number of IS6110 RFLP bands is below cut-off (shown in parentheses).

  4. ‡Sampling fraction was based on proportion of all c+ cases that had RFLP results available.

  5. §One band difference in IS6110 RFLP pattern was allowed between clustered strains.

  6. ¶All typed Mtb strains had >7 IS6110 bands in their RFLP pattern.

  7. **Cross sectional survey, duration set to 0 months.

  8. ††Time difference between first and secondary patients in cluster limited to 12 months.

Brescia (Italy) (Matteelli et al. 2003)All c + cases from Brescia province, ‘91–’97 AllSpol (<3)195840.3015471467302716
Denmark & Greenland (Lillebaek et al. 2004)Nearly all c + patients in Denmark & Greenland, ‘92–’01, 4% of all strains were second or third isolate from same patientAll.39361200.97440..5856.
Department Nord (France) (Vachee et al. 1999)Most (∼90%) c + TB cases in Department Nord, 1995§All.154120.6695477326189
Elche (Spain) (Ruiz Garcia et al. 2002)Sample of all c + diagnosed patients in Elche Health district, 1993–1999. No information reported about sampling method>4.141840.5999372970.5538
Gran Canaria (Spain) (Pena et al. 2003)All c + TB patients in Gran Canaria between 1993–1996>4.566480.792939166977258
Greenland (Denmark) (Soborg et al. 2001)All identified TB patients from Greenland, 1990–1997, 15 c + cases from study region were not notified, and not included in DNA fingerprint analysis>7¶.310960.9313030.53.8578
Greenland (Denmark) (Thomsen et al. 2004)All notified patients from Greenland, 1998–2002 No data on missed cases, ∼60% of notified cases were c+>7¶.198600.94185....86 
Hamburg (Germany) (Diel et al. 2005)All reported c + TB cases in Hamburg, 1997–2002 >4.848720.88164457 433425
London (United Kingdom) (Maguire et al. 2002)All c + TB patients in greater London area, July 1995–December 1996>4.2042300.763337 59802314
Milan (Italy) (Moro et al. 2002)All diagnosed TB cases from Milan metropolitan area residents, 1995–1997AllSpol (<5)581241.0013382263304128
Netherlands (van Soolingen et al. 1999)All reported cases in The Netherlands between 1993–1997AllPGRS (<5)4266600.75934461444635
Norway (Dahle et al. 2001) All diagnosed TB patients in Norway between 1999–2001>4.485360.927430071106
Norway (Dahle et al. 2003) All diagnosed TB patients in Norway, 1994–1998>4.619600.89545.54501511
Tuscany (Italy) (Lari et al. 2005) All c + TB cases in Tuscany, 2002§All.248121.0075011.373319
Zaragoza (Spain) (Samper et al. 1998) All c + TB patients in Zaragoza in 1993 ‘Nearly all samples’ went to two participating labs>4.226120.843244446943927
Zurich (Switzerland) (Pfyffer et al. 1998) Patients from the Zurich Canton, 1991–1993AllPGRS (<5)36136.12.1063511711
Alabama (USA) (Kempf et al. 2005 )All diagnosed TB cases from state of Alabama, 1994–2000>5.1136760.80856669.2825
Alberta (Canada) (Kunimoto et al. 2004) All c + TB cases from Alberta province, 1994–1998AllSpol (<6)573601.00645.48422014
Arkansas (USA) (Braden et al. 1997) All c + cases in Arkensas, 1992–1993§>5.192240.71762.6734230
Arkansas (USA) (Cave et al. 2005 )All c + TB cases in Arkensas, 1996–1999>6.419480.98756.6193928
Baltimore (USA) (Bishai et al. 1998) All c + cases TB reported in Baltimore City, 1994–1996AllPGRS (<7)182301.001554286934632
Denver (USA) (Burman et al. 1997) All c + TB cases from Denver metropolitan area, 1988–1994.>5.131660.633.1572482819
Houston (USA) (De Bruyn et al. 2001) All reported TB cases in Houston, 1995–1998AllSpol (<5)1139360.9120.1970706053
Manitoba (Canada) (Blackwood et al. 2004) All diagnosed TB cases in Manitoba province, 1992–1999§All.629961.00945.57306860
Manitoba (Canada) (Blackwood et al. 2004) All diagnosed TB cases Manitoba province, 2003All.126121.009....6556
Maryland (USA) (Cronin et al. 2001) All c + TB cases in Maryland, 1996–2000§AllSpol (<7)1172600.985451244463728
Massachusets (USA) (Sharnprapai et al. 2002) All reported TB cases in Massachusetts July 1996–December 2000§AllSpol (<7)983540.954502757702819
Montreal (Canada) (Scott et al. 2005) All reported TB patients in Montreal, 1996–1998>5.347360.95104034558084
New York (USA) (Frieden et al. 1996) All c + TB cases in New York in April 1991>4.3440**0.8347392974223728
San Francisco (USA) (Burgos et al. 2003) All reported TB cases in San Francisco area, 1991–1999AllPGRS (<6)18001080.8435452069653828
Tarrant County (USA) (Weis et al. 2002) All c + TB patients resident in Tarrant County 1993–2000AllSpol (<7)488960.59645.67346050
Vancouver (Canada) (Blenkush et al. 1996) All c + cases in Vancouver area, 1992–1994All.114180.67652...128
Vancouver (Canada) (Hernandez-Garduno et al. 2002) All new c + TB cases in Greater Vancouver area, 1995–1999 five cases were excluded for having experienced a previous TB episodeAllSpol (<6)791510.98651554831712
Wisconsin (USA) (Cowan et al. 2005) All c + TB cases in Wisconsin, 2000–2003>6.200460.992....1610
Hokkaido (Japan) (Fujikane et al. 2004) All diagnosed patients in Hokkaido prefecture, 2001>5.207360.832069...84
Hong Kong (Chan-Yeung et al. 2006) All c + TB cases with residence on Hong Kong island, May 1999–April 2002AllPGRS (<6)1533360.6610858168633020
Cape Town (South Africa) (Verver et al. 2004) All patients diagnosed with TB that reported in and are residents of two high incidence urban communities of Cape Town. 1993–1998AllSpol (<5)797720.7876133.57.7258
Hlabisa (South Africa) (Wilkinson et al. 1997) All consecutive SS + cases in Hlabisa, a rural district in South Africa, May 1993–March 1994AllPGRS (<5)246111.00305363062.4529
Karonga (Malawi) (Glynn et al. 2005) All c + TB cases in Karonga district, 1995–2003>4.948870.8281336547.7259
Malaysia (Dale et al. 1999) Nationwide random sample of c + TB cases in Malaysia, 1993–1994>4.331240.035845...116
Republic of Korea (Park et al. 2000) Multistage stratified cluster sample of Korean population>4.1360**0.0029855.69.117
Sunamganj district (Bangladesh) (Storla et al. 2006) All SS + TB cases come from four sub-districts in Sunamganj district (Northern Bangladesh), November 2003–December 2004>4.106130.95111....63
Tirruvalur District (India) (Das et al. 2005) All DOTS notified TB cases in Tirruvallur district, 1999–2000>5.151190.7811145.76.1910
Veracruz (Mexico) (Jimenez-Corona et al. 2006) All c + patients in Veracruz state, March 1995–April 2003AllSpol (<6)62312††1.002844252.2518
Slovenia (Zolnir-Dovc et al. 2003)Nearly all (99.7%) of c + patients in Slovenia, 2001>4.304120.991955161243826
East Azerbadjan (Iran) (Asgharzadeh et al. 2006)All c + TB cases in East Azerbadjan September 2002–March 2003All.10570.82847.57.3323

Appendix 2. Calculation of relative difference in proportion clustered

VariableCoefficient*Study location (year of publication)
Hokkaido (Fujikane et al. 2004)Cape Town (Verver et al. 2004)Arkensas (Braden et al. 1997)
  1. *Coefficients give the change in the expected proportion clustered for each category.

  2. †The baseline value of the proportion clustered.

  3. ‡As reported by the study. See appendix 1 for details.

  4. §Relative difference is calculated as (observed−expected/expected) × 100%.

Constant†−12XXX
Study duration (months)
 0–120   
 13–48−3.2X X
 >4818.3 X 
Sampling fraction proportion of culture positive cases included
 0–0.500   
 0.50–0.7527.7   
 0.75–129.6XXX
Low band strains
 Excluded0X X
 Included with secondary typing0.6 X 
 Included, no secondary typing 25.4   
TB burden in study area
 Low (≤10/100 000/year)0  X
 Medium (11–50/100 000/year)17.9X  
 High (>50/100 000/year)25.4 X 
Expected proportion clustered 32.361.914.4
Observed proportion clustered‡ 87242
Relative difference§ −75.2%+16.3%+191.6%