Before 1990, it was generally believed that, in low-incidence countries, most TB cases were attributable to endogenous reactivation of latent infection, and only a small proportion, in the order of 10%, would derive from recent transmission [3, 27]. On the basis of this assumption, progress towards elimination in low-incidence countries was predicted, as this would depend mainly on the prevalence of latent infection in older age cohorts and the natural replacement of this high-prevalence group by younger, less infected age cohorts . Two landmark studies in the 1990s overthrew this assumption on the basis of RFLP typing. In New York and San Francisco, >30% of TB cases were attributed to recent infection on the basis of clustering (different patients whose isolates had the same RFLP pattern) [6, 7].
The initial studies on DNA fingerprinting raised questions about the suitability of RFLP clustering for measuring recent transmission. For instance, in a rural population in Arkansas, it proved impossible to identify recent epidemiological links between the majority of clustered cases, in particular among the elderly . On the other hand, with intensive follow-up in The Netherlands, an epidemiological link was either demonstrated or probable in up to 85% of clustered cases .
There are multiple reasons for an imperfect correspondence between RFLP clustering and epidemiological contact information . First, confirmation of contacts is expected to have limited sensitivity. Given the long incubation period of TB, recent transmission has been defined as transmission in the past 2–5 years [31, 32]. Epidemiological confirmation of all instances of airborne transmission over such a long period is a priori unlikely. Second, the intensity of efforts to identify epidemiological contact is likely to be important, and may be limited and variable under routine conditions. Third, although the rate of change of RFLP patterns supports its use for studying recent transmission , the RFLP pattern does not change exactly after the arbitrarily defined period of recent transmission, so some cases with identical fingerprints may be linked over longer periods, and some cases linked through recent transmission may not have isolates with identical fingerprints. Fourth, immigrants may introduce strains with (nearly) identical DNA fingerprints, which may not reflect recent transmission in the study area, but rather transmission or common strains in the country of origin. Fifth, sampling in time, space or at random was shown to lead to underestimation of clustering [34, 35]. In particular, cases in small clusters would run the risk of being misclassified as non-clustered. This precludes the use of clustering as an indicator of recent transmission in studies using small sampling fractions, e.g. national sample surveys. However, as sampling in space and time is unavoidable, and complete DNA fingerprinting results for all eligible TB cases are rarely obtained, this also served as a warning to interpret clustering statistics cautiously . Finally, the interpretation of clustering depends on factors such as the age distribution of TB cases (clustering may overestimate recent transmission in the elderly and underestimate it in the young) and the TB trend over time . In a meta-analysis, a large variation in clustering between studies was observed, and was indeed explained in part by study duration, sampling fraction, occurrence of strains with low copy numbers, and TB incidence .
Risk factors for TB attributable to recent transmission
Risk factors for TB attributable to recent transmission include male sex, being a young adult, being native (vs. foreign-born), urban residence, alcohol and drug abuse, being homeless, being exposed in crowded settings, including prisons, and having pulmonary tuberculosis [6, 7, 38-42]. HIV and multidrug-resistant TB (MDR-TB) were found to be risk factors in some settings, but not in others . As risk factors were identified relative to the risk of TB not attributable to recent infection, care needs to be taken in the interpretation. For instance, the elderly in low-incidence countries have a much higher risk of TB attributable to remote infection than the young, so the proportion of TB in that age group attributable to recent infection may be expected to be smaller than among the young . However, young age is also, in absolute terms, a risk factor for recently transmitted TB. For instance, in The Netherlands, the vast majority of TB cases attributed to recent transmission were found among young secondary cases resulting from recent transmission from a young index case [43, 44].
Sampling bias might affect not only the clustering proportion but also the identification of risk factors for clustering. A mathematical model suggested that ORs for clustering would be underestimated as a result of sampling bias . However, a recent study showed that this sampling bias was very limited, unless extremely small samples were taken . Risk factors for clustering can thus be used to identify priority groups for contact investigations and intensified case-finding [46, 47].
Focusing contact investigations
With the introduction of molecular typing, it was hoped that the technique would contribute to improving contact investigations and outbreak detection. Indeed, DNA fingerprinting was shown to lead to the identification of epidemiological links, in particular in ‘non-traditional' settings, including bars and churches . Moreover, unsuspected outbreaks have been detected frequently [38, 49-52]. Molecular typing may also contribute to targeting contact investigations based on the characteristics of the first two cases of a cluster .
Conversely, RFLP typing has been used to identify some of the limitations of conventional contact investigations to identify recent transmission. For instance, molecular epidemiological findings suggested that contact investigations may be inadequate to prevent disease if contact occurs outside the household or close relatives or friends . In Rotterdam, molecular typing identified widespread transmission from multiple sources among drug users, thus showing the limitations of contact investigation in his high-risk population without molecular typing, and leading to an active case-finding programme [47, 53]. In various settings, a substantial proportion of household contacts were infected with a different strain than the index case: 30% in California , and 54% in Cape Town ; before molecular typing became available, this would have been attributed to transmission within the household. It is expected that replacing RFLP typing with the much faster VNTR typing method will further help in the targeting of contact investigations.
Among the first applications of RFLP typing was the identification of outbreaks of TB among hospitalized HIV-infected patients [56, 57]. Recognition of this risk has led to the inclusion of infection control as one of the ‘three I's’ for the control of TB among HIV-infected patients, the other two being intensified TB case-finding among HIV-infected patients and isoniazid preventive therapy.
Wereas nosocomial transmission of M. tuberculosis and Mycobacterium bovis is hazardous for HIV-infected patients , the risk to non-HIV-infected health workers and patients appears to be variable [59, 60], perhaps depending on differences between settings in patient populations and infection control practices. The risk of nosocomial transmission was highlighted by molecular epidemiological studies, but other approaches have also made an important contribution to this knowledge. In particular, an in vivo air-sampling model with exposure of guinea pigs demonstrated the high variability in infectiousness between patients [61, 62] and the impact of various control measures .
Before the advent of molecular tools, the risk of re-infection after curative treatment of TB was unclear. Quantifying this phenomenon is important for various reasons, including a better understanding of the role of acquired protective immunity and hence the prospects for more effective vaccines. Styblo suggested that the decline in incidence of TB among the elderly in The Netherlands over the course of the 20th century was attributable to a declining risk of TB caused by re-infection . Given the high annual risk of tuberculous infection at the beginning of the 20th century (>10% before 1910), the prevalence of latent infection was extremely high in these birth cohorts from adulthood onwards. The decreasing TB incidence in these birth cohorts over the years might be explained by a declining rate of disease attributable to re-infection. However, other authors did not consider re-infection to be important [68, 69], and an alternative hypothesis might explain the declining TB rates among the elderly as well, as rates of reactivation from latent infection to disease might have declined over time, e.g. as a result of better nutrition. That reactivation rates vary strongly between settings was shown by a study from Hong Kong, which estimated that the rate of reactivation among elderly men was approxumately 17 times higher in Hong Kong than in the UK .
A landmark study from Cape Town provided direct evidence of the importance of re-infection as a cause of recurrent TB after curative treatment . Among 16 patients with recurrent TB, 12 (75%) had a strain with a different RFLP pattern from that during the first episode, suggesting an extremely important role for re-infection in recurrent TB in a high-incidence setting. In this study, HIV results were not available, but the HIV prevalence was believed to be low. In studies measuring HIV status, recurrent TB resulting from re-infection was particularly common among HIV-infected patients [72, 73]. The risk among HIV-infected patients was lowered by antiretroviral therapy . A study on recurrent TB among HIV-infected children showed that recurrence was common, affecting 10% of children, and was attributable to both relapse and re-infection . In low-incidence settings, on the other hand, recurrent TB was much less common, and was rarely attributable to re-infection .
A later study from Cape Town suggested that some individuals may be particularly susceptible to TB, as the incidence of recurrent TB attributed to re-infection was higher than the incidence of a first episode of TB in the same population . This finding has since been confirmed elsewhere, both for HIV-infected and for HIV-uninfected individuals [78, 79].
Although recurrent TB resulting from re-infection may have limited relevance to TB control activities , it calls into question the role of protective immunity . Further immunological studies are needed to determine the role of protective immunity in TB and the implications for vaccine development .
In the early studies, RFLP patterns were generally interpreted as being derived from one strain, as, almost invariably, the intensities of all bands were equal, and mixtures of different bacterial populations reflected in two subsets of bands with different intensities were hardly observed. This was remarkable, because, in the 1990s, a significant proportion of TB patients in western countries already came from high-prevalence areas, where the probability of multiple infection may be considerable. In The Netherlands, where DNA fingerprinting has been conducted since 1993, the only indication of two mixed RFLP patterns with different intensities was traced back to long-term laboratory cross-contamination in a peripheral laboratory . A systematic search for RFLP patterns with single ‘vague’ (low-intensity) bands suggested that mixed infection might indeed occur . However, in RFLP analysis of single colonies from such isolates, the vague bands disappeared, and bacteria of individual colonies either had a normal-intensity band at the position of the vague band in the parental strains, or no band at all. Therefore, transpositions of IS6110 in the genome of M. tuberculosis, and thus genetic drift in a part of the bacterial population, was a likely explanation. In purposely composed mixtures of strains with different RFLP patterns, it became clear that the limit for detection of a second strain was approximately 10% ‘foreign’ DNA . Similarly, mixtures of drug-resistant and susceptible strains have been recognized .
Mixed infection with different stains has also been identified with molecular techniques. Whereas, in low-incidence countries, the probability of multiple infection is expected to be low, in high-incidence countries this risk may be high. For instance, if the annual risk of infection were 4%, as has been observed in Cape Town, South Africa , it can be calculated that, at age 35 years, 24% of individuals would have escaped infection, 35% would have been infected once, and 41% would have been infected more than once, unless prior infection protected against re-infection. Evidence of multiple strains involved in TB disease has emerged in recent years. Of TB patients in Cape Town, 19% were infected with both a Beijing strain and a non-Beijing strain . Two studies in Taiwan found that, among TB patients, 3% and 11%, respectively, were infected with a Beijing strain and a non-Beijing strain [88, 89]. In Malawi, 3% of patients were infected with strains of the LAM and non-LAM lineages . This suggests that multiple infections are rather common among TB patients in high-prevalence settings. Owing to methodological limitations (multiple infection is demonstrated with genotype-specific PCR testing), the extent of multiple infection contributing to disease is likely to have been underestimated in studies thus far. On the other hand, given the risk of contamination in PCR, one could also argue that the problem of mixed infections is overestimated. Therefore, more research is needed to better clarify this important issue. Both recurrent TB after re-infection and multiple infections call into question the role of protective adaptive immunity and the possibilities of developing effective vaccines [81, 82].
It has been known for a long time that the incubation period of TB may range from a few months to many years [91, 92]. The measurement of this was challenging, because determining the moment of infection is difficult, and a long follow-up is required, with a low risk per person necessitating large cohorts, and re-infection may occur during follow-up.
Some decades ago, follow-up studies were performed among contacts of infectious TB patients in the USA  and in a control group of a bacille Calmette–Guérin (BCG) vaccination trial among adolescents in the UK . Among those developing disease within 10 years, 50% did so within 2 years in the former studies and 82% in the latter. No risk factors for short incubation periods were identified.
In a recent molecular epidemiological study in The Netherlands, the incubation period distribution was determined among 1095 secondary cases attributed to 688 source cases whose isolates had identical RFLP patterns and for whom epidemiological contact had been reported . Of those developing TB within 15 years, 62% did so within 2 years. Risk factors for short incubation periods were young age, male sex, extrapulmonary TB, and not having had previous TB or preventive therapy . The latter two risk factors appear to be consistent with some role for adaptive protective immunity.
MDR-TB was an important problem during the re-emergence of TB in New York in the early 1990s , and is recognized as a serious threat in eastern Europe and Central Asia [11, 94, 95]. Since 2006, extensively drug-resistant TB (XDR-TB) has been recognized as a global problem [12, 96, 97], with an extremely high mortality among HIV-positive individuals [98, 99]. Recently, reports on totally drug-resistant TB have emerged in Iran and India [100, 101]. Methods to control MDR-TB are known: first, its emergence needs to be prevented by appropriate treatment of drug-susceptible TB ; and second, if present, MDR-TB needs to be treated adequately to prevent transmission, death, and the development of XDR-TB .
The prospects for the control of MDR-TB are unclear. An important uncertainty is the reproductive fitness of drug-resistant strains [103, 104]. One might expect drug-resistant strains to have increased reproductive fitness, as resistant cases are likely to be infectious for a longer time than susceptible ones, and because drug-resistant TB may occur preferentially among certain risk groups, such as HIV-infected individuals . On the other hand, as was reported in the 1950s and 1960s, on the basis of experiments in guinea pigs, the fitness of M. tuberculosis might be impaired if underlying mutations impact on the ability to withstand exposure to oxygen radicals [106, 107]. However, this will depend on the specific mutations. For instance, some mutations in the katG gene, such as the S315T mutation, will only reduce the expression of katalase/peroxidase, whereas others will stop its expression entirely . If drug resistance-conferring mutations reduce virulence, this effect may be undone through compensatory evolution [104, 109].
The incidence of drug-resistant TB has declined in some settings , sometimes even in the absence of specific control measures for drug-resistant TB , suggesting reduced reproductive fitness. However, this does not appear to apply to all forms . The ability to preserve fitness while becoming resistant may be associated with particular genotypes, such as the Beijing strain [113-115]. For instance, a large proportion of recently transmitted MDR-TB/XDR-TB strains in the European Union are of the Beijing genotype [116, 117].
RFLP clustering has been used to compare the relative fitness of drug-resistant and drug-susceptible strains. Isoniazid-resistant strains were less likely to be clustered [38, 118], but not if resistance was attributable to the katG gene S315T mutation [108, 119]. Moreover, a wider comparison suggested that the relative fitness of drug-resistant strains varies between settings . Overall, the reproductive fitness is likely to depend both on biological factors—such as loss of virulence and compensatory evolution—and on factors associated with the setting—such as speed and completeness of case detection, quality of drugs and drug regimens used, and systems to ensure treatment compliance .
The Beijing genotype was described in 1995 as the predominant genotype in the Beijing region , and since then in various Asian countries [e.g. ]. It was initially recognized on the basis of the characteristic IS6110 RFLP and spoligotyping pattern; later, the definition was refined . Mokrousov et al.  introduced the distinction between typical and atypical Beijing strains, and this facilitated studies on the evolutionary development of this genotype family [124, 125]. For instance, it has been suggested that the success of the more recent typical Beijing strain may be attributable to its ability to circumvent immune protection after BCG vaccination .
The emergence of Beijing strains was reported in various settings [113, 115, 126], e.g. in Vietnam , where it was associated with young age, in the Canary Islands , where an outbreak and fast spread were documented, in South Africa, where a strong increase was seen among young children , and in The Netherlands, where the incidence increased in association with immigration and among young natives . The lineage is observed all over the world, and is associated with drug resistance in various settings [113, 115, 126], including in eastern Europe, New York (where a side branch of the Beijing lineage was known under the name ‘W’ family or W strain ), and in South Africa .
Recently, a correlation was shown between MDR-TB and the Beijing genotype in Colombia . This may be alarming, as these strains have hardly been found in Latin America in the past. The strong association of the Beijing genotype with MDR-TB/XDR-TB in eastern Europe is reflected in the European Union, where the largest number of clustered patients with MDR-TB/XDR-TB were infected with one type of Beijing genotype strain [116, 117]. Nearly half of the MDR-TB/XDR-TB cases included in European surveillance were in clusters, and 85% of the transmitted cases were Beijing isolates not distinguishable with RFLP and VNTR typing. This is remarkable, because, of the susceptible isolates in Europe, only 6–7% are of this genotype.
Although various reasons for the emergence of the Beijing genotype have been proposed, including escape from BCG vaccination, an increased ability to acquire drug resistance without loss of fitness, and an increased virulence, further research is needed [114, 124]. If Beijing strains do indeed have selective advantages over other M. tuberculosis strains and have been emerging for a few decades, the time of divergence should be short. On the basis of WGS of three typical and three atypical Beijing strains from China, Vietnam, and South Africa, the typical Beijing strains from this widespread geographical area appeared to be genetically highly conserved, whereas the more ancestral atypical strains were much more diverse [17, 132] (Fig. 2). The 53 mutations that separate all typical Beijing strains from the atypical strains were, for the large part, traced to regulatory regions of the genome, and may influence the overall protein expression in typical strains. Recently, it has been found that some Beijing strains have a much higher mutation frequency, leading to rifampicin resistance . Moreover, a higher dose of rifampicin was needed to achieve 100% killing of Beijing genotype bacteria, suggesting that Beijing bacteria have higher intrinsic resistance against this drug .
Figure 2. Mutations in the regulatory network are associated with the recent clonal expansion of a dominant subclone of the Mycobacterium tuberculosis Beijing genotype. The hypothetical phylogenetic tree of the Beijing genotype strains of M. tuberculosis is shown. The atypical Beijing strains are genetically diverse. The typical strains presumably gained a selective advantage over the atypical strains, and started to spread recently. The currently isolated typical Beijing strains from a widespread geographical area are highly clonal, which may be related to an enhanced capacity to circumvent bacille Calmette–Guérin-induced immunity or to withstand treatment with antituberculosis drugs.
Download figure to PowerPoint
The first lineage of M. tuberculosis to be found was the Beijing genotype described in 1995 . It is considered to be one of the six main lineages distributed globally . The first definition of Beijing strains was based on their specific spoligotyping and IS6110 RFLP patterns , although both markers have serious limitations for studying the phylogeny of the M. tuberculosis complex. Insertion sequence IS6110 is, in fact, a mobile genomic element that utilizes preferential insertion sites, thus favouring convergent evolution. Nevertheless, IS6110 RFLP patterns, to a large degree, group M. tuberculosis isolates into genotype families, and this characterization is valuable for identifying, for instance, Beijing genotype strains . Spoligotyping also has been used extensively to study the phylogeography of the M. tuberculosis complex, and a huge database representing >39 000 isolates from 122 countries provided the first insights into the distribution of genotype families worldwide . Spoligotyping offers insufficient resolution in some genotype families, and convergent evolution has been noted in offspring of well-characterized strains .
Although VNTR typing was initially seen a strain typing method, several studies have shown that the VNTR pattern is also a valuable phylogenetic marker , even though convergent evolution may occur occasionally .
The distribution of the six M. tuberculosis lineages appears to differ significantly by geographical area, with the largest variability in Africa . This is reflected in the names used by Gagneux et al. (Indo-Oceanic, East Asian, East African-Indian, Euro-American, West African lineage I, and West African lineage II) . The association between M. tuberculosis lineages and geographical areas has been observed among isolates from recent immigrants in low-incidence countries [138, 139], and is also emerging from many recent publications on lineage distributions in different geographical areas. These geographical associations are likely to be attributable, at least in part, to historical migration patterns and perhaps the origin of humankind, as described for Helicobacter pylori . It is interesting that Mycobacterium canettii, which is believed to be closely linked to the common ancestor of the M. tuberculosis complex [20, 141, 142], has its epicentre in the Horn of Africa, the geographical area where humankind presumably started its spread over the world.
There are various possible explanations for the association between lineage and geographical area. First, the spread of M. tuberculosis may have represented a series of population bottlenecks (founder effect). Moreover, co-evolution between the human host and M. tuberculosis may have played a role [21, 138]. In San Francisco, TB transmission was more common within than between ethnic groups , but this association may have been the result of social mixing rather than host–pathogen co-evolution.
It has been shown that polymorphisms in human susceptibility genes are associated with the clinical presentation and genotypes of M. tuberculosis infecting patients [143, 144]. Overall, the evidence for the role of the genotype of M. tuberculosis in transmissibility, pathogenicity and virulence in various human populations is limited, and is more extensive for some genotypes, such as the Beijing genotype , than for others.
Prospects of TB elimination and impact of immigration
In the late 1980s, TB elimination was expected to be achived within decades in various low-incidence countries [3, 4]. Since then, progress has been slower than foreseen, owing to four main factors: temporary neglect of TB control; the emergence of HIV; increasing human migration; and the development of resistance against anti-TB drugs. The impact of neglect of TB control was most clearly observed in New York, where TB notification rates nearly tripled from 1978 to 1992, and then showed a 20% decline between 1992 and 1994 after the re-strengthening of control . At around the same time, the impact of the HIV epidemic on TB epidemiology became evident. HIV-infected individuals have a strongly increased risk of progressing from infection to disease , and HIV has led to a strongly increased TB incidence in Africa . Fortunately, the HIV epidemic in industrialized countries did not evolve into a generalized epidemic, but remained restricted to high-risk populations. Furthermore, the risk of TB in HIV-infected individuals was reduced after the introduction of highly active antiretroviral therapy in the 1990s .
Progress towards TB elimination was slowed down by immigration from high-incidence areas . In New York, most cases of TB among immigrants were attributed to reactivation of latent infection . In The Netherlands, a molecular epidemiological study showed a strong decline in the incidence of TB attributable to reactivation among the native population, from 170 cases in 1995 to 91 cases in 2005, more or less as predicted in 1990 . The decline in the number of index cases among foreign-born individuals was much less (from 250 to 222 cases). The risk of transmission from immigrants to the native population is generally low [149-151]. However, although the absolute risk is low, the proportion of secondary cases among the native population in The Netherlands attributed to foreign-born index cases increased from 29% in 1995 to 50% in 2005 .
Earlier studies and surveillance thus suggested that existing control programmes should be maintained for as long as the disease is not eliminated, that surveillance is of vital importance, and that most TB in low-incidence countries is found among the foreign-born. Molecular epidemiological studies have helped to quantify transmission from the foreign-born to the native population, and can thus be used to predict progress towards elimination. In order to accelerate progress towards TB elimination in low-incidence countries, these countries need to maintain programmes for TB control, use new tools as they become available, expand the use of preventive therapy in those with latent infection (primarily the elderly and the foreign-born), consider expanding screening for TB infection and disease , and support global TB control, as this is expected to be most effective in the long term, and may even be cost-effective in the short term .