Tracing isolates of bacterial species by multilocus variable number of tandem repeat analysis (MLVA)
Editor: Willem van Leeuwen
Correspondence: Alex van Belkum, Erasmus MC, Department of Medical Microbiology and Infectious Diseases, Dr Molewaterplein 40, 3015 GD, Rotterdam, The Netherlands. Tel.: +00 31 10 4635813; fax: +00 31 1014633875; e-mail: email@example.com
All bacterial genomes contain multiple loci of repetitive DNA. Repeat unit sizes and repeat sequences may vary when multiple loci are considered for different isolates of an individual microbial species. Moreover, it has been documented on many occasions that the number of repeat units per locus is a strain-defining parameter. Consequently, there is isolate-specificity in the number of repeats per locus when different strains of a given bacterial species are compared. The experimental assessment of this variability for a number of different loci has been called ‘multilocus variable number of tandem repeat analysis’ (MLVA). The approach can be supported or extended by locus-specific DNA sequencing for establishing mutations in the individual repeat units, which usually enhances the resolution of the approach considerably. Essentially, MLVA with or without supportive sequencing has been developed for all of the medically relevant bacterial species and can be used effectively for tracing outbreaks or other forms of bacterial dissemination. MLVA is a modern, timely and versatile bacterial typing methodology.
Microbial genome typing or DNA fingerprinting is important for the delineation of outbreaks of infectious diseases and for the universal tracing of virulent or multiresistant pathogens (van Belkum, 2003). There is a continuous search for technical and methodological improvements in this field, where timeliness, accuracy and database developments and data exchange are the most important developmental driving forces. Notably, each and any form of bacterial genotyping capitalises on the various genetic mechanisms that give rise to polymorphism in bacterial genomes (Ochman & Davalos, 2006). As an important example, many bacterial genes or intergenic regions contain loci of repetitive DNA which may be variable among strains with respect to the number of repeat units present or their individual primary structure. These so-called ‘variable number of tandem repeat regions’ (VNTRs) have been identified in essentially all pro- and eukaryotic species and have been successfully used for identification purposes (King et al., 1997; Van Belkum et al., 1998; Jansen et al., 2002). An obvious class of such bacterial genes are those encoding microbial surface components recognising adhesive matrix molecules, or MSCRAMMs (Rivas et al., 2004). The Staphylococcus aureus gene encoding the immunoglobulin-binding protein A, for instance, harbours a region with 21 nucleotides long repeat units. When various strains of S. aureus were investigated it appeared that the number of repeats may vary from one to c. 15. The detection of this phenomenon simultaneously associated this variability with the putative epidemicity of S. aureus isolates (Frenay et al., 1996). Essentially, the first single locus VNTR microbial typing test was developed and discussions on the molecular clock-speed of bacterial VNTR evolution were initiated (van Belkum et al., 1997a, b). The availability of complete genome sequences facilitated further and highly systematic searches for the presence of repetitive DNA. The first full inventories of repetitive DNA loci on a genome-wide scale were defined on the basis of the Haemophilus influenzae genome (Fleischmann et al., 1995). This genome was demonstrated to be crowded with stretches of repetitive DNA, and many of these mainly short-unit repeats were found to be located in genes involved in virulence modulation (Hood et al., 1996). Variation of repeat numbers led to the abrogation or stimulation of the expression of various genes in the bacterial lipooligosaccharide (LOS) biosynthesis locus, and a variety of other repeat-containing virulence genes, including hemolysins and DNA modification-restriction systems, were identified as well. Based on the H. influenzae genome sequence and the intrinsic variability of the repeats, the first genuinely multilocus variable number of tandem repeat analysis (MLVA) methodology was developed (van Belkum et al., 1997a, b). It appeared that the evolutionary clock speeds of different repeat loci were variable and that, hence, some repeats were better suited for epidemiological tracing of strains of H. influenzae than others. This is a general issue in the development of species-specific MLVA systems, with stability, clock speed and evolutionary pace as important quality parameters. From that time onwards, MLVA systems have been developed for practically all medically relevant bacterial species.
This MiniReview aims to briefly assess the current state of affairs concerning bacterial MLVA. The fundamental features of repeat variation will be discussed and technical improvements in reliably assessing MLVA scores will be presented. The main part of this review, however, will discuss MLVA systems as have been recently developed for some selected and clinically relevant bacterial species.
How and why does repetitive DNA vary?
Slipped strand mis-pairing (SSM) is the molecular model that best explains the variability of short sequence repeats (Torres-Cruz & Van der Woude, 2003). Stretches of relatively short arrays of repeat units, when being copied by the DNA polymerase, may engage in illegitimate basepairing. This forces the polymerase to introduce or delete individual repeat units. The frequency of such variations also depends on the accuracy of DNA repair systems (Caporale, 2003; Prunier & LeClercq, 2005). In microorganisms this repeat variability affects genome functioning: promoter structures may become (sub-) optimized, whereas, obviously, the coding potential of genes can become severely compromised. An important biological question is whether such effects are completely coincidental and dependent upon natural selection (contingency behaviour; Kroll et al., 1989) or whether this effect may be directional as well (Foster, 2004). This has not been fully explored and, hence, the ultimate answer is still lacking. However, it has been suggested on many occasions that repeat variation offers selective advantages (Wise et al., 2006). Microorganisms may enhance their biological versatility and adaptability by randomly playing around with their gene-coding potential and the level of expression of certain genes that are only beneficial in special circumstances (Henderson et al., 1999; Van der Woude & Baumler, 2004). For various human pathogens it has been shown that gene modification through repeat variation offers evolutionary advantages (Renders et al., 1999; Rivas et al., 2004; Zaleski et al., 2005).
Technological improvements in assessing variation in individual repeats
Repeat-based genotyping tests are, historically speaking, Southern hybridization tests, and satellite DNA in a diversity of eukaryotes has provided the models for developing profiles of repeat DNA variation (Ugarkovic & Plohl, 2002). The first diagnostic assays involved human genotyping using enzymatically restricted, length separated DNA fragments, blotted onto a membrane and probed with (radioactively labelled) repeat probes. The resulting DNA fingerprints could be compared, stored in databases and successfully used for forensic applications. To date, these tests have been nearly completely replaced by amplification-mediated PCR assays, the advantages of which are speed, versatility, lower costs and easier-to-interpret results. The rapidly increasing knowledge on (whole genome) sequences, especially in microbiology with several hundreds of genome sequences available, allows for the design of specific primers aimed at the amplification of the repeat and some bordering consensus sequences. After PCR, length separation or mass assessment of amplimers is the single additional step required. This is also becoming increasingly sophisticated. Whereas initially agarose gel electrophoresis was the gold standard, now capillary sequencers or lab-on-a-chip systems can be employed, and sophisticated mass spectrometers can also be used for diagnostic purposes. The use of various fluorescent dyes allows for multiplex assays to be developed. A single DNA amplification test followed by reliable high-speed, high-throughput electrophoresis generates multiple binary and easily communicable parameters (i.e. the number of repeats per MLVA locus) and leads to reliable strain identification. One has to take care, however, because Taq polymerase may be as unreliable in copying repeats as the native DNA polymerase. Artefactual changes in repeat profiles due to PCR inaccuracy need to be monitored for carefully.
MLVA for nasopharyngeal bacterial opportunists: the example of S. aureus
The genome of S. aureus harbours a variety of DNA repeats and their in-gene location is regularly updated (Buckling et al., 2005). The first real MLVA system for S. aureus was described by Sabat et al. (2003) Target genes included sdr, clfA, clfB, ssp and spa. The system involved multiplex PCRs and analysis of the amplimers on standard agarose gels. As such, analyses can also be performed in laboratories without sophisticated electrophoresis equipment. A drawback of the system was that the experimental output was monitored as a banding pattern; the individual numbers of repeats per locus could not be conveniently calculated. The coagulase gene repeats were excluded from the system. Because of their slow rate of changing their added value to the resolution was limited. The system was compared with pulsed field gel electrophoresis (PFGE), by then the gold standard staphylococcal typing method. It performed well and the type cut-off could be defined at 70–80%, analogous with PFGE values of 75–90% (Malachova et al., 2005). Overall, strain clustering was highly congruent. An independent system was recently unveiled by a Swiss group (Francois et al., 2005). The advantages of the Swiss over the initial Polish system were the smaller number of PCR amplifications required, a (slightly) larger number of repeats analysed and the automated micro-capillary electrophoresis with the results being directly cluster-analysed. Various other repeat-based systems have been developed for S. aureus. This also includes the use of staphylococcal interspersed repeat units (Hardy et al., 2004, 2006). This, however, does not constitute a real MLVA system and will therefore not be discussed any further. More relevant is the use of spa sequencing. The spa gene is part of both MLVA systems described above, but is has been documented that spa repeat sequences by themselves already define excellent resolving power among strains of S. aureus (Shopsin et al., 1999; Harmsen et al., 2003). spa Typing correctly assigns staphylococcal strains to the appropriate phylogenetic groups and performs better than multi locus enzyme electrophoresis (MLEE) and PFGE, and it facilitates the detection of both macro- and micro-variation (Koreen et al., 2004). In combination with real-time DNA sequencing, an early warning system for the notification of the initiation of nosocomial outbreaks of S. aureus infections has been developed and put in operation (Mellmann et al., 2006). In all, both the MLVA systems and the spa sequencing approach are now slowly taking the place of PFGE when typing of S. aureus in reference centres is involved.
MLVA for Mycobacterium tuberculosis
Mycobacterium tuberculosis is one of the most prevalent pathogens worldwide, with four people dying every minute as a result of the infection. Multidrug resistance is becoming an increasingly important accessory problem. The epidemiology of tuberculosis has been difficult to study due to the clonality of the species: limited amounts of DNA variation can be documented among isolates. The species is relatively young and specific methods for typing have been developed over the years. These include IS6110 hybridization tests and spoligotyping (Van Embden et al., 1993; Kremer et al., 1999). However, the advent of MLVA systems has led to a number of important initiatives and various detailed studies were undertaken (Frothingham & Meeker-O'Connell, 1998; Supply et al., 2000, 2001; Mazars et al., 2001; Cowan et al., 2002; Le Fleche et al., 2002; Skuce et al., 2002; Spurgiesz et al., 2003). The system presented by Spurgiesz et al. (2003) involves the initial identification of 87 repeat loci in the DNA sequence of the genome strain H37Rv. Nine of these proved to be variable (two–eight alleles) in a collection of 91 strains, mostly members of the Beijing family or having limited copies of IS6110 in their genome. Using six/nine loci, seven distinct genotypes could be discriminated among 34 Beijing strains, while a five-locus approach resolved many of the IS6110 low copy number isolates. Recently, an MLVA system was also described for some of the other, also highly clonal species of mycobacteria (Harris et al., 2006). Mycobacterium avium ssp. paratuberculosis and M. avium ssp. avium were studied with respect to short sequence repeat variability and this has led to an adequate typing system for the most variable for the two, M. avium ssp. paratuberculosis.
MLVA for putative agents of bioterrorism
Recent years have seen an intensified interest in agents of bioterrorism. The anthrax attacks in the US have prompted additional investigations in to the molecular identification of strains of Bacillus anthracis (Jernigan et al., 2002; Keim & Smith, 2002) and simultaneously attention was given to molecular identification of other agents of bioterrorism. MLVA systems for B. anthracis have been put to work and their performance, given the highly clonal nature of the pathogen, is excellent. A simple PubMed search on ‘MLVA and B. anthracis’ already generates 13 hits, nine of which date 2005 or 2006! The currently most sophisticated system involves 25 repeat loci which can be analysed in four multiplex PCRs followed by capillary electrophoresis (Lista et al., 2006). Several new branches on the anthrax phylogenetic tree were identified using this methodology. For Francisella tularensis, an agent of tularaemia, two different MLVA systems were developed. There is one based upon the genome sequence (Jernigan et al., 2002) which employs six genuine VNTR loci (Farlow et al., 2001). The system was able to distinguish strains belonging to the two different biovars in a concordant manner and outbreak tracing was also in agreement with previous studies using other, less sensitive and more laborious systems. Another approach was based on two short sequence tandem repeats (SSTR9 and SSTR16; Johansson et al., 2001). A high diversity index of 0.97 was established and, again, regional geographically related isolates could not be distinguished by the system. For Yersinia pestis, a third example of a putative bioterrorism agent, the need for an MLVA system was also high. Yersinia pestis is considered to be a ‘young’ pathogen and the plague causing organism is characterized by a general lack of sequence diversity (Achtman et al., 1999). The MLVA system that was developed for Y. pestis was highly concordant with existing datasets as available for various strain collections (Klevytska et al., 2001). In short, successful new MLVA schemes are now available for most of the biowarfare agents. These systems are not only better than the previously existing ones, they also combine speed and versatility. The potential of field application of such tests is very high, especially when combined with portable real-time detection assays.
MLVA for enteric pathogens
The number of bacterial species capable of causing gastro-intestinal infections is too large to be covered completely in this section. For Campylobacter jejuni, for instance, there has not yet been a description of an MLVA system, despite that fact that its genome is densely scattered with a diversity of short repeat loci suited for the development of such a system (Park et al., 2000; Coward et al., 2006). Hence, this section will be restricted to those three species for which the most advanced MLVA systems are now available. For Escherichia coli, a multitude of repeats has been traced and many of these have been documented to be inter-strain diverse (Gur-Arie et al., 2000). In addition, the repeat mutation mechanisms have been extensively studied for this bacterial species (Vogler et al., 2006) and within E. coli the repeat variation of bacteriophages has also been documented (Levinson & Gutman, 1987). An MLVA system has been described (Noller et al., 2006) but its clinical application has not yet been extensively studied. For the nosocomially important spore-forming bacterial species Clostridium difficile an MLVA system was recently described (Marsh et al., 2006). Seven different loci were identified through genome sequence scanning and pilot experiments revealed that the stability of the markers was adequate. In addition, restriction enzyme analysis of genomic DNA appeared to be concordant with the MLVA data reports. Outbreaks could be efficiently delineated. For Salmonella enterica ssp. enterica the most sophisticated ‘enteric’ MLVA system thus far has been described (Lindstedt et al., 2003). A set of eight tandem repeat regions were selected with repeat units ranging in size between six and 189 bp and the number of units per region varying between four and 15. Fluorescently labelled products were separated by capillary electrophoresis for highly accurate sizing of the fragments. The resolution of this system was much higher than that of PFGE, amplified fragment length polymorphism (AFLP) and integron profiling. In conclusion, although for some enteric bacterial species the MLVA tests are at different stages of development, the first successful systems are now available and awaiting implementation in, for instance, food production lines. We will then see what their genuine added value will be.
Lack of MLVA systems for viruses, parasites and fungi
Repeat targets for epidemiological typing have also been identified for nonbacterial organisms. Repeats are present in large numbers and great diversity in the genomes of parasitic protozoa (Wickstead et al., 2003). Some of these have been exploited for epidemiological typing purposes, although genuine MLVA systems that are clinically practical have not yet been described. The same holds true for viruses: although repetitive regions have been described (e.g. Epstein Barr Virus; Xue et al., 2003), not a single operational MLVA system has been given in literature. Currently, genuine MLVA systems have not been described for fungi. For the clinically most important species, Candida albicans and Aspergillus fumigatus, no literature is available although there have been reports on the variability of repetitive fungal DNA and it epidemiological and clinical relevance (Lott et al., 1999; Bart-Delabesse et al., 2001).
Since its recent development MLVA swiftly became an important new player in the field of microbial forensics (Budowle et al., 2005). Being based on the existence of variable repeat loci in microbial genomes and their facile and binary detection by PCR and subsequent fragment analysis, convenient methods for high-speed and high-throughput microbial strain identification have been made available. This has already had and will continue to have a major impact on defining microbial epidemiology, while the knowledge on the relevance of repeat variation for populations of bacteria will also be strongly advanced over the coming years. Care has to be taken to select those repeat regions that show adequate stability over the timeframe in which the epidemiological investigations take place.