Microbial multilocus sequence typing is a course-grained approach for classifying micro-organisms that involves intense gene sequencing, limiting its application in clinical and research laboratories. Multiplexes targeting single-nucleotide polymorphisms might replace this impractical method by providing the SNaP profile. We here describe mlst@snap, a fast, user-friendly and automated process to convert thousands of MLST sequences or sequence types into SNaP profiles and vice versa. The software largely increases the promptness of microbial population data analyses, being presently designed for the micro-organisms Aspergillus fumigatus, Burkholderia cenocepacia, Burkholderia cepacia, Burkholderia contaminans and Pseudomonas aeruginosa.
The current gold standard method for microbial genotyping is multilocus sequence typing (MLST). The strategy is useful in studies concerning microbial population structure, identification of infection sources and outbreaks, definition of endemic populations and monitoring of microbial evolution. It was first employed in 1998, being successfully extended during the next years to dozens of prokaryotic and eukaryotic micro-organisms (Maiden 2006). MLST involves nucleotide comparisons of a set of housekeeping genes and definition of sequence types (ST) for each new profile. Therefore, the method ensures excellent reproducibility, high discriminatory power and unambiguous genomic analysis (Curran et al. 2004; Bain et al. 2007; Unemo & Dillon 2011), and finally, it provides information through international data bases freely available online (http://www.mlst.net/ and http://pubmlst.org) (Jolley, Chan & Maiden 2004). Although MLST represents a standard method, it is still work-demanding, cumbersome and impractical to be used in large-scale studies with hundreds of isolates due to the high associated cost. The implementation of methods targeting single-nucleotide polymorphisms (SNPs) has recently been used for detection of specific lineages and microbial identification (Aydin et al. 2001; Dalmasso, Civera & Bottero 2009; Lomonaco et al. 2011). Such mini-sequencing approaches are performed using single-base extension primers in the presence of fluorescent-labelled dideoxynucleoside triphosphates (ddNTPs), recognizing a set of markers that represent site-specific genomic variation. Standardized methods with several SNPs were recently developed for identification and genotyping of Aspergillus fumigatus (Caramalho et al. 2013), Burkholderia cenocepacia (Eusebio et al. 2013a), Burkholderia cepacia and Burkholderia contaminans (manuscript in preparation) and Pseudomonas aeruginosa (Eusebio et al. 2013b); the strategy will soon be extended to other micro-organisms. These assays targeted the most relevant polymorphisms on the MLST scheme and represent a practical and cheap alternative to the gold standard method. The genomic positions selected for these assays were based on the ability to differentiate ST or MLST profiles and practical considerations such as primer design. Non-polymorphic positions were eliminated, as well as redundant polymorphism positions. These redundant positions can be used in the future to replace any position part of the panel that would be left out of MLST sequence revision and/or ST redefinition. A SNaP profile that facilitates and simplifies genomic analyses is obtained in the final; it is a systematic order of the polymorphic sites and its nucleic acids.
The aim of the present study was to develop freely available software that accomplishes these recent multiplex assays used for detection, identification and genotyping of microbial pathogenic species. Such a tool avoids extensive MLST sequence analysis, alignment tools and exhaustive examination of thousands of profiles available in online data bases. The software mlst@snap is an automated conversion tool that simplifies routine profiling in the laboratories; it also facilitates microbial population analyses and encourages the dissemination of microbial genotyping strategies world-wide.
Design and implementation
Each ST encompasses sequencing of seven standard gene fragments used to characterize the microbial isolates. The most relevant polymorphic positions of the MLST sequences were previously selected for each micro-organism, and a SNaP profile is observed following single amplification and mini-sequencing reactions (Caramalho et al. 2013; Eusebio et al. 2013a,b). The designed program allows choosing the microbial organism and then the selection of two main functions (see Figs 1 and 2 for software overview, diagrams and algorithm details).
Conversion of MLST profiles into SNaP profiles
Aiming to convert a specific isolate into its corresponding SNaP profile, the user can manually insert the ST profile, upload a text file with a list of ST profiles or upload a text file with complete MLST sequencing data. This tool rapidly converts thousands of ST or MLST sequences into the corresponding SNaP profiles without manual manipulation of the data. The software detects the position of the SNaP profile in each gene and provides the final code that corresponds to the genotyping data.
The software detects the insertion of sequences outside the standard format and alerts the user to check the data; it informs what sequences are not formatted according to the MLST standard rules. Furthermore, only sequences with correct length and orientation are accepted by the software providing the final SNaP profile. Additionally, the software informs the user about the presence of new and previously submitted sequences (ST number is provided with the SNaP profile).
Conversion of SNaP profiles into MLST profiles
This approach was designed to convert the SNaP profiles into MLST complete sequences. The software connects to the MLST data base, and it is capable of complementing the information of the SNaP profile with the remaining non-polymorphic and complementary sequences available at the data base. The users can manually insert the genomic information of the polymorphic positions (SNaP profile) for a specific isolate or upload a text file containing a list of SNaP profiles. Therefore, the software can automatically convert hundreds of SNaP profiles into the corresponding MLST sequences. The final result is a list of sequences within the MLST standards that stays available for data analysis.
Our designed approach was tested with personal and extra data of A. fumigatus, B. cenocepacia, B. cepacia, B. contaminans and P. aeruginosa deposited into online MLST data bases (http://www.mlst.net/). The software may be available to be tested with the genomic data of other micro-organisms as soon as specific SNaP panels are designed based on MLST information.
The present software encourages the submission of data into the MLST platform, particularly the new sequences. Our new approach constantly consults MLST data bases before any analysis.
The algorithm was written in Python, version 2.5.2 and tested on macbook pro, mac os x 10.9 system, linux – ubunto 12.10 and windows 7 – 32 bit and 64 bit systems. The interface, created using a tool prepared to interact with the Python programming language – visualwx, was designed to run in macintosh, unix and windows 7 systems.
Our designed software easily converted ST or MLST sequences into SNaP profiles, and vice versa, and the data were suitable to perform large-scale microbial population analyses. In fact, the software was tested with the genomic data of several hundreds of microbial isolates and after sequence alignments proved to be an excellent tool to expedite the analyses by simplifying, automating and improving greatly a previous manual and time-consuming process.
The developed algorithm started by a deep scrutiny of the inputted data in order to verify whether the information was correctly inserted. If not, the program aborted and alerted the user with a message specifying the type of error that was found, facilitating the correction of the data and restarting the analysis. When the data have the correct format, it efficiently processes the desired genomic analyses in less than 10 s. A tutorial for the software was added and can be accessed directly on the software or be consulted as supplementary information.
The program is connected to the microbial online MLST data bases. Therefore, it is automatically updated with the latest version and complete collection of microbial information. It constantly considers the complete group of polymorphisms and genetic variation described world-wide. As a result, our software is capable of checking whether the inserted information is already available in the online data base or whether the user possessed new data that should be added to the international data base (the software suggests its submission).
Results from the training sets and validation
Figure 2C exemplifies in a simple manner how the software worked converting MLST sequences in SNaP profiles and vice versa; this figure shows an example of five polymorphic positions selected from three MLST genes for three isolates. The polymorphic positions for A. fumigatus, B. cenocepacia, B. cepacia, B. contaminans and P. aeruginosa had been previously selected based on its genomic value, and methodological and technical suitability. abcde positions exemplified in Fig. 2C originated the SNaP profile for the three isolates; the SNaP profile could be converted once again into MLST sequences by employing mlst@snap software as described above in the ‘Design and implementation’ section. This strategy that converted MLST sequences into SNaP profiles and vice versa was extended to several hundred of genomic sequences of A. fumigatus, B. cenocepacia, B. cepacia, B. contaminans and P. aeruginosa. Original genomic data were aligned with the sequences obtained from the mlst@snap software employing the Geneious software v4.7 (Biomatters Ltd, Auckland, New Zealand) and BioEdit sequence alignment editor (available at http://www.ctu.edu.vn/~dvxe/Bioinformatic/Software/BioEdit.htm). The original sequence data were obtained from MLST data bases and personal data bases with the genomic data obtained when the SNP panels were previously developed in the laboratory (Caramalho et al. 2013; Eusebio et al. 2013a,b); more than 2700 MLST sequences were compared with the results provided by mlst@snap software. The obtained output showed a total agreement (100% for all tested organisms) with the manual processed data, that is, a specific ST or complete MLST sequences were correctly associated with the SNaP profile and vice versa.
We here describe user-friendly software that allows automated conversion of MLST alleles into SNaP profiles or SNaP profiles into MLST complete sequences (see Supplementary Figure S1). The algorithm was implemented as a program that allows for complete automated analyses of hundreds of profiles or sequences available at online data bases in order to avoid exhaustive examination and alignment practices, simplifying microbial population studies. In its current form, the program can already be applied to A. fumigatus, B. cenocepacia, B. cepacia, B. contaminans and P. aeruginosa. In the near future, we intend to extend it to other bacteria and microbial eukaryotes as soon as the most relevant polymorphisms are described for each micro-organism (similarly to what is presently observed with MLST information). The selection of the most informative nucleotide positions might simplify genomic characterization of microbial isolates, particularly in complex samples with multiple strains. A simple and informative genotyping profile (SNaP profile) is obtained after a single reaction of multiplex PCR amplification and mini-sequencing; non-polymorphic, redundant and low polymorphic positions are discarded. SNaPAfu (targets A. fumigatus) (Caramalho et al. 2013), SNaPBcen (targets B. cenocepacia) (Eusebio et al. 2013a), SNaPBceBcon (targets B. cepacia and B. contaminans) (manuscript in preparation) and SNaPaer (targets P. aeruginosa) (Eusebio et al. 2013b) assays represent practical, reproducible, six to seven times cheaper and sensitive alternatives to MLST that improve the diagnosis and surveillance of the pathogens A. fumigatus, B. cenocepacia, B. cepacia, B. contaminans and P. aeruginosa.
The software here described represents undoubtedly a new tool for microbial research community, allowing automated analyses of hundreds of genotypes without manual processing. We have designed a graphic interface for microbial organisms to assist extensive genomic and population analyses; reduced running times are now required for those tasks. The authors intend that mlst@snap be available at MLST data bases (Jolley, Chan & Maiden 2004) and be part of laboratory routine of researchers and technicians, supporting the dissemination of microbial genotyping in laboratories world-wide. The software, manual, algorithm details, tutorial and example files are presently freely available at https://www.mediafire.com/folder/kc4fwabtb8lu1/setupMLST%40SNaP, as supplementary material of this manuscript and at http://www.portugene.com/MLSTSNaP.html.
The authors thank to Gonçalo Oliveira, João Carneiro, Nádia Eusébio, Rita Caramalho and Tiago Pinheiro (University of Porto) by the valuable and critical suggestions on the software. No conflict of interest for all the authors.