- Top of page
- Conflict of interest statement
The challenge for first responders, physicians in the emergency room, public health personnel, as well as for food manufacturers, distributors and retailers is accurate and reliable identification of pathogenic agents and their corresponding diseases. This is the weakest point in biological agent detection capability today.
There is intense research for new molecular detection technologies that could be used for very accurate detection of pathogens that would be a concern to first responders. These include the need for sensors for multiple applications as varied as understanding the ecology of pathogenic micro-organisms, forensics, environmental sampling for detect-to-treat applications, biological sensors for ‘detect to warn’ in infrastructure protection, responses to reports of ‘suspicious powders’, and customs and borders enforcement, to cite a few examples.
The benefits of accurate detection include saving millions of dollars annually by reducing disruption of the workforce and the national economy and improving delivery of correct countermeasures to those who are most in need of the information to provide protective and/or response measures.
- Top of page
- Conflict of interest statement
The availability of sensitive and cost-effective diagnostic methods is paramount to the success of biological agent detection systems for public health protection and across industry sectors. Currently available methods are generally costly and have evolved basically from diagnostic development and applications. These methods range from cell culture to antibody (Mason et al. 2003; Gessler et al. 2007), polymerase chain reaction (PCR; Christensen et al. 2006), microarray (Brodie et al. 2007), and sequencing approaches (Margulies et al. 2005). Partial genome sequencing and comparison with known sequence data (requires a priori knowledge of the bioagent) is not effective, particularly since bioengineering makes it possible to modify organisms to be more infectious, better at avoiding immune responses or resistant to medical countermeasures (Jackson et al. 2001). Critical to the success of biothreat surveillance is the ability to screen for and detect multiple agents rapidly in a single reaction with minimal sample processing (Cirino et al. 2004).
Traditional microbial typing technologies employed for characterization of pathogenic micro-organisms and monitoring their global spread are often difficult to standardize, are poorly portable, and lack sufficient ease of use, throughput, and automation.
A survey published in 2006, reported that emergency and primary care physicians and their local health care systems were not well prepared to respond to potential disease outbreaks or biological attacks and many believed that more resources should be allocated to equipping a response system. These findings highlight the importance of expanding bioterrorism preparedness efforts to improve the public health system (Alexander et al. 2006). Other analyses of bioterrorism preparedness have recommended modernization of public health response to emergencies and investigations (The Century Foundation 2005). Similar analyses of technology gaps by groups, such as the Department of Energy and the Heritage Foundation emphasize the need to improve emergency response, regional coordination and technologies (Heritage Foundation 2006).
While antibody-based approaches are very widespread, there are many well-recognized limitations, for example, low quality control of antibody. Antibody production is dependent, in part on cell culture and the lack of understanding of the extent of biological community complexity attributes to the limited use of methods relying on culture. Nonculturable organisms can only be identified by molecular genetic methods. Singleplex, multiplex and reverse transcriptase–PCR (RT–PCR) approaches have become standard methods for most laboratories. Furthermore, nucleic acid-based methods (PCR and sequencing) are more sensitive than antibody-based detection systems (Lim et al. 2005).
PCR-based methods have critical limitations, since they depend on a priori knowledge of what sequence to detect in a sample further complicated by recent demonstrations of greater variability in genomic sequence than expected. In addition, PCR probe sequence resolution erodes (Table 1, Detection methods comparison).
Table 1. Technology comparison
|Microparticle immunoassay||Generally high (Ab quality control issues) (Ab continuous supply issues)||Depends on antibodies (Ab). Sometimes cross-species.||Proteins, sugars, DNA bacteria, viruses, toxins|
|ELISA||Moderate||Depends on antibodies. Sometimes cross-species.||Proteins, sugars, DNA bacteria, viruses, toxins|
|PCR||High||Very specific with certain primers. Phylogenetic assessment||DNA/RNA|
|DNA microarray||High||Extremely specific 100s–1000s of targets identifiable||DNA/RNA|
|MALDI-TOF||High for some molecules||Very specific for small molecules, less for large||Masses of biomarker proteins|
A platform for genome identification of a specimen from any source must not only be sensitive and specific, but must also detect a variety of pathogens with high accuracy, including modified or previously uncharacterized agents, and this challenge is daunting when identification must be achieved using nucleic acids in a complex sample matrix. The available devices and instruments are severely limited in the length of time required for analysis, the complexity of the process employed, and the lack of a systems approach, that is, from extraction to identification without preconceived notion of what may be in the sample (hybridization surfaces, chips or microarrays). It is widely understood by microbial ecologists and more recently, medical microbiologists, that the microbial species encompasses significant variability with, perhaps, a core genome incorporated in a wider, more variable genome. Furthermore, the first described strain is designated the ‘type species,’ but it may not (and probably is not) the median strain to serve as the reference strain for sequencing comparison (Colwell & Liston 1961a, b, c).
Specialists from the Federal Bureau of Investigation, experts from hospitals and university research centres and key opinion leaders from the private sector agree that the most effective approach for comprehensive genetic variation discovery is by sequencing (Budowle et al. 2005). Rapid advances in biological engineering have dramatically impacted the design and capabilities of DNA sequencing tools. These have led to an increase in the number of base pairs sequenced per day by more than 500-fold while the costs have decreased three orders of magnitude.
Despite these advances, extension of sequencing technology should include the capacity to distinguish naturally occurring micro-organisms from intentionally distributed pathogens. This represents a considerable challenge because the diversity of the microbial world is effectively unknown. DNA sequencing has the capacity to address this need.
The emergence of non-Sanger sequencing as well as microarrays and flow cells has changed the DNA sequencing landscape. Innovation of a genome identification technology meeting the goal of a ‘$1000 genome’ is in progress and remains to be discovered. Very likely, the engineering challenges and hurdles will be overcome and the basic science for DNA processing and detection will eventually be done. Given the trends of the past, a ‘eureka’ vision is highly probable that will introduce new concepts within a few years.
The build-out of genome identification DNA sequencing technology in the form of practical instrumentation will be achieved by incorporating the critical requirements for accurate long reads, without dependency for template amplification, capable of manipulating terabytes of data to provide reliable and useful identification of genetic sequences within any unknown sample, whether clinical, environmental, or other type of specimen.
With advancement of DNA sequencing technology, molecular typing methods based on nucleic acid fingerprints or ‘mini-sequencing’ are currently being replaced by more sensitive genome-wide single nucleotide polymorphism-based methods, as exemplified by anthrax bacilli (Patil et al. 2001). De novo genome sequence determination is not the maximum capability of sequencing technology. The technology forecast includes extending the capability to perform metagenomic (environmental) sequencing. Rather than focusing on an individually isolated genome, environmental sequencing is aimed at sampling populations of genomes as found, for example, in bodies of water or different types of soil and within or on the surface of the human body. In this way, an analysis of sequence and gene diversity can be obtained from organisms that cannot be cultured using conventional techniques. Genome sequencing capability will facilitate the evaluation of ‘deep’ resequencing methods which compare different sources of DNA across one or a few genes. For example, K-ras gene mutations associated with cancer.
Types of sequencing
The most widely used technology for DNA sequencing is a capillary electrophoresis (CE)-based system employing the Sanger method. Although chemistries and automation advances have made Sanger-based DNA sequencing easier and faster, the basic technology remains the same. An immense challenge is that of managing the variety and complexity of data types, the hierarchy of biology, and the inevitable need to acquire data by a wide variety of modalities. Inexpensive DNA sequencing will revolutionize medicine by making personalized treatments possible. Rapid genome sequencing is regarded as the next great frontier for science that will allow doctors to determine individual susceptibility to disease and the genetic links to cancer or cardiovascular disease.
Sanger sequencing is a method introduced by Frederick Sanger (Sanger et al. 1977). It is remarkable that Sanger sequencing-based methods are so ubiquitous and long-lived. The flexibility of the method has been its strongest asset. Furthermore, the utility of using genomic DNA directly, that is, cloneless libraries greatly accelerated DNA analyses. Over the years, instrumentation for DNA sequencing has improved dramatically in terms of read length and throughput (Fig. 1).
DNA sequencing methods reaching the market include bead-based, microfluidic-based and microarray-based approaches as well as emerging concepts for sequencing, for example, nanopore-based sequencing (Rhee & Burns 2007). Unlike Sanger sequencing, the use of pyrosequencing for sequencing-by-synthesis does not require fluorescent labelling. Incorporation of each dNTP is accompanied by release of pyrophosphate, which is converted by sulfurylase into ATP, which leads to the release of light from the conversion of luciferase to oxyluciferin. However, asynchronistic extensions may occur because of slight variations in dispensing order of the dNTPs.
Bridge amplification in a flow cell represents another alternative to Sanger sequencing. It employs four-colour tagging, but uses forward-thinking, acid-labile reversible terminator chemistry.
Synthesis-by-hybridization defines a resequencing method. DNA microarrays are the modern, massively parallel version of classic molecular biology hybridization techniques. The technique permits analysis of genetic material (DNA) and monitoring of expression changes (RNA, really based on cDNA) occurring in a biological sample under various conditions. Microarrays have been used successfully in various research areas including DNA sequencing. While microarray-based approaches enable high-throughput, they are limited to genes present in reference databases and hence require pre-selected sequence information limiting what DNA samples can be interrogated (Fig. 2, DNA sequencing methods comparison).
The genomes of most organisms, from the simplest unicellular organism to more complex species, consist of a variety of genomic landscapes. Each with a unique profile of genetic content, GC-richness, high-copy repeats and low-copy repeats (Bacolla et al. 2006). Particularly near regions of structurally important sequences, such as the centromere, the genomic landscape can become quite problematic for analysis. The primary factor contributing to the difficulty in studying these areas within the genome is that clusters of genomic duplications and unusual repeat structures often lie in close proximity. Consequently, sequence similarity-based methods of global genome assembly fail to properly assign the correct positions of duplicated sequences. Annotation and interpretation of data from current DNA sequencing technologies can be difficult, because artificial overlaps form, significant warping of working draft sequences occurs, and numerous gaps appear in the assembly, all which, reduce overall quality and relevance of the assembly. These effects are further compounded by the absence of unique sequence-tagged sites (STS) or DNA landmarks within such regions and a general under-representation of such areas in clone-by-clone sequencing. Thus, current DNA sequencing instruments are challenged by the presence of large spans of duplicated sequence, which interfere with genome analysis.
With faster methods to collect data, more attention has turned to sequence annotation. Researcher can quickly generate sequence information and want to annotate their data. An annotated sequence provides a wealth of information about the organism not directly obvious from the sequence. It also acts as a standard, giving investigators the ability to work on the same basic gene structures and to compare findings. The annotation process is a challenging task especially in light of the limited infrastructure and expertise.
New opportunities to develop faster and less expensive methods for sequencing DNA are pushing genome sequencing technology to include answers to:
information on the nature and source of a sample
effective data collection for comparison of samples (from known and probable locations)
confidence in data comparison
sample divergence from a common ancestor (the mechanism of the variation or the heterotachy of the mutation)
genome-wide information analysis, including evolutionary distance to enable the measurement of the variation
evidence of genetic engineering
all or partial data exclusion as a contaminant or failure in sample handling.
A major obstacle to identification of micro-organisms is having a reference or comparison strain. For all named microbial species, there is a requirement for deposition of the culture in a collection so that others may have the reference strain. Research over the last 50 years has shown that the reference strain rarely is the ‘median’ strain and often is an outlier to the species. Furthermore, recent genome sequence data show that strains passaged several times in media will display single nucleotide polymorphism (SNP) differences and multiple isolates of a single species will not be identical base pair by base pair. This reality must be dealt with in identifying pathogens, especially isolates or even the nucleic acid from a natural environment.
The high-throughput nature of the sequencing device inevitably produces sequencing errors and limits data quality. The errors in standard genome sequencing projects can be reduced by applying efficient genome assembly techniques. Potential sequencing errors can also be further minimized by posterior computational processes (Gajer et al. 2004; Huse et al. 2007).