Applications of sequencing technology in clinical microbial infection

Abstract Infectious diseases are a type of disease caused by pathogenic microorganisms. Although the discovery of antibiotics changed the treatment of infectious diseases and reduced the mortality of bacterial infections, resistant bacterial strains have emerged. Anti‐infective therapy based on aetiological evidence is the gold standard for clinical treatment, but the time lag and low positive culture rate of traditional methods of pathogen diagnosis leads to relative difficulty in obtaining the evidence of pathogens. Compared with traditional methods of pathogenic diagnosis, next‐generation and third‐generation sequencing technologies have many advantages in the detection of pathogenic microorganisms. In this review, we mainly introduce recent progress in research on pathogenic diagnostic technology and the applications of sequencing technology in the diagnosis of pathogenic microorganisms. This review provides new insights into the application of sequencing technology in the clinical diagnosis of microorganisms.

The discovery of antibiotics was a turning point in human history.
Regrettably, the effects of these miraculous drugs have gradually been lost with the rapid emergence of antibiotic-resistant strains. 5 Shortly after the introduction of penicillin in the 1940s, penicillin-resistant Staphylococcus aureus (S aureus) appeared; similarly, a strain of Mycobacterium tuberculosis resistant to streptomycin appeared shortly after the discovery of streptomycin. 6 In addition to bacteria, viruses and fungi are also common pathogenic microorganisms in the clinic.
A statistical analysis of several severe infectious diseases, such as tuberculosis, malaria and AIDS, has been conducted by the World Health Organization (WHO). According to the WHO, in 2017, 10 million new tuberculosis cases were reported, resulting in 1.6 million deaths. 7 There were 219 million malaria cases in 2017, and the death toll from malaria was 435 000. 8 A total of 36.9 million people were living with HIV worldwide in 2017, resulting in 940 000 deaths. 9 In addition, S aureus is one of the most common pathogens leading to human infection and the primary cause of bacteraemia, pneumonia, infective endocarditis, and skin and soft tissue-related infections. 10 The Centers for Disease Control and Prevention (CDC) in the USA has reported that the number of infections caused by S aureus is second only to the number of infections caused by Escherichia coli. Another common bacterium, Pseudomonas aeruginosa (P aeruginosa), a conditional pathogen, is one of the main causes of nosocomial infection; P aeruginosa can survive in a variety of environments, including surfaces in medical facilities, owing to its adaptability and antibiotic resistance. 11 Pseudomonas aeruginosa affects more than 2 million patients, causing approximately 90 000 deaths each year. 12 According to the China National Statistics Bureau, from 2013 to 2017, the number of infections and deaths due to class A and B notifiable infectious diseases in China fluctuated at approximately 3 million, but the number of deaths from these infectious diseases is increasing 13 (Figure 1).
In recent years, with increasing numbers of drug-resistant pathogens and a growing number of individuals abusing antibiotics in the clinic, the issue of drug resistance has become increasingly serious. The detection of pathogenic microorganisms and drug resistance has become the primary procedure for clinical anti-infective treatment. Methicillin-resistant Staphylococcus aureus (MRSA) and vancomycin-resistant Enterococcus (VRE) are common and harmful antibiotic-resistant bacteria found in the clinic. Methicillin-resistant Staphylococcus aureus, which was first discovered in 1961, 14 is resistant to multiple antibiotics and found in 74% of patients with S aureus infection worldwide. 15 According to the CDC in the USA, approximately 5% of patients in hospitals in the USA carry MRSA in their nose or on their skin. Vancomycin is well known as the last line of defence against antibiotic-resistant bacteria, and VRE-related infections have caused a serious impact on human health as they were first reported in the 1980s. 16 Enterococcus can cause a variety of clinical diseases, such as urinary tract infections, bacteraemia, endocarditis, and abdominal and pelvic infections. 17

| RE S E ARCH PROG RE SS IN PATHOG ENIC D IAG NOS TI C TECHNI QUE S
Anti-infective therapy based on aetiological evidence, including the genus, species and drug resistance of a bacterium, is the gold standard for clinical treatment. Traditional methods of pathogenic microbial detection include culture and separation, biochemical and serological detection, immunology and nucleic acid detection. 18 Traditional aetiological diagnosis determines drug resistance by collecting clinical specimens, culturing positive pathogenic microorganisms and then conducting drug sensitive tests. Therefore, traditional aetiological diagnosis has certain defects. First, obtaining a traditional aetiological diagnosis takes a long time, and reports on bacterial detection are lagging, which results in an inability to guide anti-infection treatment plans in sufficient time. Second, the positive detection rate with traditional aetiological diagnosis is low, and it is relatively difficult to obtain aetiological evidence. Third, in vitro culture medium is distinct from the environment and conditions of microbial growth in vivo and cannot fully reflect infection in vivo.
Fourth, some infectious diseases are not caused by the presence or F I G U R E 1 The number of infections and deaths due to class A and B notifiable infectious diseases in China from 2013 to 2017. The abscissa indicates the year, the left ordinate represents the number of infections, and the right ordinate represents the number of deaths excessive reproduction of a single pathogenic microorganism but by dysbiosis of the flora at the lesion site, resulting in an imbalance in the population of multiple microorganisms. Furthermore, different microorganisms adapt to different media and culture conditions, and traditional in vitro culture can obtain only the dominant growing microorganisms; however, many microorganisms in nature still cannot be cultured.
In recent years, with the spread of infectious diseases worldwide, the number of suspected infections has increased, and new pathogenic diagnostic technology is rapidly developing. Mass spectrometry for rapid microbial identification, 19 such as matrixassisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF-MS), is used to identify bacterial genera and species, 20 but the limitation of this technique is that accurate identification depends on a well-established database, and the organism identified with this technology must be a cultured microorganism. 21 Molecular diagnostic techniques, such as lateral-flow immunoassay (LFIA), combine microbial antigens with labelled antibodies, 22 and real-time polymerase chain reaction is used for the quantitative determination of specific microorganisms. 23,24 These detection techniques are limited to identifying the genera and species of specific pathogenic microbes.

| THE DE VELOPMENT OF S EQUEN CING TECHNOLOGY AND ITS CURRENT S TATUS
Sequencing technology is used to determine the primary structure of biomacromolecules such as nucleic acids, proteins and polysaccharides. The most common sequencing technique is nucleic acid sequencing, including DNA and RNA sequencing, which determines the order of nucleotides in nucleic acid sequences. DNA sequencing technology has evolved from first-generation DNA sequencing to the current fourth-generation DNA sequencing.
Maxam-Gilbert sequencing, Sanger dideoxy sequencing, fluorescence automated sequencing and hybrid sequencing are collectively known as first-generation DNA sequencing technology. In the midand late 1970s, sequencing technology began to gradually mature.
In 1977, Maxam and Gilbert established Maxam-Gilbert sequencing based on chemical fracture. 25 In the same year, Sanger proposed the dideoxy chain termination method. 26 In 1986, Smith et al 27  With completion of the Human Genome Project, the throughput of traditional sequencing methods has been unable to meet genome sequencing needs. In the mid-and late 1990s, next-generation sequencing (NGS) emerged. Next-generation sequencing has the potential to accelerate biological and biomedical research, increase throughput and reduce production scale and labour costs. 28 It is suitable for not only genome sequencing and genome re-sequencing but also transcriptome analysis (RNA-Seq), the characterization of DNA-protein interaction (ChIP-sequencing) and epigenomics 29 .
From 2008 to 2009, sequencing technology using different methods from those of the second-generation platform was first described as 'third-generation' sequencing. 30 Unlike second-generation sequencing, which requires long-chain DNA, third-generation sequencing machines, such as the HeliScope, 31 Nanopore 32 and PacBio 33 systems, read nucleotide sequences at the single-molecule level. Third-generation sequencing can produce longer reads than current sequencing technology, 34 and its portability and sequencing speed are also important advantages. 35 At present, third-generation sequencing technology still has some limitations, which include its reliance on the activity of high-cost DNA polymerase and an error rate that is much higher than that observed in NGS. 36 F I G U R E 2 Comparison of the traditional method and next-generation sequencing (NGS)

| First-generation DNA sequencing
Sanger sequencing was the main sequencing technique used between 1975 and 2005 and the gold standard for all sequencing technologies. Ribosomal RNA, especially 16S rRNA and 23S rRNA, is the most useful phylogenetic marker for pathogenic diagnosis by Sanger sequencing. 37 The most commonly used molecule for microbial species identification is 16S rRNA, which is extremely abundant in bacteria. The 16S rRNA sequence is approximately 1.5 kb in length 38 and highly conserved in structure and function with multiple variable regions. 39 Therefore, the genera and species of different pathogenic bacteria can be identified by 16S rRNA sequencing.
Currently, 16S rRNA gene analysis of bacteria is primarily achieved by sequencing the variable regions of the 16S rRNA gene. 40 Single clones from microorganisms obtained by culture can be identified by Sanger sequencing after amplification with 16S universal primers. The full-length 16S rRNA gene in bacteria can be sequenced by Sanger sequencing, but the drawback of this strategy is that the bacterial culture (BC) must be pure; otherwise, the bacterium cannot be identified. The positive rate of clinical culture is very low, most microorganisms cannot be detected, and a mixture of 16S genes cannot be used to distinguish species. This time-consuming strategy of cultivation and sequencing cannot meet the timeliness required of clinical tests. All of the above defects seriously affect the scope of the application of first-generation DNA sequencing.

| Next-generation metagenomic DNA sequencing (mNGS)
Based on the popularity and maturity of NGS, professional companies are already providing clinical microbiological testing based on mNGS. Next-generation sequencing technology, which is represented by the Illumina system, is sequencing by synthesis that  Resistance and Virulence Markers (draft)' guideline. 45 In recent years, mNGS has been applied in clinical practice. The world's first case of pathogen diagnosis using NGS occurred in 2013.
A 14-year-old boy with severe combined immunodeficiency (SCID) was hospitalized for fever and headache several times over 4 months.
His doctors could not identify the cause of his disease by diagnostic examination, including brain biopsy. Finally, a Leptospira species that could cause encephalitis was discovered by NGS, after which doctors cured the boy with a large dose of penicillin. 46 Food-borne pathogens are the major cause of morbidity and mortality worldwide. Nextgeneration sequencing can be used for real-time sequencing during pathogenic microbial outbreaks. 47 For example, in a food poisoning incident in Nanjing, China, Salmonella Schwarzengrund was isolated from diarrhoeal patients, and NGS was used to confirm the cause of the outbreak and trace contamination. 48 The incidence of sepsis, a seriously life-threatening infection with a high fatality rate, has been increasing recently. 49 It is estimated that more than 30 million people worldwide are affected by sepsis each year, possibly leading to 6 million deaths. 50  The specific technical problems in the diagnosis of pathogens by mNGS include the following. First, mNGS is limited to roughly judging the species of pathogenic microorganisms and estimating the approximate proportion of those microorganisms, and the results of different tests or from different labs may vary. Second, mNGS is unable to directly detect RNA viruses, because the RNA needs to be reverse transcribed to cDNA. 57 Reverse transcription is necessary to sequence RNA viruses, but this process will take extra time. At the same time, the reads from mNGS are relatively short, so it is difficult to obtain information such as the full-length antibiotic resistance gene sequences in pathogenic microorganisms, and it is impossible to associate the drug resistance gene with the corresponding pathogenic microorganism species. Moreover, short-read sequencing will introduce bias. 58,59 Third, the detection rate of some low-content intracellular bacteria, such as M tuberculosis, Legionella, Brucella and fungi with thick cell walls, is low. Fourth, mNGS requires multiple rounds of PCR amplification in the process of library construction, so cross-contamination issues are prone to occur. Fifth, a certain proportion of the extracted macrogenomic DNA is derived from dead bacteria, and mNGS cannot determine whether the detected sequence is from living or dead bacteria. Sixth, because of the high-throughput nature of mNGS, a sufficient number of samples are needed to run the sequencing machine, which means that sequencing with the machine cannot start at any time. Furthermore, the sequencing time is slightly longer than that of other techniques; for example, to sequence using the Illumina system in 300 bp paired-end mode, the time from booting the sequencing system to data acquisition is more than 60 hours, while for infected patients, the need to obtain aetiological evidence is urgent. Therefore, the diagnostic results of mNGS have limited clinical reference value.

| Long-read third-generation sequencing technology
The 16S rRNA gene sequence commonly used for microbial species identification is 1.5 kb in length 60 ; however, the longest read length of mNGS systems, such as Illumina and BGI sequencers, is only 300 bp. The reads from mNGS are short, and computer analysis based on short sequences cannot completely determine the real sequence. Therefore, both machines can identify microorganisms to only the genus level and not to the species level from 16S rRNA.
Metagenomic sequencing uses short reads to search the database for species identification, which is prone to error mapping. 61 Longread third-generation sequencing technology can measure reads longer than 1 Mb, which can be used not only for microbial metagenome sequencing but also to directly measure full-length 16S rRNA and identify pathogenic bacteria to the species level.
Third-generation sequencing technology represented by the Nanopore system was launched in 2015. Compared with NGS, thirdgeneration sequencing produces longer reads, and the results can be obtained in a shorter time. The Nanopore system is an electrical signal-based sequencing technology. 62 Protein-based nanopores (microscopic pores, which essentially form channels on the membrane) are embedded in a synthetic membrane and immersed in electrophysiological solution to allow ion currents to pass through the nanopore. The current is interfered when DNA or RNA molecules pass through it, causing a characteristic change in the current signal. In this process, the signal is analysed in real time to determine the base sequence of the DNA or RNA strand that is passing through the pore, which allows the entire DNA or RNA sequence to be determined. 63,64 Nanopore sequencing technology effectively addresses the defects of mNGS in the field of aetiological diagnosis, directly sequencing reads more than 1 Mb in length, 65 which allows the identification of species of pathogenic microorganisms by 16S rRNA sequencing.
Nanopore sequencing technology eliminates the effort and time needed for reverse transcription because the reverse transcription of RNA into cDNA is not required; the Nanopore system can perform RNA sequencing directly. 66 The throughput is very high, and the latest version of the Nanopore RNA direct sequencing chip can obtain 1 million full-length RNA sequences at a time. Sequencing results are obtained at a fast speed and available in a few hours. to the doped bases are generated in the thin region, and each pulse has its own colour intensity and duration to enable identification of the base. 68,69 The cost of PacBio sequencing is very high, and there have been only a few reports of microbial metagenomics studies using PacBio sequencing. 64 However, taking 16S rRNA sequencing as an example, only 5000 circular consensus sequencing (CCS) reads are obtained per sample during sequencing. 70 For the same cost, the throughput screening efficiency of Nanopore direct RNA sequencing can be much higher.
The PacBio system is rarely used for infection diagnosis because of the very large size of the machine, the expensive hardware, cumbersome library construction and its inability to directly detect RNA.
At present, the Nanopore system is widely used.
Due to the above advantages, Nanopore sequencing technology has been widely used in the field of epidemic outbreak investigation over the past 2 years to detect infectious pathogens, antimicrobial drug resistance and other infectious areas of concern. For the rapid and real-time monitoring of outbreaks, researchers conducted The time lag and low positive culture rate of traditional pathogen diagnosis leads to relative difficulty in obtaining evidence of pathogens. Metagenomic DNA sequencing is limited to roughly judging the microbial species and estimating its approximate proportion, and it is difficult to obtain information such as the full-length sequence of pathogenic microbial resistance genes. The diagnostic results of mNGS have limited reference value in clinical practice. Third-generation sequencing technology has many advantages that address the defects in NGS and traditional pathogenic diagnostic methods. Therefore, for the rapid diagnosis of clinical infectious microorganisms, the use of 'culture-independent' technology combined with third-generation sequencing must be a major developmental direction for clinical pathogenic microorganism diagnosis.

| Outlook
In