Rapid sequencing‐based diagnosis of infectious bacterial species from meningitis patients in Zambia

Abstract Objectives We have developed a portable system for the rapid determination of bacterial composition for the diagnosis of infectious diseases. Our system comprises of a nanopore technology‐based sequencer, MinION, and two laptop computers. To examine the accuracy and time efficiency of our system, we provided a proof‐of‐concept for the detection of the causative bacteria of 11 meningitis patients in Zambia. Methods We extracted DNA from cerebrospinal fluid samples of each patient and amplified the 16S rRNA gene regions. The sequencing library was prepared, and the sequenced reads were simultaneously processed for bacterial composition determination using the minimap2 software and the representative prokaryote genomes. Results The sequencing results of four of the six culture‐positive samples were consistent with those of conventional culture‐based methods. The dominant bacterial species in each of these samples were identified from the sequencing data within only 3 min. Although the major bacterial species were also detected from the other two culture‐positive samples and five culture‐negative samples, their presence could not be confirmed. Moreover, as a whole, although the number of sequencing reads obtained within a short sequencing run was small, there was no change in the major bacterial species over time with prolonged sequencing. In addition, the processing time strongly correlated with the number of sequencing reads used for the analysis. Conclusion Our results suggest that time‐effective analysis could be achieved by determining the number of sequencing reads required for the rapid diagnosis of infectious bacterial species depending on the complexity of bacterial species in a sample.


INTRODUCTION
Advances in DNA sequencing technology have now enabled obtaining real-time DNA sequencing data using the nanopore-based sequencer MinION (Oxford Nanopore Technologies, Oxford, UK). 1 Recently, remarkable performances of the MinION system have been reported for rapid bacterial identification based on sequencing of full-length 16S rRNA gene amplicons. [2][3][4][5][6][7][8] Using this sequencer in tandem with two laptop computers, we have developed a portable and rapid bacterial composition analysis system for the on-site diagnosis of infectious diseases. [2][3][4] Although this portable system could successfully determine the bacterial composition by sequencing 16S rRNA genes, there are two major challenges that need to be overcome for improved utility: the quality of DNA sequencing and the speed of sequencing searches.
The first challenge is that the quality of nanopore sequencing was found to be considerably lower, by about 85% for 1D sequencing 9 than that obtained using more common next-generation sequencers such as Ion PGM (Thermo Fisher Scientific, MA, USA). However, with the advantage of long-read sequencing, MinION can detect bacterial species based on the full-length 16S rRNA gene, which improves the accuracy of species detection. [2][3][4][5][6][7][8] Moreover, nanopore sequencing technology is continuously being updated, resulting in improvements in both the quality and quantity of the output sequencing data. Therefore, the newer versions of MinION flow cells with updated library preparation kits should help to improve the quality issue. In addition, multiplex sequencing is now possible using DNA barcoding technology for 1D sequencing, which helps to reduce the cost of sequencing.
Secondly, we previously used two computer programs for the identification of bacterial species in a given sample: BLASTN 10 and Centrifuge 11 . On the one hand, BLASTN is a relatively sensitive sequencing similarity tool and is suitable for MinION reads, but requires high computational power; thus, it takes a long time for regular laptop PCs to process the huge amount of sequence data. 4 On the other hand, Centrifuge is superior in terms of processing time, but its accuracy is considerably lower than that of BLASTN. 4 Recently, various computational programs have been released to handle nanopore sequencing data, including minimap 12 , minimap2 13 and minialign 14 . In particular, minimap2 can handle compressed (gzipped) fasta data as the database, which is convenient for use with a laptop PC in terms of data capacity. Therefore, considering the speed and accuracy of sequencing similarity searches, we applied minimap2 for bacterial detection in our portable system.
With these improvements, we recently updated our rapid diagnosis system for bacterial infection 3,4 . Using this system, we have conducted sequencing analyses of mock bacteria samples 3 and aspiration pneumonia 4 to evaluate the DNA preparation methods 3 and identify a causative agent 4 , respectively. We successfully identified bacterial species in both studies. However, the accuracy, as well as time efficiency, was not evaluated between the previous and current systems, in particular for the performance of BLASTN and minimap2 software. More importantly, these studies were conducted in Japan, but this portable sequencing-based system for the identification of bacterial species can work in resource-poor countries as well. Therefore, in this study, we brought this system developed in Japan to Zambia and used it to identify the causal bacteria in meningitis patients in Zambia.
Meningitis is an infectious disease of the central nervous system caused by a bacterial or viral infection, resulting in significant morbidity and mortality, often leading to severe consequences. 15 Approximately 4100 cases of bacterial meningitis are diagnosed in the United States each year, 500 of which are fatal. 16 The traditional diagnostic workup of meningitis consists of neuroimaging, cerebrospinal fluid analysis (cell counts, Gram staining, biochemical tests for glucose and protein, and cultures) and blood cultures. 17 The diagnosis of nosocomial bacterial meningitis is made on the basis of the results of a cerebrospinal fluid culture; thus, both aerobic and anaerobic culturing techniques are obligatory. However, these cultures require prolonged incubation periods before confirmation of a negative result can be made, and some results may be negative in infected patients who received previous anti-microbial therapy. 17 Accordingly, securing a final diagnosis can take weeks or months of testing, and many cases will remain unsolved, necessitating empirical treatment approaches that may be ineffective or even harmful to the patient. 18 Therefore, the critical step in the improvement of therapeutic effectiveness in meningitis is the accurate identification of the causative agents, which can ensure appropriate treatment decisions. 19 Indeed, metagenomic next-generation sequencing of the cerebrospinal fluid or brain tissue can screen for nearly all potential central nervous system infectious agents and can also identify novel or unexpected pathogens. 20,21 In this study, we brought our updated portable sequencing-based system developed in Japan to Zambia for the rapid diagnosis of bacterial species in a given DNA sample ( Figure 1). We then conducted 16S rRNA gene amplicon sequencing analyses of samples from spinal meningitis patients in Zambia and compared the performance of the system to the results obtained with conventional culture-based methods.

RESULTS
Spinal fluid samples were obtained from 11 meningitis patients at the University Teaching Hospital in Zambia, where all experiments were performed except for some downstream computational analyses. The DNA was extracted from each sample, and 16S rRNA amplicon libraries were constructed. Sequencing was performed on the MinION Mk1b system without an Internet connection (see the Methods for details). As a result, 60,671 reads were obtained through 18hour (h) sequencing, 54,442 (89.7%) of which were sorted into 12 samples based on barcodes designed for SQK-RAB201. The average length of sequencing reads with barcodes was 1407 base pairs (bp), which nearly corresponds to the full length of the 16S rRNA gene. For each sample, we conducted minimap2 13 searches of all MinION reads against 5,850 representative bacterial species genomes (see Supplementary figure 1 and Supplementary tables 1-3) and predicted bacterial species in each sample at different calculation times ( Figure 2). The details of the bacterial species identified at > 10% of the proportion of the entire reads at 18 h are shown in Table 1.
Sequencing results for four of the six culturepositive samples were consistent with those of conventional culture-based methods: Enterobacter hormaechei subsp. steigerwatti (71%), Enterobacter hormaechei subsp. steigerwatti (63%), Pseudomonas aeruginosa (95%) and Klebsiella pneumoniae subsp. pneumoniae (92%) were detected as major bacterial species for Samples #2, #3, #4 and #5, respectively. Since we sequenced almost the entire region of 16S rRNA genes, we could detect candidates of causative bacteria at the species level for each sample (Table 1), which is quite difficult to achieve using the culture-based method. However, the other two culture-positive samples (i.e. Samples #1 and #6) showed different potential causative bacteria compared to the results obtained by the culture-based method. In addition, for Samples #2 and #3, other bacterial species were also detected with reads > 10%: Klebsiella pneumoniae subsp. pneumoniae (10%) for Sample #2 and Acinetobacter indicus (13%) for Sample #3, indicating that multiple bacteria, rather than single, may be involved in the infection in these samples.
We also detected bacterial species in the five culture-negative samples (Table 1). For Sample #10, with 2,262 matched reads, Microbacterium chocolatum (54%) was detected as the major bacterial species, followed by Scytonema hofmannii (17%) and Psychrobacter urativorans (12%). For Sample #9, Stenotrophomonas maltophilia (57%) was detected as the major bacterial species, although this result was based on only seven mapped reads. For the other three samples, no bacterial species were detected with > 50% matched reads. However, for Sample #7, three Bacillus species [Bacillus thuringiensis (40%), Bacillus manliponensis (22%) and Bacillus anthracis (17%)] were identified, even though their 16S rRNA sequences are almost identical (97.9%). Indeed, more than 99% of the mapped reads (i.e. 337 reads) corresponded to the Bacillus cereus group, which is an important causative agent of meningitis. 22 Nevertheless, we could not confirm whether these bacteria were present and were responsible for the symptoms in each patient. In addition, two bacterial species -Cutibacterium acnes (75%) and Deinococcus proteolyticus (25%)were detected in the control sample (Sample #12) of water, although this finding was based on only four reads.
We also conducted bacterial detection using BLASTN 10 with the following parameters: -word_size 9 -gapopen 2 -gapextend 2 -evalue 3.80e-2. The accuracy of the sequence similarity search using BLASTN is considered to be superior compared to that using minimap2 13 ; however, the detected bacterial species using BLASTN were almost identical to those obtained using minimap2, especially for the culture-positive samples (Samples #1-6; Table 1). In addition, we compared the processing time of minimap2 and BLASTN and found that BLASTN required approximately 5-37.5 times longer processing time compared to minimap2; the processing time increased with the increase in the number of reads (Supplementary figure 1 and Supplementary  table 3). These results suggested that the accuracy of a minimap2 search is sufficient to detect bacterial species from 16S rRNA gene amplicon sequencing using MinION, and the time required for bacterial identification is significantly lower than that of BLASTN.
*Only bacterial species accounting for > 10% of the total at 18 h of sequencing are listed.
We further analysed the time effectiveness of our sequencing analysis. To assess the sequencing time required to identify the causative bacterial species, sequencing data at nine different time points (3 min, 5 min, 10 min, 30 min, 1 h, 3 h, 6 h, 12 h and 18 h) from the beginning of MinION sequencing were compared. The major bacterial species identified at each time point appeared at the first time point in all cases (Figure 3). In particular, major bacterial species were detected within only 3 min of sequencing for Samples #2-8 and #10. Moreover, despite the small number of sequencing reads processed in this short time, the major bacterial species were consistent throughout the entire period (up to 18 h).
We also determined the time required for processing each calculation step (Table 2 and Supplementary figure 1 and Supplementary tables 1-3). As shown in Table 2, the most timeconsuming step of the computation is the sequence search (Step 3 in Figure 2). The calculation time for the entire computation tended to be longer with more prolonged sequencing ( Figure 4a) and strongly correlated with the number of sequencing reads (Figure 4b).
This correlation can be explained by the fact that sequence search time depends on the number of query sequences. Therefore, considering the calculation time, it is more advantageous that the number of sequences is smaller; however, the accuracy of species determination gets worse with lower number of sequences.

DISCUSSION
We detected bacterial species using our updated portable sequencing system from samples of 11 meningitis patients in Zambia. In particular, the sequencing results of four of the six culturepositive patients were concordant with those of culture-based methods. Importantly, our sequencing search could detect bacteria at the species levels in a given sample, which cannot be achieved by conventional culture-based methods. However, such high resolution also comes at a risk of false-positive detection. For example, Cutibacterium acnes was detected as a major bacterium in the water sample (Sample #12) used as a negative control. This bacterial species is commonly found in the human skin 23 , which was also detected from negative control samples (i.e.  Table 3.    water) in our previous sequencing analyses. 24 In addition, as for the other bacterial species found in the Sample #12, Deinococcus proteolyticus, these are commonly found from materials, surfaces and dust contaminated by humans and animals as well as soil and sewage. 25 Therefore, these detected bacterial species in the Sample #12 likely reflect human contamination. The number of reads in the Sample #12 was quite small (four reads), which further suggests a contamination origin during the procedure. However, such DNAs derived from contaminated bacterial species can also be amplified with polymerase chain reaction (PCR)-based methods. Indeed, in our study, we were not able to quantify the exact amount of DNA owing to the lack of required equipment, which resulted in high variance in the number of reads among samples. Considering our results, even though the number of sequence reads is small, accurate prediction can be made to some extent ( Figure 3). Therefore, the results from bacterial identification may not be significantly affected by the uneven reads numbers in multiplexed sequencing in this study.
Streptococcus pneumoniae, Streptococcus agalactiae (group B Streptococcus), Neisseria meningitis, Haemophilus influenzae and Escherichia coli (particularly the K1 serotype) are currently the most common bacterial pathogens causing acute meningitis in the United States. 16 However, Streptococcus, Neisseria and Haemophilus species were found only in the two (i.e. Samples #1 and #10) of the 11 samples analysed in this study. Moreover, no Escherichia species were detected in our sequencing analysis. One of the potential reasons for these differences is that the common bacterial pathogens reported are obtained from patients in developed countries. Indeed, in these countries, meningitis is usually observed mainly in elderly people. In this study, we applied samples obtained from patients in Zambia, who have a completely different background from those of developed countries, including a high prevalence of human immunodeficiency virus. 26 Another possibility is that these common pathogenic bacteria are usually determined by culture-based methods, which could provide different results from those obtained by sequencing-based methods in certain cases.   Nevertheless, this observation highlights the need to further evaluate the bacterial pathogens causing meningitis in developing countries. Our current experimental protocol targets the 16S rRNA genes; therefore, other pathogenic agents such as viruses, protists and fungi cannot be detected with the current sequencing-based method. In addition, it is impossible to determine the drug resistance status of the identified bacteria based on 16S rRNA gene amplicon sequencing. Our computational system itself can potentially be applicable for these purposes since the sequencing data search can be performed at the genomic scale. In addition, recently developed base-calling software, Guppy, produces accurate sequences with reduced calculation time compared tp Albacore (data not shown). Therefore, the strategy to overcome these challenges could be mainly in the experimental steps. Compared to bacteria, it is generally more difficult to design specific PCR primers for viruses owing to commonly shared genes (such as 16S rRNA genes in bacteria) as well as their rapid mutation rates. Moreover, for eukaryotic species, the potential of host species contamination also needs to be resolved to ensure accurate determination of pathogenic species. These aspects are the next challenges to be tackled for improving sequencing-based diagnosis of causative agents of infectious diseases.
Studies for sequencing-based diagnosis of meningitis are ongoing all over the world. [27][28][29] However, the sequencing methods and bioinformatics pipelines used for the diagnosis are different in different studies. This is because sequencing technology, as well as bioinformatics technology including genome databases, has progressed quickly, making it difficult to establish standards. In addition, sequencing-based diagnosis usually costs more than conventional culture-based ones, which makes it difficult to conduct such studies, particularly in developing countries. Indeed, sequencing technology is still being developed; for example, another cheap type flow cell called Flongle was recently released by Oxford Nanopore Technologies, which costs approximately 1/10 the price of the MinION flow cell. The price of sequencing equipment will further reduce with advances in technology.
In conclusion, we tested our improved rapid sequencing diagnosis system based on 16S rRNA amplicon sequencing for the identification of infectious bacterial species of 11 meningitis patients in a medical hospital laboratory in Zambia. As a result, four of the six culture-positive patients were concordant between sequencingbased and culture-based methods; however, for two culture-positive and five culture-negative samples, their pathogens were unclear. We found that application of minimap2 reduced the calculation time of species identification without losing its accuracy and that the sequence search time depends on the number of query sequences being processed. The number of sequencing reads required for the rapid diagnosis of infectious bacterial species should be determined depending on the complexity of bacterial species in a sample. For the practical application of sequencing-based diagnosis of infectious diseases, more examples are required.

Data analysis
The data analysis procedure is schematically outlined in Figure 2. For each read (fast5 file), base calling and barcode sorting were performed using Albacore software version 2.1.3 developed by Oxford Nanopore Technologies. First, the fastq files were converted to fasta format using our in-house script. The simple repetitive sequences were masked using TanTan program 30 version 13 with default parameters. To remove reads derived from humans, we searched each read against the human genome reference (GRCh38) using minimap2 with default parameters 13 ; unmatched reads were regarded as reads derived from bacteria. A total of 5850 representative bacterial genome sequences stored in the GenomeSync database (http://genomesync.org) were used for analysis (see Supplementary table 1). For each read, we chose species showing the highest minimap2 score based on alignment length, matched/mismatched sequences and gapped sequences as the existing species in a sample. Taxa were determined using our in-house script based on the National Center for Biotechnology Information taxonomy database 31 and visualised using Krona Chart. 32

Laptop computers
Two laptop computers were used for the analysis. One was used for MinION sequencing (OS, Windows 10; CPU, Intel Core i7 6700HQ; memory, 8 GB; storage, 960 GB SSD), and the other was used for base-calling as well as barcode-sorting, fastq-to-fasta conversion, repetitive masking and bacterial identification by Albacore with our in-house script, TanTan and minimap2, respectively (OS, Ubuntu 16.04; CPU, Intel Core i7 6700K; memory, 32 GB; storage, 1 TB SSD) (Figure 1).