ON‐rep‐seq as a rapid and cost‐effective alternative to whole‐genome sequencing for species‐level identification and strain‐level discrimination of Listeria monocytogenes contamination in a salmon processing plant

Abstract Identification, source tracking, and surveillance of food pathogens are crucial factors for the food‐producing industry. Over the last decade, the techniques used for this have moved from conventional enrichment methods, through species‐specific detection by PCR to sequencing‐based methods, whole‐genome sequencing (WGS) being the ultimate method. However, using WGS requires the right infrastructure, high computational power, and bioinformatics expertise. Therefore, there is a need for faster, more cost‐effective, and more user‐friendly methods. A newly developed method, ON‐rep‐seq, combines the classical rep‐PCR method with nanopore sequencing, resulting in a highly discriminating set of sequences that can be used for species identification and also strain discrimination. This study is essentially a real industry case from a salmon processing plant. Twenty Listeria monocytogenes isolates were analyzed both by ON‐rep‐seq and WGS to identify and differentiate putative L. monocytogenes from a routine sampling of processing equipment and products, and finally, compare the strain‐level discriminatory power of ON‐rep‐seq to different analyzing levels delivered from the WGS data. The analyses revealed that among the isolates tested there were three different strains. The isolates of the most frequently detected strain (n = 15) were all detected in the problematic area in the processing plant. The strain level discrimination done by ON‐rep‐seq was in full accordance with the interpretation of WGS data. Our findings also demonstrate that ON‐rep‐seq may serve as a primary screening method alternative to WGS for identification and strain‐level differentiation for surveillance of potential pathogens in a food‐producing environment.


| INTRODUC TI ON
Intra-species variability exists in the bacterial genome (Abee et al., 2016;Sela et al., 2018) and therefore strain-level discrimination of pathogens is a key factor for the identification and subsequent elimination of a contamination source at a food processing plant. The significance of Listeria monocytogenes as a foodborne pathogen is well documented (Buchanan et al., 2017;Farber & Peterkin, 1991;Gandhi & Chikindas, 2007), and through the years different microbial typing methods, more or less labor-intensive, have been used to identify and differentiate this pathogen at the strain level (Jadhav et al., 2012;Wiedmann, 2002). During the last decades, development in sequencing technologies and whole-genome sequencing (WGS) has rapidly been changing bacterial strain identification analysis in the food industry. WGS is now becoming a more available and affordable molecular tool and is proposed to be the new primary typing tool for strain identification of L. monocytogenes . It has already been successfully used to investigate and characterize outbreaks of listeriosis (Jackson et al., 2016;Kvistholm Jensen et al., 2016;Schjørring et al., 2017). L. monocytogenes is a highly heterogeneous, omnipresent, psychrotolerant pathogen (Moura et al., 2016), able to survive and persist in food processing plants for years (Fagerlund et al., 2016). The possibility of L. monocytogenes contamination in food products from residual cells in the equipment represents a serious concern, especially in the ready-toeat (RTE) food industry (EFSA, 2018;Fonnesbech Vogel et al., 2001).
Many food processing plants have therefore implemented a comprehensive testing regime to detect this pathogen in raw materials, processing environment, equipment, and food products (Carpentier & Léna, 2012;EuropeanCommision, 2005). Whenever a food processing plant experience frequent detection of L. monocytogenes it raises the question of whether the contamination is due to a persistent strain or transient strains. Identification at the strain level and source tracking are therefore crucial to recognize possible "hot spots" for accommodating the pathogen.
Sequence-based typing, and in particular whole genome sequencing (WGS), are proposed to replace pulse-field gel electrophoresis (PFGE) as the primary typing method for L. monocytogenes  as well as for other foodborne pathogens (Oakeson et al., 2017). Nevertheless, PFGE, MLST (multilocus sequence type), and other typing methods will remain relevant techniques for smaller laboratories also in years to come (Neoh et al., 2019) because of the significant investments necessary to implement WGS in strain typing (Nouws et al., 2020).
In theory, WGS can differentiate strains on a single nucleotide level and it has a resolution superior to PFGE and MLST (Salipante et al., 2015;Stasiewicz et al., 2015), and is gaining support in both outbreak investigation, surveillance, and source tracking of pathogenic bacteria (Nadon et al., 2017;Van Walle et al., 2018;Zhang et al., 2020). So, WGS analysis generated with short-read technology offered by Illumina sequencing platforms is cost-effective, accurate, and offers a low sequencing cost per base however with the limitations of short reads and challenging genome assembly (Kwong et al., 2015;Xu et al., 2020). Additional important drawbacks of the WGS as a molecular tool for institutions lacking bioinformatics infrastructure and expertise is the comprehensive data analysis and data interpretation (Oakeson et al., 2017). There is, however, a variety of WGS data analysis pipelines available (Jagadeesan, Baert, et al., 2019;Quainoo et al., 2017), ranging from methods that require extensive bioinformatics expertise to commercial software packages which can be challenging to use (Amézquita et al., 2020;Jagadeesan, Gerner-Smidt, et al., 2019). Nevertheless, studies have shown that source tracking with WGS data from L. monocytogenes was possible from these platforms with default settings (Jagadeesan, Gerner-Smidt, et al., 2019;Oakeson et al., 2017).
The third-generation sequencing technologies allow for long sequencing reads of single molecules which simplifies the reconstruction of the molecules and de novo assembly of genomes. One of the cheapest (~$1000) and most commonly used is a MinION sequencer commercialized in 2014 by Oxford Nanopore Technologies (ONT) (Jain et al., 2015;Loman & Watson, 2015). In its early days, this technology had limitations due to the high error rate and relatively low throughput (Kilianski et al., 2015;van Dijk et al., 2018). Since then the technology has matured significantly with a reduced error rate and higher throughput (Karst et al., 2021). Considering ONT's latest release, Flongle, which is a $90 adapter for the MinION transportable sequencing platform, the sequencing cost is now considerably decreased.
The classical fingerprinting method, repetitive sequence-based PCR (rep-PCR) was introduced in 1991 by Versalovic et al. (1991) and has been shown to have equal discriminatory power as PFGE for subtyping Listeria monocytogenes (Chou & Wang, 2006). By combining rep-PCR with the sequencing of the amplicons with the ONT sequencing platform Krych et al. (2019) presented a new method called ON-rep-seq. This method combines the discriminative power of rep-PCR fingerprinting with access to the sequence information for each DNA fragment which we earlier only knew as bands on a gel. This gives a set of highly discriminating sequences which allows for accurate taxonomic identification and in many cases strain-level differentiation (Krych et al., 2019).
This study aimed to explore the use of ON-rep-seq as (1) a screening method in a real industry case for identification and differentiation of putative L. monocytogenes isolated during routine sampling of processing equipment and products and (2) to evaluate the strain level discrimination results with WGS.

| Sampling in processing plant and preparation of isolates
Routine sampling in the salmon processing plant was performed according to the company's guidelines. Environmental testing was performed both at fixed and rotational sampling points every day, before, during, and after the processing of the salmon. Analysis of the samples was performed at the in-house laboratory of the processing plant following the iQ-Check Listeria spp. kit (Bio-Rad) procedure. All PCR-positive samples were plated on Rapid'L.mono agar plates (Bio-Rad). From all plates that contained colonies with typical characteristics of L. monocytogenes colony, the material was frozen at −20℃ and stored in the Microbank TM system (Pro-Lab Diagnostics) before being transported to NTNU and further stored at −80℃. Two gutting machine lines repeatedly tested positive for L. monocytogenes and therefore, 20 isolates deriving from different time points and places on these lines were selected for further investigations (Table 1).
Upon analysis, the isolates were propagated on Brain Heart Infusion agar (BHIA; CM1136) and repropagated at a minimum twice.
Their growth and appearance on Brilliance Listeria Differential agar (BLA; CM1080) was investigated after incubation at 37 o C for 24 ± 2 h. Note, DNA extraction was performed by using the Genomic Micro AX Bacteria+ Gravity-kit (102-100 M, A&A BIOTECHNOLOGY) according to the manufacturer's procedure. The RNAse treatment was included in the procedure. The DNA was eluted in the neutralized elution buffer. Also, DNA quality was checked on agarose gel and DNA concentrations were estimated by spectrophotometric measurement using BioTek PowerWave XS, Take3 plate, and Gen5 2.0 software. DNA (30 µl, ~40 ng/µl) was sent on ice with overnight shipment to Novogene UK Sequencing laboratory and another 30 µl (~40 ng/µl) of DNA was subjected to ON-rep-seq sequencing at the University of Copenhagen, Denmark.

| Library construction and sequencing details
At the sequencing laboratory, DNA purity and integrity were again controlled and accurate DNA concentration was measured by Qubit ® 3.0 fluorometer quantification. The genomic DNA was randomly sheared into fragments of about 350 bp and library construction was done by using the NEBNext ® DNA Library Prep Kit. End repairing, dA-tailing, and ligation of NEBNext ® adapter were done before the fragments were PCR enriched by P5 and indexed P7 oligos. Purification and quality check of the products was performed before sequencing. The sequencing strategy was paired-end sequencing with a read length of 150 bp at each end, performed on an Illumina ® NovaSeq TM 6000 sequencing platform.
Base-calling was done with CASAVA v1.8 software and the raw read dataset was subject to quality filtering. Paired reads containing either adapter contamination, more than 10% uncertain nucleotides or reads with low-quality nucleotides (base quality Q ≤ 5) constituting more than 50% of either read, was removed to obtain highquality reads.

Isolate ID
Sampling point Sampling date ID Rapid'L.mono

| Genomic characterization based on WGS data
The whole-genome sequences were analyzed by using the online web-based tools developed by the Center for Genomic Epidemiology (CGE, 2020). The high-quality reads from Illumina PE150 sequencing were used as templates and uploaded to the CGE server. The typing tool KmerFinder (Clausen et al., 2018;Hasman et al., 2014;Larsen et al., 2014) was used to identify the species based on Kmers (length = 16 bases), while MLST 2.0 , was used to determine the sequence type based on the seven conventional MLST loci. For the 17 isolates identified as L. monocytogenes the MLST configuration Listeria monocytogenes was chosen, and for the three isolates identified as L. innocua, the MLST configuration, Listera was chosen.
Average Nucleotide Identity (ANI) is a measure used to compare the genome sequences of two prokaryotic organisms and calculate the ANI value. Here, the online ANI Calculator from ChunLab (Yoon et al., 2017), based on the OrthoANI algorithm, was used to do pairwise comparisons of all the isolates in the dataset.
To show the relationship between the L. monocytogenes isolates a phylogenetic tree based on SNPs was constructed using the CGE webtool CSI Phylogeny 1.4 . Three reference genomes were included in the tree (Table 2). To give a better visualization the result file in Newick format was uploaded to another web tool, iTol (Letunic & Bork, 2019). The phylogenetic tree was rooted at the reference strain L. monocytogenes EGD-e.
Further on, genotypic characterization and phenotypic predictions were made on acquired antimicrobial resistance genes using

| Comparison to other published isolates by NCBI Pathogen Detection
The WGS data from each isolate was submitted to NCBI SRA.
Sequence data for pathogens submitted to SRA are regularly picked up by the NCBI Pathogen Detection project, assembled, and compared to all other assemblies in the same taxonomic group (NCBI, 2016). Isolates in the same SNP cluster differ with <50 SNPs and within each cluster, a phylogenetic tree is constructed based on a maximum compatibility algorithm (Cherry, 2017). The "Search and Highlight" function was used to find other isolates associated with salmon, fish, seafood, and food processing environment. Rep-PCR-1 and nuclease-free water to a total volume of 25 μl.

Incorporation of ONT compatible adapters was performed using
dual-stage PCR where first 3 cycles provide optimal annealing of (GTG)5 regions, following 10 cycles of denaturation 5 min; 3 cycles of 95°C for 30 s, 45°C for 1 min and 62°C for 4 min; followed by 10 cycles of 95°C for 30 s, 65°C for 1 min and 72°C for 4 min and final elongation at 72°C for 5 min. After Rep-PCR-2 samples were pooled using 10 μl of each sample. The pooled library was cleaned with AMPure XP beads (Beckman Coulter Genomics) in volumes 100:50 μl respectively. The bead pellet was washed with 80% ethanol and re-suspended in 100 μl of nuclease-free water.
The pooled and bead-purified library was measured with Qubit ® dsDNA HS Assay Kit (Life Technologies) and 66 ng of the library was used as an input to the End-prep step in 1D amplicon by ligation protocol (ADE_9003_v108_ revT_18Oct2016) with one adjustment: 80% ethanol instead of 70% was used for all washing steps.
The sequencing was performed using the R9.4.1 flow cell.
The phylogenetic tree based on SNPs supported the similarity of the L. monocytogenes isolates clustering in two different groups in perfect correlation with the MLST sequence type (Figure 1).  (Gilmour et al., 2010) and the rep26 sequence of PLGUG1 originally isolated from L. grayi (Kuenne et al., 2010)  monocytogenes isolates in this study was assigned to two different SNP clusters, the group of 15 isolates was assigned to SNP Cluster PDS000032941.106 (393 isolates), while the group of two isolates was assigned to SNP Cluster PDS000025311.185 (1093 isolates).

| Species-level classification
Classification of corrected reads from LCPs in 20 isolates identified 17 isolates as L. monocytogenes and three as L. innocua.

| Strain-level discrimination
The read length count profiles (LCps) from the sequenced Rep-PCR products identified three unique profiles among the selected isolates (Figures 4 and 5). Among 17 L. monocytogenes isolates two unique clusters of LCps were distinguished with two and 15 isolates ( Figure 5). No differentiation in LCp profiles could be observed among three L. innocua isolates (Figure 4).

| DISCUSS ION
Species-level and strain-level discrimination of microorganisms is essential for a food processing plant to track microbial contamination sources in the value chain. Intra-species variability exists in important characteristics such as virulence, pathogenicity, and drug resistance. During seven months a bacterial isolate can change due to environmental conditions, isolation, and culturing can generate new SNPs (Allard et al., 2012), and sequences from the same contamination source are most likely not identical even though they are of the same origin (Pightling et al., 2018).
In this study, a set of 20 putative Listeria monocytogenes isolates from a salmon processing plant were identified to species and differentiated down to strain level with ON-rep-seq and the results were evaluated by WGS. The isolates, originally detected through routine sampling in the processing plant, were selected from different time points and sampling points in the processing facility, with a focus on two gutting machines where L. monocytogenes had repeatedly been detected. The ON-rep-seq method separated the isolates into three distinct groups with unique LCps (read length count profiles). The taxonomic classification performed on the consensus reads from each peak revealed that these groups were two different L. monocytogenes strains and one L. innocua strain. This differentiation is in agreement with our former work where we described the relationship between unique LCps and associated strains (Krych et al., 2019). Testing novel methods on real industry case isolates is significant, and in this study, ON-rep-seq was able to unravel differences and similarities between the isolates. Results as unique LCps differentiating between strains, as presented here, will inform the quality control personnel at the processing plant that with high probability it is  (Nadon et al., 2017). In the aftermath of this, the use of WGS among food companies was discussed in an industry workshop in 2019 (Amézquita et al., 2020). One of the barriers discussed was the development of expertise in sequencing and bioinformatics that is necessary, as well as the concern for the requirement of computer infrastructure and data storage needed (Amézquita et al., 2020).

F I G U R E 4
LCps (read length count profiles) generated from 20 putative L. monocytogenes isolates sampled from a salmon processing plant. The curves are a function of read length and abundance, where the position of the peak on the x-axis corresponds to the length of the sequence and the height of the peaks corresponds to abundance. The four LCps at the top left are from reference strains analyzed in an earlier project (Krych et al., 2019). The two closely related strains EGD-e and LO28 have previously been shown to be indistinguishable from each other than by SNP analysis. As is the case here as they have the same LCp. The two other strains N53-1 and 12067 clearly show different profiles. Fifteen of the isolates analyzed in this study show the same LCp (blue) and are expected to be the same strain. Two of the isolates show an LCp (green) different from (c) but similar to each other, while three isolates show a third LCp (brown

SL3.189
Based on the WGS data, the isolates in this study were further characterized into sequence types (MLST). In correspondence with the identical LCps from the ON-rep-seq analysis, the group of two identical L. monocytogenes strains was identified as ST8 and the group of 15 L. monocytogenes strains as ST37.
The two isolates of L. monocytogenes ST8 were originally detected in the filleting area, the first isolate, FS.171, from salmon fillet and the second isolate, F1K2.353, in a filleting machine six months later. Strain ST8 has earlier been linked to a multi-country outbreak of listeriosis in Denmark, Germany, and France in 2015-2018 which was due to the consumption of salmon products (EFSA, 2018). In addition, ST8 has been identified repeatedly over three years in a salmon processing plant in Denmark (Schmitz-Esser et al., 2015). In Norway, L. monocytogenes ST8 has been frequently detected in one salmon slaughterhouse for 13 years (Fagerlund et al., 2016). All this demonstrates that L. monocytogenes of this ST can be persistent, and it can cause listeriosis. L. monocytogenes ST37 has been detected in both food products and food processing environments associated with meat, dairy,  References Tomáštíková et al., 2019). It is however suspected to be a less persistent strain than ST8 (Muhterem-Uyar et al., 2018).
The phylogenetic analyses done, both by CSI Phylogeny and NCBI Pathogen Detection confirms the grouping of the isolates demonstrated by ON-rep-seq. CSI Phylogeny SNP tree in Figure 1 indicates that the L. monocytogenes isolates cluster in two different groups in exact accordance with the ON-rep-seq LCps, and the MLST sequence type, additionally, both groups are somewhat different from the reference strain L. monocytogenes EGD-e. The three L.
innocua strains are not included in the SNP phylogenetic tree as their relationship to the L. monocytogenes are too distinct.
Clinical isolates of L. monocytogenes usually carry a fully functional inlA gene (Gorski et al., 2016). Different mutations in this gene can lead to premature stop codons (PMSC) (Van Stelten et al., 2010) and have been identified in 45%-50% of food isolates analyzed (Upham et al., 2019;Van Stelten et al., 2010). This can indicate a lower potential of pathogenesis (Olier et al., 2005) and this gene has been suggested as a genetic marker for risk assessment (Upham et al., 2019).
In this study, all the L. monocytogenes isolates carried a full length and predictably fully functional inlA gene meaning that they must be considered as a severe risk for human infection if they contaminate the food product.
In the analysis of pathogenicity done with PathogenFinder, all the isolates, including L. innocua, were predicted to be human pathogens. However, the prfA gene, coding for positive regulatory factor (PrfA) of L. monocytogenes, was not present in the L. innocua isolates. This factor regulates and activates most of the known virulence genes by binding to a palindromic prfA recognition sequence located in the promoter region (Glaser et al., 2001;Greene & Freitag, 2003). This means that many of the genes involved in pathogenesis will not be expressed in these isolates even though they are present and therefore these isolates are probably not pathogenic. The prfA gene was present in all L. monocytogenes isolates in full length and with 100% identity to the reference gene.
In this study, the isolates used for analysis were selected based on when and where they were detected in the processing plant, and in connection to the area with frequent Listeria detection. The anal- There are exceptions, however, where white colonies were identified as L. monocytogenes, and blue colonies were confirmed as L.
innocua. (Greenwood et al., 2005), as was also the case for some of the isolates in this study. isolates in case of tracking and tracing of source contamination. It must be acknowledged that; the more they test -the more they find, and for some processing plants, this can lead to several hundred isolates a year. Performing WGS on hundreds of isolates is not applicable due to the costs, workload, data processing, and data storage needed (Amézquita et al., 2020;Jagadeesan, Gerner-Smidt, et al., 2019). Sequencing a small number of isolates in a tracing situation will be the most likely scenario but selecting the most representative isolates for this might be a challenge. As demonstrated here the ONrep-seq method gives sufficient information for preliminary source tracking of pathogens in the food industry to serve as a screening method before doing WGS and can in some cases even serve as an alternative method to WGS.
ON-rep-seq as a fast-screening method offers much more accurate taxonomic identification than 16S rRNA gene sequencing with simultaneous access to a strain level discrimination comparable to that obtained from the WGS.

| CON CLUS ION
With this study, we demonstrate that the recently developed fingerprinting method combined with nanopore sequencing called ON-rep-seq is a promising, rapid, cost-effective, and less laborious alternative to the whole genome sequencing for species-level identification and strain level discrimination of Listeria species.
From a set of 20 isolates, 17 L. monocytogenes and 3 L. innocua were identified and the L. monocytogenes isolates were further differentiated into two strains. The analysis done on WGS data showed the same, and no further differentiation of the isolates was obtained.
The material in this study is however very limited. To evaluate the discriminatory power of ON-rep-seq more thoroughly a more diverse set of isolates will be necessary.

ACK N OWLED G M ENTS
The authors would like to thank students Kristoffer Johansen and Heidi Heejin Lee for performing parts of the practical work. This study was funded by the Norwegian University of Science and Technology (NTNU). Gunn Merethe B. Thomassen was supported by a Ph.D. grant from NTNU, as part of the OPTiMAT project.

CO N FLI C T O F I NTE R E S T
None declared. The price for each sample if 96 samples are analyzed simultaneously on the Flongle.

E TH I C S S TATEM ENT
None required.

DATA AVA I L A B I L I T Y S TAT E M E N T
Sequence reads from whole-genome sequencing and ON-rep-seq are available at the NCBI repository under the BioProject number