Application of shotgun metagenomics sequencing and targeted sequence capture to detect circulating porcine viruses in the Dutch–German border region

Abstract Porcine viruses have been emerging in recent decades, threatening animal and human health, as well as economic stability for pig farmers worldwide. Next‐generation sequencing (NGS) can detect and characterize known and unknown viruses but has limited sensitivity when an unbiased approach, such as shotgun metagenomics sequencing, is used. To increase the sensitivity of NGS for the detection of viruses, we applied and evaluated a broad viral targeted sequence capture (TSC) panel and compared it to an unbiased shotgun metagenomic approach. A cohort of 36 pooled porcine nasal swab and blood serum samples collected from both sides of the Dutch–German border region were evaluated. Overall, we detected 46 different viral species using TSC, compared to 40 viral species with a shotgun metagenomics approach. Furthermore, we performed phylogenetic analysis on recovered influenza A virus (FLUAV) genomes from Germany and revealed a close similarity to a zoonotic influenza strain previously detected in the Netherlands. Although TSC introduced coverage bias within the detected viruses, it improved sensitivity, genome sequence depth and contig length. In‐depth characterization of the swine virome, coupled with developing new enrichment techniques, can play a crucial role in the surveillance of circulating porcine viruses and emerging zoonotic pathogens.


INTRODUCTION
The emergence of new viral diseases poses a continuous threat to both animal and human health. Wildlife-borne diseases such as Lassa fever (Roberts, 2018) and West Nile fever (Vlaskamp et al., 2020) and those linked to livestock such as avian and swine influenza (Fraaij et al., 2016;Lam et al., 2015) have emerged previously and have caused significant epidemics/pandemics with serious repercussions. With the increasing intensification of livestock farming, a rise in not only the human-wildlife-livestock interface, but also within herds, has led to an increased risk of transmission (Jones et al., 2013;Kwok et al., 2020).
Therefore, the surveillance of farms and the environment is critical for detecting (emerging) zoonotic infectious diseases.
Pigs are the most commonly studied farm animals as they are considered mixing vessels in the transmission of epidemic/pandemic viruses (Smith et al., 2009). The 2009 swine-origin H1N1 influenza A virus (FLUAV), which was derived from co-circulating FLUAV strains in swine, was initially transmitted to humans several months before the outbreak was identified (Smith et al., 2009). The results of several studies highlight the need for systematic surveillance of FLUAV in swine.
Additionally, these studies can provide evidence of reassortment of cocirculating viruses in swine, leading to the emergence of potentially pandemic viruses in humans (Nava et al., 2009). Moreover, pigs can also be affected by several swine-specific viruses, for example the African swine fever virus (Taylor et al., 2020) and porcine reproductive and respiratory syndrome virus (PRRSV) (Balka et al., 2018), that can cause severe production losses. Lastly, characterization and understanding of the pig virome are also essential when assessing the safety of xenotransplant development (Denner, 2017).
Next-generation sequencing (NGS) has been used previously to identify and characterize viruses (Lizarazo et al., 2019). Shotgun metagenomics sequencing (SMg) depicts the untargeted sequencing of nucleic acids directly from the sample. SMg has the potential for broad range detection, characterization and detailed taxonomic classification of pathogens, making it a promising tool within a One Health approach (Wylie et al., 2015). As such, SMg has been used to detect and characterize known and novel viruses affecting plants, humans and animals (Kwok et al., 2020;Palinski et al., 2017). Furthermore, SMg can detect co-infections and provide genomic data for epidemiological typing (Couto et al., 2018). However, the inherent unspecific nature of SMg results in the sequencing of host, environmental, pathogenic and non-pathogenic nucleic acids, which results in an overall lower sensitivity, compared to conventional methods such as real-time PCR (Quick et al., 2017). Therefore, sensitivity is not only determined by the abundance of microorganisms but more so by the presence of host cells and other microbes (Couto et al., 2018). To improve the sensitivity of microbe detection, several pre-and post-lysis enrichment strategies have been described. Pre-lysis enrichment depends on the microorganisms' structural integrity, as it involves targeted lysis of host cells followed by degradation of free nucleic acids (Hasan et al., 2016) and/or separation by centrifugation/filtration (Bellehumeur et al., 2015). Postlysis enrichment steps include DNase treatment (Lizarazo et al., 2019), oligonucleotide bait probes (targeted sequence capture [TSC]) (Oba et al., 2018;Wylie et al., 2015), rRNA depletion and PCR amplicon sequencing (Quick et al., 2017). Oligonucleotide bait probes capture viral nucleic acids present in a sample by hybridization and have been reported to be superior to other pre-lysis and post-lysis enrichment methods at increasing the number of sequenced viral reads, while maintaining viral diversity (Briese et al., 2015). As a result, viral TSC was selected to be evaluated in the sequencing of highly diverse pig samples.   Irvine, USA). Complementary DNA (cDNA) was generated using a SISPA approach as described previously (Kafetzopoulou et al., 2018).

Sample collection, qPCR and nucleic acid isolation
Briefly, reverse transcription and synthesis of second-strand cDNA were performed as described (Greninger et al., 2015). Amplification of cDNA was performed as described (Kafetzopoulou et al., 2018) using Sol-Primer B (5′-GTTTCCCACTGGAGGATA-3′) and the following PCR  , keeping only contigs ≥200 bp. Assembly metrics were compared using QUAST v5 (Gurevich et al., 2013). MEGAHIT assemblies were mapped (80% identity, 80% length fraction, ignore unspecific reads) against an in-house viral database derived from available complete genomes on GenBank CAT substitution model (Stamatakis, 2014) and rapid bootstrapping from 1000 replicates. The phylogenetic analysis was carried out on the freely available CIPRES Science Gateway v3.3 portal www.phylo.org (Miller et al., 2012). The in silico Influenza Antiviral Resistance Risk Assessment was performed on www.fludb.org.

Impact of probe hybridization time on viral sensitivity
To set up an efficient viral enrichment strategy for ViroCap, hybridization times of 20 and 72 h were compared on a subset of 12 BS samples

Comparison of viral sensitivity between SISPA and ViroCap
In total, 36 samples (32 BS and four NS) were evaluated using SISPA and ViroCap to compare viral sensitivity. Using the kmer-based online tool Taxonomer, a total of 87 viral species were detected with SISPA, and a total of 93 viral species were detected using ViroCap. Viruses F I G U R E 2 Impact of ViroCap hybridization times (20 and 72 h) on viral sensitivity compared to SISPA (n = 12 samples). The diagram highlights the most frequently detected viruses. Sequencing reads were analyzed with Taxonomer (full analysis) and normalized. Abbreviations: PERV, porcine endogenous retrovirus; PRRSV, porcine reproductive and respiratory syndrome virus; PPV, porcine parvovirus.
F I G U R E 3 Viral reads (normalized) and fold changes between SISPA and ViroCap (n = 36 samples). Frequently detected viral genera in this study are shown. Numbers higher than 1 indicate increased sensitivity using ViroCap. Data analyzed with Taxonomer (full analysis). detected within each herd and farm using read-based taxonomic classification are listed in Table S1. Additionally, ViroCap increased the overall viral read count by a fold of 23.5, compared to the SISPA approach alone ( Figure 3; Table S2). The relationship between FLUAV Ct values and the number of FLUAV reads is shown in Table S3. No significant association or correlation was found between these two parameters.
Overall, ViroCap led to a higher read count in 16 of the 19 most abundantly detected viral genera in this study. The most significant fold change occurred in rhadinoviruses (171.59-fold change). There was a loss of viral read count in three viral genera, as shown in Figure 3.

3.3
De novo assembly of SISPA and ViroCap sequencing reads

Contig-based detection of clinically relevant pathogens
In  Figure S1). Viruses detected in pigs suffering from respiratory symptoms are listed in Table S5 (NS) and Table   S6 (BS). Meanwhile, viruses detected in animals without symptoms are listed in Table S7 (BS). Additional complete or near-complete viral contigs obtained in this study are shown in Table S8.

PRRSV genome coverage
PRRSV was most frequently detected following assembly, with 26 and 24 samples generating contigs with the ViroCap and SISPA approach, respectively. Coupled with its high abundance and clinical significance, PRRSV was subsequently evaluated in more detail. MEGAHIT assemblies were used as they produced the longest contigs ( Figure 4). Viro-Cap increased the average coverage, along with the number of reads in all 26 samples. Although ViroCap generated more PRRSV reads, the length of the contigs was only slightly longer compared to the PRRSV contigs obtained by SISPA, indicating a coverage bias. Figure 5a TA B L E 2 Overview of detected viruses (contig level) in the respective sample material and associated symptoms/pathology: Red (blood serum), green (nasal swab) and blue (blood serum and nasal swab) (n = 36)

Phylogenetic analysis of PRRSV and FLUAV
ViroCap increased the sequencing depth of two clinically and economically significant viruses, PRRSV and FLUAV. In the following two case studies, we used high-quality MEGAHIT assemblies generated through ViroCap for epidemiologic analysis.

DISCUSSION
The European Union (EU) is the world's second biggest producer of pork after China and the biggest exporter of pork products (https://ec.europa.eu/info/food-farming-fisheries/animalsand-animal-products/animal-products/pork_en). The major production basin extends from Germany (specifically Nordrhein-Westfalen and Niedersachsen) to Belgium (Vlaams Gewest) and accounts for 30% of EU pigs (https://ec.europa.eu/eurostat/statisticsexplained/pdfscache/3688.pdf). The large livestock population and density in areas such as these can facilitate disease transmission within herds and between livestock and humans (Kwok et al., 2020). Therefore, surveillance of livestock and the surrounding environment is a hallmark of early detection of potential epidemic/pandemic pathogens of human and animal significance.
The recent rapid technological advances and availability of NGS platforms fuel our grasp on viral diagnostics, surveillance and transmission directly from sample material. However, several wet-lab and e-lab hurdles remain. Sensitivity was labelled as the most pressing wet-lab issue (Greninger, 2018). Pre-lysis enrichment to increase sensitivity relies on microorganisms' structural integrity (Hasan et al., 2016). However, fresh specimens are not always achievable or practical. Therefore, we compared a post-lysis enrichment technique, Viro-Cap, to shotgun metagenomics (with only a simple DNase treatment) to estimate and determine its ability to detect and characterize the virome of pigs.
To determine the impact of ViroCap on sensitivity, we used paired aliquots from the same sequencing library pre-and post-ViroCap. Viro-Cap increased the number of viral reads significantly and allowed improved detection of viruses on the read and contig level. The increased sequence depth of viral contigs improved single-nucleotide resolution for phylogenetic and antiviral resistance analyses. However, the increased number of viral reads by ViroCap did not always result in longer viral contigs. Coverage bias of TSC methods has been reported previously (Naccache et al., 2016). The inability to yield whole genomes consistently with capture probes has also been reported previously, as probes can be less efficient in lower viral abundances due to coverage bias and bias towards viral organisms with high loads in multiplexed TSC approaches (Quick et al., 2017;Naccache et al., 2016). Overall, the use of short-read sequencing (2 × 76 bp) could also have contributed towards shorter assemblies. The application of long-read sequencing platforms combined with ViroCap might be an option to reduce taxonomic misassignments in the future (Schuele et al., 2020). Targeted PCR amplification has been shown to yield whole-genomes more consistently but is dependent on primer target matches and, therefore, primarily suitable during outbreak scenarios such as Ebola (Deng et al., 2020) and SARS-CoV-2 (Meredith et al., 2020).
Read-based taxonomical approaches were prone to misassignments in closely related viruses such as bat adenovirus and equine adenovirus.
Viruses with high genetic diversity and recombination events, such as porcine astroviruses, also resulted in misassignments. A contig-based approach improved taxonomical assignment but resulted in reduced sensitivity. An evaluation of different assemblers revealed that SPAdes yielded the highest number of viral contigs, whereas MEGAHIT yielded the longest contigs. Indeed, MEGAHIT was recently deemed one of the leading choices to assemble a metagenome in the Critical Assessment of Metagenome Interpretation (CAMI) challenge (Sczyrba et al., 2017).
Important respiratory swine pathogens that were detected included PRRSV, FLUAV and porcine astrovirus (PoAstV). PoAstV genotypes 2-5 have been reported in pigs with diarrhoea or respiratory symptoms and asymptomatic pigs. Interestingly, co-infections with different genotypes have been frequently reported (Lv et al., 2019). Astroviruses show wide genetic diversity in humans and animals, indicating the possibility that astroviruses could cross the species barrier (Fischer et al., 2017). Several pig pathogens that can cause gastroenteric symptoms were also detected in NS samples within the same farm, such as swine norovirus, porcine kobuviruses, porcine sapelovirus and rotavirus. Curiously, diarrhoea was never listed as a symptom. Therefore, the relevance of these viruses within these herds is somewhat unclear. Rotavirus was detected in two samples. A previous study which investigated rotaviruses revealed potential transmission events between humans and pigs . However, in order to determine the zoonotic potential of this finding, more samples would need to be screened from both pigs and humans within the area. Nervous system-related viruses that were detected included bocaviruses, mamastrovirus 2 and 3, and porcine pestivirus 1. Interestingly, these viruses were frequently co-detected with PRRSV; however, the significance of this association remains to be ascertained.
To better understand the potential of metagenomics for clinical and public health, we studied two viruses, particularly PRRSV and Although we did not detect mixed clusters in the farms, it is known that the movement of piglets by trading could serve as a transmission route for PRRSV (Hanada et al., 2005).
Denmark is the lead exporter of piglets in the EU, trading mainly to Poland and the Netherlands. The latter country then trades pigs mainly to Germany for slaughtering (https://ec.europa.eu/eurostat/ statistics-explained/pdfscache/3688.pdf). These intra-EU exchanges are reflected in the FLUAV tree, in which four closely related FLUAV strains from one German farm clustered together with strains from the Netherlands and Denmark. Interestingly, the study's closest neighbour was a strain from the Netherlands, which was reported to cause a severe acute respiratory infection in a child (Fraaij et al., 2016). At the time, the case was considered incidental and rare. However, the continuous presence of these strains in pigs should be monitored permanently as mutations (genetic drift) can occur with the potential to cause human epidemics or even pandemics (Nava et al., 2009). Zoonotic infections with influenza A swine H1 av N1 have been reported in Germany in 2020 (Dürrwald et al., 2020 Thus, generating (nearly) complete viral genomes directly from sample material could reveal strains that may have acquired antigenic changes increasing their zoonotic potential (Dürrwald et al., 2020). Although the infectivity potential of a particular viral strain does not determine the susceptibility of the host, the complete genome of viruses can help with the in silico prediction of enhanced human receptor binding and specificity, which can be tested experimentally in cells expressing human receptors (Schmier et al., 2015).
Limitations of this study include the pre-selection of farms based on their ability to enable the long-term monitoring of FLUAV, PRRSV and Within our sample cohort, SPAdes was the best choice for detecting viruses, whereas MEGAHIT yielded the longest contigs. Understanding the swine virome and the potential zoonotic pathogens present within these crucial mixing vessels will allow for better outbreak preparedness in livestock disease and subsequent human transmission.

ACKNOWLEDGEMENTS
We

DATA AVAILABILITY STATEMENT
All sequencing data have been deposited in at the Sequence Read Archive under the BioProject number: PRJNA701157.

CONFLICT OF INTERESTS
John W. A. Rossen is employed by IDbyDNA. Silke Peter consults for IDbyDNA. This did not influence the interpretation of reviewed data and conclusions drawn nor the drafting of the manuscript, and no support was obtained from them. All other authors declare no conflict of interest.

ETHICS
The sampling within the Food Protects project has been classified as an