Metabarcoding using nanopore long‐read sequencing for the unbiased characterization of apicomplexan haemoparasites

Apicomplexan haemoparasites generate significant morbidity and mortality in humans and other animals, particularly in many low‐to‐middle income countries. Malaria caused by Plasmodium remains responsible for some of the highest numbers of annual deaths of any human pathogen, whilst piroplasmids, such as Babesia and Theileria can have immense negative economic effects through livestock loss. Diagnosing haemoparasites via traditional methods like microscopy is challenging due to low‐level and transient parasitaemia. PCR‐based diagnostics overcome these limitations by being both highly sensitive and specific, but they may be unable to accurately detect coinfections or identify novel species. In contrast, next‐generation sequencing (NGS)‐based methods can characterize all pathogens from a group of interest concurrently, although, the short‐read platforms previously used have been limited in the taxonomic resolution achievable. Here, we used Oxford Nanopore Technologies' (ONT) long‐read MinION™ sequencer to conduct apicomplexan haemoparasite metabarcoding via sequencing the near full‐length 18S ribosomal RNA gene, demonstrating its ability to detect Babesia, Hepatozoon, Neospora, Plasmodium, Theileria and Toxoplasma species. This method was tested on blood‐extracted DNA from 100 dogs and the results benchmarked against qPCR and Illumina‐based metabarcoding. For two common haemoparasites, nanopore sequencing performed as well as qPCR (kappa agreement statistics > 0.98), whilst also detecting one pathogen, Hepatozoon felis, missed by the other techniques. The long‐reads obtained by nanopore sequencing provide an improved species‐level taxonomic resolution whilst the method's broad applicability mean it can be used to explore apicomplexan communities from diverse mammalian hosts, on a portable sequencer that easily permits adaptation to field use.

million deaths reported annually from as recently as 2010 (Murray et al., 2012).Many apicomplexan pathogens also afflict animals, including various species of Babesia and Theileria that cause severe disease in cattle, which in turn can generate devastating economic impacts through livestock morbidity and mortality, perpetuating cycles of poverty in many parts of the world (Bishop et al., 2004;Mans et al., 2015;Ozubek et al., 2020;Suarez & Noh, 2011).
The most important blood-borne apicomplexan parasites are vectorially transmitted by haematophagous arthropods, such as ticks and mosquitoes (Mans et al., 2015;Penzhorn, 2020;Sato, 2021).Among animals that live in close association with humans, dogs act as an ideal species in which to investigate tools for exploring apicomplexan haemoparasite diversity as they act as hosts for a diverse array of species, including tick transmitted Babesia that cause varying degrees of haemolytic anaemia and can prove fatal.The 'large' (Babesia canis, Babesia vogeli, Babesia rossi) and 'small' (Babesia vulpes, Babesia gibsoni) dog infecting Babesia not only vary in geographical distribution and pathogenicity, but also respond variably to specific treatments, making their accurate detection and diagnosis imperative (Davitkov et al., 2015;Huggins et al., 2023;Karasová et al., 2022;Penzhorn, 2020).
Haemoparasites of the genus Hepatozoon are also important canine pathogens, for example, Hepatozoon canis, which is found throughout the tropics and can generate clinical signs that overlap with those caused by Babesia (Baneth et al., 2003;Baneth & Allen, 2022).Many apicomplexan haemoparasites are also zoonotic, for example, Babesia microti and Babesia divergens and whilst canines have not been confirmed as acting as viable hosts for these Babesia, there are several reports in the literature implicating pet dogs in human babesiosis cases (El-Bahnasawy et al., 2011;Giudice et al., 2015;Solano-Gallego et al., 2016;Yabsley & Shock, 2013;Youngid et al., 2019;Zhou et al., 2014).
For the diagnosis of apicomplexans, such as tick-transmitted piroplasmids, microscopy on blood smears is still considered the 'gold standard' method (Salih et al., 2015).Nonetheless, the sensitivity of microscopy is low and morphological identification of closely related species, for example, within the 'large' and 'small' canine Babesia, may be difficult-to-impossible to achieve (Müller et al., 2010;Salih et al., 2015;Troskie et al., 2019).Serodiagnostic methods circumvent some of these issues, however, they also have drawbacks, such as an inability to discern past from current infections (Kaewkong et al., 2014;Kumar et al., 2022;Salih et al., 2015).
Diagnostic tools that use next-generation sequencing (NGS) bypass many of the issues that classical molecular methods face, as they can deep-sequence conserved genetic barcodes from a target group of organisms simultaneously, generating a 'metabarcode' of all the pathogens present (Huggins et al., 2019a(Huggins et al., , 2019b;;Wahab et al., 2020).
Given the substantial impact haemoparasites have on the health of animals and humans, we set out to develop and validate a novel nanopore sequencing-based metabarcoding method for the holistic characterization of blood-borne apicomplexan pathogens on ONT′ USB-sized MinION™ Mk1B sequencer.The proposed value of such a screening tool would be to explore apicomplexan haemoparasite diversity from regions of the world and/or from host species that have limited-to-no prior relevant data, thereby making traditional molecular methods potentially inadequate at detecting such diversity.
The diagnostic parameters of the developed nanopore sequencing method were validated using 100 canine blood samples from Cambodia, previously screened and found positive and negative for blood-borne apicomplexan haemoparasites using a both a shortread NGS method (Illumina) and a multiplex qPCR assay (Huggins, Colella, et al., 2021;Huggins, Massetti, et al., 2021).Additionally, the developed method was tested on samples collected from diverse species and sample types, demonstrating a potential broad applicability to a wide range of different hosts and host-parasite systems.

| Sampling and DNA extraction
To validate the developed nanopore sequencing method, this study utilized a subset of 100 DNA samples sourced from whole blood collected from community dogs in Cambodia as part of a previous study (Huggins, Colella, et al., 2021) under The University of Melbourne Animal Ethics Committee Permit: 1814620.1.
In brief, following owner consent, dogs were restrained and whole blood was collected via venepuncture into ethylenediaminetetraacetic acid (EDTA) tubes.Samples were temporarily kept at 4°C in the field before being couriered at −20°C to the University of Melbourne, Australia.Two hundred microlitres of thawed whole blood was extracted with the DNeasy Blood & Tissue Kit (Qiagen, Hilden, Germany) using the manufacturer's protocol with a 30-min proteinase K digestion at 56°C and two final elution steps in 50 μL total eluent.Extracted DNA was kept at −20°C until use.Haemoparasite-positive control samples, listed in Table 1, that were used for validation of the nanopore method had their DNA extracted in a similar way.
All DNA extracts were quantified using a Qubit™ 4 Fluorometer (Thermo Fisher Scientific) using the dsDNA HS assay kit.

| Design and selection of apicomplexan 18S rRNA gene primers
To identify an ideal primer pair capable of amplifying 18S rRNA gene sequences from as wide a range of apicomplexan pathogens as possible, 18S rRNA sequences were added to Geneious Prime® version 2023.0.4 (Geneious) and aligned (see Appendix S1 for species and GenBank Accession numbers used).A selection of primers designed for amplification of the apicomplexan 18S rRNA gene from the literature were added to the alignment, whilst novel primers were also developed using the online tool Primer3web version 4.1.0.Optimal primer pairs were selected based on ensuring their amplicon was large (> 1000 bp); their primer sequences could not bind to the 18S rRNA gene sequence of a range of host mammals (Appendix S1); their primer binding sites were conserved and therefore not reliant on substantial degeneracy in their sequences and that the primer melting temperatures (T m ) were between 60 and 64°C and with no more than 2°C T m difference between each primer.Primer pairs were also tested for the possibility of coamplification of non-target sequences via running them through the University of California, Santa Cruz's in silico PCR (https://genome.ucsc.edu/cgi-bin/hgPcr)which found no off-target amplification against relevant reference genomes.Additionally, sequences were compared against the full NCBI database using blastn to shed further light on potential off-target amplification, with primers not selected if there was a high chance that off-target organisms could be found in mammalian blood.
As a result of this selection process, five putative primer pairs were selected and first tested using conventional PCR (cPCR) on three Babesia spp., two Hepatozoon spp.and two apicomplexannegative control samples with the results visualized on 1.5% of agarose gels dyed with GelRed® (Biotium) using a ChemiDoc™ Imaging System (Bio-Rad).Taking into consideration these criteria the primer pair chosen for use within the nanopore sequencing method was APICO_F_Mod: 5′-GCCAG TAG TCA TAT GCT TGTCT -3′ a primer modified from (Pinheiro et al., 2020) and Api_Illum_New-Rev: 5′-CTGTT ATT GCC TYA AAC TTCCTYG -3′ a primer modified from (Huggins et al., 2019a).This was found to be the best primer pair at amplifying all apicomplexan haemoparasite controls with a single clear band of ~1450 bp size with no concurrent cross-reaction on apicomplexan-negative controls.

| Nanopore sequencing for apicomplexan 18S rRNA gene metabarcoding
The PCR Barcoding Expansion 1-96 (EXP-PBC096) with ONT′ Ligation Sequencing Kit (SQK-LSK110) was utilized to conduct the library preparation for metabarcoding of the apicomplexan 18S rRNA gene on the MinION Mk1B sequencer (Oxford Nanopore Technologies).The protocol followed was 'Ligation sequencing amplicons -PCR barcoding (SQK-LSK110 with EXP-PBC096)' version: PBAC96_9114_v110_revK_10Nov2020, with some modifications to improve yield.Separate laboratory spaces were used for DNA extractions, processing of positive controls, first-round PCR and final deep-sequencing library preparation.For the full library preparation protocol please follow the protocols.iolink dx.doi.org/10.17504/proto cols.io.6qpvr 4nqpg mk/v1 or find in Appendix S2.
The final concentration of sequencing library added was always between 20 and 50 fmol, that is, within the recommended 5-50 fmol range recommended for loading onto R9.4.1 flow cells.
For assay development and optimisation, samples were run in batches of 16, whilst for methodological comparison of the nanopore method using the 100 samples from Cambodian dogs, samples were run as a batch of 96 and a batch of 12. Batches for methodological comparisons were run with four no template PCR negative controls, that is, nuclease free water, and four positive controls that were comprised of a uniquely identifiable 1486 bp gBlock synthetic DNA strand (Integrated DNA Technologies) of the 16S rRNA gene from Aliivibrio fischeri, as previously reported in Huggins et al. (2020).This positive control gBlock consisted of the relevant 16S rRNA gene sequence flanked by the appropriate primer binding regions for the APICO_F_Mod and Api_Illum_New-Rev primers as an artificial construct, the design of which, can be seen in Appendix S3.
The batch of amplicons used for methodological benchmarking and comparison were both run on new R9.4.1 flow cells (Oxford Nanopore Technologies), whilst smaller batches for assay optimisation and testing of positive controls were run on re-used R9.4.1 flow cells.Any re-used flow cells underwent a DNAse clean-up using the EXP-WSH004 Flow Cell Wash Kit (Oxford Nanopore Technologies), to reduce the possibility of DNA contamination and carry-over from prior sequencing runs.
To ensure detection of a wide range of apicomplexan haemoparasites and pathogens of veterinary and human health importance, a diverse spectrum of positive control samples were sourced and tested.These samples had previously been characterized using cPCR and Sanger sequencing and were typically blood or tissue extracted DNA from host animals (Table 1 Note: Classification information shows the top hit(s), identity and length of the relevant apicomplexan pathogen consensus sequence as generated by the NanoCLUST pipeline, when classified using blastn on our bespoke apicomplexan 18S rRNA gene database.Note that relevant NCBI accession numbers were taken from GenBank as they are not automatically outputted by NanoCLUST.Asterisk marks a species that has now been reclassified as Babesia vulpes (Baneth et al., 2015(Baneth et al., , 2019)).

| Bioinformatics
The bioinformatic pipeline NanoCLUST was used as it corrects for the error rate of the utilized ONT′ R9.4.1 flow cells by construction of accurate 18S rRNA gene consensus sequences (Delahaye & Nicolas, 2021;Rodríguez-Pérez et al., 2021).Overall, the pipeline conducts multiple quality control, read clustering, polishing and consensus forming steps followed by classification of consensus sequences using blastn against a database of the user's choice (Rodríguez-Pérez et al., 2021).Specifically, NanoCLUST uses fastp (Chen et al., 2018) et al., 2017) and Medaka (https://github.com/nanoporete ch/ medaka) for polishing and finalized by taxonomic classification using blastn (Altschul et al., 1990).NanoCLUST has already demonstrated great accuracy in the context of forming 16S rRNA gene consensus sequences used for bacterial microbiome sequencing, returning consensus sequences that were 99.7%-100% identical to the relevant representative sequence from GenBank (Huggins et al., 2022;Rodríguez-Pérez et al., 2021).
To create an apicomplexan database from which NanoCLUST could correctly assign taxonomic classifications, all NCBI's GenBank

| Comparison of molecular assays
To benchmark the performance of the newly developed nanopore sequencing protocol for apicomplexan haemoparasite detection, the results obtained by this method were compared against those accrued by two other molecular methods.Blood samples from 100 Cambodian dogs that were known to contain a variety of apicomplexan haemoparasite positive and negative samples were analysed using the developed nanopore method.These same samples were also tested using a previously validated multiplex qPCR for three canine apicomplexan haemoparasites; B. gibsoni, B.
vogeli and H. canis, (Huggins, Massetti, et al., 2021) and by an Illumina deep-sequencing based metabarcoding method for apicomplexan pathogen detection (Huggins et al., 2019a).Testing of both apicomplexan positive and negative samples was conducted so as to provide insight into both the relative sensitivity of the nanopore protocol but also its specificity, with testing of negative samples providing information as to whether the nanopore method could generate false positive results.
The qPCR and Illumina NGS methods followed were as per the protocols detailed in the relevant publication (Huggins et al., 2019a;Huggins, Massetti, et al., 2021).Specifically, QIIME2 version 2020.11(Bolyen et al., 2019) was used for processing of Illumina data with cutadapt (Marcel, 2011) used to remove primer and adapter sequences before upload into the QIIME2 environment.
To generate an estimate for the limit of detection of the novel nanopore sequencing method, results were compared to those obtained by the apicomplexan multiplex qPCR.Serial dilutions of gBlock synthetic DNA targets (IDT) for the pathogens B. vogeli and H. canis targeted by the qPCR were produced as described in (Huggins, Massetti, et al., 2021).From these, standard curves could be made, permitting the conversion of C q values into an estimated DNA concentration for a given pathogen, using the equation found from the line of best fit (Huggins et al., 2022).Therefore, samples found positive to B. vogeli or H. canis by both qPCR and nanopore sequencing had their C q value converted into a DNA concentration and then copy number via the formula: Number of copies = Amount of DNA (ng) × 6.022 × 10 23 Amplicon Length (bp) × 10 9 × 660 .
Through selection of the highest qPCR C q value obtained for a sample positive to B. vogeli or H. canis that was also positive by nanopore sequencing, the lowest number of 18S rRNA gene copies that was detected for either of these pathogens by the novel method was calculated, as has previously been done by the authors in (Huggins et al., 2022).
Kappa statistics were used to compare agreement of nanopore sequencing positivity for B. vogeli and H. canis against the results from the multiplex qPCR (Huggins, Massetti, et al., 2021) and Illuminasequencing results (Huggins, Colella, et al., 2021)   The substantial loss of raw reads that occurred when using NanoCLUST was due to the pipeline parameters chosen, with a stringent minimum read length required to remove short-read (<1000 bp) sequences from both sheared or truncated haemoparasite 18S rRNA gene amplification and co-amplification of non-target sequences.A stringent maximum read length (>1900 bp) excluded sequences that were substantially longer than the full-length 18S rRNA gene target, such as chimeric molecules formed by PCR during library preparation.
For the results generated by the nanopore sequencing method on Cambodian canine samples, the identification of read count thresholds was critical to determine whether a sample was truly positive for a given apicomplexan pathogen, therefore, a previously developed method for calculating these thresholds as per (Huggins, Colella, et al., 2021) was used.In brief, reads of the uniquely identifiable A. fischeri positive control sequence that were found in samples other than the positive control were used to inform the read cut-off threshold for a given library.Identification of positive control sequences in non-positive control samples can occur due to occasional barcode index misreading, sequencing error, chimeric reads or infrequent cross-contamination during NGS library preparation across a 96-well plate (Kim et al., 2017;Xu et al., 2018).Positive controls were used as a pure gBlock sequence at a concentration of 5.7 × 10 −5 ng/μL, that produced a post-PCR DNA concentration substantially higher than that achieved by biological samples, making these sequences pose the greatest risk of generating index misreading or cross-contamination.Therefore, to be able to take the strictest possible read cut-off threshold for a given sequencing batch, the cut-off was taken to be the highest read-count of a positive control sequence, that is, A. fischeri sequence, in a non-positive control sample.If apicomplexan haemoparasite reads from a blood sample were found to be lower than the batch's threshold then the sample was defined as negative for that pathogen, i.e., a false positive.
Read cut-off thresholds were low for both sequencing batches, at between 42 and 53 reads.
Following clustering and classification of the 100 canine blood samples, three apicomplexan haemoparasite species were detected: Babesia vogeli, Hepatozoon canis and Hepatozoon felis (Table 2).Of particular note is the identification of H. felis from a canine, which to the best of the authors' knowledge has not been reported before.
Other read classifications that were returned were from the A. fischeri positive control sequence, from the host; Canis lupus familiaris, and a few reads classified as belonging to 18S rRNA genes from plant or fungal species.Dog, plant and fungus sequences were omitted from the final dataset.Of the 100 Cambodian dogs tested by the nanopore method 48% of dogs were infected, of which 40% were single apicomplexan haemoparasite infections and 8% were coinfections (Table 2).For coinfected dogs, the proportion of species read counts was typically dominated by H. canis sequences with between 37% and 92% of reads belonging to this species for all but one of the coinfected dogs.In contrast, for 75% of coinfected dogs the proportion of B. vogeli reads was <11% with only one dog having roughly equal proportions of B. vogeli and H. canis reads.

| Comparison of diagnostic parameters for nanopore sequencing against qPCR and Illumina sequencing
By comparing samples that were apicomplexan haemoparasitepositive by both qPCR and nanopore sequencing, the lowest 18S Note: Classifications for nanopore sequencing were obtained using the NanoCLUST pipeline and blastn identification against the apicomplexan 18S rRNA gene database, whilst for Illumina sequencing classifications were conducted using blastn against the full NCBI database.Numbers in brackets show the number of dogs that were infected with a mono-infection.(Dalrymple et al., 1992).More generically, between two and five copies of this gene are thought to exist across diverse apicomplexan parasite species, hence using this range the nanopore method can detect at least 299 to 749 H. canis organisms per 1 μL of template DNA (Garcia, 2007;Teal et al., 2012).
Benchmarking of the novel nanopore sequencing method against other validated molecular methods, that is, a conventional molecular method (qPCR) and an alternative NGS approach (Illumina-based sequencing) was crucial to be able to assess the new method's diagnostic parameters.Across the 100 canine blood samples assessed by all three methods, only B. vogeli and H. canis pathogens were detected by all techniques, and therefore, kappa agreement statistics could be calculated to compare levels of agreement between pairwise sets of methods (Tables 3-5).The relevant kappa agreement statistics between qPCR and nanopore sequencing are shown in Table 3, which found that methodological concordance was high, with a total agreement between these two methods of 100% for detection of H. canis (k = 1) and 99% for detection of B. vogeli (k = 0.98).Kappa agreement statistics between both deep-sequencing methods; Illumina versus nanopore sequencing (Table 4), demonstrated less concordance, with 96% total agreement for detection of B. vogeli (k = 0.9) and 92% agreement for H. canis (k = 0.81).Differences in results was driven by eight samples found positive to H. canis by Illumina-sequencing but negative by nanopore sequencing and four samples found positive to B. vogeli by nanopore sequencing that were negative by the other method.Additionally, to provide greater clarity on the relative performance of all techniques assessed, the results of the multiplex qPCR against those achieved by Illumina deep-sequencing were compared (Table 5).These results were also more discordant than those obtained by comparing the multiplex qPCR to nanopore sequencing.Illumina sequencing found eight samples positive to H. canis that were negative by qPCR with a 92% total agreement (k = 0.81), whilst the multiplex qPCR found three samples positive to B. vogeli that were negative by Illumina sequencing providing 97% total agreement (k = 0.93) see Table 5.
In addition, nanopore sequencing was the only method to detect a H. felis single infection from one dog, whereas this sample was unsurprisingly negative to H. canis by the highly specific qPCR and was only characterized as positive to Hepatozoon spp.via the Illumina deep-sequencing method.

| DISCUSS ION
This study reports the development and optimisation of a novel nanopore sequencing-based method for the unbiased characterization of apicomplexan haemoparasites from mammalian blood, validated on field samples from canines.Such a method may demonstrate great utility as an explorative tool in contexts where there is limited-to-no data on the pathogens that might be encountered, thereby making traditional diagnostics suboptimal for accurate haemoparasite community characterization.Through targeting of the apicomplexan 18S rRNA gene, the metabarcode of pathogens from this phylum could be elucidated, providing detailed information on haemoparasite single, and coinfections with species-level taxonomic classification.Leveraging the capability of ONT′ devices to analyse long-reads permitted sequencing of amplicons over 1400 bp, a feat that would not be achievable on short-read NGS platforms.This greater amplicon size permitted near complete sequencing of the 18S rRNA gene, providing superior taxonomic identification of pathogens when compared to short-read metabarcoding, which typically provides poorer taxonomic resolution for certain groups, through sequencing of just one-to-two 18S rRNA gene hypervariable regions (Ghafar et al., 2021;Huggins et al., 2019a).
In addition, the bioinformatic pipeline employed adequately compensated for the natural error rate of the R9.4.1 flow cells used, meaning that 18S rRNA gene consensus sequences formed TA B L E 3 Multiplex qPCR versus nanopore sequencing agreement statistics for the two most common apicomplexan haemoparasites identified from 100 Cambodian dog blood samples.were accurate and highly similar to reference data (Rodríguez-Pérez et al., 2021).When the nanopore method was benchmarked against other molecular assays it was found to perform as well as a highly sensitive and specific multiplex qPCR (Huggins, Massetti, et al., 2021), whilst also detecting an apicomplexan haemoparasite species missed by this assay and only classified to the level of genus by Illumina-based deep sequencing.
Testing of previously characterized positive controls demonstrated the range of apicomplexan genera detectable by the nanopore assay, including those of common haemoparasites such as Babesia, Hepatozoon, Plasmodium and Theileria (Table 1), whilst in silico analyses of these primers suggest other important genera such as Cytauxzoon and Rangelia may also be detectable (Appendix S1).The large variety of different cell types these apicomplexan parasites infect, alongside the diverse body sites occupied, demonstrates a wide potential applicability of this newly developed method for research in different parasitological fields.
Positive controls from diverse mammalian hosts, including cats, cows, dogs, foxes and platypuses could all be accurately analysed, without significant cross-reaction and sequencing of host DNA, highlighting a broad utility for this assay across the Mammalia class (Table 1).The range of sample types from which nanopore sequencing was possible included DNA extracted from several tissue types and cell culture, whilst accurate results were attainable from samples with total DNA concentrations as low as 0.23 ng/ μL.Moreover, haemoparasite coinfections of up to three different pathogens could be accurately sequenced, even when species were closely related, for example, B. canis, B. rossi and B. vogeli (Carret et al., 1999;Penzhorn, 2020;Zahler et al., 1998).The accuracy of apicomplexan haemoparasite 18S rRNA gene consensus sequences formed in all contexts was always over 99.1% identical to the relevant sequence from GenBank, demonstrating a consistently good performance by this assay, irrespective of sample type or source (Table 1).
Across the 100 Cambodian dog blood samples tested by the nanopore sequencing assay, the apicomplexan haemoparasites B.
vogeli, H. canis and H. felis were identified with high overall levels of infection at 48%.Of the dogs that were infected, 17% were B.
vogeli and H. canis coinfections, whilst the remaining 83% were single infections (Table 2).The identification of Hepatozoon felis from a dog is significant as this has not been reported before, however, the implications of this finding to canine disease remain unclear.Within the literature there is one report of H. felis infecting a wild canid; Cerdocyon thous, in Brazil (de Sousa et al., 2017), a second report of a H. felis-like organism infecting hunting dogs in the US (Dear et al., 2018) and another relevant investigation which molecularly characterized H. felis in a tick found feeding on a stray dog in Turkey (Aktas et al., 2013).Moreover, there is very limited prior evidence identifying the presence of H. felis in Asia, with just a few reports of this pathogen within any host or vector from across this continent TA B L E 4 Illumina sequencing versus nanopore sequencing agreement statistics for the two most common apicomplexan haemoparasites identified from 100 Cambodian dog blood samples.Abbreviations: CI, confidence intervals; NEG, negative; POS, positive; SE, standard error.(Bhusri et al., 2017;Journal et al., 2022;Sakuma et al., 2011).Whilst the detection of H. felis could represent a potential contamination event, this is unlikely given the use of separate pre-PCR and post-PCR workspaces for processing of samples and controls as well as the lack of H. felis sequences at any other barcode including those used for negative controls.
The detection of unusual parasites in non-typical hosts is one of the primary advantages of employing NGS-based metabarcoding assays for pathogen surveillance, particularly across study sites with little-to-no prior epidemiological data.Conventional molecular diagnostics would likely miss such rare or unexpected findings as they rely on careful selection of assays to detect just one or a few pathogens that are expected to be found in the region under investigation (Huggins et al., 2023;Ondrejicka et al., 2014).In the case of coinfections, if assays like cPCR and Sanger sequencing are used to target conserved genes then often only the dominant pathogen will be detected, whilst rarer or novel pathogens may be obscured (Zepeda Mendoza et al., 2015).This is in stark contrast to metabarcoding methods which can holistically characterize the entire pathogen community simultaneously, identifying both common and rare sequences from a sample (Huggins et al., 2023;Zepeda Mendoza et al., 2015).
The relative diagnostic performance of the nanopore-based metabarcoding assay was ascertained by direct comparison to other molecular techniques.The almost perfect agreement (k > 0.98) between the multiplex qPCR and nanopore method, for haemoparasite detection (Table 3) is an important finding given that the qPCR employed can detect B. vogeli and H. canis at very low DNA concentrations (Huggins, Massetti, et al., 2021).
Moreover, via the use of qPCR standard curves and C q conversion into gene target copy number, at least 10 B. vogeli parasites and at least 299 to 779 H. canis parasites could be detected from 1 μL of template DNA or 200 μL of whole blood (Dalrymple et al., 1992;Garcia, 2007;Teal et al., 2012).Nonetheless, such data does not represent actual limit of detection values for the assay, as these figures could only be calculated using the highest C q values found within the dataset.Moreover, the indicators of analytical sensitivity herein identified for the nanopore assay are highly dependent on multiple factors, including the depth of sequencing achieved, the initial flow cell pore counts and pore death rates as well as the concentration of library loaded, making reliable limit of detection analyses impossible (Huggins et al., 2022;Player et al., 2020).At present, these factors all hamper the ability to use nanopore sequencing data quantitatively, although this issue may in part be resolvable through the use of normalization methods that may factor in the aforementioned variables.In contrast, it has been well established that qPCR methods can be used to accurately quantify amounts of starting DNA present, and therefore, be used to track disease progression or treatment efficacy (Gaunt et al., 2010;Wang et al., 2018).
Kappa agreement statistics demonstrated some discordance between the metabarcoding results obtained by nanopore and Illumina sequencing (Table 4).This discordance was mirrored within the comparison of the qPCR with Illumina sequencing results (Table 5).Therefore, given the high concordance between the qPCR and nanopore data, erroneous identification of haemoparasite positivity statuses may be occurring with the Illuminasequencing method.The Illumina dataset used was conducted on the MiSeq platform, attaining sequencing depths of between six and nine million short reads, depending on the relevant sequencing batch (Caporaso et al., 2012;Huggins, Colella, et al., 2021).
In contrast, nanopore sequencing on ONT′ MinION™ platform in the current study accrued 4.4 million long reads.The higher read depths and smaller amplicon size employed by the Illumina method (130 bp versus ~1450 bp on the nanopore platform) may increase the risk of cross-contamination during library preparation as much smaller sizes and quantities of PCR product can be aerosolised and distributed across the 96-well plate used for multiplexing and indexing of samples (Aslanzadeh, 2004;Boers et al., 2019;Champlot et al., 2010;Huggins et al., 2019a;Kim et al., 2017).Very high H. canis read counts were observed across the Illumina dataset with read cut-off thresholds greater than the 53 identified for nanopore sequencing, further highlighting the possibility of crosscontamination or barcode index misreading, potentially generating H. canis false positives within the Illumina data (Huggins, Colella, et al., 2021;Kim et al., 2017;Xu et al., 2018).An alternative explanation could be that the Illumina-sequencing method is more sensitive with a lower limit of detection for H. canis DNA than the other two molecular methods, although there is sparse data within the literature to support the idea that Illumina sequencing is more sensitive than qPCR (Anis et al., 2018;Thorburn et al., 2015).
All three samples that were  (Knight et al., 2018).
Whilst the nanopore sequencing assay demonstrated many strengths, there were also several limitations inherent to this technique.At present, library preparation still requires substantial laboratory expertise, whilst a high proficiency in bioinformatics is required to accurately analyse and interpret results (Huggins et al., 2019b;Malla et al., 2019;Zhu et al., 2020).Nanopore sequencing is also more expensive than many conventional molecular techniques, for example using the herein detailed methodology which multiplexed 96 samples concurrently, the cost of analysis, that is, that of all the necessary reagents and flow cell required, was approximately $10.5 AUD per sample versus just $2 AUD using qPCR at the time of writing.Such cost-estimation does not consider the price of purchasing hardware nor the provisioning of any bioinformatic expertise that may be needed to analyse nanopore data.Notwithstanding sequencing duration, which is highly variable, the time required for library preparation, inclusive of the two PCRs was approximately 6.5 h for the 96 samples tested in the first batch, which is substantially longer than the ~2.5 h required to analyse the same number of samples by qPCR.Nonetheless, the limitations encountered by the nanopore sequencing assay may in many contexts be offset by the benefits this method exhibits regarding its ability to detect rare, emerging or novel apicomplexan haemoparasites, especially for samples from regions and hosts where little prior pathogen data has been obtained (Ghafar et al., 2021;Glidden et al., 2020;Huggins et al., 2022;Huggins, Colella, et al., 2021;Squarre et al., 2020).For example, the method herein developed may be valuable within a governmentally funded biosecurity context, whereby animals being imported into a country may be infected with a highly diverse range of pathogens which would be hard-to-impossible to comprehensively test for using traditional, species-specific diagnostics.
To conclude, the novel nanopore-based metabarcoding assay for the holistic characterization of apicomplexan haemoparasite communities was shown to perform as well as a highly sensitive and specific multiplex qPCR, whilst also detecting one pathogen missed by the latter method.The power of such unguided NGSbased methods to detect rare or novel pathogens is exemplified by the nanopore sequencing assay providing the first molecular detection of H. felis from a domestic dog.This assay builds on previous research in the field, that used short-read NGS technologies for haemoparasite metabarcoding by increasing the size of the barcoding amplicon that can be sequenced, thereby providing greater taxonomic resolution, without the need for additional confirmatory tests (Barbosa et al., 2017;Chaudhry et al., 2019;Ghafar et al., 2021;Huggins et al., 2019a;Mohamed et al., 2021;Ortiz-Baez et al., 2020).The nanopore method could be made more costeffective for epidemiological studies if employed within a two-part protocol, whereby an initial apicomplexan-generic cPCR could be used, and only those samples that tested positive would be taken forward to a second nanopore metabarcoding step, removing the costs associated with deep-sequencing negative samples.In addition, the small size of the MinION™ device allows for the possible adaptation of this assay for field-use, whilst also potentially showing great promise for the detection of haemoparasites from other organisms, including arthropod vectors or humans.The method presented herein could be used alongside other MinION™-based assays for the detection of different blood-borne pathogens, such as bacteria, providing a suite of veterinary and human diagnostic tools with which to surveil for the entire 'pathobiome' of a host species within a short period of time (Bass et al., 2019;Huggins et al., 2022).

O RCI D
18S rRNA gene sequences greater than 200 bp long and smaller than 10,000 bp long for species within the phylum Apicomplexa were downloaded (txid5794).The search terms were ((((((18S ribosomal RNA[Title]) OR 18S rRNA[Title]) OR ribosomal RNA[Title]) OR SSU rRNA[Title]) OR SSU ribosomal RNA[Title]) AND txid5794[Organism]) AND 200:10000[Sequence Length].The positive control sequence for the A. fischeri 16S rRNA gene (NCBI accession NR_029255.1)was also included in this database.The relevant GitHub page detailing how this database was constructed is available here https://github.com/vetscience/ Huggi ns_NanoC LUST.This database is downloadable from https://melbo urne.figshare.com/ proje cts/Huggi ns_NanoC LUSTd b/160631.Optimal NanoCLUST parameters were found to be a minimum read length of 1000 bp, maximum read length of 1900 bp, minimum cluster size of 30, 100 reads for polishing with 100,000 reads used for UMAP projection and a HDBSCAN cluster selection epsilon value of 0.5, that is, the pipeline's default.Additionally, the curated apicomplexan 18S rRNA gene database was used as the taxonomic classification database with all other parameters within the pipeline kept on their default value.Read counts, consensus sequence lengths and classifications with a percent identity score higher than 90% as generated by NanoCLUST, were taken as the final dataset generated by the nanopore sequencing assay to which other methods were compared.All nanopore-based NGS data produced in the present study are available from the NCBI BioProject database, BioProjectID: PRJNA923029; specifically, Sequence Read Archive (SRA) accessions SRX19834392 to SRX19834499.
as muscle and nervous tissue as well as cell culture were also found to be a suitable sample type from which accurate metabarcoding results could be obtained.Optimisation of the metabarcoding library preparation protocol included the use of a 0.6× ratio of NucleoMag bead clean-up step after the second barcoding PCR to exclude and remove low-weight (<400 bp) PCR product.Removal of such low-sized product was verified by analysis on a 4200 TapeStation System and by a lack of reads <400 bp in size as observed through the relevant MinKNOW sequencing reports.After clean-up of amplicons from the second barcoding PCR, large differences in amplicon DNA concentrations were observed, varying from as low as 0.1 ng/μL to >200 ng/μL.Such disparities meant that pooling of equimolar quantities of amplicons from different samples could not be reasonably achieved and therefore just 2 μL of cleaned amplicon from each sample was pooled, followed by a DNA concentration step using a 2× NucleoMag bead ratio, eluted in 50 μL.Despite the lack of equimolar pooling from each amplicon, all barcodes and therefore samples were found to be represented in the final sequencing results, with a mean total sequencing depth of >62,000 required to obtain accurate results for detection of apicomplexan haemoparasites.3.2 | Apicomplexan haemoparasite metabarcoding on 100 dog blood DNA samples from CambodiaAcross the two sequencing runs used to identify the apicomplexan haemoparasite metabarcodes of 100 Cambodian canine blood samples and eight control samples (four negative and four positive) a total of 4,400,620 raw reads (139.02GB total data, i.e., FAST5 and FASTQ) were obtained.These raw reads were processed by the NanoCLUST bioinformatic pipeline into 2,237,540 filtered and polished reads, i.e., 51% of the total raw reads were used by the pipeline for further analysis.The majority of data passed initial quality control filtering, that is, met a Q-score of ≥8, with a ratio of passed to failed bases of 7.76GB:1.58GBfor the first 96-sample sequencing batch (sequenced for 39 h) and 1.11GB:0.22GBfor the second sequencing batch (sequenced for 3.5 h).For the total raw data with a Q-score of ≥8, the mean sample read count and standard error (S.E.) for the biological samples was 44,006 ± 9921, whilst for the filtered and polished data it was 22,375 ± 3359.For apicomplexan haemoparasite positive samples the number of different clusters, or consensus sequences, outputted by NanoCLUST ranged from one to 59 with an average of 10, whilst haemoparasite negative samples always generated zero clusters.Higher numbers of reads were obtained for the positive control samples, with a raw read total of 739,425 and mean with S.E. of 184,856 ± 12,314, whilst after filtering the total positive control sample reads were 298,913 and a mean with S.E. of 74,728 ± 4304.Negative control samples in the first sequencing batch had low raw read counts with a total of 151, mean and S.E. of 50 ± 7, whilst after data filtering and polishing zero reads remained for all negative controls in this batch.For the second sequencing batch the negative control used had a high read count of 10,960 for the raw data and 10,604 postfiltering, however, these sequences were all identified as fungus or plant DNA that could have represented some minor contamination or growth of microbes within the negative control water and/or extraction and PCR reagents.
BioProjectID: PRJNA923029; specifically, Sequence Read Archive (SRA) accessions SRX19834392 to SRX19834499.Previously published Illumina data used in the methodological comparisons is available from BioProjectID: PRJNA601241, using BioSampleIDs: SAMN13870739 to SAMN18323752.Databases developed and used for taxonomic classification are downloadable from https:// melbo urne.figshare.com/proje cts/Huggi ns_NanoC LUSTd b/160631.E TH I C S S TATEM ENTThis study was approved by the Office of Research Integrity and Ethics, University of Melbourne under ethics permit 1814620.B EN EFIT-S H A R I N G S TATEM ENTAn ongoing research collaboration was generated, developed and sustained through Animal Mama Veterinary Hospital (Phonm Penh, Cambodia) and Cambodian Mine Action Centre via the facilitation of training in diagnostic parasitology and ongoing projects to investigate the health, and use of chemopreventive tools to protect landmine detection dogs within Cambodia.The results of this project, and those continuing, are regularly shared with collaborators and the broader research community including frequent capacity building workshops in Cambodia and the broader Southeast Asia region via Rebecca J. Traub's role within the Tropical Council for Companion Animal Parasites (TroCCAP) not-for-profit organization.Cambodian collaborators have also co-authored numerous publications coming from the primary aims of this research project, that is, to characterize the diversity of canine vector-borne pathogens in Cambodia.Additionally, there is ongoing supervision by Vito Colella of DVM students at the Royal University of Agriculture, Phnom Penh that encompasses novel research programs within Cambodia and dissemination of current epidemiological findings.

TA B L E 1 Performance of the 18S rRNA gene-targeting nanopore sequencing protocol and NanoCLUST pipeline classification on apicomplexan haemoparasite positive controls. Apicomplexan pathogen present Sample type DNA concentration (ng/μL) NanoCLUST classification NCBI accession no. Length (bp) Identity (%) No. of reads
).
for FASTQ data quality control, transforms this data into normalized k-mer frequency vectors and then uses Uniform Manifold Approximation and Projection (UMAP) (McInnes et al., 2018) and Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) (McInnes et al., 2017) to form clusters and assign reads to these.Next consensus sequences are (Koren et al., 2017)of these clusters using Average Nucleotide Identities (ANI) with FastANI(Jain et al., 2018), followed by read correction in Canu(Koren et al., 2017)using both Racon(Vaser

Methodological validation and optimisation of nanopore sequencing for apicomplexan haemoparasite metabarcoding
Through testing of the metabarcoding protocol on apicomplexanpositive control samples confirmed via cPCR and Sanger sequencing, the nanopore sequencing method could detect a diverse range of taxa within the genera Babesia, Hepatozoon, Plasmodium and Theileria as well as Neospora caninum and T. gondii (Table1).The consensus sequences formed by the NanoCLUST pipeline for these rossi) were all accurately identified, whilst covert coinfections within positive controls were also detected, for example, a positive control sample that was only found to be positive to B. vulpes via cPCR with Sanger sequencing was also found infected with H. canis (Table1).Positive control DNA tested was predominantly from whole blood extracted DNA, however DNA extracted from other tissues, such Illumina sequencing versus multiplex qPCR agreement statistics for the two most common apicomplexan haemoparasites identified from 100 Cambodian dog blood samples.Agreement level defined as poor if coefficient (k) is ≤0.20, fair agreement if 0.21 ≤ k ≤ 0.40, moderate agreement if 0.41 ≤ k ≤ 0.60, substantial agreement if 0.61 ≤ k ≤ 0.80, and high agreement if k ≥ 0.81.