Proteome of urticating setae of Ochrogaster lunifer, a processionary caterpillar of medical and veterinary importance, including primary structures of putative toxins

Ochrogaster lunifer (Lepidoptera: Notodontidae) is an Australian processionary caterpillar with detachable urticating setae that have a defensive function. These true setae induce inflammation when they contact human skin, and equine foetal loss syndrome if they are accidentally ingested by gravid horses. We used transcriptomics and proteomics to identify proteins and peptides present in and on urticating setae, which may include toxins that contribute to inflammation and/or foetal loss syndromes. This process identified 37 putative toxins, including multiple homologues of the honeybee venom peptide secapin, and proteins with similarity to odorant binding proteins, arylphorins, and the insect immune modulator Diedel. This work identifies candidate molecules that may contribute to the adverse effects of processionary caterpillar setae on human and animal health.

Setae are produced in large numbers in later instar larvae and they are released if the caterpillars are handled or disturbed, and dispersed widely into the environment [7].The setae of O. lunifer are well-known for causing inflammation and allergy when they come into contact with humans (Figure 1E) [8,6,9].The release of allergens or toxins from the urticating setae is likely to be a key driver of this urticaria.Ingestion of O. lunifer true setae from caterpillars or their exuviae is the causative agent of Equine Amnionitis and Foetal Loss (EAFL), a pathology of pregnant mares that can result in death of the developing foal (Figure 1F) [10,11].The aetiology of EAFL is yet to be determined and may be associated primarily with release of gut bacteria into non-gut tissues including the uterus and developing embryo (Todhunter,[11]; Todhunter, Muscatello, et al., 2013;[12]).However, setae toxins may contribute to the aetiology of EAFL.Some insights into the function of O. lunifer seta toxins come from studies on other members of the Thaumetopoeinae subfamily, many of which produce similar defensive setae [1,13].The pine processionary caterpillar Thaumetopoea pityocampa (Denis & Schiffermüller, 1775) occurs in the Mediterranean region and produces setae infamous for causing allergic reactions and inflammation [2,[14][15][16][17].The setae of T. pityocampa release a soluble toxin that degranulates mast cells, which is readily inactivated by heat and is probably therefore a protein or peptide [18].One allergen, thaumetopoein, was isolated and characterised as a heterodimer of 13-kDa and 15-kDa subunits, though sequence data was not obtained [19].Subsequently, two IgEbinding allergens were isolated from T. pityocampa extracts: Tha p 1 is a 13 kDa odorant-binding family protein with two disulphide bridges that was isolated from whole-body extracts, while Tha p 2 is an 11 kDa protein with five disulphide bridges that was isolated from setae extracts and lacks homology to characterised protein families [20,21].Transcriptomics and proteomics have been used to examine the overall composition of setae and the soluble extracts of T. pityocampa setae [21,22], providing the primary structures of numerous other putative toxins and also likely many setae structural proteins and housekeeping proteins.Setae and saline setae extracts obtained from the pine processionary caterpillar Thaumetopoea pinivora (Treitschke, 1834) stimulate the proliferation of blood lymphocytes in a manner independent from their chitin content, which is suggested to be due to release of currently unknown molecules from the setae [23].
In contrast, nothing is known about what peptides and proteins occur on the setae of the Australian processionary caterpillar O. lunifer, despite the medical and veterinary importance of this species.
Obtaining the primary structures of setae toxins is a prerequisite to understanding their structure and function, and determining if and how these toxins contribute to the adverse effects of setae on humans and animals.In this study, we report the proteome of aqueous and organic solvent extracts of O. lunifer true setae, and which may include allergens or toxins.

Statement of significance
The true setae of Ochrogaster lunifer are responsible for causing EAFL syndrome in pregnant horses, and inflammation and allergy in humans.It is likely that toxins released from the setae are important mediators of these effects, but the peptides and proteins in the setae have not been previously investigated using transcriptomic and proteomic techniques.
In this study we identify candidate toxins present in or on O. lunifer setae, providing insights into their evolution and possible function.In addition, this study will facilitate characterisation of these molecules by providing their primary structures and thereby the ability to produce them using chemical or recombinant methods.Such efforts may eventually culminate in therapeutics to prevent the adverse effects of O. lunifer setae, for example, through the use of specific inhibitors to combat their toxic effects, or production of antibodies or vaccines that protect against their toxins.In a biosafety cabinet, the true setae on three living larvae were removed from their setae fields using a vacuum apparatus with a fine tip.The setae were transferred to a 1.5 mL tube and weighed, having a mass of 6.6 mg.According to the estimates by Perkins et al. [6] of 2-2.5 million setae/larva this sample was estimated to contain approximately 6 million setae.

Transcriptomics
Three late-instar caterpillars were dissected in PBS and the regions of cuticle underlying the defensive setae transferred to RNAlater for storage at −80 • C. Total RNA (2.0 µg) was extracted using Trizol according to the supplier's instructions, then submitted to the Institute for Molecular Bioscience Sequencing Facility at The University of Queensland.

Preparation of setae extracts
To extract proteins, 1 mL of ultrapure water was added to 5.1 mg setae, then the sample was mildly agitated by vortexing and 10 min of sonication.The tube was then centrifuged to partition the setae (14,000 x g, 5 min, room temperature) and the supernatant transferred to a tube as the aqueous protein extract.One milliliter of hydrophobic extraction reagent consisting of 50% acetonitrile with 0.05% trifluoroacetic acid was then added to the tube, which was vortexed for 2 min, centrifuged again under the above conditions, and the supernatant transferred to a new tube as the hydrophobic protein extract.

Proteomics
A volume of 200 µL of each extract was transferred to three new tubes, used to prepare three different samples for liquid chromatographytandem mass spectrometry (LC-MS/MS): native; reduced and alkylated; and reduced, alkylated, and trypsinised.For reduced, alkylated and trypsinised samples, the extract was reduced and alkylated at 37 For matrix-assisted laser desorption/ionisation-time of flight MS (MALDI-TOF MS), 0.4 µL sample mixed with 0.5 µL of matrix (5 mg/mL α-cyano-4-hydroxycinnamic acid, 70% acetonitrile and 0.1% FA) was spotted, and mass spectra were acquired using laser intensity of 3600-4300, mass detection range 1500-9000 Da, and focus mass 3500-4000 Da.

Bioinformatics
Protein annotation was performed using SignalP (Armenteros et al

Transcriptome of O. lunifer cuticle underlying fields of setae
We investigated the proteome released into aqueous and organic solvents by O. lunifer setae using transcriptomics and proteomics (Figure 1H).Since the detailed biosynthesis of O. lunifer urticating setae is unknown, it was uncertain if the cuticle underlying the setae would yield RNA, and, if so, if that RNA would include mRNA encoding toxins contained in the setae.Nevertheless, we isolated total RNA (2.0 µg) from cuticle underlying setae, from which we produced a cDNA library.
The most abundant transcript in our sequencing dataset (123,628 transcripts per million, TPM) was found to encode a peptide with similarity to secapin from bee venom [32].Other abundant transcripts encoded two members of the odorant binding protein family similar to the previously described thaumetopoeine setae allergen Tha p 1 [20] and two additional members of the secapin family (Table 1).

Proteomics of setae extracts
Extraction of 5.1 mg setae in 1 mL water followed by 1 mL organic solvent yielded extracts of 0. Of the identified sequences, 23 (62%) including pheromone/odorant binding proteins (P/OBP, Diedel, arylphorins, apolipoprotein III, and Kazal domain peptides (but not secapins) showed similarity (BLAST E < 0.005) to T. pityocampa proteins detected by Berardi et al. [22].A detailed summary of each of the putative toxins is included in Data S1.
No homologue of Tha p 2 was detected in the setae proteome.Searching the transcriptome and raw sequencing reads with the BLASTp algorithm using T. pityocampa Tha p 2 as a query also yielded no result.
MALDI-TOF MS is complementary to LC-MS/MS and ideal for visualising molecules with masses typical of peptides (1-10 kDa).By comparison with LC-MS/MS datasets, we assigned each of the ion masses observed in MALDI-TOF-MS spectra of setae extracts (Figure 3).In the reflectron-positive dataset (which is ideal for accurate masses in the range 1-5 kDa), all the assignable peaks corresponded either to secapin-like peptides, their propeptides after cleavage, or peptides originating from one other peptide precursor, U-TPTX 22 -Ol6.Linearmode MALDI (which is less accurate but is more sensitive for molecules with molecular mass > 5 kDa) revealed two other components of the protein extracts with masses of 5846 and 11,697 Da.The lower mass was assigned as U-TPTX1 2 -Ol23, which has similarity to a Kazal domain and a mature predicted average mass of 5850 Da after cleavage of a propeptide (C-terminal to an arginine, similar to that of the secapins).The higher mass was assigned as U-TPTX 2 -Ol8, the confidently detected homologue of Tha p 1, which has a mature predicted average mass of 11,693 Da.Both proteins were confidently identified in trypsinised samples of setae extracts.

Analysis of obtained primary structures
The data collected in this study provides insights into the enzymatic processing of peptide precursors.In limacodid and megalopygid caterpillar venoms [33,34], enzymes such as dipeptidyl dipeptidase and lysine carboxypeptidase are often involved in the proteolytic maturation of peptides.In contrast, O. lunifer setae peptides are all matured by cleavage at arginine residues in such a way that the arginine does not occur in the released propeptide or the mature peptide, suggesting the involvement of enzymes with different specificity (Figure 3).
Secapin was originally reported from honeybee venom [32], but our results show that members of this peptide family are also produced by Lepidoptera.To examine the phylogenetic breadth of the secapin family, we searched the National Centre for Biotechnology Information's non-redundant (nr) amino acid database for other insect orders using the BLASTp algorithm with hymenopteran and lepidopteran secapinfamily peptides as queries.We found similar peptide sequences in Coleoptera, Diptera, and Hemiptera, including many species lacking toxic defensive adaptations such as venom of bees or the urticating setae of processionary caterpillars (Figure 4A).Alphafold 2 predicts a short antiparallel β-sheet motif in secapins in venom produced by the honeybee Apis mellifera as well as those detected in setae extracts of O. lunifer (Figure 4B).
We compared the sequence of U-TPTX 2 -Ol8 (hereafter Ol8) to Tha p 1, other T. pityocampa setae proteins reported by Berardi et al.

DISCUSSION
We identified candidate toxins that might underlie the adverse effects of O. lunifer setae on humans and animals, especially the secapin-like peptides, the Tha p 1-like protein U-TPTX 2 -Ol8, and proteins similar to Diedel and arylphorins.Our results contrast with those of previous transcriptomic and proteomic studies on the pine processionary caterpillar T. pityocampa.These previous investigations detected 70 [21] or 353 [22] proteins respectively, though it is likely that many of these represent structural or housekeeping proteins and only one of these [22] employed a T. pityocampa sequence database.The numerous structural and housekeeping proteins detected in previous studies on T. pityocampa setae probably reflects the comparatively harsh extraction methods used in these studies.Secapin family peptides were not detected in previous studies on T. pityocampa, though this could be due to differences in how sequence databases were prepared for search-ing against proteomic data, rather than a biological difference between the species.We also did not detect any homologue of the T. pityocampa setae allergen Tha p 2 by proteomics or transcriptomics, despite such a homologue being reported after polymerase chain reaction amplification from genomic DNA of O. lunifer [35].This indicates that the tissue analysed in this study did not express Tha p 2, though it is uncertain if the encoding gene may have been lost in the particular O. lunifer population we analysed, or if it is expressed in a different tissue or a different ontogenic stage.In this study, we focussed on peptide and protein molecules associated with setae, since many insect allergens and toxins affecting humans are protein-based, and these are highly suited to the proteomic and transcriptomic techniques used here.However, non-protein small molecules may also act as toxins and allergens [36].Future studies are required to investigate the role of non-proteinaceous toxins associated with O. lunifer setae.
An uncontrolled variable in this study is the ontogenic stage of the caterpillars analysed, especially their location in the moulting cycle.
As the setae are newly produced at each moulting, toxin production should happen before their separation from the cuticle.A thorough control of toxin synthesis during the production cycle may explain the lack of some expected components such as Tha p 2 and the poor correspondence between transcript abundance and protein abundance.
Given that setae are cuticular structures and the cuticle is replaced every moult, it is likely that expression of setae-associated peptides and proteins depends strongly on the exact ontological stage.Further studies are required to elucidate how expression of setae proteins varies over time.
The secapin peptide family was first identified in honeybee venom [32,37].Our finding that secapin occurs across the four hyperdiverse holometabolan orders (Hymenoptera, Lepidoptera, Diptera, and Coleoptera) suggests that secapin has an ancestral conserved role in normal physiology, but one that is suitable for adaptation as a toxic defense, a situation similar to, for example, the enzyme phospholipase A 2 [38].Members of the secapin family have since been reported to affect the metabolism of lipid messengers of the immune system or act as protease inhibitors.One variant, secapin-2, isolated from Africanised honeybees, has been shown to cause oedema and mechanical allodynia when injected into mice [39].Both effects were fully or partially blocked by administration of the 5-lipoxygenase inhibitor Zileuton but not the cyclooxygenase pathway inhibitor indomethacin.Zafirlukast, which is an inhibitor of the cysteinyl leukotriene receptor (downstream in the lipoxygenase pathway from 5-lipoxygenase), also inhibited the effect of secapin-2.No cytolytic or mast-cell-degranulating activities were observed [39].Similar results were obtained with paulistine, a venom peptide of the wasp Polybia paulista Ihering, 1896, which was found to cause oedema and allodynia by modulation of the lipoxygenase and perhaps also cyclooxygenase pathways [40].If O. lunifer secapins modulate the metabolism of lipid messengers derived from arachidonic acid as some of their hymenopteran homologues are reported to, it would immediately suggest a mechanism by which they might exert proinflammatory effects on humans and animals.
Another secapin, AcSecapin-1 from venom of the Asiatic honeybee Apis cerana Fabricius, 1793, is reported to inhibit serine proteases, including mammalian plasmin, trypsin, chymotrypsin, and elastases, and this confers antimicrobial and antifungal activities [41].This secapin is reported to lack cytolytic or inflammatory activity and therefore it has been suggested to have potential as an antimicrobial drug [42].Secapin is also sometimes referred to as a neurotoxin (e.g., https://pfam.xfam.org/family/PF17521)but to the best of our knowledge there is no published evidence supporting this effect.Secapin is thought to be closely related to tertiapin, apamin, and a mast-celldegranulating peptide [43], which are peptides from bee venom that modulate potassium channels and some aspects of inflammation.Further work is required to elucidate the bioactivity and structure of O. lunifer secapins.
Taken together, our data suggest candidate molecules that may be responsible for the adverse effects of O. lunifer setae on humans and animals.Consequently, this study provides a foundation for further research on the biological activities of processionary caterpillar setae toxins, how they cause inflammation and equine foetal loss syndrome, and how these adverse effects may be ameliorated.

ACKNOWLEDGMENTS
The Late instar O. lunifer caterpillars from a "tree-hugger" nest on Moreton Bay Ash (Corymbia tessellaris) were collected from The University of Queensland Gatton campus (−27.5498o , 152.3394 o ), Queensland, Australia on 1/06/2021.A cytochrome oxidase subunit 1 DNA barcode sequence for these caterpillars, from our assembled transcriptome, is reported (Figure S1) due to O. lunifer possibly representing a species complex.

F
I G U R E 1 Ochrogaster lunifer defensive setae.(A) Processionary train of O. lunifer; Photograph by Christopher Watson.(B) Single caterpillar.Panels B-D reproduced from Perkins et al. [6] Med.Vet.Entomol.30:241.(C) Scanning electron micrograph (SEM) of defensive organs containing thousands of setae.(D) SEM of setae with inset highlighting a single 89-µM long seta inset.Panel D inset and panel G reproduced from Perkins et al. [7] J. Insect Sci.19:6.(E) Urticaria on foot of one of the authors following exposure to setae.(F) Foal aborted due to ingestion of setae by mare, reproduced with permission from Cawdell-Smith et al. [10] Equine Vet.J. 44:282.(G) Ontogeny of setae development.Black patches indicate setae fields.Numbers on the head indicate approximate head widths in each instar.Some setae field distributions correspond to differing instars in different populations of O. lunifer.
27 and 0.15 µg/µL respectively, suggesting that 8.4% setae mass consists of soluble molecules that are easily separated from setae.Comparison of LC-MS/MS spectra obtained from these extracts with transcriptome-derived data yielded identification of 41 peptide and protein precursors.The primary structures of these peptides and proteins were annotated using BLAST and HMMER algorithms (Figure 2; primary structures, annotations, and other details TA B L E 1 Highly expressed transcripts.The 15 most highly-expressed transcripts in a transcriptome of the O. lunifer defensive organ cuticle.Contigs encoding products with secretion signal peptides are highlighted grey.F I G U R E 2 Proteins detected in O. lunifer setae soluble extracts using proteotranscriptomics.PO, phenoloxidase; P/OBP, pheromone/odorant binding protein.are shown in Data S1).Four proteins with homology to known structural proteins of insect cuticle were omitted and the remaining 37 were assigned as putative toxins.The most prominent of these were the secapin-like peptides, homologues of Tha p 1 in the odorant binding protein family, proteins with similarity to arylphorins, and homologues of the insect immune modulator Diedel.Other putative toxins include serpins, a Kazal domain peptide, a hydrolase, and six sequences without detectable homology to known sequences.

[ 22 ] 2 (
, similar sequences encoded by high abundance transcripts in our transcriptome, and P/OBP family toxins from venoms of the nettle caterpillar Doratifera vulnerans (Lewin, 1805) and the asp caterpillar Megalopyge opercularis (Smith, 1797) (Figure5A).Ol8 has only 27.3% sequence identity to Tha p 1, and has greater sequence identity (45.0%) to another T. pityocampa setae protein (cds.c245091_g1_i1|m.138485)as well as the M. opercularis venom toxin U-MPTX 7 -Mo21 (28.0%).Thus, Ol8 is not an orthologue of Tha p 1. Searching our transcriptome F I G U R E 3 MALDI-TOF-MS spectra of O. lunifer setae extracts.(A) Reflectron-positive mode MS spectrum of aqueous setae extract.Suffixes on labels on identified peptides (-I, -II) indicate they represent the first or second peptide originating from the precursor shown below.(B) Reflectron-positive mode MS spectrum of organic phase setae extract.(C) Linear mode MS spectrum of aqueous extract.(D) Linear mode MS spectrum of organic extract.(E) Precursor primary structures of some detected peptides.Predicted secretion signal peptides are shown in italics, detected peptides are shown in upper case, internal excised amino acids are underlined, C-terminal amidation tags are shown in bold, and asterisks represent stop codons.using the BLASTp algorithm and Tha p 1 as a query revealed seven members of this family in the O. lunifer transcriptome retrieved with lower E values compared to Ol8.Interestingly, some of these peptides are also much more highly expressed than Ol8 in our transcriptomic data.For example, the Tha p 1-like sequences shown in Figure 5A and encoded by transcripts shown in Table 1 have TPM values of 13,183 and 5442 compared to only 30 TPM for Ol8.The most highly expressed of these has 50.0%identity with T. pityocampa Tha p 1.However, none of these highly expressed versions were detected in setae extracts by proteomics.Nevertheless, Alphafold 2 predicts very similar folds for Ol8 and Tha p 1, with an extra N-terminal helix for Ol8 (Figure 5B).The other odorant binding family protein detected in the O. lunifer setae extracts (U-TPTX 11 -Ol22) was much more distantly related than any of the proteins shown in Figure 5A and could not be aligned.The Diedel-like proteins identified in O. lunifer setae extracts have greater sequence identity (up to 66.4%) to one of the proteins detected F I G U R E 4 Secapin family peptides.(A) Alignment of O. lunifer secapins with secapin family peptides from hymenopteran venoms and predicted peptides from other insects.#, amidation or amidation tag.Amel and Acer, honeybees Apis mellifera Linnaeus, 1758 and A. cerana Fabricius, 1793.Ilum, click beetle Ignelater luminosus (Illiger, 1807).Aaeg, mosquito Aedes aegypti.Tjap, hawk moth Theretra japonica (Boisduval, 1869).Aluc, mirid bug Apolygus lucorum (Meyer-Dur, 1843).Symbols underneath the alignment show similarity in ClustalW format.Cationic and anionic residues are shown in blue and red, respectively.Hydrophobic residues are shown in pink and proline residues in orange.(B) Alphafold 2 predicted structures of selected secapins.The disulfide bond is shown in yellow.F I G U R E 5 Pheromone/odorant-binding protein family setae proteins.(A) Alignment of Ol8 detected by proteomics with T. pityocampa (Tpit) Tha p 1, sequences encoded by highly expressed O. lunifer transcripts (Ol_5442TPM and Ol_13183TPM), and venom toxins from the caterpillars Megalopyge opercularis (Mo) and Doratifera vulnerans (Dv).Formatting is as in Figure 4. (B) Alphafold 2 predicted structures for Ol8 and Tha p 1.F I G U R E 6 Diedel-like setae toxins.(A) Alignment of O. lunifer Diedel-like putative toxins with Dv67a from venom of the limacodid caterpillar D. vulnerans and the immune modulator Diedel from Drosophila melanogaster (Dmel).Formatting is as in Figure 4. (B) Comparison of Alphafold 2 predicted structure of setae protein Ol10 with crystal structure of D. melanogaster immune effector.on T. pityocampa setae (cds.c243980_g12_i4|m.131713,[22]) than to the D. vulnerans venom protein U-LCTX 9 -Dv67 (29.0%) or Drosophila melanogaster Diedel (27.1%) (Figure6A).All cysteine residues are conserved and the overall fold is predicted to be similar by Alphafold Figure6A,B).The O. lunifer setae proteins have a 20-residue insertion between the third and fourth cysteine residues in comparison to D. melanogaster Diedel which is predicted to form an additional α-helix on the opposite side of the β-sheet to the large α-helix in the dipteran protein (Figure6B).
. The O. lunifer transcriptome was annotated by authors would like to thank Angelika Christ and Tim Bruxner at the Institute for Molecular Bioscience Sequencing Facility for sequencing services, and Alun Jones at the Queensland Bioscience Precinct Mass Spectrometry Facility for assistance with proteomics.This work was funded by the Australian Research Council through Discovery supported by Principal Research Fellowship APP1136889 from the Australian National Health Medical Research Council.Open access publishing facilitated by The University of Queensland, as part of the Wiley -The University of Queensland agreement via the Council of Australian University Librarians.
Project DP200102867 to A.A.W., a Discovery project DP210102521 to M.P.Z., and Centre of Excellence grant CE200100012 to G.F.K. G.F.K. was