Predicting Blomia tropicalis allergens using a multiomics approach

Abstract Background The domestic mite Blomia tropicalis is a major source of allergens in tropical and subtropical regions. Despite its great medical importance, the allergome of this mite has not been sufficiently studied. Only 14 allergen groups have been identified in B. tropicalis thus far, even though early radioimmunoelectrophoresis techniques (27 uncharacterized allergen complexes) and comparative data based on 40 allergen groups officially recognized by the World Health Organization (WHO)/IUIS in domestic astigmatid mites suggest the presence of a large set of additional allergens. Methods Here, we employ a multiomics approach to assess the allergome of B. tropicalis using genomic and transcriptomic sequence data and perform highly sensitive protein abundance quantification. Findings Among the 14 known allergen groups, we confirmed 13 (one WHO/IUIS allergen, Blo t 19, was not found) and identified 16 potentially novel allergens based on sequence similarity. These data indicate that B. tropicalis shares 27 known/deduced allergen groups with pyroglyphid house dust mites (genus Dermatophagoides). Among these groups, five allergen‐encoding genes are highly expressed at the transcript level: Blo t 1, Blo t 5, Blo t 21 (known), Blo t 15, and Blo t 18 (predicted). However, at the protein level, a different set of most abundant allergens was found: Blo t 2, 10, 11, 20 and 21 (mite bodies) or Blo t 3, 4, 6 and predicted Blo t 13, 14 and 36 (mite feces). Interpretation We report the use of an integrated omics method to identify and predict an array of mite allergens and advanced, label‐free proteomics to determine allergen protein abundance. Our research identifies a large set of novel putative allergens and shows that the expression levels of allergen‐encoding genes may not be strictly correlated with the actual allergenic protein abundance in mite bodies.


| INTRODUCTION
House dust mites and domestic mites cause IgE-mediated type I allergic conditions in an estimated 30% of the world's population. 1 In sensitized individuals, allergen exposure induces the cross-linking of high-affinity FcεRI receptor-bound IgE on effector cells, leading to the release of mediators that cause allergic symptoms. 2In severe cases, mite allergies manifest as chronic diseases, such as allergic rhinitis, conjunctivitis, and bronchial asthma. 3Currently, 40 allergens of astigmatid mites are recognized by the Allergen Nomenclature Subcommittee of the World Health Organization (WHO) and the International Union of Immunological Societies (IUIS) 4 ; however, allergens are not studied across all domestic astigmatid mite species to the same extent.
6][7][8][9] Within its typical range, this mite species can be found in up to 96% of dust samples, with a 40% prevalence among all mite species. 7Two B. tropicalis allergens are considered major allergens, rBlo t 5 and rBlo t 21, which show IgE reactivity to 93% and 89% of sera from adult asthma patients, respectively. 10A recent study showed high IgE-binding frequencies for rBlo t 21 (43%) and rBlo t 2 (40%), whereas the other tested allergens were only recognized in the samples of a few patients (rBlo t 10, 6%; rBlo t 12, 3%). 11Overall, 27 antigen-antibody precipitin complexes have been identified in B.
tropicalis by crossed radioimmunoelectrophoresis. 12However, out of 40 allergen groups known in domestic mites, only 14 allergen groups have been described in B. tropicalis thus far. 4,13Furthermore, 7 have been predicted based on their high similarity to identified allergens of other astigmatid mites. 13,14These data collectively indicate that B.
tropicalis is likely to have a substantial number of unrecognized allergen groups.
One of the major challenges in molecular allergy research is the accurate prediction of the allergenic potential of proteins. 15Accurate diagnosis and targeted therapies for patients are critical for identifying allergenic mite species. 14Diagnosis is focused on single allergens of mites to determine the exact IgE sensitization pattern of a patient and distinguish between sensitization and cross-sensitization. 14 Additional challenges associated with research linked to mite allergenicity are the lack of screening tools for quantifying all allergenic proteins in mite bodies and feces.Mite fecal pellets (FP) are arguably the most important allergy-inducing component, as they contain powerful digestive proteins. 16The feces accumulate over time in human-related environments and, because of their minute sizes, can easily become airborne, thus increasing the risk of contact with the skin and the lung epithelium when inhaled. 16Although only a small portion of the entire mite allergome may be present in mite feces, 17 the quantification of these allergens is important.Mite feces may have distinct properties affecting sensitization and allergy induction in different ways in comparison to mites themselves.Using a proteomic approach, allergen abundance and diversity have been accurately estimated in several mite species, such as Dermatophagoides farinae, Dermatophagoides pteronyssinus, and Tyrophagus putrescentiae [18][19][20] ; however, data on B. tropicalis are lacking.Earlier studies employing bottom-up gel-based proteomics relied on narrow subsets of allergen sequences available at those times. 21 this study, we generated both a high-quality de novo whole genome (long and short reads) and transcriptome of B. tropicalis and annotated them.Putative allergens were identified based on high sequence identities and structural similarities to known allergens 4 and allergen-related proteins available in public databases.We also estimated the expression levels of known allergens/putative allergens in mite bodies and feces.An advanced proteome analysis using an Orbitrap Tribrid mass spectrometer and deduced allergen sequences was then performed to quantify allergic protein richness and abundance in mite bodies and feces.

| Sample of mites and feces
Laboratory cultures of B. tropicalis were obtained from the University of Cartagena, Colombia in 2008.Since then, the culture has been maintained under the rearing conditions of the Crop Research Institute, Prague, Czechia. 22The wheat-derived rearing diet powder contained flakes, wheat germ and Mauripan-dried yeast extract (AB Mauri, Balakong, Malaysia) (10:10:1 w/w).Mites were mass reared in tissue culture flasks on a house dust mite diet in a large population (approximately 4 weeks) when the diet was almost consumed.The detailed rearing and harvesting methodology is described elsewhere. 23Live mites were collected from the plugs and surfaces of the flasks with a brush and placed in sterile microfuge tubes."Pure" feces were obtained from the flasks after the mites and remnants of their bodies or eggs were removed and collected. 19e mite and fecal samples were weighed using a microbalance (Mettler-Toledo) to obtain samples with fresh weights in the range of 30-40 mg.For transcriptomics, mites were surface cleaned by placing them in 100% ethanol, followed by vortexing for 5 s and centrifugation at 13,000 � g for 1 min.The supernatant was replaced with a 1:10 bleach solution containing 5% sodium hypochlorite, and the samples were then mixed by vortexing for 5 s and centrifuged at 13,000 � g for 2 min.The last step was washing twice with dd H 2 O to remove residual bleach.Surface sterilization was performed on ice.
The provided rearing diet had the same weight as the feces.The rearing diet and feces were not cleaned and were stored directly in an ultracold freezer (−80°C).We collected five biological replicates of mites for RNA sequencing and 1 sample of mites for genomic sequencing.Each sample contained approximately 1000 mites.
For proteomic analyses, four sample types, each with three biological replicates, were prepared: (i) 1000 individually collected adult mites; (ii) pools of mites of different ages, including eggs; (iii) water extracts of feces; and (iv) detergent-buffer extracts of the remaining pellet of the water extract.

| Sample processing
Homogenization of mites was performed on ice.All samples were homogenized for 30 s in a glass tissue grinder (Kavalier glass, Prague, Czechia) in 500 μL of lysis buffer from the kits.DNA was extracted from homogenates after overnight incubation with 20 μL of proteinase K at 56°C using the QIAamp DNA Micro Kit (Qiagen, Hilden, Germany, cat.No. 56304) following the manufacturer's protocol for tissue samples.The concentration of the extracted DNA samples was quantified using the Qubit ® dsDNA HS Assay Kit (Life Technologies), and the quality of the DNA was determined using a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific).The average size of gDNA was determined on an E-Gel SizeSelect 2% agarose gel (Invitrogen) with a 1 kb ladder.The samples were sheared using a Covaris G-tube (Covaris Inc.).The average size of the sheared DNA was determined using an Agilent 2100 Bioanalyzer (Agilent Technologies).
RNA isolation was performed using a NucleoSpin RNA kit (catalog no.740984.50;Macherey-Nagel, Duren, Germany) according to the manufacturer's instructions, with the following modifications: homogenized samples were centrifuged at 2000 � g for 3 s, and DNA was degraded by DNase I at 37°C according to the manufacturer's protocol (Riboclear plus, catalog no.313-50; GeneAll, Lisbon, Portugal).RNA quality was evaluated using a NanoDrop (NanoDrop One; Thermo Scientific, Waltham, MA, USA) and an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA), and RNA samples together with DNA samples were shipped on dry ice to the MrDNA laboratory (Shallowater, TX, USA) for downstream processing and sequencing.
For protein analyses, mite body samples (i and ii) were further homogenized in Potter-Elvehjem homogenizers with a Teflon pestle after the addition of 100 mM (1 mL/1000 mites) triethylammonium bicarbonate buffer (TEAB; Cat No. 17902, Fluka, Sigma-Aldrich, St. Louis, MO, USA) containing 2% w/v sodium deoxycholate (SDC; Cat No. 30970, BioXtra, Sigma-Aldrich).[20] Briefly, 10 mL of ice-cold (4°C) Nanopure water (Thermo, Waltham, MA, USA) was added to the feces-coated flasks.The flasks were placed in a box with ice and mixed on a ProBlot Rocker (Labnet International, Woodbridge, NJ, USA) for 10 min laying on the side at 100 RPM, and the extract was transferred to 50 mL sterile flasks.Extraction of the second side of the feces-coated flask was performed with 10 mL of sterile water, and the extract was then added to the 50 mL flask.The third extraction in the flask was performed with 10 mL of water, which was thoroughly mixed within the flask.In total, 30 mL of the extract obtained from a flask was centrifuged at 10,000xg for 15 min at 4°C (Thermo).The supernatant was divided into 3 equal portions, and portions of 1 mL were collected for total protein quantification.The contents were lyophilized in a PowerDry LL3000 lyophilizer (Thermo Fisher Scientific).The pellets of feces (iv) used for proteomic analysis were obtained by processing the pellet after water extraction.Three milliliters of 100 mM TEAB with 2% SDC was added to the pellet that remained from approximately 20 mL of the supernatant, and the contents of the centrifuge tube were thoroughly mixed by vortexing plus 10 min of mixing on a rocker on ice.The procedure was repeated 3 times.The resulting sample was centrifuged at 10,000xg for 15 min at 4°C (Thermo Fisher Scientific), and the supernatant was collected for further analysis.All samples were stored at −80°C until use.

| Sequencing and assemblage
For Illumina DNA sequencing, the libraries were prepared using a Nextera DNA Flex library preparation kit (Illumina) following the manufacturer's user guide.Fifty nanograms of DNA was used to prepare the libraries.The samples simultaneously underwent fragmentation and addition of adapter sequences.These adapters were utilized during limited-cycle PCR in which unique indices were added to the samples.The libraries were then pooled in equimolar ratios of 0.6 nM and subjected to paired-end sequencing for 500 cycles using a NovaSeq 6000 system (Illumina).
For PacBio sequencing, 250 ng of the sheared DNA was employed as the input for library preparation using the SMRTbell Express Template Prep Kit 2.0 (Pacific Biosciences).During library preparation, each sample underwent DNA damage and end repair as well as barcode adapter ligation.To reduce the number of smaller DNA fragments (<3 kb), an additional bead purification step was performed on the SMRTbell libraries using 40% diluted AMPure PB Beads.Each library was then sequenced using a 10-h movie time on a PacBio Sequel system (Pacific Biosciences).The SMRT Link Circular Consensus Sequencing workflow (SMRT Link v.9.0.0,CCS) was used to combine multiple subreads from the same molecule to generate a highly accurate consensus sequence.The sample metadata and the whole genome sequence were deposited in GenBank as part of project PRJNA625856.
The concentration of total RNA was determined using the Qubit ® RNA Assay Kit (Life Technologies).Poly-A selection and library preparation were performed by using KAPA mRNA HyperPrep Kits (Roche) following the manufacturer's instructions.A total of 1000 ng RNA was used to prepare the library.Following library preparation, the final concentration of the library was measured using the Qubit ® dsDNA HS Assay Kit (Life Technologies), and the average library size was determined using an Agilent 2100 Bioanalyzer (Agilent Technologies).The library was then diluted to 0.6 nM and subjected to paired-end sequencing for 500 cycles using the NovaSeq 6000 system (Illumina).The samples are deposited in GenBank as project PRJNA599071.
Illumina reads were trimmed in Trim Galore, 24 corrected in fastQC, 25 and aligned with the PacBio reads in hybrid SPADES v 3.14 26,27 for DNA-based reads and rnaSPADES for RNA-based reads. 26Transcriptome sequences were annotated by Prokka, 28 and predicted proteins were identified by KEGG analysis in Ghost-Koala. 29Reads were mapped onto the assembled genome or transcriptome using Bowtie2 30,31 and Minimap2 32 for long sequences.
Then, the assembled genomes were improved by Pilon (Walker et al., 2014).The transcriptome is presented in Table S1.The GFF file from the genome and transcriptome was prepared by EXONERATE. 33pression was performed in CLC Worbench 22 (Qiagen, Venlo, Netherlands) using the total number of mapped reads as the expression value.

| Protein analyses
Samples for label-free mass spectrometry were processed and further analyzed using a nanoLC-MS/MS system employing an Orbitrap Fusion Tribrid mass spectrometer (Thermo Fisher Scientific) as previously described. 34,35Mass spectrometry data were evaluated in MaxQuant version 2.2.0.0 using label-free quantification algorithms 36,37 and the Andromeda search engine. 38The key criteria used in data analyses were a false discovery rate of 0.01 for proteins and peptides, a minimum length of 7 amino acids, carbamidomethyl as a fixed modification, and variable modifications of N-terminal protein acetylation and methionine oxidation.These data were searched using a custom database with our transcriptome assemblies of B.

| Allergen identification
Sequences of allergens were downloaded from the WHO/IUIS database 4 and subjected to protein BLAST searches against the GenBank database using BLASTp. 40The most similar search results for B. tropicalis and other mites were added to extend the mite WHO/ IUIS allergen database.Then, the predicted proteins from B. tropicalis were screened against the allergen database using HMMER 41 on the Galaxy server.Predicted sequences (Supplementary material Table S2) with the highest scores were selected for analyses and compared to the proteome identification (see above).
In the next step, we performed structural alignment on the TCofee server 42,43 for each allergen group.For homologous sequence clusters, phylogenetic trees were inferred with PhyML. 44The trees were visualized and edited in FigTree v.1.4.4, 45and alignments were visualized in ESPrint. 46Sequence identities were calculated in EMBOSS Water using the Smith-Waterman algorithm. 47The protein structure, including the signal peptide, was predicted based on comparison to reference proteins in the EMBL database. 48ypsin identification was based on BLASTp, and we searched for homologous PDB structures.Then, JGLJHAHI _02083, JGLJHAHI _03207 and JGLJHAHI _04704 were aligned to the structures with the results of BLASTp searches using CLUSTALlX 2.0. 49The protein alignment was then used as an input for MODELLER 10.2 software 50,51 for 3D structure prediction.The validity of the resulting models was checked using PROCHECK software (G-score), 52 and the model with the best valid score was chosen for further investigation.This model was optimized by running 10 steepest descent energy minimization steps followed by 10 conjugated gradients, 20 steepest descent, 10 conjugated gradients and finally 10 steepest descent steps (GROMOS 96 force field). 53The validity of the final model was again checked by PROCHECK.

| Data analyses
Expression and protein heatmaps were created in R-4.2.2 software 54 using the Complex-Heatmap package. 55,56the Protein profiles of feces and mite bodies were compared using analysis of similarities (ANOSIM).Two distance matrices were chosen: the Jaccard index (presence/absence of proteins) and the Bray-Curtis index (protein hit intensity).To identify allergens/predicted allergens responsible for the differences between the profiles, we used similarity percentages (SIMPER).The calculations for all of these analyses were performed in PAST 4. 57

| Genome, transcriptome and proteome of B. tropicalis
The Blomia tropicalis predicted genome (GenBak SUB13543831) included 1601 contigs of 61 Mb in length (N 50 = 223 kB).According to BUSCO prediction based on the Arachnida set, the completeness was estimated to be 93.2%, with 2.6% duplicated genes.The annotated transcriptome had 7171 contigs, with a total length of 31 Mb (N 50 = 5996 bp) and 18,164 predicted genes (Supplementary material Table S1).Its completeness was estimated at 95%, with 9.1% duplicated genes based on BUSCO with the Arachnida dataset.In contrast, a previously generated and annotated transcriptome included fewer predicted genes (16,590-14,899). 58,59e proteome analyses identified 3714 protein hits in four protein sample types individually collected from adult mites; pools of mites of different ages, including eggs; water extracts of feces; and detergent-buffer extracts of the remaining pellets of the water extracts.
However, we identified the AAQ24542 protein in the proteome 58,59 (Table 1; Supplementary material Figures S5-S6).Isoform AAQ24542 was not detected in transcripts.PHMMER analyses showed that several allergen groups shared the same functional domains (Table 1).The allergens Blo t 3 and Blo t 6 both had a trypsin domain.The predicted 3D structures confirmed the trypsin-like core structure of JGLJHAHI _02083, JGLJHAHI _03207 and JGLJHAHI _04704 (Figure 2).Except for the trypsin-like core structure, the remaining sequences were classified into different groups.Blo t 3 JGLJHAHI_02083 exhibited similarity to factor IX of the blood clotting cascades.This protein was identified only in the body, not in excrement.Blo t3 JGLJHAHI_03207 was identified as excrement and was highly similar to trypsinogen.The protein structure indicated Note: Confirmed allergens from the WHO/IUIS allergen database and the closest GenBank matches (annotated or unannotated) are given.Full analysis (allergen and nonallergen proteins) and alignments are given in Supplementary results and Supplementary figures 1-50, respectively.
a Based on proteome analysis AAQ24548 and KAI2795631 did not correspond to JGLJHAHI_08405.
that trypsinogen is activated by the removal of the protein tail (Figure 2C).The final protein, Blo t 6 JGLJHAHI_04704, exhibited similarity to kallikrein-like serine protease.Both Blo t 5 and Blo t 21 shared the Blo t 5 domain, as confirmed by a 2D protein model (Figure 3AB).The differences were caused by the protein tail of n Blo t 21 (Figure 3B).

| Expression of allergen-encoding genes and protein abundance
Our expression analysis identified six highly expressed allergenencoding genes, Blo t 1, 5, 14, 15, 18, and 21 (Figure 4).However, a proteomic analysis identified only some of them as the most abundant, Blo t 14, 15, 18 (Figure 5).The proteomes of mite bodies and feces were significantly different in the presence/absence of the protein (ANOSIM Jaccard, R = 0.509, P = 0.003) and abundance (ANOSIM Bray-Curtis, R = 0.677, P = 0.002).The differences ), but only two of these isoforms were detected in feces at higher abundances than in whole mites (e.g., trypsinogen: JGLKHAHI_03207 and AAQ24542).Another isoform, Blo t 3 (IX factor of blood clotting cascade: JGLJHAMI_02083), was nearly absent in feces.Blo t 6 (kallikrein-like serine protease) presented a single isoform that was more abundant in feces than in mite whole bodies.Blo t 13 (lipocalin) showed 3 isoforms (Figure S19).One isoform (JGLKHAHI_14091) predominated in feces, while isoform JGLKHAHI_00046 had almost the same abundance in whole mite bodies and feces.Isoform JGLKHAHI_00101 was detected at low abundance in whole mite samples but not in feces.

| DISCUSSION
Our multiomic analyses confirmed 13 known groups of B. tropicalis allergens (Table 1) and suggested that one previously identified WHO/IUIS allergen (Blo t 19) belonged to a nematode and not to B.
tropicalis or any other mite.This last allergen may have been observed because of contamination by a nematode inhabiting the mite culture medium. 34Based on the structural homology and sequence identity of known WHO/IUIS allergens from the pyroglyphid house dust mite Dermatophagoides sp. and the mold mite T. crossed radioimmunoelectrophoresis. 12Many of these allergens/putative allergens (14) were also found in recently published genomic/ transcriptomic 59 and proteomic analyses. 21r study showed that Blo t 3, 8, 13, 33 36a (peptides with C2 domain) occurs in more isoforms; however, we did not confirm isoforms for Blo t 2. Blo t 2 allergen protein was abundant in mite bodies as well as FP (Figure 4).We found only a single isoform, which contrasts with the findings of Reginald et al., 64 who reported nine Blo t 2 isoforms with 1-3 amino acid substitutions. 64As these isoforms were obtained via RT-PCR/cloning and were not verified using direct DNA sequencing, 64 it is unknown whether these substitutions represent the natural variation of the protein, polymerase errors, or a mix of the two.However, the study by Reginald et al. 64 which use a high fidelity polymerase revealed that 8% of the examined Blo t 2 sequences showed the Blo t 2.0102 sequence (all with the same two nucleotide substitutions).This cannot be a copy error and is consistent with the more extensive data for Dermatophagoides mites showing an evolutionary pattern of variation that differs in different regions and could be very important for the frequency of responses and for immunotherapy.A possible reason for not finding alleles is the use of a closed culture of mites.
A previous proteomic study revealed 3 isoforms with different protein profiles in B. tropicalis extracts. 21Lipocalin Blo t 13 is a minor allergen 13,78 with 3 isoforms (our data).Some of these isoforms presented higher abundances in mite feces (Figure 4), as noted previously for T. putrescentiae. 19The study showed the existence of two however, the biological functions and epitope binding of the isoformes should be solved in the future.
Blo t 14 allergen (vitellogenin) 21 showed higher levels in both whole mite bodies and feces.In both Dermatophagoides mites, the allergen has been described as an apolipophorin-like group 14 allergen, with predicted hydrophobicity and lipid-binding activity, inducing high levels of IgE responses and T-cell stimulation. 79,80This is an exoskeleton-bound protein with much higher abundance in females than in males of D. farinae. 34In contrast to "exoskeletonbound" proteins, vitellogenins are yolk protein precursors that are produced at extraovariole sites and taken up by the growing oocytes 81 ; the presence of vitellogenin in feces is thus surprising, and the protein structure needs further attention.However, it has been suggested that it can bind several microbial ligands, such as LPS, lipoteichoic acid or β-glucan, due to the vitellogenin domain (as reviewed Jacquet 82 ).
The chitin-binding peritrophin-A domain is the same domain identified exclusively in group 12 in B. tropicalis 83 and in group 23 (D. farinae 84,85 and D. pteronyssinus 86 ) and group 37. 4 We did not find any evidence of the presence of groups 23 and 37 in B. tropicalis based on our analyses.Similarly, as in Dermatophagoides mites, 87  no cross-reactivity was observed. 87The situation in B. tropicalis is not known and needs further analysis.
Putative Blo t 20 allergen protein (arginine kinase 88 or ATP: guanido phosphotransferase (our PHMER analysis)) is relatively abundant in mite bodies but has low or zero abundance in feces.Der p 20 presented 40% IgE reactivity in nonhospitalized patients, 89 and Der f 20 showed 7% IgE reactivity to patient sera, 90,91 indicating that in pyroglyphid mites, this allergen has moderate to low medical importance.However, arginine kinase is a major allergen in crustaceans that may cause asthma and fatal anaphylaxis in fishermen and processing plant workers. 92The Allergenic properties of putative Blo t 20 are unknown.
Protease allergens (Blo t 3 trypsin and Blo t 6 chymotrypsin) were the predominant proteins in mite feces.Similarly, group 3 and 6 proteins were highly abundant in the feces of the mold mite T. putrescentiae, [18][19][20] but in D. pteronyssinus, only chymotrypsin was detected. 17,18Previous observations have indicated that in culture extracts of B. tropicalis, no trypsin or chymotrypsin enzymatic activity can be detected 93 or that these enzymes occur at low abundance. 21However, another study based on a design using inhibitors and specific substrates for enzymes identified trypsin, elastase, chymotrypsin, kallikrein, C3/C5 convertase, and mast cell protease. 94Our results showed that the two isoforms of Blo t  94 These results can be reconciled as enzyme inactivation occurring in the mite bodies 95 or the expression of these enzymes being affected due to the presence of the intracellular bacterial symbionts Cardinium and Wolbachia. 35o t 5 and 21 have been reported as the major allergens of B. tropicalis. 10,14Our analyses showed that both proteins contained a signal peptide and shared a Blo t 5 domain.The majority (>75%) of F I G U R E 5 Abundance of allergenic proteins in the bodies and feces of Blomia tropicalis (A, adults, M, mixed stages, F, feces, FP, fecal pellets not dissolved in water).Data were log2 transformed.In the allergen group names, blue indicates that an allergen was identified with more isoforms, red indicates that an allergen was not present in fecal pellets (FP), and purple indicates allergens not present in feces (F).
sensitized patients were cosensitized to both Blo t 5 and Blo t 21, which was expected based on their structural similarity. 70Both of these allergens were found in the midgut and inside FP (Gao et al., 70 as independently suggested by our gene expression and proteomic analyses.
Epidemiological studies provide further evidence of an empirical link between abundance and allergenicity. 96The suggestion is that the allergens are more stable and more highly expressed than other proteins. 97For example, the stability of Der p 1 is slightly less than the mean for DP proteins, but it is exceptionally highly transcribed.In contrast, Der p 23 is not as highly expressed but is extremely stable. 97Based on our analyses, the comparison of allergens in mite bodies and feces enables the identification of stable allergens in terms of their resistance to degradation by mite digestive enzymes.
The allergens that are present in feces are produced via the digestive tract. 98,99This means that these proteins are not degraded by digestive enzymes. 95This is supported by studies showing low allergen degradation (Der f 1) after exposure in the environment. 100 conclusion, we show that the medically important mite B. tropicalis has a large set of putative allergens (n = 15), as predicted by omics analyses (genome, transcriptome, proteome) and the use of carefully verified comparative data on known allergens.Although the clinical importance of these putative allergens should be verified in the future, a similar bioinformatics approach was successful in allergen discovery in the mold T. putrescentiae. 101It is therefore expected that a large portion of our predicted putative allergens will have medical significance.Our study also provides accurate quantification of immunogenic proteins in both mite whole bodies and FP, the two important mite-related components with distinct allergenic profiles.For this task, we used a shotgun proteomics approach with an advanced Tribrid Orbitrap Fusion spectrometer and contrasted our results with gene expression data.We showed that the expression levels of allergen-encoding genes may not be strictly correlated with actual allergenic protein abundance.[104][105][106] Moreover, label-free or other quantitative approaches enable the estimation of mite allergen abundance in proteome profiles. 34,35 tropicalis.In addition, we included a GenBank genomic sequence of B. tropicalis (NCBI: 16792seq; 06 Feb 2023) and sequences of possible associated contaminants and microorganisms: Aspergillus (UniProtKB: 5439 seq 07 Feb 2023), Staphylococcus kloosii (UniProtKB 4243seq: 07 Feb 2023), Staphylococcus rev (UniProtKB: 13547seq, 07 Feb 2023), Staphylococcus xylosus (UniProtKB: 12,883, 07 Feb 2023), and yeast (UniProtKB: 21667seq, 07 Feb 2023).The data were processed in Perseus version 2.0.7.0. 39

F I G U R E 1
isoforms of Blo t 1 with low similarity (35%) in B. tropicalis.We found evidence of two domains inside this protein: (i) a cathepsin propeptide inhibitor domain and (ii) Peptidase_C1.The differences were caused by the absence/presence of cathepsin propeptide inhibitors; Venn diagrams of allergens only from the World Health Organization (WHO)/ IUIS database (A) and WHO/IUIS allergens plus predicted allergens (B) in four species of mites: Dermatophagoides farinae, Dermatophagoides pteronyssinus, Blomia tropicalis, and Tyrophagus putrescentiae.The numbers of total and percentual numbers of allergens are showed.Legend: Blo.tro, Blomia tropicalis; Der.far, Dermatophagoides farinae; Der.pte, Dermatophagoides pteronyssinus; Tyr.put, Tyrophagus putrescentiae.F I G U R E 2 3D models of identified serine proteases factor IX of the blood clotting cascade (JGLJHAHI _02083) -(A and B); trypsinogen (JGLJHAHI _03207) -(C and D); kallikrein-like serine protease (JGLJHAHI _04704) -(E and F).Panels (A, C, and D) show the backbones.(B, D and F) show the surface.Positively charged surfaces are indicated in blue, and negatively charged surfaces are indicated in red.The active site is indicated by an arrow.HUBERT ET AL.
Blo t 15 and Blo t 18 showed similar structures (i.e., signal peptide and glycosyl hydrolase family 18) (Blo t 15 position 33-386 and Blo t 18 position 29-355).In addition to the glycosyl hydrolase family, Dermatophagoides mites exhibit a chitin-binding domain.This was not found in Blomia.The previous observation showed that polyclonal anti-Der p 15 and Der p 18 mouse serum bound only one specific allergen, and

F I G U R E 3
Predicted 2D model of allergen Blot 5 (A) and Blot 21 (B).Arrows point to the protein tail.
Known and predicted allergen groups in Blomia tropicalis based on structural identity analyses (PHMMER) of transcriptomic data.
teins (Table 1, Supplementary material Tables S2; Supplementary Figures S1-S50), among which 13 proteins were confirmed WHO/ IUIS allergens (groups 1-8,10-13, 21), 17 proteins represented 16 T A B L E 1 SIMPER analysis) were due to the low abundance/absence of several In B. tropicalis, group 3 allergen (trypsin) had 3 isoforms (Figures S5 and S6 The comparison of sequence identity was based on PERMANOVA (Bray-Curtis, 1000 permutations), and the factors were species of mites or Blomia tropicalis versus other species.The similarity value levels are indicated by color.
Expression of known and predicted allergenencoding genes of Blomia tropicalis.Gene expression values were log2 transformed and visualized in a heatmap.The red box indicates allergens with the highest expression.In the allergen group names, blue indicates allergens identified or predicted allergen isoforms.
3 differ in their protein structure and body localization: Blot 3 JGLJHAHI_02083 showing similarity to factor IX of the blood clotting cascade is body localized, trypsinogen JGLJHAHI_03207 associated with feces.The last Blo t 3 (AAQ24542) isoform was obtained by the amplification of isolate RNA and confirmed by DNA sequencing. 66The absence of the transcriptome and presence of the B. tropicalis proteome in this and F I G U R E 4 HUBERT ET AL. previous studies 21 indicated posttranslational modification.Surprisingly, Blo t 6 JGLJHAHI_04704 exhibited similarity to kallikrein-like serine protease, confirming kallikrein enzymatic activity of B. tropicalis extract.