To cite this article: Pfiffner P, Stadler BM, Rasi C, Scala E, Mari A. Cross-reactions vs co-sensitization evaluated by in silico motifs and in vitro IgE microarray testing. Allergy 2012; 67: 210–216.
Background and objective: Using an in silico allergen clustering method, we have recently shown that allergen extracts are highly cross-reactive. Here we used serological data from a multi-array IgE test based on recombinant or highly purified natural allergens to evaluate whether co-reactions are true cross-reactions or co-sensitizations by allergens with the same motifs.
Methods: The serum database consisted of 3142 samples, each tested against 103 highly purified natural or recombinant allergens. Cross-reactivity was predicted by an iterative motif-finding algorithm through sequence motifs identified in 2708 known allergens.
Results: Allergen proteins containing the same motifs cross-reacted as predicted. However, proteins with identical motifs revealed a hierarchy in the degree of cross-reaction: The more frequent an allergen was positive in the allergic population, the less frequently it was cross-reacting and vice versa. Co-sensitization was analyzed by splitting the dataset into patient groups that were most likely sensitized through geographical occurrence of allergens. Interestingly, most co-reactions are cross-reactions but not co-sensitizations.
Conclusions: The observed hierarchy of cross-reactivity may play an important role for the future management of allergic diseases.
We have proposed a method to predict cross-reactivity from protein sequence (1). The method uses sequence motifs to identify evolutionary conserved structural domains, which possibly present highly similar surface patches. Hence if two proteins contain the same motif, they are predicted to cross-react. We previously applied this method to asses specific IgE determinations based on allergen extracts in view of cross-reactivity. Interestingly, we found a linear correlation between the number of positive extracts and the number of recognized motifs in a 3 : 1 ratio (2). These findings suggested that cross-reactivity may pose a greater problem than previously assumed. However, we were unable to conclude that these cross-reactions can reliably be predicted through allergen motifs because our study was based on crude allergen extracts, containing many proteins and often several motifs.
With the advent of allergen microarray systems using highly purified natural or recombinant proteins, it became possible to determine specific IgE in a serum directed against many different allergens (3, 4). These data allowed for the first time a direct comparison between observed co-reactivity at the level of a single allergen protein. Our results show that the predicted cross-reactivity coincides with wet laboratory data and is not because of co-sensitization with other allergens.
Materials and methods
In our study we used specific IgE data of 3142 serum samples analyzed with the Immuno Solid-phase Allergen Chip (ISAC®; Phadia Multiplexing Diagnostics GmbH, PMD, Vienna, Austria). The samples were collected at two sites. Two thousand and five hundred samples were received from patients predominantly living in Italy and 642 samples from patients predominantly living in Austria. No selection criteria other than having at least one positive reaction on the allergen chip were applied and no clinical or demographic information on the donors was included. The two centers used different ISAC chip generations differing in two proteins only. Each chip contained 103 different, highly purified natural or recombinant allergen proteins immobilized in triplicate onto glass slides. These systems quantitate specific IgE content in ISAC Standardized Units (ISU), a semi-quantitative unit normalized against calibration serum. We collected 311 058 specific IgE values of 105 different allergen proteins. The cut-off value to test positive for a protein was set to 0.1 ISU in order to filter eventual noise. Normalization was achieved by scaling the IgE values between zero and 100. The average of the five highest values per protein and collection site was assumed to be the highest value and set to 100.
The terms co-reaction, cross-reaction and co-sensitization were used as follows: if we observed two proteins positive in the same serum, the term co-reaction was used. This expresses a neutral standpoint without implying a mechanism. If co-reacting proteins contained the same motif, that is, our approach predicted these proteins to cross-react, the term cross-reaction was used. Co-sensitization was assumed for co-reacting proteins that occur in the same organism.
To connect our three data types – ISAC data, allergen sequences and allergen motifs – a MySQL database was created (MySQL 5.0, obtained from http://www.mysql.com/). The allergen sequences to be used in the motif-discovery process were downloaded from Allergome (http://www.allergome.org as of December 1, 2010) and imported into the database. MEME 4.4.0 (5), cd-hit 3.1.1 (6) and pftools 2.3.4 (7) were used for the iterative allergen-motif discovery. Data were handled using scripts written in Perl 5.10.0 (http://www.perl.org/) and PHP 5.3+ (http://www.php.net/) languages. Statistical evaluations were performed with R 2.11.0 (8).
We used our published iterative motif discovery approach to identify allergen motifs (1). Three modifications were applied to our original approach: first, the 2708 protein sequences were preclustered by cd-hit into 1218 clusters with at least 90% sequence identity. Only the longest (‘representative’) sequence per cluster was subsequently submitted to the motif discovery process; second, MEME was allowed to select protein motifs with a variable length of 35–70 amino acid residues – in our earlier studies we used a fixed length of 50 amino acids; and third, each protein was allowed to contain more than one motif. This was achieved by scanning all proteins against all found motifs instead of assigning only the motif identified during the discovery process.
We had previously demonstrated that the majority of the available allergen sequences can be grouped according to allergen motifs (1). Since then, the number of allergen sequences has more than tripled. Here we have used the same iterative approach to determine the number of motifs. Fig. 1 shows that the number of motifs continued to increase over time, but the motif number seemed to plateau earlier despite the continuous increase of allergen sequences in the Allergome database. After July 2010, we modified motif detection parameters because motif quality and number were occasionally variable as described later. Our modification explains the increase in the number of four to five motifs (data not shown). Additionally, new motifs were found, which results in an increase of 13 motifs compared to the previous calculation. In this study, motifs were identified in the sequences available from Allergome as of December 2010. Within the 2708 allergen protein sequences, we identified 115 allergen motifs; 311 sequences did not contain a motif. Identified motifs are also displayed in each allergen monograph on Allergome.
Initially, we chose an arbitrary stretch of 50 amino acids for reasons discussed earlier (1). Here we used as a measure of motif quality the P-value describing the difference between protein pairs carrying the same motif and the pairs not carrying the same motif in regard to their co-reactivity as determined by the commercial array system. The P-value data are shown in Fig. 4. The reason why we modified our previous standard motif length of 50 amino acids will be depicted in Fig. 3. Fig. 2 shows the results using different length approaches. An amino acid residue stretch over 30–35 resulted in a low probability of co-reaction as did stretches above 100 amino acids. Thus we confirmed the previously defined motif length, but 40 or 60 amino acids produced motifs of slightly higher quality. Motif lengths above 70 amino acids resulted in lower motif coverage; thus fewer allergens would match a motif, and the risk of generating multi-domain motifs increased. Therefore we modified motif-detection parameters and used a variable length between 35 and 70 amino acids, as represented by a horizontal line in the figure. The median of the variable length motifs used throughout our studies was 68.5 residues.
As shown in Fig. 3A, some motifs identified in earlier datasets showed a variable correlation coefficient with a group of correlating (0.73–0.96) and a second group of noncorrelating (−0.04–0.14) protein pairs. Here, the correlation coefficient between two proteins was used to describe the correlation between the two proteins’ ISU values, that is, if one protein had a high ISU level in any given serum, the other also had a high ISU level, and vice versa. Each circle represents one protein pair, and all possible pairs are shown. Using newer datasets, the proteins originally matching motif 2, as shown in Fig. 3A, were separated into two motifs, now containing only highly correlating pairs (0.89–0.96), shown in Fig. 3B. This illustrates that more protein sequences lead not necessarily to more motifs, but motifs may become of higher quality.
Next we assessed the frequency of co-reaction between proteins defined by the same motif. The protein pairs were considered for evaluation only if both proteins were positive in at least 10 different patient sera, which was the case for 6272 pairs. Fig. 4A shows that among the proteins on the chip, 218 pairs each contained the same motif. Protein pairs with a common motif tested significantly more often positive in the same serum (median: 63.6%) than proteins without a common motif (median: 15.7%, P < 4.9153e-70, Mann–Whitney U-test). Data distribution was visualized using a box-plot: a gray box was drawn for results between the first and third quantile (0.25–0.75), thus representing 50% of all values, and an individual circle was drawn for all outliers (>1.5 interquartile ranges distance from the median).
We further analyzed whether specific IgE levels of two proteins also correlated if the proteins shared a motif. Fig. 4B shows that indeed the correlation coefficients of proteins with a common motif were significantly higher (median: 0.61) than the correlation of ISU levels for proteins without a common motif (median: 0.13, P < 5.0666e-74, Mann–Whitney U-test). Thus higher specific IgE levels produced also a stronger cross-reacting signal.
The analysis so far has shown that proteins characterized by the same motif co-react at a statistically highly significant level. The crucial question, however, is how often such co-reactions, presumably cross-reactions, would occur within a given motif group. Fig. 5 addresses this question and depicts the results for a selection of motifs where we found three or more allergen proteins sharing the motif. The selection is further based on a minimum of 100 positive sera for at least one of the allergens reacting on the chip. Interestingly, we observed a hierarchy among the proteins within one motif group, namely each motif group has one representative that produces the maximum of positive reactions and cross-reacting proteins with decreasing frequencies of positivity. The gray shade shows the same value as depicted percent-wise, meaning the least frequent cross-reacting allergen within a motif group showed the most stringent cross-reaction toward allergens in the same group. Figures S1 and S2 provide a breakdown of Fig. 5 by serum origin.
To address the question whether a certain degree of co-reaction might be due to co-sensitization with allergenic organisms containing several motifs, we split the dataset into groups according to the origin of the patient sera (Table 1). Patient sera from the dataset ‘Austria,’ based on the serum collection in the more northern hemisphere, were presumably from patients who had encountered birch pollen. These sera recognized Bet v 1 more frequently (22.3%) than sera contained in the dataset ‘Italy’ (9.1%). The surroundings of Rome can be considered as mostly free from birch tree and birch pollen (9, 10). Unexpectedly, patients from Italy seemed to have higher anti-Bet v 1 IgE levels than patients in the dataset ‘Austria’, despite the assumption that they were less frequently exposed to birch tree pollen. As there may be differences between laboratories performing the assay or differences in batches delivered to different centers of the microarray chips, we normalized the IgE values to exclude a laboratory bias. Still, patient sera from Italy showed higher normalized specific IgE values than those from Austria. Thus, cross-reactions may deliver a higher diagnostic IgE level than the putative sensitizing allergen.
|Number of positive sera (%)||IgE level|
|Mean ISU||Normalized||Median ISU||Normalized|
|Dataset ‘Italy’ (total 2500 sera)|
|A||Positive against Bet v 1 (motif 4)||228 (9.1)||11.45||18.01||6.05||9.52|
|B||From A positive against other proteins of motif 4||215 (94.3)||12.12||19.06||6.78||10.66|
|C||From A positive against Bet v 2 (motif 1)||50 (21.9)||15.8||24.85||9.74||15.32|
|D||From A positive against Bet v 4 (motif 15)||14 (6.1)||8.32||13.09||1.83||2.88|
|E||From A not in B, C or D||11 (4.8)||0.31||0.49||0.22||0.35|
|Dataset ‘Austria’ (total 642 sera)|
|A||Positive against Bet v 1 (motif 4)||143 (22.3)||6.66||14.62||2.26||4.96|
|B||From A positive against other proteins of motif 4||117 (81.8)||7.97||17.50||3.26||7.16|
|C||From A positive against Bet v 2 (motif 1)||16 (11.2)||18.55||40.73||10.56||23.18|
|D||From A positive against Bet v 4 (motif 15)||9 (6.3)||20.58||45.18||17.01||37.35|
|E||From A not in B, C or D||25 (17.5)||0.76||1.67||0.49||1.08|
Complementary to these findings, patients from Austria were more often (49.8%) positive against the German cockroach allergen Bla g 4 than patients from the Italian dataset (3.2%). Bla g 4 is the only protein matched by motif 35.
Table 1 depicts another argument that co-sensitization is less important for in vitro diagnostics than cross-reaction. Namely, the two datasets show that allergens such as Bet v 2 (motif 1) produce cross-reactions and almost no co-sensitizations, as predicted.
There exist different approaches to determine allergenicity using bioinformatics (11–19). Our initial algorithm was based on the assumption that similar sequences will lead to similar surface structures that may be recognized by a given antibody (1). Since then, we and others have been capable of showing that such an approach is a valid tool for allergenicity prediction (20, 21). Nevertheless the validation of the computational approach with wet laboratory data were hampered for a long time as most specific IgE determinations were based on allergen extracts (2). When comparing allergenicity prediction with wet laboratory data, obtained by diagnostic procedures using allergen extracts, we could confirm the predicted motifs but were left with the question whether some of the observed cross-reactions were true immunological cross-reactions at the protein level or only reactions occurring with nonrelated proteins derived from the same organism within the allergen extract (2).
The situation changed with the availability of new in vitro diagnostic procedures such as allergen microchips (22). Now it became possible to extensively compare cross-reactions at the allergen protein level. Before doing so, we re-evaluated our algorithm for defining motifs on a large dataset obtained by such new microchips. As we show here, the number of motifs continuously increased in parallel with the number of new allergen sequences that were added to the Allergome database during the first years. Recently, a plateau of allergen motifs seemed to be reached. Our initial motif length of 50 amino acids was computationally verified; however, we now had diagnostic data at hand. Therefore we re-evaluated our initial approach and we could demonstrate that by using a variable length, a higher motif quality can be achieved, as shown in Fig. 2. As a consequence, the number of motifs rose to a higher level. Future determinations will show whether this slightly higher number of motifs actually represents a plateau again, meaning that most of the potential allergens are currently known because no further common structures in the form of motifs can be detected.
Using the dataset based on the allergen-chip, we show here that cross-reactions were as frequent as previously reported when cross-reactions were determined by allergen extracts. However, there are still some cross-reactions within the group of allergens that have no common motif. On the one hand, this may be because we do not yet have sufficient allergen sequences to construct a motif for every known allergen. On the other hand, observed co-reactions are not true cross-reactions but increased IgE-levels because of the fact that a patient was sensitized with an organism containing several allergens. In other words, our new dataset allowed addressing the question of co-sensitization.
Fortunately, the dataset could be divided into two groups of sera depending on the laboratories that performed the microchip test. One set had been collected by the last author from the greater region of Rome and was termed ‘Italy’. The other set was provided by the microchip producer and contains mostly sera from within Austria. As expected, the number of positive sera against Bet v 1 (22.3%) is higher in the dataset ‘Austria’ than in the dataset ‘Italy’ (9.1%). This remains true if both datasets were normalized to avoid manufacturer or operator biases. Interestingly, the great number of patients in the vicinity of Rome positive for Bet v 1 cannot be attributed to co-sensitization as birch pollen are not commonly found in this region. To further test the question of co-sensitization, we compared the reaction of patients positive against Bet v 1 to other allergens normally present in birch pollen, namely Bet v 2 and Bet v 4. Indeed, Bet v 2 (motif 1) and Bet v 4 (motif 15) were far less often positive in these patients than other motif-4-containing allergens, again excluding co-sensitization.
Most interestingly, our motif approach in conjunction with wet laboratory evaluation produced a hierarchy of allergens within motif groups. The greater the number of allergens within one group the clearer a hierarchy was apparent. At present it can only be speculated what such a hierarchy might mean for future specific immunotherapy. Many of the 2500 patients from the Italian dataset recognized Bet v 1 probably without being sensitized by this allergen. On the other hand, the allergen that was least frequently positive within a motif group seemed to be most stringently cross-reacting with other allergens higher up in the hierarchy. For example, in the motif 4 group we found 371 sera positive against Bet v 1, but only 27 against Dau c 1 (carrot). Thus testing positive against Dau c 1 happens infrequently; however, if you test positive, your chances of reacting against each other allergen in the motif group is higher than 63%. This opens the question whether the most frequently cross-reacting allergen or an allergen that most stringently cross-reacts should be used for specific immunotherapy.
An interesting observation is our finding that often, the least frequently cross-reacting allergen within one motif group is a food allergen. This favors speculations that oral tolerance might play a role in the development of cross-reactions.
In summary, our highly significant correlations of cross-reactions are a strong argument that the common protein structure of an allergen motif rather than the actual allergen should be considered for future IgE diagnostics or therapy.
We are grateful to VBC Genomics, Austria, for their dataset. We thank Michael Stadler for discussion and review of this manuscript. This work was supported by grant no 8803.1 from the Commission of Technology and Innovation, Switzerland, and by grants from the Italian Ministry of Health, Current research funding 2008–2009.
Pascal Pfiffner has performed all experiments except IgE data gathering, created the figures and tables and has written the manuscript. Beda Stadler was involved in the study design, experiment setup and result discussions. He has corrected and edited the manuscript. Chiara Rasi was responsible for the allergen sequence data and the integration of chip data with Allergome sequence data. Enrico Scala collected most patient serum data of the Italian dataset. Adriano Mari was involved in the study design and result discussions. He was responsible for the Allergome data integration and has written parts of the manuscript.
Conflicts of interest
The authors declare no conflicts of interest.