We investigated the accuracy of tests used to diagnose food allergy.
We investigated the accuracy of tests used to diagnose food allergy.
Skin prick tests (SPT), specific-IgE (sIgE), component-resolved diagnosis and the atopy patch test (APT) were compared with the reference standard of double-blind placebo-controlled food challenge. Seven databases were searched and international experts were contacted. Two reviewers independently identified studies, extracted data, and used QUADAS-2 to assess risk of bias. Where possible, meta-analysis was undertaken.
Twenty-four (2831 participants) studies were included. For cows’ milk allergy, the pooled sensitivities were 53% (95% CI 33–72), 88% (95 % CI 76–94), and 87% (95% CI 75–94), and specificities were 88% (95% CI 76–95), 68% (95% CI 56–77), and 48% (95% CI 36–59) for APT, SPT, and sIgE, respectively. For egg, pooled sensitivities were 92% (95% CI 80–97) and 93% (95% CI 82–98), and specificities were 58% (95% CI 49–67) and 49% (40–58%) for skin prick tests and specific–IgE. For wheat, pooled sensitivities were 73% (95% CI 56–85) and 83% (95% CI 69–92), and specificities were 73% (95% CI 48–89) and 43% (95% CI 20–69%) for SPT and sIgE. For soy, pooled sensitivities were 55% (95% CI 33–75) and 83% (95% CI 64–93), and specificities were 68% (95% CI 52–80) and 38% (95% CI 24–54) for SPT and sIgE. For peanut, pooled sensitivities were 95% (95% CI 88–98) and 96% (95% CI 92–98), and specificities were 61% (95% CI 47–74), and 59% (95% CI 45–72) for SPT and sIgE.
The evidence base is limited and weak and is therefore difficult to interpret. Overall, SPT and sIgE appear sensitive although not specific for diagnosing IgE-mediated food allergy.
‘Food allergy’ refers to the subgroup of food hypersensitivity reactions  in which immunologic mechanisms have been implicated, whether IgE-mediated and/or non-IgE-mediated . The first and most important step in the diagnosis of food allergy is a full dietary history, and this should be supplemented with a clinical examination. The double-blind, placebo-controlled food challenge (DBPCFC) is usually considered the ‘gold standard’ diagnostic test . DBPCFC is, however, time-consuming, resource-intensive and may induce anaphylaxis; hence, there is a need to try and find safer and cheaper alternatives .
The most common additional tests are the skin prick test (SPT; ), serum food-specific-IgE (specific-IgE; ) and, to a lesser extent, component-specific-IgE  and atopy patch testing (APT; ). Specific-IgE and SPT indicate the presence of IgE sensitization to a specific food. Sensitization is, however, not always associated with a clinical reaction to that food . Non-IgE-mediated immunological reactions to food result from the activation of other immunologic pathways (e.g. T-cell mediated) and manifestations include atopic eczema/dermatitis, food protein-induced enterocolitis, or proctocolitis . APT may be positive in some of these non-IgE-mediated conditions .
The literature on diagnosis of food allergy currently lacks clear consensus regarding the accuracy and safety of different diagnostic approaches. The European Academy of Allergy and Clinical Immunology (EAACI) is developing the EAACI Guideline for Food Allergy and Anaphylaxis, and this systematic review is one of seven interlinked evidence syntheses that were undertaken to provide a state-of-the-art synopsis of the current evidence base, which will be used to inform the formulation of clinical recommendations. This systematic review assessed the diagnostic accuracy of tests aimed at supporting the clinical diagnosis of food allergy.
A protocol for the systematic review was developed prospectively  and registered with the International Prospective Register of Systematic Reviews (PROSPERO) at http://www.crd.york.ac.uk/prospero/, registration number CRD42013003707.
Articles were retrieved using a highly sensitive search strategy implemented in the following databases: Cochrane Library including Cochrane Database of Systematic Reviews, Database of Reviews of Effectiveness (DARE), CENTRAL (Trials), Methods Studies, Health Technology Assessments (HTA), Economic Evaluations Database (EED), MEDLINE (OVID), Embase (OVID), CINAHL (Ebscohost), ISI Web of Science (Thomson Web of Knowledge), TRIP Database (web www.tripdatabase.com), and Clinicaltrials.gov (NIH web).
The search strategies were supplemented by contacting an international panel of experts for potential studies. There were no language restrictions, and where possible, non-English language papers were translated.
Prospective or retrospective, cross-sectional or case–control studies that evaluated APT, SPT, specific-IgEs, and component-specific-IgE in children or adults presenting with suspected food allergy caused by cow's milk, hen's egg, wheat, soy, peanut, tree nut, fish, or shellfish were included. The reference standard was DBPCFC used in at least 50% of the participants (Fig. 1). Studies in which participants were selected based on having a positive food allergy test result (index test or reference standard) or for which no 2 × 2 data could be extracted were excluded.
Two reviewers (SSP and KSW) independently checked titles and abstracts identified by the search, followed by review of the full text for assessment of eligibility. Both reviewers also extracted data using a customized form and assessed risk of bias using the QUADAS-2 tool . Any discrepancies were resolved by consensus and, where necessary, a senior reviewer (AS) was consulted. We collected study characteristics and recorded the number of true positives, true negatives, false positives, and false negatives for constructing a 2 × 2 table for each study. In cases where 2 × 2 data were not available, where possible, we derived them from reported summary statistics such as sensitivity, specificity, and/or likelihood ratios.
For each test, diagnostic accuracy was assessed according to target food. Preliminary exploratory analyses were conducted for each test by plotting pairs of sensitivity and specificity from each study on forest plots and in receiver operating characteristic (ROC) space . Hierarchical summary ROC models [12, 13] were used to summarize the accuracy of each test and to compare the accuracy of two or more tests. Where studies used a common or similar cutoff, we used parameter estimates from the models to compute summary sensitivities and specificities with 95% confidence regions. Analyses were performed in Review Manager 5.2 (The Nordic Cochrane Centre, The Cochrane Collaboration, 2012), and SAS software (version 9.2; SAS Institute, Cary, NC, USA).
We identified 6260 studies (excluding duplicates) and 312 were eligible for full-text review. Twenty-four studies (33 references; [7, 45]) with a total of 2831 participants were included in the quantitative analyses. Figure S1 in the online supplement shows the PRISMA flowchart for the study screening and selection process.
Table 1 summarizes the characteristics and methodological quality of the 24 included studies. Of the 24 studies, 17 were conducted in Europe. Twenty-two studies were cohort studies and two were case–control studies. The majority (n = 21) included infants or children under 18 years of age. At the study entry, all participants in six studies had atopic eczema/dermatitis. Eight studies reported data on more than one target food. Most studies were judged to be at high or unclear risk of bias in all domains except flow and timing. Applicability concerns were judged as high mainly in the index test domain because in 18 studies, there was prior testing with SPT and/or specific-IgE when a diagnosis of food allergy was suspected. Further details are available in the online supplement.
Table 2 shows summary results for each target food where meta-analysis was possible.
|Test (cut-off)||Studies||Participants||Cases||Sensitivity % (95% CI)||Specificity % (95% CI)||Positive likelihood ratio (95% CI)||Negative likelihood ratio (95% CI)|
|Cow's milk: five prospective cohorts [8, 15, 16, 20, 44], two retrospective cohorts [37, 42], one retrospective case–control study |
|APT||3||495||254||52.8 (32.6, 72.1)||88.1 (75.5, 94.7)||4.43 (2.61, 7.51)||0.54 (0.37, 0.77)|
|SPT (≥3 mm)||5||587||284||87.9 (75.6, 94.4)||67.5 (56.0, 77.2)||2.70 (2.09, 3.50)||0.18 (0.10, 0.34)|
|Specific-IgE (mixed cutoffs)||6||831||390||87.3 (75.2, 93.9)||47.7 (36.4, 59.2)||1.67 (1.441.93)||0.27 (0.16, 0.45)|
|Ratioa||1.0 (0.93, 1.06), P = 0.9||0.71 (0.60, 0.83), P < 0.01|
|Hen's egg: three prospective cohorts [4, 8, 20], one retrospective cohort , one prospective case–control study , one retrospective case–control study |
|SPT (mixed cutoffs)||5||448||287||92.4 (79.9, 97.4)||58.1 (49.1, 66.6)||2.30 (1.77, 2.74)||0.13 (0.05, 0.36)|
|Specific-IgE (mixed cutoffs)||5||572||346||93.4 (82.1, 97.8)||49.2 (40.2, 58.1)||1.84 (1.52, 2.21)||0.13 (0.05, 0.38)|
|Ratio||1.01 (0.70, 0.96), P = 0.7||0.85 (0.68, 1.05), P = 0.1|
|Wheat: three prospective cohorts [8, 20, 41], two retrospective cohorts [36, 41], one retrospective case–control study |
|SPT (≥3 mm)||5||350||114||72.6 (55.7, 84.8)||73.3 (47.9, 89.1)||2.72 (1.32, 5.60)||0.37 (0.23, 0.60)|
|Specific-IgE (mixed cut-offs)||4||408||102||83.2 (69.0, 91.7)||42.7 (19.8, 69.1)||1.45 (0.95, 2.22)||0.39 (0.20, 0.77)|
|Ratio||1.15 (0.97, 1.36), P = 0.1||0.58 (0.40, 0.85), P < 0.01|
|Soy: two prospective cohorts [8, 20], one retrospective cohort , one retrospective case–control study |
|SPT (≥3 mm)||4||366||94||55.0 (33.2, 75.0)||68.0 (52.4, 80.3)||1.71 (1.29, 2.27)||0.66 (0.47, 0.94)|
|Specific-IgE (mixed cut-offs)||3||404||74||82.9 (63.8, 93.0)||38.0 (24.2, 54.0)||1.34 (1.13, 1.58)||0.45 (0.24, 0.83)|
|Ratio||1.51 (1.10, 2.07), P = 0.01||0.56 (0.43, 0.72), P < 0.01|
|Peanut: five prospective cohorts [19, 20, 23, 29, 44], one retrospective cohort , one retrospective case–control study |
|SPT (≥3 mm)||5||499||245||94.7 (87.9, 97.8)||61.0 (46.6, 73.6)||2.43 (1.69, 3.48)||0.09 (0.04, 0.21)|
|Specific-IgE (mixed cut-offs)||5||817||452||96.3 (91.6, 98.4)||59.3 (45.4, 72.0)||2.37 (1.69, 3.32)||0.06 (0.03, 0.15)|
|Ratio||1.02 (0.97, 1.06), P = 0.5||0.97 (0.84, 1.12), P = 0.7|
Figure S2 shows the pairs of sensitivity and specificity from each study, including the cutoffs used, for APT (three studies), SPT (six studies), and specific-IgE (six studies). The summary sensitivity and specificity of APT were 53% (95% CI 33–72) and 88% (76–95%). For SPT and specific-IgE, the summary sensitivities were 88% (76–94%) and 87% (75–94%), and specificities were 68% (56–77%) and 48% (36–59%), respectively. Although there was some between-study heterogeneity, the summary estimates suggest that specific-IgE detects on average the same number of cases per 100 people with cow's milk allergy as SPT, but gives on average 20 additional false-positive diagnoses for every 100 people without the allergy (P < 0.01).
Figure S3 shows the pairs of sensitivity and specificity from each study for APT (one study), SPT (five studies), and specific-IgE (five studies) at the different cutoffs reported. The sensitivity and specificity of APT in the single study were 41% (32–50%) and 88% (77–95%). For SPT and specific-IgE, the summary sensitivities were 92% (80–97%) and 93% (82–98%), and specificities were 58% (49–67%) and 49% (40–58%), respectively. No significant differences in sensitivity and/or specificity were observed when SPT was compared to specific-IgE (Table 2).
Figure S4 shows the pairs of sensitivity and specificity from each study for APT (one study), SPT (five studies), and specific-IgE (five studies) at the different cutoffs reported. The sensitivity of APT was 26% (16–40%) and specificity was 89% (82–94%) in the single study. For SPT and specific-IgE, the summary sensitivities were 73% (56–85%) and 83% (69–92%), and specificities were 73% (48–89%) and 43% (20–69%), respectively. There was a significant difference in specificity (P < 0.01) with SPT having a higher specificity than specific-IgE (Table 2). The results suggest that specific-IgE detects on average 11 more cases of every 100 people with wheat allergy than SPT, but gives on average 31 additional false-positive diagnoses for every 100 people without the allergy.
Figure S5 shows the sensitivities and specificities for studies that evaluated APT (one study), SPT (four studies), and specific-IgE (three studies) at the different cutoffs reported. The single study of APT reported a sensitivity of 24% (12–41%) and specificity of 86% (79–91%). For SPT and specific-IgE, the summary sensitivities were 73% (56–85%) and 83% (69–92%), and specificities were 73% (48–89%) and 43% (20–69%), respectively. Significant differences in sensitivity and specificity were observed with specific-IgE having a higher sensitivity than SPT (P = 0.01) but lower specificity (P < 0.01; Table 2). The summary estimates suggest that specific-IgE detects on average 28 more cases of every 100 people with soy allergy than SPT, but gives on average 30 additional false-positive diagnoses for every 100 people without the allergy.
The individual study estimates of sensitivity and specificity of SPT (five studies) and specific-IgE (six studies) are shown in Figure S6 for the different cutoffs reported. The summary sensitivities of SPT and specific-IgE were very similar (Table 2) – 95% (88–98%) and 96% (92–98%), respectively – with no significant difference between them (P = 0.5). Similarly, there was no significant difference (P = 0.7) between the specificities of SPT (61% [47–74%]) and specific-IgE (59% [45–72%]).
Hazelnut was assessed in three prospective cohorts [26, 30, 44]. At the ≥3 mm cutoff, one study  reported SPT sensitivities of 88% and 90% and specificities of 28% and 6% for hazelnut allergy using natural and commercial extracts, respectively (Figure S7). For specific-IgEs, sensitivities were 75–99% and specificities were 17–77%, depending on the cutoff.
One prospective cohort  and one retrospective cohort  showed sensitivities of 91% and 100%, but the same specificity of 57% for SPT at a cutoff of ≥3 mm (Figure S8). For specific-IgEs, sensitivities were 67–94%, and specificities were 65–88% at different cut-offs.
Shrimp allergy was evaluated in two prospective cohorts ([27, 36]; Figure S9). For SPT, sensitivities were 100% for both studies and the specificities were 32% and 50%. For specific-IgE, one study  gave a sensitivity of 100% (80–100%) and specificity of 45% (23–68%) at a cutoff of > 0.35 KU/l.
Single studies evaluated the accuracy of component-specific-IgEs for hen's egg, peanut, tree nuts, and shellfish.
One study  including 68 children evaluated the accuracy of component-specific-IgEs (Gal d1, 2, 3, 5) in boiled and raw eggs. The study reported cutoffs varying from 0 to 0.41 KUa/l (ImmunoCAP, ISAC). The sensitivity estimates were 20–84% and specificities 84–100%.
Another study  including 43 children evaluated the accuracy of component-specific-IgEs (Ara h2) in peanut allergy. The study reported a threshold of 16% for basophil allergen CD-sens (derived from the basophil allergen concentration). The sensitivity was 100% and specificity of 77%.
One study  including 26 children evaluated the accuracy of component-specific-IgEs (Cor a1, 2, 8; rCor a1, Pru p3, Bet v1) in hazelnut allergy. The study reported a cutoff of 0.35 kU/l (CAP FEIA system). The sensitivities were 25–100% and specificities 22–94%. An additional case–control study  reported the percentage of people positive for rCor a1, 8 and rBet v1, 2 (component-specific-IgEs), but did not report enough information to calculate sensitivity or specificity.
One study  including 37 adults evaluated the accuracy of component-specific-IgE (rPen a 1) for shrimp allergy. The estimated sensitivity was 100%, and the specificity was 80%.
Due to the limited number of studies available for each meta-analysis, we were unable to use meta-regression to explore potential sources of heterogeneity in test performance as planned.
We included 24 studies that evaluated the accuracy of APT, SPT, specific-IgEs, and component-specific-IgEs at different cutoffs.
Our systematic review suggests that SPT and specific-IgE have good sensitivity, but poor specificity with wide variation in estimates for each of the food allergies investigated. The limited evidence available for APT suggested poor sensitivity, but good specificity. The strength of evidence on the relative accuracy of SPT and specific-IgE was weak; we relied on indirect comparisons of the two tests which may be prone to bias due to differences in population characteristics and study design. Very few studies have compared the tests head-to-head in the same population, and direct or indirect comparisons of accuracy between the other tests were not possible.
Our inclusion criteria were similar to those used in a recent RAND report . The main differences, however, were that we limited the inclusion criteria to studies in which at least 50% of participants received a DBPCFC to minimize verification bias, and we did not exclude studies based on language of publication. We also contacted senior researchers in the field to locate additional studies for inclusion in the review.
The strengths of this review include the use of internationally recommended methods for study identification, methodological quality assessment, and meta-analysis. The main limitation was the poor reporting of primary accuracy studies. In particular, inclusion and exclusion criteria were not clearly defined, and there was lack of information on test cutoffs and details of how the tests were applied. Regarding population, the index tests evaluated in the included studies were previously used to select participants in 75% of the studies included in the quantitative analyses. A third of the studies were performed in a specific population; in eight studies, all participants had atopic dermatitis, and in three, all participants had asthma. These population issues impact on the generalizability of our findings. Furthermore, protocols for the index tests are likely to differ between countries, thus limiting applicability. Lastly, although DBPCFC is generally accepted as the reference standard for diagnosing food allergy, it is not widely used  and accounts for the exclusion of 30% of the potentially relevant studies.
This review has identified relevant evidence for different tests available for a range of foods most commonly implicated in suspected food allergy and highlighted both the volume and strength of evidence available to guide clinical decision-making.
Direct comparisons are difficult because of the limited body of evidence in which these tests have been compared in the same population. That said, overall, this body of work indicates that SPT and specific-IgE (and probably also component-specific-IgE) offer high sensitivity in relation to a range of allergens implicated in immediate IgE-mediated food allergy. There was, however, greater variation in the specificity of these tests, with specific-IgE tending to a higher rate of false positives.
Local decisions about which tests to employ and the order in which these are undertaken need to be guided by the above considerations, the comparability of the populations being cared for to those enrolled in studies (i.e. mainly high-risk populations being seen in specialist care settings), and the relative availability, safety, and costs of tests.
Most of the evidence in this review was derived from small studies, with a high or unclear risk of bias. Future studies should be prospective with consecutive recruitment, adequate sample sizes and should be representative of the population in which the tests will be used in practice . Head-to-head comparisons of specific-IgE, SPT, and component tests are needed to determine the relative accuracy of the tests. Test accuracy is only one aspect of the assessment of a test , and the balance between benefit and harm should also be assessed, ideally within a randomized controlled trial [51, 52].
Skin prick tests and specific-IgEs are sensitive, but not specific for diagnosis of food allergy, although test performance may differ between foods. However, the findings should be viewed with caution due to the limited evidence base and the paucity of good quality studies.
We would like to acknowledge the support of the EAACI and the EAACI Food Allergy and Anaphylaxis Guidelines Group in developing this systematic review. We would also like to thank the EAACI Executive Committee for their helpful comments and suggestions.
AS, AM, SSP, and GR conceived this review. It was undertaken by KS-W and YT, with the support of SSP. KS-W, YT, AS, and GR drafted the article, and all authors including TW, KH-S, SH, LP, RvR, and BV-B critically commented on drafts of the article.