Frequency of severe reactions following penicillin drug provocation tests: A Bayesian meta‐analysis

Abstract Background Patients with a penicillin allergy label tend to have worse clinical outcomes and increased healthcare use. Drug provocation tests (DPT) are the gold‐standard in the diagnostic workup of penicillin allergy, but safety concerns may hinder their performance. We aimed to assess the frequency of severe reactions following a DPT in patients with reported allergy to penicillins or other β‐lactams. Methods We performed a systematic review, searching MEDLINE, Scopus, and Web of Science. We included primary studies assessing participants with a penicillin allergy label who underwent a DPT. We performed a Bayesian meta‐analysis to estimate the pooled frequency of severe reactions to penicillin DPTs. Sources of heterogeneity were explored by subgroup and metaregression analyses. Results We included 112 primary studies which included a total of 26,595 participants. The pooled frequency of severe reactions was estimated at 0.06% (95% credible interval [95% CrI] = 0.01%–0.13%; I 2 = 57.9%). Most severe reactions (80/93; 86.0%) consisted of anaphylaxis. Compared to studies where the index reaction was immediate, we observed a lower frequency of severe reactions for studies assessing non‐immediate index reactions (OR = 0.05; 95% CrI = 0‐0.31). Patients reporting anaphylaxis as their index reaction were found to be at increased risk of developing severe reactions (OR = 13.5; 95% CrI = 7.7–21.5; I 2 = 0.3%). Performance of direct DPTs in low‐risk patients or testing with the suspected culprit drug were not associated with clinically relevant increased risk of severe reactions. Conclusions In patients with a penicillin allergy label, severe reactions resulting from DPTs are rare. Therefore, except for patients with potentially life‐threatening index reactions or patients with positive skin tests—who were mostly not assessed in this analysis ‐, the safety of DPTs supports their performance in the diagnostic assessment of penicillin allergy.


| BACKGROUND
β-Lactam antibiotics constitute the preferred treatment for many infections, but they are not typically prescribed to patients who report a past history of allergic reactions to this drug class. 1 In fact, penicillins correspond to the drug class most patients report to be allergic-between 5% and 10% of individuals from the general population report having a penicillin allergy, and this frequency can reach up to 16% in hospitalized patients. [2][3][4][5][6][7] However, only a small fraction of these individuals (estimated in 2%-10% in the United States and 18%-30% in Europe) have a true allergy to β-lactams. 2,7,8 Patients mislabeled as having a penicillin allergy more frequently receive antibiotics with a broader spectrum, often with lower efficacy and increased side-effects, leading to poorer clinical outcomes, longer hospitalizations, higher risk of drug-resistant and healthcareassociated infections, and increased healthcare costs. 1,2,[9][10][11] As a result, evaluating and delabeling patients with penicillin allergy has both clinical and economic advantages. 12 The diagnostic workup of a suspected penicillin allergy comprises a sequence of steps, typically including a complete clinical history, followed by skin tests and potentially in vitro tests (e.g., specific IgE quantification). Ultimately, if negative results are obtained with those tests, a drug provocation test (DPT; i.e., "drug challenge"), consisting in the controlled administration of a drug under strict clinical supervision, is considered to establish or rule out the diagnosis of penicillin allergy. 7,[13][14][15][16] In patients whose clinical history is poorly compatible with a true penicillin allergy, some experts advocate the performance of direct DPT (i.e., DPT without preceding in vivo or in vitro testing). 1 On the contrary, in patients with history of potentially life-threatening index reactions (e.g., Stevens-Johnson syndrome [SJS]/toxic epidermal necrolysis [TEN], severe anaphylaxis, or some severe specific organ manifestations), DPT are contraindicated. 16 While DPTs are the gold-standard in the diagnosis of penicillin allergy, the possibility of precipitating severe hypersensitivity reactions may prompt safety concerns. 16 However, the frequency of such severe reactions has not been systematically evaluated. Therefore, in this systematic review and meta-analysis, we aimed to quantify the frequency of severe hypersensitivity reactions following a DPT in patients reporting a penicillin (or β-lactam) allergy, as well as to explore the impact of different patients' and methodological characteristics on the frequency of such severe reactions.

This systematic review with meta-analysis follows Preferred
Reporting Items for Systematic Reviews and Meta-Analyses guidelines and the recommendations of the Cochrane Handbook for Systematic Reviews. 17,18

| Eligibility criteria
We included original studies reporting the frequency of severe reactions subsequent to DPTs in patients reporting a penicillin or β-lactam allergy. Severe reactions were defined as episodes of anaphylaxis, shock, SJS/TEN, acute generalized exanthematous pustulosis, drug reaction with eosinophilia and systemic symptoms, acute interstitial nephritis, hemolytic anemia, serum sickness, drug fever, or other reactions described by the authors as severe and/or-if no additional information was provided-whose reaction treatment required more than antihistamines or corticosteroids (e.g., epinephrine) to subside. Other positive reactions to DPT were not considered severe, and therefore not taken into account.
We excluded studies deliberately performing DPTs with drugs from another antibiotic class, assessing allergy to cephalosporins exclusively or patients with specific diseases or occupations (e.g., only patients with cancer), or adopting a case-control approach (as data from those studies do not permit calculation of the risk of severe reactions).

| Information sources and search methods
We searched three electronic bibliographic databases (MEDLINE, Web of Science, and Scopus), through June 2019. Search queries are detailed in Table A1. References of included studies and of other relevant studies were further reviewed. No restriction on publication languages or dates were applied.

| Study selection and data collection process
After duplicates removal, each study was independently assessed by two reviewers (researchers B.S.P. and A.C.F.), first by title and abstract screening, and then by full text reading. Data were independently extracted by two reviewers using a predefined online form purposely built for this study (a pilot version was built to assess the first 15 studies, and subsequently modified accordingly). For each study, we retrieved information on (i) the year of publication; (ii) country; (iii) participants' age group; (iv) setting (i.e., outpatients, inpatients or other); (v) timing of the index reaction (immediate reactions were defined as those occurring during the first hour after exposure to the culprit drug, and the remainder were classified as nonimmediate reactions 14,15 ); (vi) culprit drug class (i.e., whether studies included 2 of 14 -CARDOSO-FERNANDES ET AL. participants reporting an allergy to any β-lactam or specifically to penicillins); (vii) whether penicillin re-exposure occurred as part of a diagnostic workup or for therapeutic reasons; (viii) whether single dose, graded or prolonged (>24 h) DPTs were performed; (ix) the route of drug administration; (x) whether DPTs were preceded by skin/in vitro tests or directly performed; (xi) the drugs tested; and (xii) the period during which patients were followed for adverse reactions.
In addition, for each primary study, we retrieved information on the number of participants undergoing a DPT, as well as on the number and type of subsequent severe reactions. Whenever provided, we separately retrieved these data for patients who reported immediate index reactions and for patients reporting anaphylaxis as their index reaction (we were not able to perform separate analyses for index reactions as the information necessary was not consistently provided on primary studies). Specific data regarding DPTs to penicillins were always preferred over data regarding DPTs to overall β-lactam antibiotics. Disagreements between reviewers in study selection or data extraction were solved by consensus.
Full texts were carefully examined so as not to include the same results/patients more than once. Authors were contacted whenever full texts were not available (or in the two cases they were only available in a language authors were not fluent, with two received responses) or to provide relevant missing information.

| Quality assessment
The quality of primary studies was independently assessed by two researchers using an adaptation of a tool developed for prevalence studies. 19 Of the 11 items described, we used six items that were adequate for the aim of this study, namely: (i) if the study's target population was representative of the national population in relation to relevant variables; (ii) if the sample frame was representative of the target population; (iii) if some form of random or consecutive selection was used to select the sample; (iv) if the likelihood of nonresponse bias was minimal (defined as less than 25% follow-up losses and/or participants with negative skin/in vitro tests not undergoing DPT); (v) if an acceptable/sufficiently complete definition of "severe reaction" was used in the study (or if allergic reactions were described in detail); and (vi) if the same methods of assessment and data collection were used for all subjects.

| Quantitative synthesis of results
In order to quantitatively synthesize the frequency of severe reactions subsequent to DPTs, we performed Bayesian metaanalyses following a random effects model based on a binomial likelihood (as described by Welton et al. 20 ). We opted for this approach due to the large quantity of studies in which no severe reactions were observed. In fact, one of the advantages of a Bayesian meta-analysis based on a binomial likelihood concerns its use of exact methods, dealing more adequately with proportions equal to zero (by contrast, a frequentist approach would imply the need for a continuity correction at least to the proportions equal to zero). 20 Bayesian methods provide estimations of posterior probability distributions of the parameters of interest, based on prior probability distributions and on the observed data. In this study, based on the frequencies of severe reactions reported in primary studies, we obtained, through meta-analytic methods of weighting, a probability distribution of the frequency of severe reactions. In addition, we obtained probability distributions for the odds ratio (OR) assessing the association between reporting anaphylaxis as index reaction and occurrence of severe reactions following a DPT. Of these posterior probabilities, we collected information on the mean values and respective 95% credible intervals (95% CrI; range of values within which, with 95% probability, the true frequency of severe reactions lies). 20 We assessed heterogeneity-defined as the existence of differences beyond those that would be expected just by random samplingby computing estimates of the I 2 statistic. An I 2 > 50% was indicative of substantial heterogeneity. Heterogeneity sources were explored by means of metaregression and subgroup analyses (i.e., a specific type of sensitivity analysis, consisting of separate meta-analyses restricted to specific categories of retrieved variables). Exponentials of the metaregression coefficients were interpreted as OR.
Both for the effect size measure and for the τ parameter we used uninformative prior distributions (dnorm(0, 0.00001) and dgamma (0.00001, 0.00001), respectively). For each analysis, we ran at least 40,000 iterations with a burn-in of 15,000 sample iterations. Metaanalysis was performed using rjags package of software R (version 3.5.0).  39,55,100,118 Sixteen studies found to be eligible were not included in this systematic review since they evaluated patients partially or fully assessed in another study, and which were not restricted to any particular characteristic that would render them available to be included in subgroup analyses.

| Study characteristics
A summary of the included studies is presented in Table A2. Included studies were published between 1965 and 2019, and were mostly CARDOSO-FERNANDES ET AL.

| Risk of bias of individual studies
A risk of bias graph is presented in Figure 2, and the complete analysis of the risk of bias of individual studies may be found in Table   A4. Most studies had a high or unclear risk of bias in terms of sample T A B L E 2 Results of metaregression and subgroup analyses for the frequency of severe reactions following penicillins drug challenges  The region was also identified as a variable potentially explaining heterogeneity. In fact, in European studies, we observed higher frequency of severe hypersensitivity reactions and lower heterogeneity when compared to their North American counterparts, and a more evident increased risk of such reactions among adults. These differences may point to regional differences in the type of assessed Finally, information on index reactions was inconsistently reported across primary studies-except for anaphylaxis, we were not able to assess the risk of severe reactions associated with each type of index reaction.

Number of patients
The main strength of this study is its meta-analytical approach for the quantitative synthesis of rare events. The main advantage of Bayesian meta-analysis based on a binomial likelihood concerns its use of exact methods, allowing for dealing more adequately with zero-cells (in this case corresponding to the majority of included primary studies, in which no severe reactions to DPTs were observed). By contrast, classical frequentist meta-analytical methods would possibly result in an overestimation of the true frequency of such reactions. 133 In addition, for the Bayesian meta-analysis, we used noninformative priors, whose effect was further diluted by including a large number of primary studies, further decreasing the risk of priors dominating the results. 134 Another methodologic strength concerns the performance of metaregression and subgroup analyses, aiming to identify patient or clinical characteristics associated with differences in the outcomes. Finally, we performed a comprehensive search, encompassing three different electronic bibliographic databases and not using exclusion criteria based on the date or language of publication.
In conclusion, and from a clinical point of view, this study suggests that overall, severe reactions are rare, occurring at an average of one reaction for each 1700 patients undergoing a DPT.
In addition, the included primary studies did not report any fatal reactions; indeed, in a comprehensive search of the literature beyond the eligibility criteria of this systematic review, we only found one death described after a DPT to a penicillin, although this case was potentially attributable to resensitization to clavulanate. 135 However, our results also point that the risk of a severe reaction may also identify those low-risk patients who may undergo a direct DPT, in whom our results did not identify an increased risk of severe reactions. 136,137 Testing the suspected culprit drug did not associate with clear increased risk of severe reactions, and therefore should be encouraged. 138 To the best of our knowledge, this is the first systematic review with meta-analysis assessing the frequency of severe reactions following DPTs. Future primary studies may allow for a more thorough exploration of this issue, by providing more details on their methodology (particularly regarding eligibility criteria and DPT procedures) or even anonymized individual participant-level data ( Table 4). The results of this study support, from a safety point of view, the performance of DPTs during the diagnostic workup of penicillin allergy, particularly if a detailed allergy history has been obtained, evidence-based recommendations are followed and there is appropriate supervision by an allergy specialist. This is particularly important, since delabeling patients reporting a penicillin allergy has been recommended as an antibiotic stewardship tool, to contribute to a more adequate prescription of antibiotics, minimizing patients' risks and improving clinical and economic outcomes.