Allergen immunotherapy for respiratory allergy: Quality appraisal of observational comparative effectiveness studies using the REal Life Evidence AssessmeNt Tool. An EAACI methodology committee analysis

Abstract Background Observational comparative effectiveness studies in allergen immunotherapy (AIT) represent an important evidence source answering research questions that can be challenging to obtain from randomized controlled trials (RCTs), such as long‐term benefits of AIT, the effects on asthma prevention and the onset of new allergen sensitizations. However, observational studies are prone to several sources of bias, which limit their reliability. The REal Life Evidence AssessmeNt Tool (RELEVANT) was recently developed to assist in quality appraisal of observational comparative research to enable identification of useful nonrandomized studies to be considered within guideline development. Objective To systematically appraise the quality of published observational comparative AIT studies using RELEVANT. Methods Observational studies comparing AIT to pharmacotherapy for respiratory allergies, assessing as outcome measures reduction of symptoms and/or medication use reduction, were retrieved by computerized bibliographic searches. According to RELEVANT, a failure to meet any one of primary items (background, design, measures, analysis, results, discussion/interpretation, and conflict of interest) represents a critical flaw, significantly undermining the validity of the study results. Results The 14 studies identified supported the benefit of AIT in real‐life, which persists after treatment discontinuation. However, none of them met all the 7 primary RELEVANT criteria. The main defects were reported in the design (28.6% of studies), measures and analysis (64.3% of studies), and results (78.6% of studies) items, due to selection bias and lack of methods for adjusting controls. Half of the studies did not report on conflict of interest. Conclusion There is a need for more robust observational research in AIT. RELEVANT appears as an easy‐to‐use and sensitive tool for quality appraisal in AIT studies.


| INTRODUCTION
Allergen immunotherapy (AIT), administered by both the subcutaneous (SCIT) and sublingual (SLIT) routes, effectively treats allergic rhino-conjunctivitis and asthma. [1][2][3][4][5][6] AIT's benefit in reducing both the symptoms and the use of rescue medications lasts beyond the duration of the treatment and is therefore thought to be disease modifying. [7][8][9][10] Current guidelines recommend at least 3 years of therapy to obtain a sustained clinical benefit after AIT completion. 7 This recommendation is mainly based on evidence from observational studies, since data from SLIT randomized controlled trials (RCTs) are limited. 8,11 However, evidence from observational studies is often ranked below that from RCTs in traditional evidence hierarchies, as they are prone to several sources of bias and most studies do not account for these. 12,13 Therefore, it is often difficult to assess whether AIT observational studies are of sufficient quality to be considered within the context of clinical guidelines, despite their importance in providing fundamental complementary information, such as treatment persistence, adherence, and long-term benefit, that cannot, or are very challenging, to obtain from traditional RCT designs.
Recently, the Respiratory Effectiveness Group (REG) and European Academy of Allergy and Clinical Immunology (EAACI) joint Task Force developed the REal Life EVidence AssessmeNt Tool (RELE-VANT) precisely in order to assist in quality appraisal of observational comparative effectiveness research. 14,15 The tool, like the Cochrane Collaboration's Risk Of Bias In Non-randomized Studiesof Interventions (ROBINS-I), 13 is designed to identify evidence which are robust enough (i.e., low risk of bias) to inform clinical practice and to warrant consideration by guideline bodies.
Although RELEVANT was developed and has been validated for studies of asthma, it could, theoretically, also be applicable to general quality appraisal of observational comparative studies across other medical specialties.
The principal aim of this study was to systematically review observational studies on the effectiveness of AIT in treating of respiratory allergy as compared to standard therapy. This was done by RELEVANT in order to identify evidence of sufficient quality that can be integrated with the findings from RCTs, and thus provide a more complete picture on which to base clinical recommendations.

| METHODS
This study is an EAACI position paper related to ROC, which commissioned the analysis, and its activities.

| Data sources and searches
The primary sources of the reviewed studies were Medline, the Web of Science, and LILACS (inception to April 30, 2020) using a specific search strategy with the following medical subject head-

| Study selection
We required that studies: (i) were prospective or retrospective observational studies comparing subjects treated with AIT to subjects treated with standard pharmacotherapy who did not receive AIT; (ii) included monosensitized or polysensitized patients with allergic rhinitis/rhino-conjunctivitis/rhino-sinusitis and/or asthma with positive allergen-specific skin prick tests, and/or elevated serum allergenspecific IgE; and (iii) reported symptoms and/or medication use assessed by any measurement tool (e.g., symptom score, medication score, visual analogue score, etc.) as outcome measure of the treatment effect. Studies were excluded if they did not meet these criteria for study design or population, intervention, or outcomes of interest.

| Data extraction and risk of bias assessment
Two separate reviewers (DDB, GP) independently extracted the study data. The accuracy of data extraction was confirmed by a third reviewer (EH). Disagreements were solved by consensus adjudication. We used the Real Life EVidence AssessmeNt Tool (RELEVANT) to evaluate the quality standards in the selected observational comparative effectiveness studies. 14,15

| RELEVANT-based quality assessment
The analysis started from the assessment of study quality based on the seven primary items (11 sub-items) which are critical for enabling   were also analyzed in some studies. 20,24 One study assessed as measure of efficacy the AIT the inhaled corticosteroid-sparing effect in patients with asthma (considered as asthma medication score). 29 There was no failure in the fulfilment of this item (Figure 1).

2.
Design. This item was fulfilled in nine studies. 16,17,[22][23][24][26][27][28][29] The "Population was defined" in all studies (sub-item 2.1). Regarding the comparison groups (sub-item 2.2), Five studies did not clarify the criterion used for patient's allocation in each respective group. [18][19][20][21]25 The other studies declared that the allocation was related to the patient's or the parent's choice or was based on clinical reason, according to guidelines (patients with a severe disease, with chronic exposure to allergens, such as animal dander, who were willing to reduce drug use).
No study used a historical control.
3. Measures. One study failed to clearly define which AIT vaccine or vaccines was used in the treatment group (exposure: sub-item 3.1). 29 In particular, the authors did not report which route of AIT they used (SCIT or SLIT), the AIT allergen extract (e.g., pollens or house dust mites), vaccine formulation (e.g., native-conjugated or allergoid) and the manufacturer. 29 Eight studies did not report which was the primary outcome In all studies, except one, 21 groups were compared at baseline (sub-item 4.2). However, the number and importance of characteristics considered for the comparison varied greatly across the studies. studies. 19,25,29 In particular, the importance of possible confounders was not considered.
6. Discussion/interpretation. Generally, the results of all the observational studies analyzed confirmed findings from previous RCTs that AIT is effective in the pragmatic settings ("real-life") (subitem 6.1). However, additional information relative to RCTs was making causal inferences from observational studies, as long as the consistency is not produced by a pervasive systematic confounder, such as a selection bias, or by a set of systematic biases that together produce a consistent bias in the same direction across studies.
Unfortunately, none of the 14 AIT comparative effectiveness studies included in this analysis meet all the 11 RELEVANT primary sub-items, and are therefore deemed of insufficient quality to be eligible to robustly inform guidelines development.
As expected, selection bias was the most important limitation to internal validity of the studies, hindering the ability to make valid causal inferences for AIT effectiveness (Figure 2; domains 2 and 4).
About 80% of the studies insufficiently recognized and controlled pre-existing characteristics of the groups being compared, which could potentially lead to distinct prognoses (domains 2, 4 and 5).
Most studies did not report sufficient details on baseline population characteristics. Only a couple of studies tried to control for this problem, making groups more comparable based on matching by other baseline characteristics, such as patients' sensitization status (mono-or poly-sensitization), allergy duration, comorbidities, disease severity at baseline, persistent or seasonal disease. Adjusting controls by these characteristics might have mitigated the role of confounders. For example, patients with more severe presenting symptoms may be more likely to get selected for an intervention (i.e., confounding by indication), so it should be important to match patients and controls for disease severity. Thus, when the choice was based on patient's preference, we assume that a patient who refused AIT was comparable to those who accepted it in terms of disease severity. However, when the AIT was based on recommendations from the guidelines (more severe disease, side effects with standard therapy, chronic allergen exposure, e.g. animal dander, or willingness to reduce drug use by the patients), it is more likely that there are greater differences between patients and controls. 7,37,38 The attention paid to selection bias in RELEVANT is highlighted by the impact of confounders in three different primary items: Design (primary item #2), Analysis (primary item #4) and Results (primary item #5). This caused some uncertainty when we rated the studies, owing to the fact that neglecting to account for confounders duplicates the negative rating. For example, if confounders are not taken into account in the study design (matching treatment and controls) or adjusted for by statistical analysis (sub-item 4.1), as a consequence they will not be clearly presented in the result section (item 5) either, and thus leading to a negative rate for two or three items implying confounders. However, considering that a failure even in a single item is considered a fatal flaw by RELEVANT, no difference in the final judgment may arise due to a doubtful interpretation of this part of the tool.
Attrition bias may also have affected the results. This was not accounted for in most studies. In particular, it was completely ignored by the authors of the largest study, which did not report the rate or reasons of patients lost-to-follow up, 28 potentially changing the characteristics of the groups, irrespective of the exposure or intervention. Methods to address missing data were also absent.
Another defect observed in nine studies was the lack of definition of the exposure (sub-item #3.1), 29 or the primary outcome (item #3.2). Furthermore, some studies used newly created tools to assess the outcome, or ad hoc modification of an existing measurement instrument, tool, or scale, without any supporting evidence of its validity and reliability. This generated unreliable conclusions and interpretation problems on the extent of the difference in treatment effectiveness across different outcomes.
Regarding COI, despite a general consensus favoring disclosure, a disclosure statement was present in only seven studies, four of which declaring no COI. This was independent of publication year.
Although disclosure only reveals the possibility of bias, without any guidance to resolve it, 39

| Strengths and limitations
This is the first time that a tool specifically designed for the appraisal of observational asthma research is used in AIT research.
Notably, the proportion of RELEVANT failed items reported in our analysis is comparable with that observed in asthma studies, 15 which showed no failure in the item #1 (Background) and only 5% studies with a failure in item #6 (Discussion and Interpretation).
This consistency may suggest that the tool is suitable for the use in fields other than asthma, being sensitive to the main limitation of the real-life studies.
This analysis has some limitations. Regarding the specific search strategy, we encountered some difficulties since the outcome was not clear in the title and in the abstract of the retrieved articles.
Furthermore, the definition of AIT changed over the years, and we used different search term to retrieve as many studies as possible.
Despite the possibility that some studies were overlooked, considering the general results, we are confident that they are not likely to have substantially changed the findings of this analysis.
Some uncertainty in interpretation of specific RELEVANT items emerged during the review process. This was probably due to the absence of a user guide, as for other tools, such as GRADE or ROBINS-I, which would have made the tool simpler, reducing potential inter-rater variability. 12,13. Finally, it appears that RELEVANT assessments are influenced by the quality of reporting of research as much as the inherent quality of the study itself, as acknowledged by the EAACI-REG Task Force members. This may result in an underestimation of the quality of the study analyzed. Therefore, a comparison with different established tools such as ROBINS-I, which separates risk of bias from methodological quality and reporting quality, and GRADE, which systematically evaluates the quality of an entire body of evidence, is necessary in order to inform how to best determine the strength of recommendations on AIT. 12,13 In conclusion, this analysis based on RELEVANT allowed us to identify the main defects of comparative effectiveness research on AIT available to date. Based on the results of this analysis, we found a general lack of high-quality real-life effectiveness observational research.
As a consequence, we recommend that future studies should pay close attention to methods for adjusting confounders, clearly define primary outcomes, population and comparison groups, and state potential COI. This will help in providing reliable information that can hardly be obtained from RCTs, such as the duration of benefit after AIT discontinuation, treatment persistence, and adherence. In light of this, establishing AIT registries, with the aim of collecting data in a cohesive way, using standardized protocols will provide an essential source of RWE to promote evidence-based research and quality improvement in study design and clinical decision-making. 40