The GRADE approach to grading the quality of evidence and strength of recommendations provides a comprehensive and transparent approach for developing clinical recommendations about using diagnostic tests or diagnostic strategies. Although grading the quality of evidence and strength of recommendations about using tests shares the logic of grading recommendations for treatment, it presents unique challenges. Guideline panels and clinicians should be alert to these special challenges when using the evidence about the accuracy of tests as the basis for clinical decisions. In the GRADE system, valid diagnostic accuracy studies can provide high quality evidence of test accuracy. However, such studies often provide only low quality evidence for the development of recommendations about diagnostic testing, as test accuracy is a surrogate for patient-important outcomes at best. Inferring from data on accuracy that using a test improves outcomes that are important to patients requires availability of an effective treatment, improved patients’ wellbeing through prognostic information, or – by excluding an ominous diagnosis – reduction of anxiety and the opportunity for earlier search for an alternative diagnosis for which beneficial treatment can be available. Assessing the directness of evidence supporting the use of a diagnostic test requires judgments about the relationship between test results and patient-important consequences. Well-designed and conducted studies of allergy tests in parallel with efforts to evaluate allergy treatments critically will encourage improved guideline development for allergic diseases.