Laryngoscopy is the principal tool for the clinical assessment of vocal fold paralysis (VFP). Yet no consistent, unified vocabulary to describe laryngoscopic findings exists, compromising the evaluation and comparison of cases, outcomes, and treatment. The goal of this investigation was to evaluate laryngoscopic findings in VFP for inter- and intra-rater consistency.
Prospective survey-based study.
Half-minute excerpts from stroboscopic exams of 22 patients with VFP were mailed to 22 fellowship-trained laryngologists. Each reviewer was sent exams in randomized order, with three random repeats included to determine intra-rater reliability. Twelve laryngoscopic criteria were assessed and recorded on preprinted sheets. Eleven criteria were binary in nature (yes/no); glottic insufficiency was rated on a four-point scale (none/mild/moderate/severe). Raters were blinded to clinical history, each other's ratings, and to their own previous ratings. Inter-rater agreement was calculated by Fleiss' kappa.
Twenty reviewers (91%) replied. Intra-rater reliability by reviewer ranged from 66% to 100% and by laryngoscopic criterion from 77% to 100%. Of the laryngoscopic criteria used, glottic insufficiency (κ = 0.55), vocal fold bowing (κ = 0.49), and salivary pooling (κ = 0.45) showed moderate agreement between reviewers. Arytenoid stability (κ = 0.1), arytenoid position (κ = 0.12), and vocal fold height mismatch (κ = 0.12) showed poor agreement. The remainder showed slight to fair agreement.
Inter-rater agreement on commonly used laryngoscopic criteria is generally fair to poor. Glottic insufficiency, vocal fold bowing, and salivary pooling demonstrated the most agreement among responding laryngologists. These findings suggest a need for a standardized descriptive scheme for laryngoscopic findings in VFP. Laryngoscope, 2010