Natural Language Processing to identify pneumonia from radiology reports

Authors


  • This work was presented as an oral presentation at the HMO Research Network Annual Conference in Boston, Massachusetts on 24 March 2011, where it received an Early Career Investigator Award.

Correspondence to: S. Dublin, Group Health Research Institute, 1730 Minor Avenue, Suite 1600, Seattle, WA 98101-1448, USA. E-mail: dublin.s@ghc.org

ABSTRACT

Purpose

This study aimed to develop Natural Language Processing (NLP) approaches to supplement manual outcome validation, specifically to validate pneumonia cases from chest radiograph reports.

Methods

We trained one NLP system, ONYX, using radiograph reports from children and adults that were previously manually reviewed. We then assessed its validity on a test set of 5000 reports. We aimed to substantially decrease manual review, not replace it entirely, and so, we classified reports as follows: (1) consistent with pneumonia; (2) inconsistent with pneumonia; or (3) requiring manual review because of complex features. We developed processes tailored either to optimize accuracy or to minimize manual review. Using logistic regression, we jointly modeled sensitivity and specificity of ONYX in relation to patient age, comorbidity, and care setting. We estimated positive and negative predictive value (PPV and NPV) assuming pneumonia prevalence in the source data.

Results

Tailored for accuracy, ONYX identified 25% of reports as requiring manual review (34% of true pneumonias and 18% of non-pneumonias). For the remainder, ONYX's sensitivity was 92% (95% CI 90–93%), specificity 87% (86–88%), PPV 74% (72–76%), and NPV 96% (96–97%). Tailored to minimize manual review, ONYX classified 12% as needing manual review. For the remainder, ONYX had sensitivity 75% (72–77%), specificity 95% (94–96%), PPV 86% (83–88%), and NPV 91% (90–91%).

Conclusions

For pneumonia validation, ONYX can replace almost 90% of manual review while maintaining low to moderate misclassification rates. It can be tailored for different outcomes and study needs and thus warrants exploration in other settings. Copyright © 2013 John Wiley & Sons, Ltd.

Ancillary