A systematic review of validated methods for identifying pulmonary fibrosis and interstitial lung disease using administrative and claims data


G. Schneider, Epidemiology and Database Analytics, United BioSource Corporation, 430 Bedford St., Suite 300, Lexington, MA, 02420 USA. E-mail: gary.schneider@unitedbiosource.com



The Food and Drug Administration's Mini-Sentinel pilot program initially aimed to conduct active surveillance to refine safety signals that emerge for marketed medical products. A key facet of this surveillance is to develop and understand the validity of algorithms for identifying health outcomes of interest (HOIs) from administrative and claims data. This paper summarizes the process and findings of the algorithm review of pulmonary fibrosis and interstitial lung disease.


PubMed and Iowa Drug Information Service Web searches were conducted to identify citations applicable to the pulmonary fibrosis/interstitial lung disease HOI. Level 1 abstract reviews and Level 2 full-text reviews were conducted to find articles using administrative and claims data to identify pulmonary fibrosis and interstitial lung disease, including validation estimates of the coding algorithms.


Our search revealed a deficiency of literature focusing on pulmonary fibrosis and interstitial lung disease algorithms and validation estimates. Only five studies provided codes; none provided validation estimates. Because interstitial lung disease includes a broad spectrum of diseases, including pulmonary fibrosis, the scope of these studies varied, as did the corresponding diagnostic codes used.


Research needs to be conducted on designing validation studies to test pulmonary fibrosis and interstitial lung disease algorithms and estimating their predictive power, sensitivity, and specificity. Copyright © 2012 John Wiley & Sons, Ltd.


Mini-Sentinel is the Food and Drug Administration's (FDA) pilot program that aimed to conduct active surveillance of administrative and health care claims data. The initial goal is to refine safety signals that emerge for marketed medical products. Essential components of this exercise are as follows: (i) to identify administrative and claims data—friendly algorithms used to detect various health outcomes of interest (HOIs) and (ii) to identify the performance characteristics of these algorithms as measured within the studies in which they were used. In this article, we describe the algorithm review process and findings for 1 of the 20 HOIs selected for review by the Mini-Sentinel Protocol Core: pulmonary fibrosis (PF) and interstitial lung disease (ILD).

Interstitial lung diseases, also called diffuse parenchymal lung diseases, are a diverse group of pulmonary disorders with similar clinical, radiographic, physiologic, and/or pathologic features.[1] As delineated by a joint consensus statement from the American Thoracic Society and European Respiratory Society, ILD can be grouped into four main categories: disorders of known causes (e.g., collagen vascular disease or lung disorders associated with drug, occupational, or environmental exposure), idiopathic interstitial pneumonias (idiopathic pulmonary fibrosis [IPF] and others, e.g., nonspecific interstitial pneumonia, desquamative interstitial pneumonia, respiratory bronchiolitis-associated ILD, acute interstitial pneumonia, cryptogenic organizing pneumonia, and lymphocytic interstitial pneumonia), granulomatous lung disorders (e.g., sarcoidosis), and other forms of ILD of unknown cause (e.g., lymphangioleiomyomatosis [LAM], pulmonary Langerhans cell histiocytosis/histiocytosis X [HX], and eosinophilic pneumonia).[2]

The incidence of ILD is approximately 30 per 100 000; about half of these cases are classified as PF.[1] Treatment and prognosis of ILD/PF depend on the specific diagnosis and its histopathologic features. Corticosteroid treatment can be beneficial for some patients, including those with nonspecific interstitial pneumonia, cryptogenic organizing pneumonia, and desquamative interstitial pneumonia.[1, 2] However, for other disorders such as IPF and LAM, corticosteroid treatment is generally ineffective; in such disorders, mortality is high and lung transplantation may be the only viable treatment option.[1]


The general search strategy originated from prior work by the Observational Medical Outcomes Partnership and its contractors, and was modified slightly by Mini-Sentinel investigators for the 20 HOIs selected for review.

Details of the methods for these systematic reviews can be found in the accompanying manuscript by Carnahan and Moores (on page 82 in this issue). In brief, the base PubMed search was combined with the following terms to represent the HOI: “Lung Diseases, Interstitial,” “Pulmonary Fibrosis,” “idiopathic pulmonary fibrosis,” and “interstitial” AND (“lung” OR “pulmonary”).

To identify other relevant articles that were not found in the PubMed search, the Iowa Drug Information Service Web (IDIS/Web) was also searched using a similar search strategy. Both the PubMed and IDIS/Web searches were conducted on 10 May 2010. The details of these searches can be found in the full report on the Mini-Sentinel website: http://mini-sentinel.org/foundational_activities/related_projects/default.aspx.

The search results from different databases were compiled and duplicate results eliminated by University of Iowa investigators, who used a citation manager program. The results were then output and provided to organizations contracted to conduct the literature reviews. Mini-Sentinel collaborators were also asked to help identify relevant validation studies.

The abstract of each citation identified was reviewed by two investigators. When either investigator selected an article for full-text review, the full text was reviewed by both investigators. Agreement on whether to review the full text or include the article in the evidence table was calculated using the Cohen's kappa statistic.

A single investigator abstracted each study for the final evidence table. The data included in the table were confirmed by a second investigator for accuracy. A clinician or topic expert was consulted to review the results of the evidence table and discuss how they compared with diagnostic methods currently used in clinical practice. This included whether certain diagnostic codes used in clinical practice were missing from the algorithms, and the appropriateness of the validation definitions compared with diagnostic criteria currently used in clinical practice.


The PubMed and IDIS/Web searches identified 203 and 7 citations, respectively. The total number of unique citations from the combined searches was 210. Mini-Sentinel collaborators provided no additional published or unpublished reports of validation studies.

Of the 210 abstracts reviewed, we accepted only 20 for full-text review. The straightforward inclusion criteria—consisting of (i) examination of the HOI, (ii) use of administrative and claims database, and (iii) study conducted in the United States or Canada—enabled perfect agreement between the two reviewers on acceptance/rejection status, although there was variation in the reasons for rejection.

Exclusion criteria for full-text review consisted of the following: (i) poorly described HOI identification algorithm and (ii) no validation of the outcome definition or reporting of validity estimates. Both reviewers agreed that none of the 20 full-text articles reviewed fulfilled this second criterion; therefore, five studies identified as fulfilling all other criteria were reviewed.


We identified five studies that provided codes for PF or ILD. Suissa et al. used ICD-9 codes 515, 516.3, 516.8, and 516.9 to identify spontaneous reports of ILD among patients with rheumatoid arthritis.[3] Raghu et al. used ICD-9 code 516.3 as the basis of their broad and narrow IPF definitions.[4] Ehrlich et al. used ICD-9 codes 515 and 516.3 to identify PF in patients with and without diabetes.[5] Pinheiro et al. used the ICD-10 code J84.1 to identify occupational risks for IPF.[6] Finally, ICD-9 code 501 and ICD-10 code J61 were used by Gan et al. to identify asbestosis, a special type of ILD occurring from asbestos exposure.[7] Information on the study populations, outcomes, and algorithms used in each of these studies is presented in Table 1.

Table 1. Pulmonary fibrosis and interstitial lung disease coding algorithms
CitationStudy population and time periodDescription of outcome studiedAlgorithm
Ehrlich et al. 2010[5]Kaiser Permanente Medical Care program in Northern California. The study cohort (n = 121 886) was drawn from a population of 1 811 228 members aged <18 years as of 1 January 1996.Incidence of asthma, chronic obstructive pulmonary disease, pulmonary fibrosis, pneumonia, and lung cancer in patients with and without a diagnosis of diabetes.Pulmonary fibrosis, ICD-9 515 (chronic postinflammatory) and 516.3 (idiopathic).
Gan et al. 2009[7]British Columbia Linked Health Database data, individuals ≥15 years of age. The study cohort included 1170 new asbestosis cases (1121 men, 49 women) identified using workers' compensation records (n = 271), hospitalization records (n = 562), and outpatient records (n = 582) from 1992 to 2004.Population-based surveillance of asbestosis using multiple health data sources.ICD-9 code 501 (asbestosis) and ICD-10 code J61 (pneumoconiosis due to asbestos and other mineral fibers) to identify asbestosis cases.
Pinheiro et al. 2008[6]United States National Institute for Occupational Safety and Health mortality surveillance system for respiratory diseases of occupational interest; multiple cause-of-death data compiled by the National Center for Health Statistics for US residents aged 15 years and older, from 1999 to 2003.Idiopathic pulmonary fibrosis mortality rate and occupational risks.The term “IPF” refers here to the group of diseases classified under ICD-10 code J84.1, comprising “Other interstitial pulmonary diseases with fibrosis, including fibrosing alveolitis (cryptogenic), Hamman–Rich syndrome, and idiopathic pulmonary fibrosis.” Cases were defined as those decedents whose death certificates mentioned ICD-I0 code J84.1 (i.e., IPF) as the underlying or contributing cause of death and did not mention any other type or cause of interstitial lung disease.
Raghu et al. 2006[4]Unspecified data source. Data were obtained from the health care claims processing system of a large US health plan (1996–2000) that consisted of claims for service facilities (e.g., hospitals), health care professionals (e.g., physicians), and retail pharmacies and provided services through health maintenance organizations, preferred provider organizations, Medicare Risk, and indemnity products to approximately three million persons residing in 20 states. The study sample consisted of all persons 18 years or older who were eligible for comprehensive health benefits for at least 1 day in calendar year (CY) 2000.Annual incidence and prevalence of idiopathic pulmonary fibrosis in the United States.Algorithms with “broad” and “narrow” case definitions of IPF.
IPF (“broad case definition”) was identified as (i) one or more medical encounters with a diagnosis code for IPF (ICD-9-CM 516.3) and (ii) no medical encounters with a diagnosis code for any other type of ILD on or after the date of their last medical encounter with a diagnosis of IPF.
IPF (“narrow case definition”) was identified as (i) satisfaction of the broad case definition and (ii) one or more medical encounters with a procedure code for surgical lung biopsy (ICD-9-CM 33.28, 34.21; CPT-4 32095, 32100–32160, 32602), transbronchial lung biopsy (ICD-9-CM 33.27; CPT-4 31628, 31629), or computed tomography of the thorax (ICD-9-CM 87.41; CPT-4 71250, 71260, 71270) on or before the date of their last medical encounter with a diagnosis of IPF.
Suissa et al. 2006[3]PharMetrics Patient-Centric Database, 1 September 1998–31 December 2003. Subjects were 18 years of age or older at cohort entry.Risk of ILD in patients with rheumatoid arthritis treated with leflunomide.Cases of probable drug-related ILD were identified from inpatient encounters as all subjects who were hospitalized with a first-time primary diagnosis of postinflammatory lung fibrosis (ICD-9 code 515), idiopathic fibrosing alveolitis (code 516.3), or other/unspecified alveolar pneumonopathies (codes 516.8 and 516.9).


Interstitial lung diseases are a diverse group of pulmonary disorders classified together because of similar clinical, physiologic, or pathologic features.[1] The diversity of the diseases classified under ILD is well illustrated by Raghu et al.,[4] who used a total of 36 ICD-9 codes in their IPF algorithm; 1 to define IPF (ICD-9-CM 516.3) and 35 additional codes that are consistent with other ILDs and were used as exclusion criteria.

Despite this vast collection of ICD-9-CM codes consistent with ILD, we found that papers that studied ILD used only five ICD-9-CM codes, specifically 515, 516.3, 516.8, 516.9, and 501 (note that Raghu et al.[4] studied IPF, a subtype of ILD). ICD-9-CM codes 516.8 and 516.9 were used to broadly define ILD. ICD-9-CM codes 515 and 516.3 were used to define PF, whereas IPF, a subtype of PF, was defined by ICD-9-CM code 516.3. ICD-9-CM code 501 was used to identify asbestosis, which can be considered a subgroup of ILD. The limited breadth of codes used for ILD identification in the reviewed literature, despite the recognition of 36 ILD-consistent ICD-9-CM codes, suggests that the general literature search used by Mini-Sentinel was inadequate for this HOI.

Regardless of the diagnostic codes used for PF/ILD algorithms, fine tuning of algorithms may be possible via relevant procedural codes, such as those for lung biopsies, and relevant imaging techniques (e.g., high-resolution computed tomography). Such strategies were applied by Raghu et al.,[4] who narrowed their IPF algorithm by requiring cases to have one or more medical encounters with a procedure code for surgical lung biopsy, transbronchial lung biopsy, or computed tomography of the thorax. These procedural codes added specificity to the algorithm but reduced the number of IPF patients identified by approximately threefold.[4] There are, however, limiting factors that must be considered when using procedural codes from administrative and claims data. Most notably, information on results and diagnosis confirmation is generally not available.

Algorithm development may be hindered further by potential differential coding resulting from the specialization of health care providers and the settings of health care. For example, we speculate that the codes chosen may be associated with the perceived certainty of the diagnosis, which may vary between specialists and primary care providers. Likewise, there may be differences in inpatient and outpatient settings. These all potentially result in various diagnostic codes being captured in administrative and claims health care databases and may influence algorithm development.


There are almost certainly definitional problems pertaining to each of the codes used within the identified PF/ILD algorithms. However, because none of these algorithms provided validation, the extent of these problems cannot be known. We suspect that ICD-9-CM codes 515 and 516.3 are most likely too narrow to identify all PF/ILD cases. By contrast, the ICD-9-CM codes 516.8 and 516.9 have more-extensive definitions and are perhaps used when there is uncertainty about a specific PF/ILD diagnosis. There are also many diagnostic codes consistent with ILD that were not incorporated into the reviewed studies, suggesting that even these more-extensive diagnostic codes may be inadequate to capture all ILD cases. Therefore, using the codes identified in this literature review for case identification, even in combination with procedural codes, may not provide the desired levels of sensitivity and specificity. The scarcity of literature providing validated or non-validated algorithms for PF/ILD suggests the need for additional research focused on identifying and designing validation studies that test PF/ILD algorithms and estimate their predictive power, sensitivity, and specificity.


The authors declare no conflict of interest. This is not product-specific or privately funded research. The views expressed in this document do not necessarily reflect the official policies of the Department of Health and Human Services, nor does this document mention trade names, commercial practices, or organizations imply endorsement by the US government.


  • There is limited literature focusing on pulmonary fibrosis and interstitial lung disease that provides administrative and claims data-based coding algorithms and validation estimates.
  • The broad spectrum of diseases under the umbrella of interstitial lung disease may complicate algorithm development for common subtypes, such as pulmonary fibrosis.
  • Additional research is needed regarding the use of administrative and claims data-based coding algorithms to identify pulmonary fibrosis and interstitial lung disease.


This work was supported by the Food and Drug Administration (FDA) through Department of Health and Human Services (HHS) Contract Number HHSF223200910006I.