Performance of lung ultrasound in the diagnosis of pediatric pneumonia in Mozambique and Pakistan

Abstract Introduction Improved pneumonia diagnostics are needed in low‐resource settings (LRS); lung ultrasound (LUS) is a promising diagnostic technology for pneumonia. The objective was to compare LUS versus chest radiograph (CXR), and among LUS interpreters, to compare expert versus limited training with respect to interrater reliability. Methods We conducted a prospective, observational study among children with World Health Organization (WHO) Integrated Management of Childhood Illness (IMCI) chest‐indrawing pneumonia at two district hospitals in Mozambique and Pakistan, and assessed LUS and CXR examinations. The primary endpoint was interrater reliability between LUS and CXR interpreters for pneumonia diagnosis among children with WHO IMCI chest‐indrawing pneumonia. Results Interrater reliability was excellent for expert LUS interpreters, but poor to moderate for expert CXR interpreters and onsite LUS interpreters with limited training. Conclusions Among children with WHO IMCI chest‐indrawing pneumonia, expert interpreters may achieve substantially higher interrater reliability for LUS compared to CXR, and LUS showed potential as a preferred reference standard. For point‐of‐care LUS to be successfully implemented for the diagnosis and management of pneumonia in LRS, the clinical environment and amount of appropriate user training will need to be understood and addressed.

Conclusions: Among children with WHO IMCI chest-indrawing pneumonia, expert interpreters may achieve substantially higher interrater reliability for LUS compared to CXR, and LUS showed potential as a preferred reference standard. For point-ofcare LUS to be successfully implemented for the diagnosis and management of pneumonia in LRS, the clinical environment and amount of appropriate user training will need to be understood and addressed.
K E Y W O R D S chest ultrasound, childhood pneumonia, interrater reliability, low-resource settings

| INTRODUCTION
Each year, approximately 920,000 children die before their fifth birthdays due to pneumonia. 1 Greater access to appropriate and effective pneumonia diagnostics, particularly in low-resource settings (LRS), is critical to addressing child mortality. In LRS, pneumonia is identified using the World Health Organization (WHO) Integrated Management of Childhood Illness (IMCI) guidelines that depend on assessing variable and subjective clinical signs, specifically respiratory rate and chest indrawing. 2 It is not clear how effective WHO IMCI guidelines are in identifying pneumonia, 3 and because the guidelines prioritize diagnostic sensitivity over specificity, there is concern regarding antimicrobial overuse and resistance. 4 Diagnostic alternatives to WHO IMCI also have challenges. 5 Clinical diagnosis not using WHO IMCI guidelines lack standardization. If available, chest radiographs (CXR) can be expensive, difficult to obtain, timeconsuming, and expose the child to ionizing radiation. [5][6][7] Microbiology (e.g., blood, lung/pleural aspiration, and/or bronchoalveolar lavage culture) is invasive, slow, and detects a limited proportion of cases. 5 Biomarkers such as C-reactive protein can correlate with bacterial infection but do not have a set threshold nor indicate a specific etiology. 5 Given these limitations and that diagnostic tests used for pediatric pneumonia have not been sufficiently validated despite their routine use, there is no satisfying safe and effective reference standard for the accurate and reliable diagnosis of pediatric pneumonia. 8 Lung ultrasound (LUS) is a promising technology that can dynamically visualize the lungs with potentially high diagnostic accuracy for pneumonia. 6 Advantages of LUS, relative to CXR, include its lower cost, portability, ease of use, and absence of ionizing radiation. 6,7,9 We conducted a pilot study in Mozambique and Pakistan to investigate the use of point-of-care LUS as a tool for the diagnosis of pediatric pneumonia in LRS among children with WHO IMCI chest-indrawing pneumonia.

| Study design, setting, and participants
The methods of this study have been described previously. 10    where there were more than three interpreters, three interpretations were randomly selected, and if the first two interpretations were discordant, the third would act as a tiebreaker.   Table 2. Numbers of LUS and CXR pneumonia determinations as classified by expert LUS and expert CXR interpreters are presented by country in Table A1 and graphically in Figure 3. LUS Onsite LUS interpreters in Mozambique diagnosed pneumonia 0.5 times or less frequently than expert LUS interpreters (7.2% and 2.1% for onsite interpreters A and B vs. 15.5% for final expert LUS interpretation). Onsite LUS interpreters in Pakistan diagnosed pneumonia about 1.4 times more frequently than expert LUS interpreters (62.6% and 63.4% for onsite interpreters C and D vs. 45.5% for final expert LUS interpretation; Table A3). As shown in Table 3b, the interrater reliability observed between expert CXR interpreters for whom more than 10 paired interpretations were available varied widely ranging from very poor to moderate (κ from −0.036 to 0.619).

| Statistical analysis
When restricted to the same subsets of scans as used by each pair of CXR interpreters, the kappa estimates for the two experts LUS interpreters were substantially higher (all >0.80 and most >0.90) than the corresponding kappa estimates for the expert CXR interpreters.  As demonstrated in this pilot with poor-to-moderate interrater reliability even among trained expert CXR interpreters, CXR itself is an imperfect reference standard, and, therefore, limited our ability to accurately assess LUS performance. Compelling evidence indicates that LUS may have greater sensitivity or specificity when compared with CXR, a diagnostic not readily available in LRS. 6,22,26,28,34,35 Initially, we considered analyzing the data using CXR as the reference standard (Table A5). However, CXR is a poor reference standard, and diagnosing pediatric pneumonia when there is no proven accurate and reliable gold standard is problematic. 8