Performance of an algorithm based on WHO recommendations for the diagnosis of smear-negative pulmonary tuberculosis in patients without HIV infection


Corresponding Author Alonso Soto, Hospital Nacional Hipólito Unanue, Luis Galvani 249, Lima 33, Santiago de Surco, Lima, Peru. Tel.: +51 1 3627777; E-mail:


Objective  To evaluate the performance of an algorithm based on WHO recommendations for diagnosis of smear-negative pulmonary tuberculosis in HIV-negative patients.

Methods  We recruited HIV-negative patients with clinical suspicion of tuberculosis who had had three negative sputum smears in Lima, Peru. All included subjects underwent a complete anamnesis, physical examination and chest X-ray, and had a sputum specimen cultured in Ogawa, Middlebrook 7H9 media and MGIT®. We applied an algorithm based on WHO recommendations to classify patients as having tuberculosis or not. The diagnostic performance of the algorithm was evaluated comparing its results against the reference standard of a positive culture for M. tuberculosis in either of the media used.

Results  A total of 264 of the 285 patients included (92.6%) completed evaluation and follow up. Of these, 70 (26.5%) had a positive culture for M. tuberculosis. Clinical response to a broad spectrum course of antibiotics was good in 32 of these 70 patients (45.7; 95%CI 34.0–57.4%). Overall, the algorithm attained a sensitivity of 22.9% (95% CI 13.1–32.7%) and a specificity of 95.4 % (95% CI 92.4–98.3%) compared to culture results. The positive likelihood ratio was 4.93 and the negative likelihood ratio was 0.81.

Conclusions  The sensitivity and negative likelihood ratio of the algorithm is poor. It should be re-evaluated, and possibly adapted to local circumstances before further use. The clinical response to an antibiotic trial is the most important component to reassess. We also suggest considering performing chest X-ray earlier in the diagnostic work-up.

Performance d’un algorithme basé sur les recommandations de l’OMS pour le diagnostic de la tuberculose pulmonaire à frottis négatif chez les patients non infectés par le VIH

Objectif:  Evaluer la performance d’un algorithme basé sur les recommandations de l’OMS pour le diagnostic de la TB pulmonaire à frottis négatif chez les patients VIH-négatifs.

Méthodes:  Nous avons recruté des patients VIH-négatifs avec une suspicion clinique de TB et avec 3 frottis négatifs, à Lima au Pérou. Tous les sujets inclus ont subi une anamnèse complète, un examen physique et une radiographie du thorax et ont eu un échantillon d’expectoration cultivé sur milieux Ogawa, Middlebrook 7H9 et MGIT®. Nous avons appliqué un algorithme basé sur les recommandations de l’OMS pour classer les patients comme ayant la TB ou non. Les performances diagnostiques de l’algorithme ont étéévaluées en comparant les résultats avec la norme de référence d’une culture positive pour M. tuberculosis sur un des milieux utilisés.

Résultats:  264 des 285 patients inclus (92,6%) ont complété l’évaluation et le suivi. De ceux-ci, 70 (26,5%) avaient une culture positive pour M. tuberculosis. La réponse clinique à un régime d’antibiotiques à large spectre a été bonne chez 32 de ces 70 patients (45,7; IC95%: 34,0–57,4). Dans l’ensemble, l’algorithme a atteint une sensibilité de 22,9% (IC95%: 13,1-32,7) et une spécificité de 95,4% (IC95%: 92,4-98,3) comparativement aux résultats de la culture. Le rapport de prédiction positive était de 4,93 et le rapport de prédiction négative 0,81.

Conclusions:  La sensibilité et le rapport de prédiction négative de l’algorithme sont faibles. Ils devraient être réévalués et éventuellement adaptés aux circonstances locales avant une nouvelle utilisation. La réponse clinique à un essai aux antibiotiques est l’élément le plus important à réévaluer. Nous suggérons aussi d’envisager l’utilisation de la radiographie du thorax plus tôt dans le bilan diagnostique.

Desempeño de un algoritmo basado en las recomendaciones de la OMS para el diagnóstico de la tuberculosis pulmonar con baciloscopia negativa en pacientes sin infección por VIH

Objetivo:  Evaluar el desempeño de un algoritmo basado en las recomendaciones de la OMS para el diagnóstico de la tuberculosis pulmonar con baciloscopia negativa en pacientes VIH negativos.

Métodos:  Hemos reclutado pacientes VIH negativos con sospecha clínica de tuberculosis que tenían 3 esputos negativos en Lima, Perú. Todos los sujetos incluidos fueron sometidos a una anamnesis completa, examen físico y placa de tórax, así como un cultivo de esputo en Ogawa, medio Middlebrook 7H9 y MGIT®. Aplicamos el algoritmo basado en las recomendaciones de la OMS para clasificar los pacientes como con o sin TB. El desempeño del algoritmo en cuanto al diagnóstico se evaluó comparando sus resultados frente a los del estándar de referencia de un cultivo positive de M. tuberculosis en cualquiera de los medios utilizados.

Resultados:  264 de los 285 pacientes incluidos (92.6%) completaron la evaluación y el seguimiento. De estos, 70 (26.5%) tenían un cultivo positivo para M. tuberculosis. La respuesta clínica al tratamiento con antibióticos de amplio espectro era buena en 32 de estos 70 pacientes (45.7; 95%IC 34.0-57.4%). En general, el algoritmo alcanzó una sensibilidad de 22.9% (95% IC 13.1- 32.7%) y una especificidad del 95.4% (95% IC 92.4-98.3%) comparado con los resultados del cultivo. La razón de verosimilitud positiva era 4.93 y la razón de verosimilitud negativa era 0.81.

Conclusiones:  La sensibilidad y la razón de verosimilitud negativa del algoritmo son pobres. Debería reevaluarse, y a ser posible adaptarse a las circunstancias locales antes de continuar siendo utilizado. La respuesta clínica de un ensayo de antibióticos es el componente más importante a ser reevaluado. También sugerimos considerar el realizar una placa de tórax más temprano en el diagnóstico.


Tuberculosis is a major public health challenge, most of all in poorer countries, where its control has been difficult to achieve (World Health Organization 2007; Lonnroth & Raviglione 2008). Smear-negative pulmonary tuberculosis (SNPT) accounts for 30 to 50% of cases of pulmonary tuberculosis (Colebunders & Bastian 2000; Siddiqi et al. 2003) and causes significant morbidity and mortality, particularly in HIV-prevalent settings (Cohen et al. 2008). It also remains one of the most important diagnostic challenges in this field, because conventional microbiological methods lack sensitivity (Foulds & O’Brien 1998; World Health Organization 2007) and culture – the currently accepted reference standard for its diagnosis (Colebunders & Bastian 2000) - requires several weeks before results become available.

The lack of rapid culture systems and molecular diagnostic techniques in most settings with high tuberculosis prevalence has renewed attention to clinical diagnostic tools. Clinical prediction rules (Laupacis et al. 1997) have been developed for SNPT, in the form of ‘scores’ and algorithms (Kanaya et al. 2001; Kudjawu et al. 2006; Mello et al. 2006; Soto et al. 2008b). They are made up from clinical symptoms and signs that are easily obtained during anamnesis and physical examination or from chest X-ray readings. Findings are either assigned ‘points’ or incorporated as a step of a decision tree, in order to yield a probability of having the disease or a dichotomic result (positive/negative for tuberculosis). This result is meant to form the basis of, or at least to assist, the decision making in the diagnostic management of patients with suspected tuberculosis.

For over a decade, in successive publications, World Health Organization (1997, 2003) has recommended an algorithmic approach for the diagnosis and treatment of patients with clinical suspicion of smear-negative pulmonary tuberculosis (SNPT) in poorer settings. More recently, a parallel algorithm to be used in people living with HIV/AIDS was developed alongside the existing one for HIV-negative patients in poor settings (Getahun et al. 2007; World Health Organization 2007). However, prospective studies that formally evaluate the proposed algorithms, in different settings are lacking and uncertainty on their field performance needs to be lifted. The aim of this study was to evaluate an algorithm based on the WHO recommendations for the diagnosis of smear-negative pulmonary tuberculosis in HIV-negative patients in Lima, Peru.



Peru is a Latin American country with high incidence of tuberculosis and low HIV prevalence. We performed our study in two public hospitals - Cayetano Heredia and Hipólito Unanue - in Lima. Both are university-affiliated and Hipólito Unanue is a reference centre for tuberculosis in Peru. The incidence of tuberculosis in the 12 districts covered by these hospitals ranges from 154 to 376 cases per 100 000 inhabitants (Estrategia sanitaria nacional de prevencion y control de la tuberculosis 2008). The estimated HIV prevalence in tuberculosis patients in Lima is 1.2% (Bernabe-Ortiz 2008), but can be higher in reference centres like ours, and a previous study in emergency wards found an HIV prevalence of 14.7% in our patients with tuberculosis (Solari et al. 2008).

Sample size and power

To detect a difference of 15% with a desired diagnostic test performance of 90%, either in terms of sensitivity or specificity, and given an expected prevalence of tuberculosis of 20%, an α error of 0.05 and a power of 80%, a sample size of 205 patients was required.

Recruitment of patients and procedures

The study was conducted from September 2005 to April 2008. It was part of a wider study designed to validate a clinical algorithm for the diagnosis of smear-negative tuberculosis adapted to the local situation (unpublished). We recruited patients older than 18 years with clinical suspicion of smear-negative tuberculosis presenting to the outpatient clinics of the participating hospitals or being admitted to their wards. Clinical suspicion of SNPT was defined as cough for more than 2 weeks as recommended by Peruvian guidelines for diagnosis of tuberculosis (Estrategia sanitaria nacional de prevencion y control de la tuberculosis 2006) corroborated by a recent study in Lima (Otero et al. in press) plus at least one of the following: dyspnoea, thoracic pain, fever, night sweating or weight loss (of any duration) and 3 negative sputum smears with Ziehl-Neelsen stain (the two routine smears recommended by the Peruvian national TB program and one concentrated sputum specimen performed for the purpose of the study). All patients underwent a questionnaire-based complete anamnesis, physical examination and a chest X-ray. A sputum specimen was cultured for M. tuberculosis in Ogawa, Middlebrook 7H9 media and mycobacteria growth indicator tube (MGIT) (Becton Dickinson), following procedures described earlier (Soto et al. 2008a). Typification of M. tuberculosis was based on standard methodology (niacin production, nitrate reduction and catalase enzyme production(Kent & Kubica 1985). Counselling and voluntary ELISA testing for HIV infection was offered. Patients refusing the test or testing positive for HIV were excluded.

To each of the subjects included, we applied the algorithm proposed by the WHO for the diagnosis of SNPT in HIV-negative patients (World Health Organization 2007), with one modification detailed below (Figure 1). The evaluation - and follow up - of all patients was performed by one research fellow (a licensed general practitioner) per centre. The first step of the algorithm consisted of a broad spectrum antibiotic trial. We used doxycycline 100 mg bid for 10 days. Patients who clinically improved, i.e. who reported a meaningful reduction or resolution of constitutional and respiratory symptoms and had resolution of signs at clinical examination were considered not to have tuberculosis. For those showing no improvement, we did not repeat the sputum smear examination proposed in the original WHO algorithm. This step was omitted after discussion of the protocol with the clinicians who were to become involved in the study and with the institutional review boards, because it was felt that it could lead to more delay in diagnosis and possibly increase the loss to follow-up, and was at odds with routine clinical practice. The last step in the algorithm relied on the interpretation of the chest X-ray and on clinical judgment informed by medical history signs and symptoms. To reduce variability in clinician’s judgment, we used a locally derived clinical prediction rule described previously (Soto et al. 2008b) to assess whether patients had a high probability of tuberculosis and to guide the clinician’s decisions. The clinical prediction rule included information on age, the presence of clinical (hemoptysis, weight loss, expectoration) symptoms and radiographic (apical infiltrate or miliary pattern) signs.

Figure 1.

 Algorithm based on WHO recommendations for diagnosis of smear-negative Pulmonary Tuberculosis in HIV negative patients evaluated in the study. *The original algorithm proposed by WHO (reference 2) includes repeat sputum smear examinations (discontinuous lines) before chest X ray and physician’s judgment.


The protocol was approved by the Ethics Committees of the participating hospitals and of Cayetano Heredia University. Written informed consent was obtained from all study patients. TB patients received antituberculous treatment according to current guidelines of the Peruvian national tuberculosis program (World Health Organization 2003).

Statistical analysis

All data were entered in a Microsoft Access database, and checked on paper against database records. All statistical analyses were done with stata 9 (Stata-Corporation, Lakeway, TX, USA, version 9.2). The reference standard for diagnosis of smear-negative pulmonary tuberculosis was a positive culture for M. tuberculosis in any of the culture media. To evaluate the diagnostic performance of the algorithm, we compared its outcome with the result of this reference standard and calculated sensitivity, specificity, positive and negative predictive values as well as likelihood ratios (Dujardin et al. 1994). Differences in proportions were evaluated using the Chi-squared test and we constructed 95% confidence intervals (CI) around proportions.


Two hundred and eighty-five patients were included in the study (Figure 2). Fifty-six SNPT suspects (19.6%) were recruited in the hospital wards and 229 were outpatients, of whom 46 (20.1%) had been referred from primary health care centres and 183 directly consulted the hospitals. The median age was 35 years (interquartile range 23 years) and 52.6% were male. The median duration of symptoms before inclusion was 4 weeks (interquartile range 10 weeks) and 33.3% had a previous history of pulmonary tuberculosis. The most common clinical findings were weight loss (54.8%), hemoptysis (37.5%), and an abnormal physical chest examination, defined as the presence of ronchi, wheezing, crackles or dullness - (60.7%). The sensitivity and specificity of the latter was 62.7% (95% CI 51.7–73.6%) and 40% (95% CI 33.4–46.6%) respectively.

Figure 2.

 Culture results and patient classification according to the algorithm based on WHO recommendations for diagnosis of smear negative pulmonary tuberculosis in HIV negative patients.

Seventy-five (26.3%) patients had a positive culture for M. tuberculosis in one or more of the 3 culture media. For 26 (34.6%) culture was positive in MGIT only (all of them were recultured in solid cultured and typified as M. tuberculosis). MGIT results were missing for 43 suspects (15.1%) due to stockouts for 3 consecutive months. Twenty-one patients (7.4%) were lost to follow-up after prescribing the antibiotic trial treatment (five had a positive culture for M. tuberculosis) and they were classified as ‘indeterminate’ as far as the algorithm result is concerned. Thus, conclusive algorithm results were available for 264 patients (92.6%). The overall prevalence of tuberculosis did not differ in this group (70/264 or 26.5%; 95% CI 21.2–31.8) from that of the patients with incomplete follow-up (5/21 or 23.8%; 95% CI 5.6–42.0).

Figure 2 shows the distribution of subjects according to the successive steps of the algorithm. The number and percentage of positive cultures is indicated for each of the final classification groups. Twenty-five patients (9.5%) were classified as having SNPT by the algorithm. On the one hand, 16 (64.0%; 95% CI 45.2–82.8) indeed had a positive culture for M. tuberculosis. On the other hand, 54 of 239 patients (22.6%; 95% CI 17.3–27.9) classified as not having tuberculosis also had a positive culture. Furthermore, 32 of the 70 patients with a positive culture (45.7%; 95% CI 34–57.4) had shown clinical improvement with antibiotic therapy. Such improvement was also observed in 124 of the 194 patients with negative cultures (63.9%; 95%CI 57.2–70.6). The sensitivity of the antibiotic trial itself was 54.3 (95% CI 42.6–66.0%); the specificity, 63.9 (95% CI 57.2–70.7%). Remarkably, 91 of the 108 patients who did not improve after the antibiotic trial had abnormal chest X-rays and 67 of the 108 (62.0%; 95% CI 52.9–71.2) had apical or miliary infiltrates in chest X-rays. Of these 67 patients, 29 (43.3%; 95% CI 31.4–55.1) had a positive culture.

In terms of overall diagnostic performance, the algorithm attained a sensitivity (Se) of 22.9% (95% CI 13.1–32.7) and a specificity (Sp) of 95.4% (95% CI 92.4–98.3). The positive predictive value was 64% (95% CI 45.2–82.8%) and the negative predictive value 77.4% (95% CI 72.1–81.3 %) The positive likelihood ratio was 4.92 and the negative likelihood ratio 0.81. We repeated the analysis omitting all patients included during the 3-month period when MGIT was out of stock and obtained similar results (26% sensitivity and 96.4% specificity).


The performance of the algorithm based on the WHO recommendations for diagnosis of smear negative pulmonary tuberculosis in HIV-negative subjects in Lima, is far from optimal. According to suggested standards (Dujardin et al. 1994; Van den Ende et al. 2007), it attained a high specificity and moderate positive likelihood ratio, but its very low sensitivity and non-discriminating negative likelihood ratio preclude its use, at least in our setting. Poor reliability of the antibiotic trial seems to be at the root of the problem.

However, some limitations of our study have to be accounted for. We have not exactly replicated the WHO algorithm, since sputum smears were not repeated after the antibiotic trial. But the incremental diagnostic yield of further sputum smears after the second one is very low (Katamba et al. 2007; Mase et al. 2007). Although these results may not be directly extrapolated to a repeat series of the incremental diagnostic yield of successive sputum smears after 2 weeks (which, to our knowledge, not yet been specifically evaluated), in practice such repeated smears would not substantially improve the algorithm’s overall performance. Given a scenario in which the 22 SNPT patients that did not respond to the antibiotic trial and were clinically not considered to have tuberculosis, would all have had at least one positive repeated sputum smear (which is highly improbable), the algorithm’s sensitivity still would only increase to 54%. Nevertheless, additional smears can be associated with patients’ loss to follow-up (Foulds & O’Brien 1998). On the other hand, we standardized the clinical and radiological evaluation by using a clinical prediction rule that has been locally derived, which should have improved the reliability of the clinical evaluation component of the algorithm, just as an expert panel does (Matthys et al. 2009). In summary, we consider that our modification to the WHO algorithm has not significantly affected its performance.

Another limitation of our study - that is inherent to the WHO algorithm – is the mode of assessment of clinical improvement, which relied on patients self-reporting and on standard clinical assessment. This could indeed affect reproducibility, but misclassification – if any – would be random and not affect the validity of our findings.

The important issue to be addressed is that our results question the utility of a key step of the standard diagnostic approach for smear negative tuberculosis suspects in poorer settings. A trial of antibiotics is deemed crucial for the assessment of these patients, and clinical improvement has been considered to rule out the diagnosis of tuberculosis. This assumption is possibly based on observations from earlier studies conducted mainly in Africa (Wilkinson et al. 1997, 2000; Bah et al. 2002; Kudjawu et al. 2006). However, some studies indicated that between 7 and 30% of patients who did respond to an antibiotic trial were eventually diagnosed with tuberculosis (Wilkinson et al. 1997; Somi et al. 1999; O’Brien & Talbot 2003; Siddiqi et al. 2006). In our setting 45.6% of patients with culture-positive tuberculosis would have been missed due to this clinical response to the broad spectrum antibiotic course.

Some differences with the studies above could explain this. In the first place, we used liquid culture media in addition to the conventional solid ones as the reference standard for diagnosis of tuberculosis, which improves sensitivity without loss of specificity (Somoskovi & Magyar 1999; Soto et al. 2008a), For that reason, liquid media are increasingly relied on, both in clinical practice as well as in research settings (Somoskovi & Magyar 1999; Apers et al. 2003; World Health Organization 2007; Soto et al. 2008a). Their use could account for a higher diagnostic yield, also in subjects responding to an antibiotic trial. Secondly, Bah et al. (2002) performed culture only in patients showing no clinical improvement after the antibiotic trial. Such conditional further testing leads to biased results (Goetghebeur et al. 2000). Additionally, we used doxycycline for the antibiotic trial instead of amoxicillin. In line with the results reported by Wilkinson et al. (2000), who used amoxicillin and erythromycin, this should have improved the performance of the trial therapy in culture negative patients since we thus covered more non-mycobacterial pathogens, including ‘atypical’ agents such as Mycoplasma pneumoniae and Chlamydia pneumoniae, which are not covered by amoxicillin and can produce long episodes of cough (Vincent et al. 2000; Kim et al. 2006). For all of the above, we feel that we provide strong evidence that the discriminating power of the antibiotic trial therapy as diagnostic tool for SNPT is discouraging and that, in particular, a favourable response is likely even in patients with pulmonary tuberculosis.

The second problem is the standardisation of chest radiography interpretation and clinical assessment (World Health Organization 2007). To improve diagnostic validity, more accurate laboratory procedures should become widely accessible in poor settings (Foulds & O’Brien 1998; Siddiqi et al. 2003). Sputum concentration techniques could be introduced (Foulds & O’Brien 1998; Bruchfeld et al. 2000; Apers et al. 2003) as well as liquid culture media (World Health Organization 2007). The latter, as well as new molecular methods, have not been widely implemented yet due to their costs and technical requirements, but they could prove to be cost-effective once adapted to the context of poor settings.

In the meanwhile, the WHO algorithm for diagnosis of SNPT in HIV negative subjects ought to be revaluated. The use of an antibiotic trial is a first, critical, step to reassess. We also consider that chest X-ray could be performed earlier in the algorithm, before repeating sputum microscopy, to reduce treatment delay and loss of patients. Incidentally, both these suggested changes would lead to an algorithm more in line with the one recently proposed by the WHO for SNPT diagnosis in HIV-positive patients (Getahun et al. 2007; World Health Organization 2007).


We are grateful to Dr Alberto Mantilla for facilitating the recruitment of patients and to Dr. Marie-Laurence Lambert for participating in the design of the study. The study was funded by the Damian Foundation.