• Open Access

Performance of rapid influenza H1N1 diagnostic tests: a meta-analysis

Authors

  • Haitao Chu,

    1. Division of Biostatistics, School of Public Health, The University of Minnesota at Twin Cities, Minneapolis, MN, USA.
    Search for more papers by this author
  • Eric T. Lofgren,

    1. Department of Epidemiology, The Univerity of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
    Search for more papers by this author
  • M. Elizabeth Halloran,

    1. Department of Biostatistics, The Univerity of Washington at Seattle, Seattle, WA, USA.
    2. Center for Statistical and Quantitative Infectious Disease, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.
    Search for more papers by this author
  • Pei F. Kuan,

    1. Department of Biostatistics, The Univerity of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
    2. Lineberger Comprehensive Cancer Center, The Univerity of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
    Search for more papers by this author
  • Michael Hudgens,

    1. Department of Biostatistics, The Univerity of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
    Search for more papers by this author
  • Stephen R. Cole

    1. Department of Epidemiology, The Univerity of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
    Search for more papers by this author

Haitao Chu. Division of Biostatistics, School of Public Health, The University of Minnesota at Twin Cities, Minneapolis, MN 55455, USA. E-mail: chux0051@umn.edu

Abstract

Please cite this paper as: Chu et al. (2011) Performance of rapid influenza H1N1 diagnostic tests: a meta-analysis. Influenza and Other Respiratory Viruses DOI: 10.1111/j.1750-2659.2011.00284.x.

Background  Following the outbreaks of 2009 pandemic H1N1 infection, rapid influenza diagnostic tests have been used to detect H1N1 infection. However, no meta-analysis has been undertaken to assess the diagnostic accuracy when this manuscript was drafted.

Methods  The literature was systematically searched to identify studies that reported the performance of rapid tests. Random effects meta-analyses were conducted to summarize the overall performance.

Results  Seventeen studies were selected with 1879 cases and 3477 non-cases. The overall sensitivity and specificity estimates of the rapid tests were 0·51 (95%CI: 0·41, 0·60) and 0·98 (95%CI: 0·94, 0·99). Studies reported heterogeneous sensitivity estimates, ranging from 0·11 to 0·88. If the prevalence was 30%, the overall positive and negative predictive values were 0·94 (95%CI: 0·85, 0·98) and 0·82 (95%CI: 0·79, 0·85). The overall specificities from different manufacturers were comparable, while there were some differences for the overall sensitivity estimates. BinaxNOW had a lower overall sensitivity of 0·39 (95%CI: 0·24, 0·57) compared with all the others (P-value <0·001), whereas QuickVue had a higher overall sensitivity of 0·57 (95%CI: 0·50, 0·63) compared with all the others (P-value = 0·005).

Conclusions  Rapid tests have high specificity but low sensitivity and thus limited usefulness.

Introduction

Real-time reverse-transcriptase polymerase chain reaction (rRT-PCR) is the most accurate method in detecting influenza A (H1N1) virus infection in respiratory specimens. However, the facilities and expertise for performing rRT-PCR are not uniformly available, and the results from rRT-PCR are generally not immediately accessible, which poses challenges in establishing a diagnosis, especially in patients presenting late in their clinical course.1 Rapid influenza diagnostic tests (henceforth, rapid tests) that detect influenza viral antigens produce quick results that can be used to screen patients with suspected influenza. Although as the 2009 pandemic H1N1 progressed, some new rapid tests were developed, the rapid tests used in the majority of studies were already in use and not developed specifically to detect H1N1. Specifically, during the beginning of the pandemic, their performance for the detection of 2009 pandemic H1N1 was not known. The lack of specific rapid and accurate diagnostics for H1N1 has been a major concern for monitoring and controlling outbreaks of 2009 pandemic influenza A (H1N1) infection. When they were developed, rapid influenza diagnostic tests were introduced as promising novel approach to detect this virus. Several commercial antigen assays, although not specifically designed for diagnosing 2009 pandemic H1N1, were quickly introduced to the market. However, rapid test performance has been less than optimal.1 Compared to rRT-PCR, several previous studies reported consistently high specificity but inconsistent estimates of sensitivity using rapid tests to detect 2009 H1N1 virus infection in upper respiratory specimens.2,3 When this manuscript was drafted, no meta-analysis of the diagnostic accuracy of rapid tests for diagnosing 2009 H1N1 had been reported, although Babin et al4 published a meta-analysis recently. Here, we use a comprehensive search strategy and meta-analytic methods to determine the accuracy of existing rapid tests for diagnosing 2009 H1N1 virus infection.

Methods

Findings are reported according to the Quality of Reporting of Meta-Analysis (QUOROM) statement5 and the Standards for Reporting of Diagnostic Accuracy (STARD) statement.6

Search strategy

The literature was systematically searched using predetermined inclusion criteria. Studies were included that reported the sensitivity and/or specificity of an influenza rapid test to detect the presence of 2009 pandemic influenza (H1N1) infection or contained sufficient information to calculate the sensitivity and specificity based on diagnosis of clinical specimens using the rRT-PCR as a gold standard reference test. No language restrictions were applied. Studies were identified eligible for inclusion by searching the databases MEDLINE (NLOM, Bethesda, MD, USA) and EMBASE (Elsevier, Amsterdam, the Netherlands) using PUBMED and OVID interfaces, respectively. Publication dates were restricted to between 1/1/2009 and 1/15/2010, inclusive. Search terms for each database included the following: “influenza diagnostic,”“influenza rapid test,”“rapid test H1N1,” and “influenza rapid”. Subsequently, the title and abstract of each potential study were screened to determine potential eligibility, which was then confirmed by a review of the full text. References from eligible studies were also examined for additional potential studies, and papers referencing eligible studies were identified using Google Scholar and considered for inclusion.

Data synthesis and meta-analysis

Data synthesis was performed according to guidelines on systematic reviews of diagnostic accuracy studies.7,8 The bivariate logit-normal random effects meta-analyses were conducted to summarize the overall sensitivity and specificity of rapid tests.9–14 Compared to fixed effects models, the random effects models typically provide conservative estimates with wider confidence intervals because it assumes that the meta-analysis includes only a sample of all possible studies. In addition, the random effects models appropriately account for the difference in study sample sizes, both within-study variability (random error) and between-study variability (heterogeneity).15,16 In general, the bivariate approach offers some advantages over separate univariate random effects meta-analysis by accounting for the correlation between sensitivity and specificity.17–19 This correlation will exist if the different studies use different test-thresholds and thus are operating at different points along the underlying receiver operating characteristic (ROC) curve for the test. However, one study reported that the differences between univariate and bivariate random effects models for summarizing pooled sensitivity and specificity are trivial based on extensive simulations.20 Thus, we utilized the univariate logit-normal random effects meta-analyses to generate forest plots (i.e., graphical display designed to illustrate the relative strength in meta-analysis of multiple quantitative scientific studies addressing the same question) with overall and rapid test-specific pooled estimates for both sensitivity and specificity. Parameters used to summarize diagnostic accuracy include the following: sensitivity and specificity directly estimated from the univariate and/or bivariate random effects models; positive and negative likelihood ratio, positive and negative predictive values, and the diagnostic odds ratio (DOR) derived from parameter estimates from the bivariate random effects models accounting for potential correlation between sensitivity and specificity estimates. In addition to reporting pooled sensitivity and specificity, which are often regarded as intrinsic properties of a diagnostic test, we also report other metrics because they are clinically more meaningful in some settings. Sensitivity is estimated by the proportion of positive tests among those with the disease of interest, whereas specificity is estimated by the proportion of negative tests among those without the disease. The positive (or negative) likelihood ratio is estimated by the ratio of the proportion of positive (or negative) tests in the diseased versus non-diseased subjects. The positive (or negative) predictive value is estimated by the proportion of subjects with a positive (or negative) test who have (or do not have) the disease. The DOR, commonly considered a global measure of test performance, is estimated by the ratio of the odds of a positive test result in diseased subjects to the odds of a positive test result in non-diseased subjects.

The Begg- and Mazumdar-adjusted rank correlation test21 and the Egger et al.22 regression asymmetry test were used to assess publication bias for sensitivity and specificity, respectively. The Cochran’s Q-test was used to detect heterogeneity.23 Location (US versus non-US) and rapid test manufacturer were included as covariates to examine their possibility as factors causing heterogeneity. Tests for small-study effects were employed only when at least four studies were available. The univariate logit-normal random effects meta-analyses were implemented in R version 2·12·1 (http://cran.r-project.org/) meta package,24,25 and the bivariate random effects models were fitted using the NLMIXED procedure in sas version 9·2 (SAS Institute, Cary, NC, USA). The summary ROC curve was plotted based on the regression line of sensitivity on the false-positive rate (1–Sp) in logit scale using the estimates from the bivariate random effects models12 rather than the line proposed by Rutter and Gatsonis.26,27

Results

We identified 2054 citations from MEDLINE and 775 citations from EMBASE, with overlap from the initial search. After screening titles and abstracts, 85 articles were eligible for full-text review. Of these, 68 articles were excluded, and 17 articles 11 on the sensitivity (specificity) of rapid influenza H1N1 diagnostic test were included, as presented in Table 1. Three studies have contributed results for multiple rapid tests28–30 producing a total of 22 sensitivity estimates and 12 specificity estimates. Specifically, six (three) studies reported sensitivity (specificity) estimates of BinaxNOW Influenza A & B2,28–32; seven (four) studies reported sensitivity (specificity) estimates of QuickVue Influenza A + B28,30,33–37; four (two) studies reported sensitivity (specificity) estimates of BD Directigen EZ Flu A + B test28,30,38,39; two (one) studies reported sensitivity (specificity) estimates of Espline Influenza A & B29,40; and one study reported sensitivity and specificity estimates of Xpect Flu A & B.41 The seven (four) studies reporting sensitivity (specificity) of BD Directigen, Espline, and Xpect were grouped together because of small numbers of studies for these tests. One study reported sensitivity and specificity estimates of either BinaxNOW Influenza A & B test or the 3M Rapid Detection Flu A + B test,42 and one study reported sensitivity estimate of either QuickVue Influenza A + B or SD Bioline Influenza Antigen test. These two studies are excluded for the analyses of pooled sensitivities and specificities of QuickVue Influenza A + B test and BinaxNOW Influenza A & B test as we cannot calculate the number of false positives, true negatives, false negatives, and true positives for either test. However, we included them for the analyses of pooled overall sensitivity and specificity of rapid tests.

Table 1.   Study details of articles that reported (or with enough information to back calculate) the number of true positives (TP), false negatives (FN), false positives (FP), and true negatives (TN) of rapid test for diagnosis of 2009 pandemic H1N1 on clinical specimens
IDFirst authorMonth/yearRapid testsPopulationSpecimen typeProspectiveTPFNFPTN
  1. *Vasoo et al. incorrectly reported specificity for rapid tests on all confirmed positive specimens.

 1Balish08/2009BinaxNOW Influenza A & BUSANasopharyngeal samplesYes1827  
   QuickVue A + B   3114  
   Directigen EZ Flu A + B   2223  
 2Blyth11/2009QuickVue A + BAustraliaSamples from nose and throatYes512  
 3Brouqui10/2009Directigen EZ Flu A + BFranceSource unspecifiedYes19122270
 4Cheng02/2010Espline Influenza A & BHong KongNasopharyngeal samplesYes3723  
 5Drexler10/2009BinaxNOW Influenza A & BGermanySamples from nose and throatNo16128  
 6Faix08/2009QuickVue A + BCA, USASource unspecifiedYes20192100
 7Fuenzalida12/2009BinaxNOW Influenza A & BSpainNasopharyngeal samplesYes1379018267
 8Ginocchio06/2009BinaxNOW Influenza A & B or 3M Rapid Detection Flu A + BNY, USANasopharyngeal samplesYes269791699
 9Karre11/2009Directigen EZ Flu A + BCO, USANasopharyngeal samplesYes39415140
10Kok01/2010QuickVue A + BAustraliaSamples from nose and throatYes93810326
11Leveque01/2010BinaxNOW Influenza A & BFranceNasal samples onlyNo91605
   Espline Influenza A & B   16905
12Likitnukul11/2009QuickVue A + B or SD Bioline Influenza AntigenThailandNasal samples onlyYes37653  
13Sabetta11/2009Xpect Flu A & BCT, USANasopharyngeal samplesYes2326212
14Sandora03/2010BinaxNOW Influenza A & BMA, USANasopharyngeal samplesYes124841332
15Suntarattiwong04/2010QuickVue A + BThailandSamples from nose and throatYes89532234
16Vasoo*10/2009BinaxNOW Influenza A & BIL, USANasopharyngeal samplesNo2337  
   QuickVue A + B   3228  
   Directigen EZ Flu A + B   2832  
17Watcharananan01/2010QuickVue A + BThailandNasopharyngeal samplesYes16101041

The average sample size of the included seventeen studies was 315 (range 17–1831), with a total of 1879 cases and 3477 non-cases confirmed by rRT-PCR. The majority (82% = 14 of 17) of the studies were prospective. The overall sensitivity and specificity estimates were 0·51 (95% CI: 0·41, 0·60; range 0·11–0·88) and 0·99 (95% CI: 0·94, 0·99; range 0·80–1·00) from the univariate random effects models. Figures 1 and 2 show the diagnostic accuracy measures from all the studies, stratified by the rapid test manufacturer using the bivariate random effects models. Based on the Q statistics, both the sensitivity and specificity showed highly significant between-study heterogeneity in the summary results (P-value <0·001).

Figure 1.

 Forest plot of sensitivity estimates and 95% confidence intervals (CI). Point estimates of sensitivity from each study are shown as solid squares. Solid lines represent the 95% CIs. Squares are proportional to weights based on the random effects model. The pooled estimate and 95% CI is denoted by the diamond at the bottom. Se, sensitivity; TP, true positives; FN, false negatives.

Figure 2.

 Forest plot of specificity estimates and 95% confidence intervals (CI). Point estimates of sensitivity from each study are shown as solid squares. Solid lines represent the 95% CIs. Squares are proportional to weights based on the random effects model. The pooled estimate and 95% CI is denoted by the diamond at the bottom. Sp, specificity; TN, true negatives; FP, false positives.

Specificity appeared to be more consistent than sensitivity from different manufacturers. The overall specificities from different manufacturers were comparable as seen in Figure 2. However, there were some differences for the overall sensitivity estimates from different manufacturers. BinaxNOW had a lower overall sensitivity (0·39 with 95%CI: 0·24, 0·57) compared with all the others (P-value <0·001), whereas QuickVue had a higher overall sensitivity (0·57 with 95%CI: 0·50, 0·63) compared with all the others (P-value = 0·005) from the bivariate random effects model.

Begg’s adjusted rank correlation test (P-value = 0·40 and 0·53) showed no evidence of publication bias for both sensitivity and specificity, whereas the Egger’s regression asymmetry test (P-value = 0·07 and 0·06) suggested that some publication bias may exist for both sensitivity and specificity. Because we had a total of 22 sensitivity estimates but only had 12 specificity estimates, we did not consider the modified Begg- and Mazumdar-adjusted rank correlation test and the modified Egger et al. regression asymmetry test to detect the publication bias in log DOR scale, which has been shown to perform slightly better by simulations when equal sensitivity and specificity estimates are available.43

Based on the bivariate logit-normal random effects models, the correlation between sensitivity and specificity was only 0·32 (95%CI −0·64, 0·89) on the logit scale, suggesting no evidence of strong correlation. The overall positive likelihood ratio was 34·5 (95% CI: 12·7, 93·6), and the overall negative likelihood ratio was 0·48 (95%CI: 0·39, 0·60). The DOR was 71·6 (95%CI: 26·3, 194·6). Study location (US versus non-US) was not associated with sensitivity and specificity (P-value = 0·41 and 0·86, respectively). Sampling type (Nasopharyngeal samples versus the other) was not associated with sensitivity (P-value = 0·95), but associated with specificity (P-value = 0·03). Nasopharyngeal samples have a specificity of 0·97 (95%CI: 0·90, 0·99), and the other samples have a specificity of 1·00 (95%CI: 0·98, 1·00).

Figure 3A presents the 95% confidence region of the summary point, the 95% prediction region and the summary receiver operating characteristic curve.44 The area under the curve was 0·68 (95%CI: 0·20, 0·92). Figure 3B shows the estimated positive and negative predictive values with their point-wise 95% confidence intervals based on the overall estimates of sensitivity and specificity. For example, when the prevalence was 30%, the estimated overall positive and negative predictive values were 0·94 (95%CI: 0·85, 0·98) and 0·82 (95%CI: 0·79, 0·85), suggesting limited usefulness.

Figure 3.

 The summary receiver operating characteristic plot (panel A) and the overall PPV and NPV plot (panel B) based on the bivariate random effects model. In panel A, each open circle represents a study in the meta-analysis with both sensitivity and specificity estimates of a rapid test; dotted lines represent those studies with only a sensitivity estimate of a rapid test; the solid circle represents the overall summary point; blue colored solid and dashed contour curves denote the boundaries of the 95% confidence region of the summary point and the 95% prediction region, respectively; black solid lines represent the summary receiver operating characteristic curve. In panel B, solid and dashed lines denote the estimate and 95% confidence interval; PPV, positive predictive value; NPV, negative predictive value.

Discussion

An extensive literature search indentified 17 articles that reported rapid test results from clinical specimens. Meta-analysis results showed that the specificity estimates for existing commercial rapid tests are high and relatively consistent ranging from 0·80 to 1·00. However, the sensitivity is low and highly variable ranging from 0·11 to 0·88. A lack of sensitivity is of particular concern in the present setting. Rapid tests are useful as a screening device to the extent that they identify possible cases. Therefore, high sensitivity is essential.

Rapid tests with improved performance are needed. Alternatively, testing strategies that employ multiple rapid tests may improve sensitivity. For example, use of two different rapid tests on sequential biologic samples of the same individual may provide partially independent information. If an individual is defined as positive when at least one of the rapid tests is positive, the upper bound on improved sensitivity is the complement of the probability that both tests yield false-negative results. Using the overall sensitivity estimates from QuickVue and other manufactures, this would yield an possibly acceptable sensitivity of 0·80 = 1−(1−0·57) × (1−0·53) if the tests work independently. However, this strategy would double the cost of testing and would also require the collection of a second sample, delaying time to results.

In conclusion, real-time reverse-transcriptase polymerase chain reaction remains the most accurate method for detecting 2009 pandemic influenza A (H1N1) virus infection. Because rRT-PCR results are not immediately accessible, and a laboratory with the necessary equipment and required skill level to avoid common technical errors that may occur with rRT-PCR may not be available, rapid procedures with adequate diagnostic test characteristics are needed, and existing rapid tests are inadequate. Alternative solutions to address poor test sensitivity are needed.

Acknowledgements

The authors would like to thank Drs. Dennis Faix, Thomas Sandora, and Alex McAdam for their contribution of additional data, as well as Drs. Hugo Lopez-Gatell, Guido Schwarzer, and Loic Desquilbet for expert advice.

Funding statement

Dr. Haitao Chu was supported in part by the U.S. Department of Health and Human Services Agency for Healthcare Research and Quality Grant R03HS020666 and P01CA142538 from the U.S. National Cancer Institute.

Financial disclosures

None reported.

Conflicts of interests

None reported.

Ancillary