Examining Heterogeneity in the Diagnostic Accuracy of Culture and PCR for Salmonella spp. in Swine: A Systematic Review/Meta-Regression Approach

Authors

  • W. Wilkins,

    1.  Department of Large Animal Clinical Sciences, Western College of Veterinary Medicine, University of Saskatchewan, SK, Canada
    Search for more papers by this author
  • A. Rajić,

    1.  Policy Advice and Effectiveness Program, Laboratory for Foodborne Zoonoses, Public Health Agency of Canada, Guelph, ON, Canada
    2.  Department of Population Medicine, Ontario Veterinary College, University of Guelph, ON, Canada
    Search for more papers by this author
  • S. Parker,

    1.  Department of Large Animal Clinical Sciences, Western College of Veterinary Medicine, University of Saskatchewan, SK, Canada
    Search for more papers by this author
  • L. Waddell,

    1.  Policy Advice and Effectiveness Program, Laboratory for Foodborne Zoonoses, Public Health Agency of Canada, Guelph, ON, Canada
    2.  Department of Population Medicine, Ontario Veterinary College, University of Guelph, ON, Canada
    Search for more papers by this author
  • J. Sanchez,

    1.  Centre for Veterinary Epidemiological Research, Atlantic Veterinary College, Charlottetown, PEI, Canada
    2.  Canadian Food Inspection Agency, Charlottetown, PEI, Canada
    Search for more papers by this author
  • J. Sargeant,

    1.  Department of Population Medicine, Ontario Veterinary College, University of Guelph, ON, Canada
    2.  Centre for Public Health and Zoonoses, Ontario Veterinary College, University of Guelph, ON, Canada
    Search for more papers by this author
  • C. Waldner

    1.  Department of Large Animal Clinical Sciences, Western College of Veterinary Medicine, University of Saskatchewan, SK, Canada
    Search for more papers by this author

Wendy Wilkins. Department of Large Animal Clinical Sciences, Western College of Veterinary Medicine, University of Saskatchewan, 52 Campus Drive, Saskatoon, Saskatchewan S7N 5B4, Canada. E-mail: wendy.wilkins@usask.ca

Summary

The accuracy of bacterial culture and PCR for Salmonella in swine was examined through systematic review of existing primary research in this field. A replicable search was conducted in 10 electronic databases. All steps of the review were conducted by two reviewers: to identify relevant publications, to assess their methodological soundness and reporting, and to extract raw data or reported test accuracy estimates. Meta-analyses and meta-regression were performed: to evaluate pooled estimates of test sensitivity (Se) and specificity (Sp), to identify variables explaining the variation in reported test estimates, and to evaluate the association between these variables and reported test Se and Sp. Twenty-nine studies were included in the review. Unique test evaluations reported in these 29 studies were categorized according to the type of test comparison: culture versus culture (n = 134 test evaluations) and PCR versus culture (n = 21). We identified significant heterogeneity among evaluations for each test category. For culture, more heterogeneity was caused by differences in individual test protocols (52%) than overall differences between studies (16%). Enrichment temperature, study population, agar and enrichment type were significantly associated with variation in culture Se. Furthermore, interaction between enrichment temperature and enrichment type was detected. For PCR, most of the heterogeneity was caused by overall differences between studies (65–70%); sample type and study size were associated with variation in reported PCR Se and Sp. The overall methodological soundness and/or reporting of primary studies included in this review were poor, with variable use of reference standards, and consistent lack of the use or reporting of blinding, randomization and subject (sample) selection criteria. Consequently, the food safety and veterinary public health research community should formally consider ways for standardizing the conduct and reporting of this type of research.

Impacts

  • • Significant heterogeneity (< 0.001) in reported culture and PCR sensitivity and PCR specificity was detected, primarily because of differences in both index and reference test protocols used in individual studies.
  • • The use of varied reference standards makes direct comparison between studies difficult; researchers should identify a valid and reliable reference standard that can be applied universally to evaluate the accuracy of new diagnostic test protocols.
  • • Significant improvement in the design and reporting of diagnostic accuracy studies for Salmonella in pigs are necessary.

Introduction

The evaluation of laboratory tests for Salmonella in pigs or pork has resulted in vast amount of primary research over the past 15 years reporting conflicting findings and recommendations. In a diagnostic accuracy study, the test under evaluation, otherwise known as the index test, is applied to a set of subjects or samples. Ideally, the results are compared with a ‘gold standard’, a test with perfect sensitivity (Se) and specificity (Sp), which is applied to the same set of samples. In reality, a gold standard test is difficult to identify, and this is particularly true for the tests utilized to evaluate Salmonella in pigs. Instead, index tests are compared with existing imperfect tests often referred to as a ‘reference standard’. For this reason, the Se and Sp of the index test are reported as the ‘apparent’ Se and Sp. Measures of agreement between tests, such as Cohen’s Kappa statistic and correlation between tests are also frequently reported.

Most Salmonella infections in pigs are sub-clinical, which presents challenges in the interpretation of the Salmonella status of a pig or herd. Although bacterial culture remains the customary test for detecting current Salmonella infection and/or shedding in pigs, there has been increased interest in rapid tests such as polymerase chain reaction (PCR) tests for detecting Salmonella in pigs both on farm and at slaughter. Bacterial culture is frequently the reference standard against which other tests or different culture protocols are compared; although this type of test is highly specific, the Se is highly variable and the test is prone to false-negative results (Funk, 2003). Possible reasons for variability are between study differences in test protocols, reference standard, sample matrices or sample population, among others. A transparent and replicable systematic review (SR) and meta-analysis (MA) of studies reporting the accuracy of culture or PCR could identify factors contributing to heterogeneity in estimates of test accuracy and identify gaps in existing research and future research needs.

The objective of the study was to identify, critically appraise and summarize scientific literature reporting the accuracy of bacterial culture and PCR for Salmonella in pigs under field conditions, using SR and MA methodology. Meta-regression (MR) was used to explore the study and test characteristics that could explain the variation in reported Se and Sp among studies and to quantify the association between these variables and those estimates. In this study, the term ‘diagnostic test’ is used in its general context and does not differentiate between screening (detection in asymptomatic subjects) and diagnostic (confirmation in symptomatic subjects) tests (Greiner and Gardner, 2000). This review was part of a larger review examining the accuracy of a variety of diagnostic tests for detecting Salmonella in pigs.

Review Approach

Literature search

Keyword search combinations were developed using a guide for conducting SRs on diagnostic questions (Deville et al., 2002), and can be obtained from the primary author on request. Literature searches were performed in ten electronic databases accessed through the University of Saskatchewan Library Server. These were Agricola, CAB Abstracts, MEDLINE (PubMed interface), BIOSYS Previews, Web of Science, Food Science and Technology Abstracts (FSTA), CISTI, Scopus, Dissertation and Theses (ProQuest) and Theses Canada Portal. The initial and two up-dated searches were conducted in June 2006, June 2007 and November 2009, respectively, and limited to publications from January 1980 to November 2009. No language or study design restrictions were imposed at this stage. Conference proceedings from the International Symposium on the Epidemiology and Control of Salmonella and Other Foodborne Pathogens in Pork (2001–2009) were scanned by two independent reviewers to identify potentially relevant abstracts. Web pages from the Inventory of Canadian Agri-Food Research (http://www.icar-irac.ca/client/qt_e.aspx) and the National Pork Board (Pork Checkoff) (http://www.pork.org/PorkScience) were searched for reports on current or otherwise unpublished research. Reference lists of five review articles identified in the initial search (Gebreyes, 2003; Harris, 2003; Capita et al., 2004; Malorny and Hoorfar, 2005) were checked for articles potentially missed through electronic searches. Theses or proceeding abstracts were manually checked and if duplicates were identified, peer-reviewed journal publications were given preference. The final reference list was then forwarded to six topic-experts for evaluation of potential missing references.

All citations were entered into a bibliographic-management software program (endnote v7.0; Thomson Scientific, Carlsbad, CA, USA) and duplicates were removed automatically and manually by the primary investigator (WW). The final reference list was uploaded into a web-based SR data-management program (srs 3; TrialStat Corporation, Ottawa, ON, USA).

Relevance screening and exclusion criteria

The relevance screening form was developed a priori and pre-tested on 20 abstracts among all five reviewers (kappa agreement of >0.8). Relevance screening was conducted according to the larger review objectives; therefore, to be relevant, an abstract had to report: (i) the evaluation of bacterial culture or PCR or serology for Salmonella in pig faeces, blood, tissue, carcass swabs or pork ‘meat juice’ or the use of these tests simultaneously to the same sample population; and, (ii) primary research (including conference proceedings and theses) published in English, as resources were not available for translation. The following studies were excluded at this stage: challenge trials evaluating the use of one test in a population of artificially infected pigs; studies evaluating tests in artificially contaminated samples; studies that only examined differences between sample type or size: or, studies that reported pooled test results from different sample/tissue types (e.g. faecal results pooled with tissue results).

Primary quality assessment

Quality assessment (QA) questions were developed a priori, using a guide developed specifically for QA of diagnostic test research (Whiting et al., 2003) with some modifications caused by the differences between agri-food public health and human health research (Sargeant et al., 2006), and pre-tested on the full texts of 20 relevant articles. The primary QA form had four questions that a study had to meet to be included in the review. These were: (i) estimates of the accuracy of bacterial culture (as compared with other culture) or PCR (as compared with culture), or raw data for post hoc calculation of test accuracy, must be reported, (ii) each test protocol had to be reported sufficiently to allow appropriate categorization of the test, (iii) samples had to be stored appropriately (refrigerated or held on ice for a maximum of 6 days) after collection and (iv) the time period between tests had to be short enough to be reasonably sure that the sample’s Salmonella status did not change between the two tests. Specific responses resulting in inclusion or exclusion are outlined in Table 1.

Table 1.   Questions included in the quality assessment of studies (1980–2009) evaluating diagnostic tests for detecting Salmonella infection in pigs
Primary quality assessment (inclusion/exclusion criteria)ResponseConsequence
  1. aThis question was applicable only to references not excluded by Q1 (n = 47).

  2. bThis question was applicable only to references not excluded by Q2 (n = 29).

  3. cNot reported.

  4. dApplies only to references not excluded during Primary QA (n = 29).

Q1. Were the estimates of test accuracy (culture or PCR, as compared with culture) reported, or is a sufficient amount of raw data presented for post hoc analysis?Yes47/156Included
No109/156Excluded
Q2. Was the test protocol(s) described in sufficient detail to permit replication of the test?Yes29/47aIncluded
No18/47aExcluded
Q3. If two or more tests are being used/compared, is the time period between tests short enough to be reasonably sure that the subject’s Salmonella status did not change between the two tests? OR For challenge trials: was the time from challenge administration to measurement of outcome sufficient to have the outcome of interest?Yes27/29bIncluded
No0/29bExcluded
NRc2/29bIncluded
Q4. Were samples stored appropriately AND processed/tested within a reasonable period of time after collection?Yes17/29bIncluded
No0/29bExcluded
NR12/29bIncluded
Secondary quality assessmentd
Q1. Were criteria for selecting the sample population clearly described?Yes4/29 
No/NR 25/29
Q2. Were the same test(s) applied to ALL samples or a random selection of the samples?Yes29/29 
No/NR 0/29
Q3. Were the same tests applied to all subjects and all samples, regardless of the result of one or more tests?Yes29/29 
No/NR 0/29
Q4. Were the tests independent of each other (one test did not form part of another test being used)?Yes12/29 
No/NR 17/29
Q5. Were test results interpreted without knowledge of the results of the other test(s) (was blinding reported)?Yes0/29 
No/NR 29/29
Q6. Were uninterpretable/intermediate test results reported?Yes29/29 
No/NR 0/29
Q7. Were withdrawals or losses (subjects AND samples) from the study, if any, explained?Yes29/29 
No/NR 0/29
Q8. Was the study population randomly selected?Yes4/29 
No/NR 25/29

Secondary QA and data extraction

The eight additional QA questions were used to assess the study design soundness and reporting (secondary QA); none of these were used for exclusion purposes (Table 1). This step was implemented in conjunction with data extraction. Data extraction for each study included general study information (e.g. publication type), tests used and details of the protocols and the population tested, details regarding specific test comparisons, such as type of sample tested and raw data and/or reported estimates of test accuracy. Wherever possible, raw data were captured in a two-by-two contingency table format to facilitate post hoc analysis. The following measures of test accuracy were extracted: Se, Sp, correlation, kappa statistic or per cent agreement, along with associated P-values, if available. The data were entered into a spreadsheet by the primary investigator. A separate record was created for each unique test evaluation. A test evaluation was considered unique if different test protocols (within the same test category) were used or if results were available for different populations or time periods. Each record was then verified by a second reviewer and all disagreements were resolved by consensus.

Data analysis

Meta-analysis

Test Se and Sp, as the most recognizable measure of diagnostic test accuracy, were the focus of our analytical approach. For the purposes of comparison, tests applied within each study were designated as the index test (test being evaluated) or as the reference test (test assumed to be the ‘gold standard’). Se and Sp were calculated where raw data were available; otherwise, reported estimates were used. The apparent Se of culture (as compared with culture) was calculated using all positive results from both the index test and the reference test or tests; culture Sp was assumed to be 100%. For PCR, the apparent Se and Sp were calculated relative to the results of the reference standard.

All estimates were first logit-transformed and the standard errors of the logit estimate were then computed as follow:

image(1)

and

image(2)

where n is the sample size and p is test Se or Sp. All analyses were performed using stata/se v9.2 (StataCorp LP, College Station, TX, USA). For both culture and PCR, the META command was used to generate summary (pooled) estimates of the logit of Se and logit of Sp and to evaluate heterogeneity between studies, using both fixed-effects (Mantel-Haenszel) and random-effects (DerSimonian and Laird) analysis. Heterogeneity of the logit Se and logit Sp was considered significant if the associated Q-statistic was found to be significant. Because the statistical power of the heterogeneity test is typically low, a more liberal criteria of < 0.10 was used rather than the standard < 0.05 (Song et al., 2001). Sub-group analysis was performed for faecal and non-faecal sample matrices and also for single-branch (one selective enrichment, one agar) and multi-branch (>1 selective enrichment and/or >1 agar, in parallel) reference tests. For PCR only, the METANDI command was used to model the hierarchical summary receiver operating characteristic (HSROC) and to evaluate the correlation between the logit Se and logit Sp; outliers and influential observations were identified as previously described (Harbord and Whiting, 2008).

Meta-regression

When significant heterogeneity was detected, pooled estimates were not reported and potential sources of heterogeneity were examined via MR. The test and study characteristic variables that were selected for the MR are shown in Tables 2 and 3. The variables were selected for biological relevance, completeness of records (no missing data) and non-uniform responses. Random-effects MR models for culture Se and PCR Se and Sp were evaluated to identify sources of heterogeneity in reported Se and Sp among test evaluations and to evaluate the association between evaluation- and study-level predictors and those estimates (outcomes). These models were specified according to:

image(3)

where β0 represents the intercept or the overall mean if no other predictor was included in the model, βk represents the coefficient for the kth predictor, vj represents the effect of study j, μi represents the effect of evaluation i, and εij represents the sampling error for evaluation i within study j. The variance of μi and vj2 and σ2v) represent the variation between test evaluations and between studies, respectively, and were estimated using a restricted maximum-likelihood (REML) algorithm. The sampling variance of the test evaluations, σ2, was determined from the within-evaluation variation and sample size (Hox and de Leeuw, 2003). The proportion of the total study variance that was caused by variation between test evaluations (ρ) and between studies (ρ) was computed as:

image(4)

and

image(5)

where inline image and inline image represent the variation between evaluation and between studies, respectively, and were estimated from the null model and inline image was the sampling variance. If inline image or inline image was found to be small (≤1% of total variance), then μi or vj was dropped from the regression model. Where ρ and ρ exceeded 25%, test and study characteristic variables were examined to determine how much of this variance was accounted for by each predictor (Hox and de Leeuw, 2003). The change in variance was computed as:

image(6)
Table 2.   Test and study characteristic variables examined in a meta-regression analysis of the diagnostic sensitivity of culture (as compared with culture) of Salmonella in pigs
VariableDescriptionCategories
  1. BPW, buffered peptone water; TT, tetrathionate broth; RV, Rappaport-Vassiliadis broth; MSRV, modified semisolid Rappaport Vassiliades agar; SE, Selenite broth; BG, brilliant green agar; XLD, xylose lysine deoxycholate agar; XLT4, xylose lysine tergitol4 agar.

Test and study variables
Pre-enrichment type (index)Type of pre-enrichment used in index test protocolBPW; TT; other; none
Enrichment type (index)Type of enrichment used in index test protocolRV; TT; MSRV; SE; other; two or more enrichments in parallel
Agar type (index)Type of agar used in index test protocolBG; XLD; XLT4; other; two or more agars in parallel
Pre-enrichment type (reference)Type of pre-enrichment used in reference test protocolBPW; TT; other; none
Enrichment type (reference)Type of enrichment used in reference test protocolRV; TT; MSRV; SE; other; two or more enrichments used in parallel
Agar type (reference)Type of agar used in reference test protocolBG; XLD; XLT4; other; two or more agars used in parallel
Enrichment incubation temperature (index)Temperature of incubation for enrichment step37°C; 42 ± 1°C
Sample typeWhat type of sample was cultured?Faeces; lymph tissue
Study populationWhere were pigs sampled?On-farm; at slaughter
Sampling levelWhat was the unit sampled?Individual pig; pen floor
Reference typePublication typeJournal article; conference proceeding
NNo. individuals or pens sampled 
Study quality variables
Inclusion criteriaPaper describes inclusion criteriaYes; no
Independent testsIndex and reference tests were independent of each other (one test did not form part of the other)Yes; no
Appropriate storage and timely processingPaper describes samples stored appropriately and processed/tested within a reasonable period of time after collectionYes; not reported
Random selectionPaper describes random selection of sampling unitYes; no/not reported
Table 3.   Test and study characteristic variables examined in a meta-regression analysis of the diagnostic accuracy of PCR (as compared with culture) for detection of Salmonella in pigs
VariableDescriptionCategories
  1. aOther quality variables not examined as a result of uniform response.

Test and study variables
PCR typeGeneral description of PCR classPCR; real-time PCR
Study sizeNo. samples tested≤120; ≥200
Study populationWhere were pigs sampled?On-farm; at slaughter
Sample typeWhat type of sample was cultured?Faeces; lymph tissue; carcass swab
Study quality variablesa
Appropriate storage and timely processingPaper describes samples stored appropriately and processed/tested within a reasonable period of time after collectionYes; not reported

The unconditional associations between each predictor variable and test Se or Sp were first evaluated in univariable regression models. All variables with an unconditional P-value of less than 0.20 were evaluated for inclusion in a multivariable model using a manual forward-stepwise process. Variables with ≤ 0.05 were considered statistically significant. Biologically reasonable first-order interaction terms were examined where more than one significant risk factor was identified in the final main-effects model. Statistically significant interaction terms were included in the final model.

Results

Identification and description of relevant studies

Twenty-nine studies were included in this review; 27 were journal articles and two were conference proceedings. Their primary purpose were test evaluation (n = 27); prevalence assessment (1) and pathogenesis (1). Two, three and 24 studies were published in the 1980s, 1990s and from 2000 to 2007, respectively. The results of each SR step are outlined in Fig. 1. No additional relevant references were identified by the six consulted experts, website searches or by checking the reference lists of the five most recent review articles.

Figure 1.

 Steps in conducting the systematic review.

Nineteen studies (publications) reported culture versus culture test results, and ten reported PCR versus culture results. The number of test comparisons (unique test evaluations) ranged from two to 21 per study. For culture, Se and kappa agreement were reported for 43% and 22% of the test evaluations, respectively, whereas raw data were also extractable in 58% of the test evaluations. For PCR, Se and Sp were reported for 24%, kappa agreement for 32%; and raw data were available for 100% of the test evaluations. Additional characteristics of the included references are outlined in Tables 4 and 5. A list with the full citation of each included reference can be obtained from the author on request.

Table 4.   Characteristics of 18 references reporting sensitivity (Se) and kappa statistics for bacterial culture (as compared with culture) used to detect Salmonella in pigs, identified in a systematic review and examined in meta-regression analysis
Author, yearCountryRef. testnSample typePairwise results of Se and Kappa from ≥1 sub-studies
Se (%)Κa
  1. aKappa statistic (note: empty cells in table indicate that neither the estimate nor raw data were reported for that observation).

  2. bAnalysis performed using tags software.

  3. Single, single or serial pre-enrichment/enrichment steps; Multi, multiple and parallel pre-enrichment/enrichment steps; i, individual level; h, herd-level; p, pooled or pen-level; f, samples collected on-farm; a, samples collected at abattoir; J, journal article; P, proceedings; C, post hoc calculated value; R, reported value; na, cannot be calculated due to zero values in 2 × 2 table; LN, lymph nodes.

Bager and Petersen, 1991, JDenmarkMulti373Faeces, i, f30C; 51C; 88C; 3C; 48C; 84C 
Botteldoorn et al., 2003, JBelgiumMulti345Faeces, i, a74R; 89R 
Multi57Faeces, i, a46R
Casey et al., 2004, JIrelandSingle15Faeces, i, a60C; 100Cna
Champagne et al., 2005, JCanadaMulti310Faeces, i, a52R; 84R; 66R; 79R; 86R0.71R; 0.63R; 0.70R; 0.63R; 0.71R; 0.63R; 0.84R; 0.77R; 0.73R; 0.85R; 0.70R; 0.63R; 0.84R; 0.77R; 0.73R; 0.85R
Single310Faeces, i, a73C; 60C; 65C; 59C; 90C; 96C; 81C; 89C; 72C; 90C; 98C; 97C; 98C; 84C; 95C; 92C
Cherrington and Huis, 1993, JThe NetherlandsSingle100Faeces, i, f90C;90C; 87C; 100C; 93C; 90C; 82C; 96C; 72C; 50C; 71C; 100C; 67C; 67C; 61C; 95C; 100C; 31C; 29C; 53C; 40C0.87C; 0.49C; 0.61C; 0.40C; 0.87C; 0.47C; 0.48C; 0.32C; 0.49C; 0.47C; 0.76C; 0.66C; 0.61C; 0.48C; 0.76C; 0.52C; 0.40C; 0.32C; 0.66C; 0.52C
Davies et al., 2000; JUSASingle136Faeces, i, f37C; 84C0.30C; 0.30C
Erdman and Harris, 2003; JUSASingle51Faeces, i, f100R; 91C0.94R; 0.94R
Single118Faeces, p, f100R; 78C0.86R; 0.86R
Harvey et al., 2001, JUSASingle557LN, i, a74C; 64C0.39C; 0.39C
Single644Faeces, i, a82C; 40C0.25C; 0.25C
Hoorfar and Baggesen, 1998, JDenmarkMulti100Faeces, p, f77R; 72R; 45R; 71R 
Single100Faeces, p, f84R; 87R; 85R; 78R; 51; 780.47C; 0.20C; 0.50C; 0.47C; 0.20C; 0.50C
Hoorfar and Visby Mortensen, 2000, JDenmarkMulti183Faeces, p, f86R; 80R 
Multi190Faeces, p, f57R; 64R; 32R 
Multi100Faeces, p, f72R; 77R 
Single183Faeces, p, f 0.76C
Single190Faeces, p, f 0.45C
Jensen et al., 2003, JDenmarkSingle396Faeces, p, f86C; 79C0.77C; 0.77C
Korsak et al., 2004, JBelgiumtagsb78Faeces, i, a50R; 65R0.45R; 0.53R
Love and Rostagno, 2008, JUSAMulti46Faeces, i, f6R; 91R; 48R; 0R; 80R 
Michael et al., 2003, JBrazilMulti126Faeces, i, f36R; 71R; 32R; 71R; 92R; 86R; 81R; 86R; 40R; 43R; 17R; 29R 
Nollet et al., 2001, PBelgiumMulti75Faeces, i, a
LN, i, a
59R; 88R; 98R; 97R; 70R; 99R; 98R; 96R; 100R; 93R
89R; 98R; 98R; 98R; 86R; 98R; 98R; 89R; 99R; 89R
0.49R; 0.61R; 0.94R; 0.90R
0.91R; 0.88R; 0.91R; 0.91R
Osumi et al., 2003, JJapanMulti348Faeces, i, f98C; 47C0.97C; 0.49C
Rostagno et al., 2005, JUSAMulti100Faeces, p, a82R; 94R; 95R; 78RNa
Vassiliadis et al., 1981, JGreeceMulti100Faeces, i, a38C; 49C; 56C; 67C; 95C0.43C; 0.54C; 0.61C; 0.71C; 0.96C
Vassiliadis et al., 1987, JGreeceMulti117Faeces, i, a83C; 72C; 49C0.89C; 0.74C; 0.51C
Table 5.   Characteristics of 10 references reporting sensitivity (Se), specificity (Sp) or kappa statistic for PCR (as compared with culture) used to detect Salmonella in pigs, identified in a systematic review and examined in meta-regression analysis
Author, yearCountrynSample typeSe (%)Sp (%)κa
  1. aKappa statistic.

  2. i, individual level; f, samples collected on-farm; a, samples collected at abattoir; L, laboratory based study; J, journal article; P, proceedings; C, post hoc calculated value; R, reported value.

Bohaychuk et al., 2007; JCanada287Faeces, i, f99R97R0.96R
249Carcass swab, i, a100R98R0.95R
337Faeces, i, a100R82R0.94R
Feder et al., 2001; JUSA92Faeces, i, f77R96R0.71R
34Faeces, i, f55R39R0.06R
Lofstrum et al., 2009, JDenmark120Carcass swab, i, a95R99R0.97R
Mainar-Jaime et al., 2008; JCanada203Faeces, i, a93R88R0.48C
Oliveira et al., 2006, JBrazil90Faeces77C98C0.78C
48LN, i, L94C90C0.83C
45Faeces, i, L71C86C0.57C
12Tonsil, i, L100C100C1C
12Ileum, i, L100C100C1C
12Faeces, i, L100C100C1C
108Faeces, i, L83C92C0.75C
54Faeces, i, L88C86C0.71C
Sibley et al., 2003; JCanada67Faeces, i, f95R93R0.94R
Uyttendaele et al., 2003, JBelgium11Carcass swab, i, a100C78C0.56C
Wilkins et al., 2009, PCanada293Faeces, i, f95C98C0.92C
Wu et al., 2003, JTaiwan230Carcass swab, i, a100C97C0.72R
Yeh et al., 2002a, JTaiwan50iCarcass swab, i, a100C100C1C

Methodological soundness of included studies

Both primary and secondary QA items and the number of studies meeting each item are outlined in Table 1. The reasons for exclusion were: test accuracy estimates were inadequately reported or raw data were not sufficient for post hoc analysis (94 references) or a lack of the test protocol description (18 references). Of the 44 remaining references, 15 references reported the evaluation of tests other than culture or PCR and were not examined in the current review.

Se of culture (index) as compared with culture (reference)

One hundred thirty-four evaluations of the Se of a bacterial culture protocol (Sec), as compared with another culture protocol, were extracted from 19 references and are summarized in Table 6. The number of samples tested (n) ranged from 15 to 644 (mean 171, median 100). Confidence intervals (CIs) or standard errors were reported for only five test evaluations. Culture protocols varied considerably in type and weight of sample tested, enrichment and culture media, incubation temperatures and reference standard (Tables 2 and 4). MA of all test evaluations combined, as well as separately for protocols using either faecal samples or lymph nodes and single- or multi-branched reference standards resulted in significant heterogeneity (< 0.001). For this reason, pooled estimates of Sec are not reported; rather, MR was applied to explore sources of heterogeneity.

Table 6.   Sensitivity and specificity of culture and PCR examined in a systematic review of the diagnostic accuracy of these tests for Salmonella in pigs
IndexReferenceSensitivitySpecificity
naMean (%)SD (%)Min (%)Max (%)nMean (%)SD (%)Min (%)Max (%)
  1. aNo. evaluations (test comparisons) in each category.

  2. bCulture specificity assumed to be 100%, therefore parameters not reported.

CultureCulture          
Overall 13474230100nab    
(faeces) 12272230100na    
(lymph tissue) 1290116499na    
PCRCulture          
Overall 2191125510021911439100
(faeces) 138714551001389163999
(carcass swabs) 598295100 94978100
(lymph tissue) 398394100 1000100100

For Sec, the proportion of total variance caused by the variance between test evaluations (Secρ) and between studies (Secρ) was 52% and 16%, respectively (Eqs. 4 and 5). The type of enrichment used in both the index and reference protocols accounted for the largest proportion of Secρ, explaining 29% and 22%, respectively. Agar type (reference), reference type, agar type (index) and enrichment type (index) each explained between 46% and 89% of Secρ. Random selection of sampling units was the only study design methodological soundness and/or reporting variable examined that was significantly associated with apparent Sec (< 0.05), and accounted for 8% and 74% of Secρ and Secρ, respectively.

Enrichment temperature, study population, agar type (index) and enrichment type (index and reference) were associated with Sec in the final model (Table 7). There was a significant interaction between enrichment temperature and enrichment types (index). This model explained 52% of Secρ and 100% of Secρ.

Table 7.   Final multi-variable regression model: coefficients, P-values and impact of predictors associated with the sensitivity of culture (as compared with other culture) used to detect Salmonella in pigs
VariablenCoefficientP-valueOverall P-valueSensitivity (%)a
  1. aValue obtained by adding variable coefficient to intercept coefficient, which is then back-transformed by: Se = 1/[1 + exp − (coefficient)].

  2. bIncludes coefficients for interaction term as well as the related main effects.

  3. cMultiple enrichments incubated at both 37°C and 42°C, therefore cannot assess temperature–enrichment interaction.

Enrichment incubation (°C)
3734Reference0.02 54.9
42 ± 11000.7169.1
Study population
On-farm70Reference<0.01 54.9
Slaughter640.7069.0
Agar type (index)
BG52Reference <0.00154.9
XLD250.490.3264.4
XLT418−0.100.7050.0
Other5−1.37<0.0121.9
Multiple340.320.2260.4
Enrichment type (index)
RV44Reference <0.00154.9
TT25−0.880.0731.4
MSRV250.900.2073.2
SE90.010.9952.7
Other130.460.2463.6
Multiple181.24<0.0179.3
Enrichment type (reference)
RV21Reference <0.00154.9
TT70.070.8854.1
MSRV170.750.0370.1
SE51.92<0.0188.3
Other70.570.1566.2
Multiple77−0.130.6449.2
Interaction: Enr (°C) × Enr (index)
Enr (°C) × RV60Reference 0.005 
Enr (°C) × TT211.210.0360.5b
Enr (°C) × MSRV23−0.850.2653.8b
Enr (°C) × SE7−1.860.0319.1b
Enr (°C) × Other50.030.6857.4b
Enr (°C) × multiple18nac  
Intercept 0.100.7554.9

Se and Sp of PCR (index) as compared with culture (reference)

Twenty-one evaluations of the Se or Sp of a PCR protocol (Sep, Spp), as compared with a bacterial culture protocol, were extracted from 10 references (Table 5). The number of samples tested (n) ranged from 11 to 337 (mean 126, median 90). The number of test evaluations extracted ranged from one to eight per reference; these are summarized in Table 6. PCRs were described either as ‘PCR’ (14 test evaluations) or ‘real-time-PCR’ (seven test evaluations). Two studies used PCR kits (iQ-Check Assay; Bio-Rad Laboratories, Hercules, CA, USA; BAX system, Qualicon Ltd., Warwick, UK), whereas the other 19 were developed in-house. Variations in the reference culture protocols were similar to those described above. Using the METANDI command, the correlation between logit Sep and logit Spp was found to be positive (0.48, SE 0.24). The HSROC curve and the prediction region, including the summary point and its confidence region, are shown in Fig. 2. One influential observation was identified; dropping this observation considerably narrowed the prediction region of the HSROC curve, and reduced the correlation between logit Sep and logit Spp to 0.21. However, dropping this observation did not result in any change to the summary estimates of Sep and Spp. There was no difference in the summary estimates resulting from the bivariate model as compared with pooling sensitivities and specificities separately using the META command. For this reason, and because the bivariate model does not permit multi-level analysis (level 1 = study; level 2 = individual test evaluations within study), further multivariate analysis to explore sources of heterogeneity were conducted for Sep and Spp separately.

Figure 2.

 Plot of fitted HSROC model of the sensitivity (Se) and specificity (Sp) of PCR, as compared with culture, used to detect Salmonella in pigs; (a) 95% prediction region with 21 test evaluations extracted from 10 studies; (b) 95% prediction region with one influential observation (i) removed.

Significant heterogeneity was identified for overall Sep and Spp (< 0.001). Dropping the influential observations did not change neither the Q-statistic nor its P-value, thus the observations were retained for analysis. For Sep, the proportion of total variance caused by the variance between test evaluations (Sepρ) and between studies (Sepρ) was 9% and 65%, respectively. Study size and study population each accounted for 58% of Sepρ. Sample type accounted for 42% of Sepρ and 11% of Sepρ. For Spp, the proportion of total variance that was caused by inline image was found to be zero, thus vj was dropped from the regression model. The proportion of total variance that was caused by the variance between test evaluations was 70%. Study size, sample type and PCR type explained 14%, 20% and 23% of Spp, respectively. Sub-group analysis by sample type (faeces, carcass swabs or lymph tissue) revealed significant heterogeneity (< 0.001) in all sub-groups except for carcass swabs. The pooled Sep for five test evaluations using carcass swabs was 98% (CI95 96–100), whereas the pooled Spp was 92% (CI95 92–99). Sample type and study size were found to be significantly associated with Sep in the final multivariable model (Table 8); none of the variables examined were significantly associated with Spp.

Table 8.   Final multi-variable regression model: coefficients, P-values and impact of predictors associated with the sensitivity and specificity of PCR (as compared with culture), used to detect Salmonella in pigs
VariablenCoefficientP-valueOverall P-valueSe/Sp (%)a
  1. aValue obtained by adding variable coefficient to intercept coefficient, which is then back-transformed by: Se = 1/[1 + exp − (coefficient)].

Sensitivity
Sample type   0.016 
Faeces13Reference  76.0
Carcass swab52.15<0.01 96.4
Lymph tissue31.650.03 94.3
Study size
≤10814Reference  76.0
≥23072.39<0.01 97.2
Intercept211.15<0.01 76.0
Specificity
Intercept212.85<0.01 94.6

Discussion

The literature search strategy used in this SR was comprehensive, encompassing many electronic databases and sources of ‘grey literature’; however, not all this literature could be retrieved and it is possible that some relevant data from conference proceedings have been missed. As proceedings tended to be of insufficient quality to pass through to data extraction, it is unlikely that this omission had much impact on our findings.

A number of reasons were identified for observed variability in reported estimates of the diagnostic Se of culture for Salmonella in pigs. A larger proportion of the overall variance associated with Sec was caused by variance between test evaluations (52%) than between studies (16%), indicating that differences in individual culture protocols are a more important source of observed heterogeneity. For example, according to our MR results the culture of samples collected at slaughter and then tested using multiple enrichments at 42 ± 1°C, plated on multiple agars, and compared against a reference culture protocol with only Rappaport-Vassiliadis (RV) medium for enrichment would have an apparent Sec of 95%. However, culture of the same samples enriched in only RV medium at 37°C, plated onto only brilliant-green agar and compared against a reference culture protocol which uses RV for enrichment would have an apparent Sec of just 69%.

When Vassiliadis et al. (1976) first described the use of RV media as a selective enrichment for Salmonella, the recommended incubation temperature was 43°C; however, more than a third (17/44) of the protocols using RV media used an incubation temperature of 37°C. Interestingly, the incubation of tetrathionate enrichment broth (index) at 37°C was marginally associated with decreased Sec, as compared with enrichment in RV; however, this association was reversed when incubation temperature was 42°C. It has been suggested that 37°C is the optimal temperature for enrichment, but that for highly contaminated samples incubation at 40–43°C will inhibit competing organisms (Waltman, 2000). Our results suggest that higher incubation temperatures should be used with most enrichment media for the highly contaminated samples typically examined for Salmonella in pigs. The exception is with SE media where, similar to the current analysis, decreased Se was reported at higher incubation temperatures (Waltman, 2000).

More intensive sampling or the use of multiple enrichment broths or plating media will naturally result in increased Salmonella detection (Davies et al., 2000). Studies evaluating protocols with multiple culture media or testing in parallel would be expected to report higher Se than studies that do not use these approaches. This has important implications with respect to the use of reference standards. The relative Se of a given culture protocol can differ considerably depending on whether a reference standard consists of a single or serial pre-enrichment/enrichment steps or multiple and parallel pre-enrichment/enrichment steps. In the former case, the reference standard will likely detect fewer positive samples and thus the relative Se of the index test will be higher, whereas in the latter case the reference will likely detect more positive samples and the relative Se of the index test will be lower. However, more intensive diagnostic efforts translate into increased costs for both labour and materials, and researchers must weigh the increased cost against the expected gain in Sec. It would therefore be misguided to make a blanket recommendation that all investigators should use a multiple-enrichment, multiple-agar protocol.

Culture of faecal material collected at the abattoir was associated with an increased Sec, as compared with similar samples collected on-farm. Although little can be found in the literature regarding quantitative differences in the number of colony forming units per Salmonella-positive pig sampled at slaughter versus those sampled on-farm, research has demonstrated that the prevalence of Salmonella-positive animals can increase dramatically during transport and lairage (Hurd et al., 2002, 2004; Larsen et al., 2004). Pigs likely shed greater numbers of Salmonella organisms when prevalence is higher. A greater number of viable organisms in the sample material would logically increase the probability of obtaining a positive culture result.

The use of a wide array of culture protocols which are compared against an equally wide array of reference protocols makes comparison of results from different studies difficult. This problem is highlighted in the current analysis, where the apparent Sec varied when different enrichment media were used in the reference protocol. Recently, the International Standard for isolating Salmonella from foods and feedstuffs (ISO 6579) was amended to include the detection of Salmonella in faeces. This amendment outlines the use of a single enrichment medium (modified semi-solid RV) in combination with xylose lysine deoxycholate (XLD) agar and ‘any other solid-selective medium complementary to XLD’ (Anonymous, 2007). Although widespread adoption of ISO 6579 would be an improvement over the current situation, with only a single enrichment medium this protocol may not be sufficiently sensitive to serve as a reference standard for test validation studies. Ideally, this reference standard would incorporate multiple selective enrichments, and possibly multiple agars, to ensure the highest possible Se.

The inconsistent use of reference standards among studies in this review makes pooling or comparison of these studies problematic. The difficulty is that there is no true ‘gold standard’ test for Salmonella in pigs and this issue extends to many pathogens of interest to food safety or animal health. This problem could be partly addressed by the consistent use of a single (albeit imperfect) reference test, thus allowing comparison of diagnostic accuracy across different studies. However, in our review, the use and definition of the reference standard varied considerably for each test across studies. Researchers usually attempted to cope with the lack of a gold standard in one of two ways: first, by selecting a single imperfect reference standard which varied among studies, or second, by creating a reference standard consisting of a combination of all positive test results for their study (equivalent to testing in parallel). This approach, although valid if assuming that false-positive test results do not occur, increases the complexity of trying to categorize test comparisons according to the reference test used. In addition, the incorporation of index test results into the reference standard can result in an overestimation of the test’s accuracy (incorporation bias) (Greiner and Gardner, 2000).

The HSROC curve graphically illustrates the trade-off between Se and Sp. In Fig. 2, we can see that for PCR this trade-off is most pronounced when Sep and Spp are greater than 90%.

A negative correlation between Se and Sp is expected because of the trade-off between these measures as the test threshold varies (Moses et al., 1993; Deeks, 2001). Interestingly, we noted a positive correlation between logit Sep and logit Spp. This suggests that when PCR assays are conducted in a manner which maximizes one parameter the other parameter will also be higher, as compared with assays which are perhaps conducted with less rigour thus negatively impacting both Se and Sp. This is further supported by the observation that the correlation between logit Sep and logit Spp was reduced, from 0.48 to 0.21, when the influential observation (which had both low Se and low Sp) was dropped from the dataset.

PCR of enriched samples has been advocated as a fast, reliable test for detecting Salmonella in pigs but the Se of PCR for detecting Salmonella without enrichment remains poor (Feder et al., 2001; Sibley et al., 2003; Mainar-Jaime et al., 2008). All studies used pre-enrichment and/or enrichment procedures and, in general, the Sep was reported to be good to excellent and varied with different sample matrices. Previous research has shown that human faecal samples contain inhibitory substances which may interfere with PCR assays (Wilde et al., 1990; Monteiro et al., 1997), and these substances are likely present in pig faeces as well. Although some commercial PCR kits, such as the QIAGEN stool kit (QIAGEN Inc., Germantown, MD, USA), may contain reagents designed to block this inhibitors, in general it appears that Sep is higher for non-faecal sample matrices such as carcass swabs. PCR is faster and more cost-efficient than bacteriological culture (Monteiro et al., 1997; Malorny and Hoorfar, 2005), and a large number of samples can be tested at once (Meer and Park, 1995). Bacterial culture of PCR-positive samples is still necessary when isolates are required for serotype confirmation and antimicrobial-resistance testing. Identification of presumptive colonies of Salmonella is the most time, labour and cost-intensive part of Salmonella culture; therefore, the use of broth-enriched PCR as a screening tool for pig faeces may improve time and cost effectiveness, particularly when prevalence is low (Singer et al., 2006).

We observed that the overall methodological soundness and/or reporting of primary research included in the review were limited. Although positive responses were required for only two criteria (sufficient details of test protocol and sufficient data reported) for a study to be included in the review, the two other primary quality items related to time between tests and handling/storage of samples were often unanswered. Negative responses to these latter two questions would have resulted in exclusion; however, none gave a negative response, thus none were excluded based on these criteria. Had failure to address these criteria been used for exclusion, then half of the studies used in this review would have been excluded. Lack of blinding, inconsistent reference standards, lack of randomization and failure to explain selection criteria are four other important criteria and none of the studies of this review met all these criteria. Lack of blinding can result in an exaggeration of measures of diagnostic accuracy (Knottnerus and Muris, 2003). Inaccuracies in the reference test will lead to over- or underestimation of the accuracy of the index test (Deeks, 2001). Lack of random selection may result in under- or over-representation of samples or subjects with the outcome of interest; test Se and Sp may be different under these conditions than when the same test is applied in the target population. If selection criteria are not described, it is not possible to determine whether subjects are an unbiased representation of the reference population (Greiner and Gardner, 2000). Estimates of Se and Sp may have limited ability to generalize if the spectrum of tested subjects is not similar to that which will receive the test in practice (Whiting et al., 2003).

A recent SR evaluating rapid tests for bacterial intestinal pathogens in food and faeces also reported overall limited quality of included studies (Abubakar et al., 2007). Other researchers have concluded that the conduct of SRs and MA for the evaluation of diagnostic tests has been hampered by the poor quality of reporting of diagnostic studies (Irwig et al., 1994; Bossuyt et al., 2003; Khan, 2005). Whether similar deficiencies observed in this review are caused by poor study design or poor reporting is unknown and should be further explored. Efforts have been made in recent years to encourage a standardization of methods for reporting primary research and SRs, via projects such as the STARD initiative (Standards for Reporting Diagnostic Accuracy), the PRISMA statement (Preferred Reporting Items for Systematic Reviews and Meta-Analyses), and the development of the QUADAS tool (Quality Assessment of Diagnostic Accuracy Studies) (Bossuyt et al., 2003; Whiting et al., 2003; Moher et al., 2009). Even within length restrictions imposed by publishing journals, the items listed within these tools need to be addressed. Although these tools were developed for application in the human health fields, researchers in the areas of veterinary and agri-food public health should also be encouraged to adopt similar guidelines. The guidelines provided by these tools should also be considered during the design and conduct of studies of diagnostic accuracy.

Conclusion

The results of this SR demonstrate that there is considerable variability in the accuracy of bacterial culture and PCR protocols used for detecting Salmonella organisms or DNA in pigs. Individual study design and test protocol characteristics varied considerably among studies. We found that enrichment temperature, study population, agar and enrichment type can explain significant variability in culture Se, whereas sample type and study size were associated with variability in the Se and Sp of PCR assays. The use of varied reference standards makes direct comparison between studies tenuous; the absence of single adequate reference standard is the most critical problem for the evaluation of diagnostic tests for Salmonella in pigs. The overall quality of existing primary research evaluating the accuracy of diagnostic tests for Salmonella in pigs is limited, because of study design and reporting. Future studies in this area should follow guidelines such as the STARD checklist and QUADAS tool when designing and implementing studies and reporting their results.

Ancillary