Single and Combined Diagnostic Value of Clinical Features and Laboratory Tests in Acute Appendicitis

Authors

  • Wytze Laméris,

    1. From the Department of Surgery (WL, AvR, MAB), the Department of Radiology (WL, AvR, JS), and the Department of Clinical Epidemiology, Biostatistics and Bioinformatics (PMMB), Academic Medical Center, University of Amsterdam, Amsterdam; the Department of Surgery, St. Antonius Hospital (PMNYHG), Nieuwegein; the Department of Surgery, Gelre Hospitals (WHB), Apeldoorn; and the Department of Surgery, Onze Lieve Vrouwe Gasthuis (SCD), Amsterdam, The Netherlands.
    Search for more papers by this author
  • Adrienne Van Randen MD,

    1. From the Department of Surgery (WL, AvR, MAB), the Department of Radiology (WL, AvR, JS), and the Department of Clinical Epidemiology, Biostatistics and Bioinformatics (PMMB), Academic Medical Center, University of Amsterdam, Amsterdam; the Department of Surgery, St. Antonius Hospital (PMNYHG), Nieuwegein; the Department of Surgery, Gelre Hospitals (WHB), Apeldoorn; and the Department of Surgery, Onze Lieve Vrouwe Gasthuis (SCD), Amsterdam, The Netherlands.
    Search for more papers by this author
  • Peter M.N.Y.H. Go MD, PhD,

    1. From the Department of Surgery (WL, AvR, MAB), the Department of Radiology (WL, AvR, JS), and the Department of Clinical Epidemiology, Biostatistics and Bioinformatics (PMMB), Academic Medical Center, University of Amsterdam, Amsterdam; the Department of Surgery, St. Antonius Hospital (PMNYHG), Nieuwegein; the Department of Surgery, Gelre Hospitals (WHB), Apeldoorn; and the Department of Surgery, Onze Lieve Vrouwe Gasthuis (SCD), Amsterdam, The Netherlands.
    Search for more papers by this author
  • Wim H. Bouma MD, PhD,

    1. From the Department of Surgery (WL, AvR, MAB), the Department of Radiology (WL, AvR, JS), and the Department of Clinical Epidemiology, Biostatistics and Bioinformatics (PMMB), Academic Medical Center, University of Amsterdam, Amsterdam; the Department of Surgery, St. Antonius Hospital (PMNYHG), Nieuwegein; the Department of Surgery, Gelre Hospitals (WHB), Apeldoorn; and the Department of Surgery, Onze Lieve Vrouwe Gasthuis (SCD), Amsterdam, The Netherlands.
    Search for more papers by this author
  • Sandra C. Donkervoort MD,

    1. From the Department of Surgery (WL, AvR, MAB), the Department of Radiology (WL, AvR, JS), and the Department of Clinical Epidemiology, Biostatistics and Bioinformatics (PMMB), Academic Medical Center, University of Amsterdam, Amsterdam; the Department of Surgery, St. Antonius Hospital (PMNYHG), Nieuwegein; the Department of Surgery, Gelre Hospitals (WHB), Apeldoorn; and the Department of Surgery, Onze Lieve Vrouwe Gasthuis (SCD), Amsterdam, The Netherlands.
    Search for more papers by this author
  • Patrick M.M. Bossuyt MD, PhD,

    1. From the Department of Surgery (WL, AvR, MAB), the Department of Radiology (WL, AvR, JS), and the Department of Clinical Epidemiology, Biostatistics and Bioinformatics (PMMB), Academic Medical Center, University of Amsterdam, Amsterdam; the Department of Surgery, St. Antonius Hospital (PMNYHG), Nieuwegein; the Department of Surgery, Gelre Hospitals (WHB), Apeldoorn; and the Department of Surgery, Onze Lieve Vrouwe Gasthuis (SCD), Amsterdam, The Netherlands.
    Search for more papers by this author
  • Jaap Stoker MD, PhD,

    1. From the Department of Surgery (WL, AvR, MAB), the Department of Radiology (WL, AvR, JS), and the Department of Clinical Epidemiology, Biostatistics and Bioinformatics (PMMB), Academic Medical Center, University of Amsterdam, Amsterdam; the Department of Surgery, St. Antonius Hospital (PMNYHG), Nieuwegein; the Department of Surgery, Gelre Hospitals (WHB), Apeldoorn; and the Department of Surgery, Onze Lieve Vrouwe Gasthuis (SCD), Amsterdam, The Netherlands.
    Search for more papers by this author
  • Marja A. Boermeester MD, PhD

    1. From the Department of Surgery (WL, AvR, MAB), the Department of Radiology (WL, AvR, JS), and the Department of Clinical Epidemiology, Biostatistics and Bioinformatics (PMMB), Academic Medical Center, University of Amsterdam, Amsterdam; the Department of Surgery, St. Antonius Hospital (PMNYHG), Nieuwegein; the Department of Surgery, Gelre Hospitals (WHB), Apeldoorn; and the Department of Surgery, Onze Lieve Vrouwe Gasthuis (SCD), Amsterdam, The Netherlands.
    Search for more papers by this author

  • The Dutch Organization for Health Research and Development, Health Care Efficiency Research program, funded the study (ZonMw, Grant 945-04-308).

Address for correspondence and reprints: Marja A. Boermeester, MD, PhD; e-mail: m.a.boermeester@amc.uva.nl.

Abstract

Objectives:  The objective was to evaluate the diagnostic accuracy of clinical features and laboratory test results in detecting acute appendicitis.

Methods:  Clinical features and laboratory test results were prospectively recorded in a consecutive series of 1,101 patients presenting with abdominal pain at the emergency department (ED) in six hospitals. Likelihood ratios (LRs) and the areas under the receiver operating characteristic curve (AUC) were calculated for the individual features. Variants of clinical presentation, based on different combinations of clinical features, were investigated and the accuracies of combinations of clinical features were evaluated.

Results:  The discriminative power (AUC) of the individual features in patients with suspected appendicitis ranged from 0.50 to 0.65. For five of the 23 predictor sets, the accuracy for appendicitis was more than 85%. This accuracy was only found in male patients. The relative frequency of these predictor sets ranged from 2% to 13% of patients with suspected appendicitis. A combination of the clinical features migration of pain to the right lower quadrant (RLQ), and direct tenderness in the RLQ, was present in only 28% (120/422) of clinically suspected patients, of whom no more than 85 patients had appendicitis (71%). A “classical” presentation (combination of migration of pain to the RLQ, tenderness in the RLQ, and rigidity) occurred in only 6% (25/422) of patients with suspected appendicitis and yielded an accuracy of 100% in males but only 46% in females.

Conclusions:  The discriminative power (AUC) of individual clinical features and laboratory test results for appendicitis was weak in patients with suspected appendicitis. Combinations of clinical features and laboratory tests with high diagnostic accuracy are relatively infrequent in patients with suspected appendicitis.

The clinical diagnosis of appendicitis in patients presenting with acute abdominal pain remains challenging. Despite widespread use of imaging, the medical history, physical examination, and laboratory tests contribute to the initial differentiation between acute appendicitis and other disorders causing acute abdominal pain. Physicians therefore want to know the diagnostic value of findings from the medical history, physical examination, and initial laboratory tests. Well-designed studies that prospectively investigated the complete set of clinical and laboratory features of acute appendicitis in a large consecutive series of patients with abdominal pain are scarce.

A meta-analysis by Andersson1 reported a weak discriminative power for most clinical features and laboratory test results in suspected appendicitis. Signs of peritoneal irritation and elevated laboratory inflammatory tests were found to be the strongest individual factors. The meta-analysis should be interpreted with caution due to the moderate quality of the included studies and the large heterogeneity of the pooled accuracy estimates.

Combined laboratory tests are reported to yield high diagnostic accuracy for appendicitis. For example, while the pooled likelihood ratio (LR) for C-reactive protein (CRP) levels above 12 mg/L was 1.97, and for white blood cell (WBC) counts above 10 × 109/L was 2.47 in one study,1 the LR for the combination of elevated CRP and raised WBC count appears to be between 8 and 23.2,3

Combining laboratory findings with clinical features, such as migration of pain or rigidity, may lead to even better accuracy for appendicitis. Unfortunately, most previous studies have only investigated one or a few clinical features. The value of highly accurate combinations of features will also depend on the relative frequency of these combinations. Highly accurate combinations that are seldom seen most likely have limited use for clinical practice.

We prospectively collected data on a complete set of clinical features and laboratory test results in a consecutive series of patients with abdominal pain, seen in the EDs of six hospitals. We examined the diagnostic value of these results and findings in patients with suspected appendicitis, both in isolation and in combination, and calculated their relative frequencies.

Methods

Study Design

This study is a preplanned secondary analysis of data that were collected in a multicenter prospective diagnostic accuracy study of the added value of imaging after clinical assessment in patients presenting to the emergency department (ED) with acute abdominal pain. This study was approved by the institutional review boards of the participating hospitals. Eligible patients were asked for written informed consent.

Study Setting and Population

Six hospitals in the Netherlands participated in the enrollment of patients, including two university hospitals (AMC Amsterdam, UMC Utrecht) and four large teaching hospitals (Sint Antonius Hospital, Nieuwegein; Gelre Hospital, Apeldoorn; Tergooi Hospital, Hilversum; and Onze Lieve Vrouwe Gasthuis, Amsterdam). Patients discharged from the ED without imaging were excluded. Patients in hemorrhagic shock due to gastrointestinal bleeding or ruptured aortic aneurysm were excluded, as were pregnant women.

Eligible patients were adults (>18 years) presenting to the ED with nontraumatic abdominal pain for more than 2 hours and less than 5 days. These patients were either self-referred or referred by their general practitioner. Recently discharged patients admitted for other reasons than abdominal pain, but who developed abdominal pain as a new complaint, were also eligible, as were postoperative patients if they had been pain free prior to ED presentation.

Study Protocol

Data were recorded prospectively by the residents on a Web-based digital case record form. Completeness of data was monitored daily by the study coordinators, and in case of incomplete data, residents were contacted and asked to fill empty fields. Automatic time registration of data entry allowed monitoring of prospective real-time data recording. Patients were evaluated clinically, and in all patients, a full blood count and CRP were performed. Thereafter all findings from medical history, physical examination, and laboratory tests were recorded in the digital case record form. The initial clinical diagnosis at the ED for each patient, based on clinical assessment and laboratory tests results, was recorded before imaging.

A full imaging protocol was then performed for all patients, consisting of upright chest and supine abdominal plain radiographs, an abdominal ultrasound (US), and a computed tomography scan (CT) with intravenous contrast. Details on the diagnostic protocol have been published elsewhere.4

All included patients were followed for at least 6 months. Data on clinical, laboratory, and surgical findings; pathology results (available for all excised appendixes); imaging reports; and treatment outcome were collected. Follow-up data were retrieved from in-hospital digital patient information systems and general practitioners were contacted to verify patient outcome.

A final diagnosis was assigned to every included patient by an expert panel. The expert panel was formed by two gastrointestinal surgeons and an abdominal radiologist with long-term clinical experience from each hospital. Panel members are listed in Appendix A. Each case was first evaluated by each panel member individually, with data presented in a standardized digital format, including data on clinical presentation, imaging results, patient management, and all available information collected during follow-up. Disagreement on final diagnosis was resolved during consensus meetings.

Data Analysis

Clinical Features and Laboratory Tests.  Prior to data analysis we searched the medical literature for clinical features associated with acute appendicitis,1,5–7 Based on these findings, 20 of the variables collected in our study were selected for analysis. We compared the presence or absence of these clinical features with the final diagnosis of acute appendicitis for each patient and calculated sensitivity, specificity, area under the receiver operating characteristic curve (AUC), and LRs. We also calculated these statistics for the four possible combinations of CRP and WBC count. Based on the individual AUC values we identified strong clinical features associated with appendicitis.

Patient Profiles in Patients With Suspected Appendicitis.  We calculated the relative frequencies of combinations of classic appendicitis features in patients with suspected appendicitis. Patients were categorized into patient profiles based on their age, sex, clinical features, and laboratory test results. We calculated the relative frequency of the “classical” presentation of appendicitis (a patient presenting with a history of pain migration to the right lower quadrant [RLQ], direct tenderness in the RLQ, and rigidity) and the relative frequencies of the clinical presentation with migration of pain to the RLQ, direct tenderness in the RLQ, and with or without elevated laboratory test results in all suspected patients, those younger than 30 years, and those older than 50 years. The diagnostic accuracy of all patient profiles was evaluated and an effort was made to identify profiles with high diagnostic certainty. The accuracy of the patient profiles was calculated as the proportion of patients with appendicitis within the profile. The chi-square test was applied to assess statistical significance of differences in proportions.

Results

Between March 2005 and November 2006, a total of 1,101 eligible patients were registered. During a period of 6 months, the percentage of approached patients that consented to participate in the study was measured in the initiating medical center; 98% of approached patients gave consent. A previous appendectomy had been performed in 79 patients. The presence of appendicitis in these patients was already excluded (stump appendicitis did not occur). Therefore, they were not included in the current analysis. Eighty patients had to be excluded because of missing data. No significant differences were observed between the excluded cases and the other patients in terms of age, sex, time of patient presentation, or type of patient presentation. The ages of the remaining 942 patients (55% female) ranged between 19 and 94 years, with a mean of 47 years (SD ± 17.3 years). Patient demographics are presented in Table 1. The majority of patients (76%) had been referred to the ED by a general practitioner. Seventeen percent of inclusions were self-referrals and 7% were referred by other medical specialties. A total of 118 doctors enrolled patients in this study; 74% of patients were evaluated by surgical residents, and emergency medicine residents evaluated the other 26%. Clinical experience of the residents ranged from 2 months to 8.7 years (mean ± SD = 25 ± 18.9 months).

Table 1. 
Demographics of 942 Patients Presenting With Acute Abdominal Pain
Patient demographics (n = 942)No. (%)
  1. COPD = chronic obstructive pulmonary disease; HPB = hepatobiliary–pancreatic.

  2. *Certain patients underwent more than one type of surgical procedure.

  3. †Based on index clinical evaluation at the ED.

Age, years (mean ± SD)47 ± 17.3
Females  515 (55)
Duration of complaints, days (mean ± sd)1.7 ± 1.3
Race
 White882 (94)
 African53 (6)
 Other7 (<1)
Patients with previous surgery295 (31)
Type of surgery*
 Gynecologic135 (14)
 Gastrointestinal127 (13)
 Urologic20 (2)
 Vascular22 (2)
 Other 39 (4)
Comorbidity
 Cardiovascular disorder108 (11)
 Previous malignancy58 (6)
 COPD48 (5)
 HPB disorder20 (2)
 Previous sexually transmitted disease18 (2)
Suspected with acute appendicitis†422 (45)
Admitted to the hospital612 (65)
Discharged from the ED330 (35)

A flow diagram of the study is presented in Figure 1. In 422 of 942 patients (45%), acute appendicitis was assigned as the most likely clinical diagnosis after clinical evaluation at the ED. Acute appendicitis was the final diagnosis, as assigned by the expert panel, in 284 patients (prevalence 30%). Of the 422 patients with suspected appendicitis, 251 of the 284 patients (88%) were included. Compared to the 251 correctly identified patients, the 33 patients with appendicitis that were not included in the suspected patients were older (means = 49 years vs. 40 years, p = 0.001), had a longer duration of complaints (mean = 2.3 days vs. 1.5 days, p < 0.001), had less often migration of pain to the RLQ (6% vs. 37%, p < 0.001), less tenderness only in the RLQ (33% vs. 65%, p = 0.001), and a lower WBC count (mean = 12.2 vs. 13.9, p = 0.026). In 271 of the 284 (95%) patients with a final diagnosis of appendicitis, the appendix was surgically removed and pathologic examination showed acute appendicitis. Of the remaining 13 patients with a final diagnosis of acute appendicitis, 12 were treated conservatively, and one patient underwent US-guided drainage of an appendiceal abscess. In the conservatively treated patients, the expert panel assigned the diagnosis of appendicitis based on clinical presentation, US results, CT results, and clinical course. The mean ± sd time from the start of complaints to presentation at the ED of the patients with appendicitis was 1.6 ± 1.1 days.

Figure 1.

 Flow diagram of the study. CRF = case record form.

Clinical Features

The accuracy of the 20 preselected features potentially associated with appendicitis in patients with suspected appendicitis is summarized in Table 2. Their individual discriminative power was low, with AUC values ranging between 0.50 and 0.65. Male sex (LR+ = 2.0) and WBC count (LR+ = 2.1 for a WBC count between 15 × 109 and 20 × 109/L) were the strongest predictors. Tenderness in the RLQ did not discriminate for the presence of appendicitis in patients with suspected appendicitis.

Table 2. 
The Diagnostic Value of Clinical Features in 422 Patients With Suspected Appendicitis
 No.%Sensitivity (%)Specificity (%)LR+LR–AUC95% CI
  1. AUC = area under the receiver operating characteristic curve; CRP = C-reactive protein; LR+ = positive likelihood ratio; LR– = negative likelihood ratio; RLQ = right lower quadrant; WBC = white blood cell.

  2. *No other quadrants were involved.

  3. †Rectal examination was performed in 403 patients.

  4. ‡Sensitivity, specificity, and likelihood ratios for CRP were calculated at a cutoff of >12 mg/L.

  5. §Sensitivity, specificity, and likelihood ratios for WBC count were calculated at a cutoff of >10 × 109/L.

Patient characteristics
 Male1964659712.00.60.650.60–0.70
 Age4220.500.44–0.55
History
 History of RLQ pain3648686131.01.00.500.44–0.55
 Pain migration to RLQ1323137771.70.80.570.52–0.63
 Pain on movement2656363381.01.00.510.45–0.56
 Anorexia2064953581.30.80.560.50–0.61
 Duration of complaints4220.540.48–0.59
 Nausea2365661521.30.70.570.51–0.63
 Progressive pain2566060381.01.10.510.46–0.57
 History of fever1192829731.11.00.510.45–0.57
 Vomiting 1142734832.00.80.580.53–0.64
 Diarrhea691617841.11.00.510.45–0.56
Physical examination
 Tenderness RLQ3758988111.01.10.500.45–0.56
 Tenderness RLQ only*2606265431.10.80.540.48–0.59
 Rebound tenderness2656363371.01.00.500.44–0.55
 Rigidity1042530841.90.80.570.51–0.62
 Abdominal tenderness410979721.01.40.500.45–0.56
 Rectal tenderness†2975900.61.10.520.46–0.58
 Body temperature4220.560.50–0.61
Laboratory examination
 CRP (mg/L)‡3167579311.10.70.550.49–0.60
  0–1098230.2
  10–203691.1
  20–5098231.3
  50–10093221.0
 WBC count (×109/L)§3017181431.40.40.620.57–0.68
  0–10 11527
  10–15179421.1
  15–2099232.1

The LRs for the four combinations of CRP and WBC count are presented in Table 3. Of the individual laboratory parameters, CRP had less discriminative power for appendicitis than the WBC count: the AUC was 0.55 for CRP and 0.62 for WBC count. The combination of a CRP < 12 mg/L and a WBC count < 10 × 109/L was associated with a low probability of acute appendicitis, yielding a LR– of 0.09. A CRP of >12 mg/L and a WBC count of >10 × 109/L yielded a LR+ = of 1.4. The diagnostic accuracy of the clinical features in all patients with acute abdominal pain is presented in Data Supplement S1 (available as supporting information in the online version of this paper).

Table 3. 
Diagnostic Value of the Combined Laboratory Test Results for Acute Appendicitis in Patients With Suspected Appendicitis
Combined laboratory testsPrevalence (%)LR95% CI
  1. CRP = C-reactive protein; LR = likelihood ratio; WBC = white blood cell.

CRP >12 mg/L and WBC count >10 × 109/L561.4 1.2–1.7
CRP <12 mg/L and WBC count >10 × 109/L171.5 0.9–2.3
CRP >12 mg/L and WBC count <10 × 109/L190.65 0.4–1.0
CRP <12 mg/L and WBC count <10 × 109/L80.090.03–0.3

Patient Profiles in Patients With Suspected Appendicitis

In Table 4, the relative frequencies of the patient profiles and the observed proportions of acute appendicitis within each profile are presented. The classical presentation of appendicitis with a history of pain migration to the RLQ, direct tenderness in the RLQ, and rigidity was present in 6% (25/422) of patients with suspected appendicitis (14 males [3%] and 11 females [3%]), of whom all the male patients but only 46% of the female patients had appendicitis. A combination of the clinical features, migration of pain to the RLQ, and direct tenderness in the RLQ was only present in 120 of the 422 patients (28%) with suspected appendicitis, of whom only 85 had appendicitis (71%). Overall, the relative frequency of combinations of strong clinical features in suspected patients was low. There was large variability in the clinical presentation of patients with acute appendicitis. For five of the 23 patient profiles, the proportion of appendicitis was more than 85%. These were all for male patients. The relative frequencies of these patient profiles ranged from 2% to 13%. The clinical presentation of direct tenderness in the RLQ and rigidity was present in 53 clinically suspected male patients (13%), with 89% (47/53) accuracy, and in 40 clinically suspected female patients (9%), with 50% (20/40) accuracy. This difference in profile accuracy between males and females was statistically significant (p < 0.001).

Table 4. 
The Relative Frequency of Patient Profiles in Suspected Appendicitis with the Observed Proportion of Appendicitis Within Each Profile
Profiles in patient with suspected acute appendicitisNo. patients (% of 422)No. with AppendicitisObserved proportion (%)
  1. CRP = C-reactive protein; RLQ = right lower quadrant; WBC = white blood cell.

1.All patients with direct tenderness in the RLQ375 (89)22259
 + migration of pain to the RLQ120 (28)8571
 + CRP >12 mg/L and WBC count >10 × 109/L67 (16)5278
2a.Male, direct tenderness in the RLQ and rigidity53 (13)4789
 + migration of pain to the RLQ14 (3)14100
2b.Male, direct tenderness in the RLQ and rigidity53 (13)4789
 + CRP <12 mg/L and WBC count <10 × 109/L2 (0.5)150
3a.Female, direct tenderness in the RLQ and rigidity40 (9)2050
 + migration of pain to the RLQ11 (3)546
3b.Female, direct tenderness in the RLQ and rigidity40 (9)2050
 + CRP <12 mg/L and WBC count <10 × 109/L3 (1)00
4.Male, <30 years, direct tenderness in the RLQ54 (13)4175
 + migration of pain to the RLQ19 (5)1579
 + CRP >12 mg/L and WBC count >10 × 109/L8 (2)675
5.Female, <30 years, direct tenderness in the RLQ 67 (16)3045
 + migration of pain to the RLQ23 (5)939
 + CRP >12 mg/L and WBC count >10 × 109/L10 (2)660
6.Male, >50 years, direct tenderness in the RLQ44 (10)3171
 + migration of pain to the RLQ12 (3)1192
 + CRP >12 mg/L and WBC count >10 × 109/L9 (2)889
7.Female, >50 years, direct tenderness in the RLQ45 (11)2147
 + migration of pain to the RLQ7 (2)457
 + CRP >12 mg/L and WBC count >10 × 109/L4 (1)375

Discussion

In this study, isolated clinical features had weak diagnostic value for acute appendicitis, and there was substantial variability in the clinical presentation of patients suspected with acute appendicitis. The vast majority of patients did not present with a classical presentation of acute appendicitis. The simultaneous occurrence of the strongest clinical features in suspected patients was rare, and several combinations of clinical features had a different diagnostic accuracy in males than in females. Strongly diagnostic patient profiles had a low relative frequency within the suspected patients.

Direct comparison of our accuracy results of the individual features to those described in the meta-analysis of the diagnostic value of clinical features and laboratory tests1 is difficult. Unlike the present study, which investigated a complete set of features in a consecutive series of patients, most studies included in the meta-analysis only investigated one or a few features. Calculation of the summary accuracy statistics in the meta-analysis was based on a different set of studies for each feature. Between studies there was large heterogeneity, depicted by individual study incidence of appendicitis that ranged from 27% to 61%. Our finding that the strongest clinical features for appendicitis were tenderness in the RLQ, rigidity, migration of pain to the RLQ, and elevated inflammatory laboratory parameters is in line with the findings of the meta-analysis.1 However, combinations of CRP and WBC count did not yield the high accuracy reported by previous studies (LR+ = 1.5 vs. 8 and 232,3). There is no immediate explanation for this large difference, but the high values reported in the two previous studies seem overoptimistic when considering that reported LRs of US and CT are only 4.5 and 9.3, respectively.8

Higher negative appendectomy rates are usually reported in females compared to males.9–11 The current study confirms that clinically differentiating appendicitis from other conditions is especially difficult in females. This is probably due to the fact that symptoms of gynecologic conditions can mimic those of acute appendicitis. Combinations of clinical features that displayed a high accuracy in male patients in our study were less accurate in female patients. This difference in accuracy of combined clinical features in males and females is important to consider in clinical practice.

Even when combinations of features yield high diagnostic accuracy for acute appendicitis, their clinical usefulness may be limited when patients infrequently present with such combinations. A higher number of clinical predictors included in a patient profile increased the accuracy of a profile, but lowered the proportion of patients with such a profile. The use of clinical scoring systems for the diagnosis of appendicitis may also be limited by this fact. High clinical scores are only obtained in patients in whom multiple clinical predictors are present.

Because the relative frequencies of a classical presentation and of other strongly diagnostic combinations of clinical features are low, diagnostic imaging will be warranted in a large proportion of patients with suspected appendicitis. Three recent meta-analyses8,12,13 showed excellent accuracy results for US and CT, with CT performing significantly better than US. Preoperative CT has shown to improve patient outcome in appendicitis, for example, by decreasing negative appendectomy rates in multiple studies.10,14–17 However, in one study a steep increase in the use of CT from 40% to 70% of suspected patients did not decrease the negative appendectomy rate of 12% any further.17 A similar limitation was described by Raman et al.,10 who observed no effect on negative appendectomy rate after CT use increased from 60% to 93%. Therefore, selection of patients for imaging based on clinical decision tools is an option that is worthwhile to investigate.

We included a large series of consecutive ED patients in a multicenter trial and recorded a complete set of clinical and laboratory data prospectively. Many physicians participated in the inclusion of patients, making the result more applicable to ED patients in general. A prehospital selection of patients was made in our study, as the majority of study patients presenting to the ED were referred by a general practitioner. This selection may be dissimilar to the patient flow in other countries, making their populations, and consequently the pretest probability of diseases, different from ours.

Limitations

The discriminative power of clinical and laboratory features, especially the ability to rule out appendicitis in the absence of certain features, may be underestimated. Patients who were discharged from the ED without imaging ordered by the treating physicians were not included. These patients probably had only mild complaints and few clinical features that suggested a serious disorder. The clinical data of these true-negative patients were not collected and not included in our analysis.

Physicians were aware of the radiologic investigations that would follow after inclusion. This could have lowered the threshold for inclusion to cope with diagnostic uncertainty. Hereby many patients with only a low clinical suspicion could have been labeled as patients with suspected appendicitis. This could lead to a low pretest probability for appendicitis in the suspected patients and a low relative frequency of typical clinical presentation of appendicitis within our study. This view is contradicted by the fact that in 24 previous studies reporting on the diagnostic value of clinical features in patients with suspected appendicitis described in the meta-analysis,1 the mean proportion of appendicitis was only 40% (range = 27%–61%). This is comparable to the proportion of acute appendicitis in our clinically suspected patients of 59%. In three recent meta-analyses of US and CT in patients with clinically suspected appendicitis, the mean proportions of appendicitis were 45, 48, and 50%.8,12,13 The low relative frequency of typical patient presentations found in the present study is therefore most likely not a result of a low threshold for inclusion.

Inviting consecutive patients to participate in a study with a full diagnostic protocol possibly leads to a selection bias. Patients with more serious conditions may be more willing to give consent. In the current study, we measured a refusal rate of only 2% in approached patients in the initiating hospital. Based on our discussions with the other participating hospitals, we have no reason to suspect that their refusal rate differed from ours. Patients overall were willing to undergo additional diagnostic imaging. Therefore, we see no grounds for a selection bias because participants had to undergo three forms of imaging.

Clinical findings and laboratory test results were available to the expert panel that determined the final diagnosis. Including the index test in the definition of disease causes incorporation bias, as the association between the index test and the reference standard will be artificially inflated. In this study, however, the clinical findings and laboratory test results were just two of the data items available to the panel, which also had access to imaging results, treatment, results of treatment, and follow-up. Therefore, the effect of incorporation bias on the diagnostic accuracy of clinical features and laboratory tests is most likely limited.

No results can be reported for several variables reported by some to have diagnostic value for appendicitis, such as the psoas sign, pain before vomiting, and no previous episode of similar pain.7 These variables were not selected as potentially diagnostic variables prior to the study and were therefore not collected.

The extrapolation of results of a diagnostic accuracy study is often difficult. An artificial research environment and the interobserver variability of test interpretation often affect the reproducibility of reported test results, including that of clinical evaluation. To make our research environment less artificial, we attempted to mirror daily practice as much as possible. Therefore, we chose to perform the study in a multicenter setting, including university and teaching hospitals. The reproducibility of medical history and physical examination in patients with abdominal pain itself is also known to be limited. The agreement of emergency physicians and residents18 on the presence of abdominal masses was reported to be excellent in one study, but the agreement was only moderate for abdominal tenderness, guarding, and distention. Surgeons only showed a fair to moderated agreement for most clinical findings in patients with acute abdominal pain in another study.19 To what extent this interobserver variability affected the possibility to extrapolate our results is unclear. Similar to daily practice, the on-call residents at the ED performed the clinical evaluations in our study. As a result, 118 different residents participated in the study. The fact that our results reflect a multicenter experience, and data were obtained by a large number of observers, increases the validity to apply our results to other ED settings.

A limitation of studies that investigate medical history, physical examination, and laboratory tests is that the results are often based on a single assessment of the patient. However, signs and symptoms may progress over time. The results of our study are based on the initial evaluation at the ED. This revealed a low relative frequency of the classical presentation of appendicitis. The proportion of patients with typical symptoms of appendicitis may increase after a period of observation. A study that investigated the value of repeated clinical and laboratory examinations reported an increase of the diagnostic value of body temperature and laboratory test results after observation.20 However, that study still reported a negative appendectomy rate of 25% for patients undergoing surgery for suspected appendicitis after observation. Decisions on further investigations or admission in patients with abdominal pain are usually made in the ED. Moreover, repeated assessment during clinical observation may be less efficient than single assessment followed by (selective) imaging. Therefore, our study focused on the initial clinical assessment of patients in the ED.

Conclusions

The discriminative power of clinical features and laboratory test results was weak in patients presenting with suspected appendicitis. Combinations of clinical features and laboratory tests with high diagnostic accuracy are relatively infrequent in patients with suspected appendicitis.

Appendix A

OPTIMA Trial Expert Panel Members

Academic Medical Centre, Amsterdam

O.R.C. Busch, Department of Surgery

T.M. van Gulik, Department of Surgery

O.D. Henneman, Department of Radiology, Bronovo Hospital, Den Haag

Tergooi Hospitals, Hilversum

A.A.W. van Geloven, Department of Surgery

J.W. Juttmann, Department of Surgery

E.M. van Keulen, Department of Radiology

Onze Lieve Vrouwe Gasthuis, Amsterdam

S.C. Donkervoort, Department of Surgery

J. Peringa, Department of Radiology

M.P. Simons, Department of Surgery

Sint Antonius Hospital Nieuwegein

H.W. van Es, Department of Radiology

P.M.N.Y.H. Go, Department of Surgery

M.J. Wiezer, Department of Surgery

Gelre Hospitals, Apeldoorn

W.H. Bouma, Department of Surgery

E.J. Hesselink, Department of Surgery

W. ten Hove, Department of Radiology

Ancillary