Comparison of Pediatric Emergency Physicians’ and Surgeons’ Evaluation and Diagnosis of Appendicitis


  • Anupam B. Kharbanda MD,

    1. From the Division of Emergency Medicine, Morgan Stanley Children’s Hospital of New York, Columbia University (ABK), New York, NY; and the Department of Surgery (SJF) and the Division of Emergency Medicine (RGB), Children’s Hospital Boston, Harvard Medical School, Boston, MA.
    Search for more papers by this author
  • Steven J. Fishman MD,

    1. From the Division of Emergency Medicine, Morgan Stanley Children’s Hospital of New York, Columbia University (ABK), New York, NY; and the Department of Surgery (SJF) and the Division of Emergency Medicine (RGB), Children’s Hospital Boston, Harvard Medical School, Boston, MA.
    Search for more papers by this author
  • Richard G. Bachur MD

    1. From the Division of Emergency Medicine, Morgan Stanley Children’s Hospital of New York, Columbia University (ABK), New York, NY; and the Department of Surgery (SJF) and the Division of Emergency Medicine (RGB), Children’s Hospital Boston, Harvard Medical School, Boston, MA.
    Search for more papers by this author

  • Presented in part at the annual meeting of the Pediatric Academic Societies, Washington, DC, May 17, 2005.

Address for correspondence and reprints: Anupam Kharbanda, MD; e-mail:


Objectives:  To compare the interexaminer reliability and ability to predict appendicitis between pediatric emergency physicians (EPs) and senior surgical residents.

Methods:  The authors conducted a prospective cohort study of children aged 3 to 18 years of age with signs and symptoms suspicious for appendicitis. Patients were initially examined by a pediatric EP attending and then by a consulting senior surgical resident. Physicians reported the presence or absence of specific historical and physical exam findings and predicted the likelihood the patient had appendicitis. Interexaminer reliability of historical and physical exam findings was compared (kappa statistic). Distributions and median probabilities of appendicitis were calculated for pediatric EP and surgeon predictions.

Results:  The authors evaluated 350 patients with acute abdominal pain. Historical questions revealed slight to very good agreement (kappa statistic range 0.33–0.82) between physician types, whereas physical examination findings exhibited poor to fair agreement (range 0.14–0.48). Physicians predicted similar median probabilities of appendicitis for patients who were ultimately diagnosed with appendicitis (75% vs. 70%; p = 0.73) and patients without appendicitis (25% vs. 30%; p = 0.59). For a subset of patients given a ≥ 90% predicted probability of appendicitis, pediatric EPs and senior surgical residents had similar accuracy (80% vs. 79%; p = 0.92). Similarly, among patients with ≤ 10% predicted probability, pediatric EPs were correct in 95% and senior surgical residents correct in 94% of patients (p = 0.63).

Conclusions:  Pediatric EPs and senior surgical residents elicit historical findings from patients with suspected appendicitis with a greater degree of similarity than physical examination findings, which exhibit a wide degree of variability. Pediatric EPs and senior surgical residents do not differ in their ability to clinically predict appendicitis. These findings may be helpful in developing institutional management protocols.

Appendicitis is the most common nontraumatic surgical emergency in children.1 Although the diagnosis of appendicitis in children is often challenging, many cases can be identified by history and physical examination alone.2–5 The sensitivity and specificity of historical and physical findings in predicting appendicitis have been reported previously.2,6 However, these reports should be interpreted with caution, because there may be wide variation in eliciting common physical findings among physicians of different levels of training and specialty.7,8 This is especially relevant in academic medical centers, because multiple physicians are involved in the care of children with possible appendicitis. Variability in the physical examination can lead to unnecessary testing and delayed operative care.

Although experience is likely to significantly impact the reliability of history and physical examination findings, some have also argued that certain specialists (e.g., surgeons) are more skilled in performing abdominal examinations.9 This has led several institutions to create practice guidelines mandating early surgical involvement for any child being evaluated for appendicitis. The benefit of these guidelines is that if a surgeon, with high likelihood, predicts that a patient has appendicitis, the patient could proceed directly to the operating room, avoiding the delay, expense, and possible risks inherent in imaging studies.10–13 Although it may be ideal to involve surgeons early in the care of patients with suspected appendicitis, surgeons, especially those with pediatric expertise, may not be immediately available in every setting. In addition, little research is currently available to support whether surgeons or emergency physicians (EPs) are better able to clinically diagnose appendicitis. Thus, we sought to determine whether the physical examination findings and overall clinical impression of children with acute abdominal pain suspicious for appendicitis differed between senior general surgery residents and pediatric EPs.


Study Design

This was a prospective cohort study. The study was approved by the Committee on Clinical Investigation at Children’s Hospital, Boston. Informed consent was obtained from all participating pediatric EPs and surgical residents. Informed consent was also obtained from all parents, and assent was obtained from children over age 7 years. This study was compliant with the Health Insurance Portability and Accountability Act of 1996.

Study Setting and Population

This study was conducted at an urban, tertiary-care, pediatric emergency department (ED), with approximately 52,000 visits per year. From April 2003 to July 2004, we prospectively enrolled children between 3 and 18 years of age who underwent surgical consultation for possible appendicitis. All children and adolescents who presented to the ED with acute abdominal pain were initially evaluated by a pediatric EP attending. In agreement with an institutional clinical practice guideline, pediatric EP attendings obtained surgical consultation for all patients with clinical presentations suspicious for appendicitis. Senior surgical residents and pediatric EP attendings were instructed to independently complete data collection forms before any diagnostic imaging or operative care. The surgical evaluation was conducted by a fourth-year general surgery resident from one of three Harvard Medical School surgical residencies rotating on the pediatric surgical service. Surgical consultation generally occurred within 1 hour of the request for consultation, per the hospital’s clinical practice guideline. Per the guideline, surgical residents were required to examine patients prior to any ultrasound (US) or computed tomography (CT) studies. Patients were excluded if they were pregnant, had undergone prior abdominal surgery, suffered from chronic medical conditions (e.g., cystic fibrosis, inflammatory bowel disease, sickle cell anemia), or had radiologic studies (CT or US) of the abdomen within the previous 2 weeks. Patients who had laboratory studies or plain radiographs prior to their ED evaluation were included in the study population, and these results were made equally available to both the pediatric EP and surgical consultant.

Study Protocol

Pediatric EPs and senior surgical residents were oriented to the data collection forms prior to involvement in the study. Pediatric EPs and senior surgical residents were informed that the goal of the study was to identify significant predictors of appendicitis.

The standardized data collection forms consisted of 24 demographic, historical, and physical examination variables. Historical elements were history of fever, nausea, anorexia, emesis, diarrhea, number of hours of pain, migration of pain, history of focal right lower quadrant pain, pain onset, pain quality, and ability to walk. Physical examination variables were location of abdominal tenderness and point of maximal tenderness; presence of tenderness with percussion, cough, or hopping; rebound tenderness; guarding; rectal tenderness; bowel sounds; costovertebral angle tenderness; and psoas, obturator, or Rovsing’s sign. For patients undergoing a pelvic examination, presence of adnexal pain or cervical motion tenderness was recorded. Physicians were also asked to predict the likelihood or probability that the patient had appendicitis on a continuous scale, 0% to 100%. Each physician obtained historical and physical examination information independently and was instructed to not compare findings. If a physician was unable to complete any part of the survey, he/she was instructed to mark “unable to assess.” Completed forms were placed in a locked drop box.


Our main outcome was the presence or absence of appendicitis. Final diagnosis was determined by pathology for patients who had an appendectomy. A perforated appendix was determined by the attending surgeon’s written postoperative diagnosis. For patients who did not have surgery, the outcome was confirmed by a follow-up phone call 2 to 4 weeks following the ED visit. If the family could not be reached, the patient’s pediatrician was contacted to determine the final diagnosis. In cases where phone follow-up or pediatrician follow-up could not be obtained, the institution’s electronic medical record system was reviewed for a subsequent visit to the hospital greater than one month from the index encounter to determine if the patient underwent an appendectomy.

Data Analysis

Statistical analysis was performed using Statistical Package for the Social Sciences (Version 12.0, SPSS Inc., Chicago, IL). For each historical and physical examination finding, 2 × 2 tables were constructed comparing presence or absence of exam finding by either pediatric EP or surgeon. If the physician had marked “unable to assess” on the data collection form, the data element was coded as missing. The kappa statistic was calculated for each comparison. A kappa of 0.01–0.20 indicated poor agreement, 0.21–0.40 slight, 0.41–0.60 fair, 0.61–0.80 good, and 0.81–0.92 very good, and greater than 0.92 was considered excellent clinical agreement.14

The physician-predicted likelihood, or probability, of appendicitis was analyzed for patients with and without appendicitis. Probability ranges, distributions, and median probabilities were calculated for pediatric EPs and senior surgical residents. Because the distributions were skewed, and the pediatric EP and senior surgical resident predictions were determined independently, we compared the probabilities (of appendicitis) by physician type using the Mann-Whitney U-test. Additionally, we calculated the difference between the probabilities by pediatric EP and surgeon for individual patients. This “probability difference” was calculated as the pediatric EP probability of appendicitis minus the surgeon’s estimate of the probability of appendicitis. The distribution, range, and median probability difference were examined to determine the variability of the probability for individual patients. The Wilcoxon signed-rank test was used to evaluate whether predictions differed significantly for individual patients. We also utilized linear regression to examine the correlation between senior surgical resident and pediatric EP predictions. In a final analytic procedure, pediatric EP and surgeon predictions were compared to patients’ final diagnoses. Multiple cutoffs for physician clinical predictions of appendicitis (from 50% to 90%) were examined. For example, at a cutoff of 90%, physicians were classified as correctly predicting appendicitis if patients with likelihoods of 90% or greater ultimately had pathology-confirmed appendicitis. A likelihood cutoff of 10% was also evaluated for ruling out appendicitis. The accuracy of pediatric EP and surgeon predictions at various cutoffs was then compared using chi-square analysis and by constructing receiver operating characteristics (ROC) curves.


Study Population

Over the 15-month study period, 350 patients with acute abdominal pain underwent evaluation by both a pediatric EP attending and a surgical senior resident. The median age of enrolled patients was 11.4 years (interquartile range [IQR] 8.4–14.7 years). A total of 191 patients (55%) were male. A total of 134 patients (38%) had pathology-confirmed appendicitis, 23 of whom (17%) had a perforated appendix. After ED evaluation, 163 patients (47%) were discharged home, 139 (40%) went to the operating room, and 48 (13%) were admitted for observation. Telephone follow-up was completed on 342 patients and electronic health record review on 8 patients for a total follow-up rate of 100%. We searched the ED, pathology, and radiology databases to identify patients who were eligible for enrollment but were missed. Our capture rate for the time period studied was 92%. Although we did not collect data on time between pediatric EP consult and surgical evaluation in this study, quality improvement data from our ED reveal that in May and June of 2004 the median time for surgical consultation initiation (phone call to time resident arrived in the ED) was 19 minutes (range 1–35 minutes) for day shifts and 20 minutes (range 1–155 minutes) on nights and weekends. Seventy percent of all surgical consultations are started within 30 minutes of consult. Over the course of the study period, 48 pediatric EPs and 29 fourth-year surgical residents participated in this study. The pediatric EPs who participated in the study were all board-eligible/certified in pediatrics and board eligible/certified in pediatric emergency medicine.

Comparison of Historical and Physical Examination Findings

The agreement between pediatric EPs and senior surgical residents for each of the historical and physical examination findings is presented in Table 1. Overall agreement ranged from 61% (presence of guarding) to 91% (presence of emesis). The kappa statistics revealed poor to fair agreement between pediatric EP and senior surgical residents (range 0.14–0.48) for physical examination findings, and slight to very good agreement for historical questions (range 0.33–0.82).

Table 1.   Comparison of Agreement among Senior Surgical Residents and Pediatric EPs
 Overall KappaKappaOverall Agreement (%)
Appendicitis CasesNo Appendicitis Cases
  1. EPs = emergency physicians; RLQ = right lower quadrant; CIs = confidence intervals.

  2. *95% CIs.

 Duration of pain0.53 (0.43, 0.63)*0.59 (0.44, 0.79)0.48 (0.34, 0.61)77
 Nausea0.54 (0.45, 0.64)0.46 (0.26, 0.65)0.54 (0.42, 0.65)79
 Emesis0.82 (0.76, 0.89)0.76 (0.64, 0.88)0.84 (0.77, 0.92)91
 Anorexia0.39 (0.29, 0.49)0.35 (0.16, 0.54)0.36 (0.24, 0.49)71
 Diarrhea0.69 (0.59, 0.79)0.59 (0.42, 0.77)0.74 (0.63, 0.86)90
 History of migration0.43 (0.33, 0.53)0.46 (0.30, 0.61)0.35 (0.20, 0.49)74
 History of focal RLQ pain0.52 (0.42, 0.62)0.33 (0.14, 0.53)0.56 (0.45, 0.67)79
 Able to walk without discomfort0.33 (0.23, 0.44)0.26 (0.06, 0.45)0.28 (0.14, 0.42)68
Physical examination
 Maximal pain in RLQ0.48 (0.38, 0.58)0.34 (0.12, 0.55)0.47 (0.36, 0.59)78
 Guarding0.23 (0.13, 0.33)0.16 (0.01, 0.33)0.21 (0.07, 0.33)61
 Rebound0.28 (0.17, 0.40)0.17 (0.01, 0.35)0.22 (0.05, 0.38)70
 Pain with percussion0.32 (0.22, 0.42)0.14 (0.02, 0.30)0.29 (0.16, 0.43)65
 Obturator sign0.28 (0.13, 0.43)0.36 (0.15, 0.57)0.12 (–0.07, 0.32)79
 Rovsing’s sign0.41 (0.30, 0.53)0.35 (0.18, 0.52)0.42 (0.25, 0.58)78
 Psoas sign0.38 (0.25, 0.51)0.43 (0.24, 0.61)0.26 (0.08, 0.45)78
 Character of bowel sounds0.14 (0.02, 0.29)0.09 (–0.15, 0.33)0.12 (–0.08, 0.31)68

Prediction of Appendicitis

Pediatric EPs and senior surgical residents were independently asked to predict the probability (likelihood) that a patient had appendicitis. Figure 1 illustrates the range of probabilities predicted by pediatric EPs and senior surgical residents, and the distribution of patients who were ultimately diagnosed with appendicitis. Among patients with proven appendicitis, predictions ranged from 5% to 100% by both pediatric EPs and senior surgical residents, highlighting the difficulty of making the correct diagnosis. In addition, pediatric EPs and senior surgical residents predicted similar median probabilities for patients with pathology-proven appendicitis (75% vs. 70%; p = 0.73; Table 2). These results are presented graphically in Figure 2. For patients who did not have appendicitis, the median probabilities predicted by the two physician types was lower and statistically not different (25% vs. 30%; p = 0.59). The predicted likelihood for appendicitis between physician types was also compared via linear regression. The results of this analysis revealed a correlation coefficient of 0.61 (p < 0.001), indicating a highly significant correlation for predictions between physician types. Last, ROC curves graphically reveal the similar accuracy between pediatric EPs and senior surgical residents over the range of possible predicted probabilities (Figure 3).

Figure 1.

 Predicted probability by physician type. EP = emergency physician.

Table 2.   Comparison of the Predictions of Appendicitis between Senior Surgical Residents and Pediatric EPs
 Pediatric EPSurgeonsp-Value
  1. EP = emergency physician; IQR = interquartile range.

  2. *Probability difference was calculated for individual patients as difference in prediction for pediatric EP minus prediction by senior surgical resident.

Patients with appendicitis
 Median probability of appendicitis75% (IQR 50, 90)70% (IQR 50, 90)0.73
 Median probability difference*0 (IQR –20, 10) 
Patients without appendicitis
 Median probability of appendicitis25% (IQR 10, 50)30% (IQR 10,60)0.59
 Median probability difference*0 (IQR –20, 15) 
Figure 2.

 (A) Surgical resident predicted probability for appendicitis. (B) Pediatric EP predicted probability for appendicitis. EP = emergency physician.

Figure 3.

 ROC analysis of pediatric EP and senior surgical resident–predicted probabilities. ROC = receiver operating characteristic; EP = emergency physician.

While the median probability of appendicitis was similar between physician groups, we also sought to determine whether pediatric EPs and senior surgical residents made similar predictions for individual patients. Therefore, for each patient, we calculated the difference between pediatric EP and surgeon predictions. For patients with and without appendicitis, the median probability differences were zero, indicating that senior surgical residents and pediatric EPs predicted essentially the same likelihood for appendicitis for individual patients (Table 2). In addition, when evaluating paired predictions, there was no difference between the pediatric EP and surgeon probabilities (p = 0.32 among patient with appendicitis and p = 0.45 among patients without appendicitis).

Extreme Prediction Accuracy

We suspected that physicians might have improved accuracy among those patients for whom they offered an extreme prediction for the presence or absence of appendicitis. To test this hypothesis, we stratified patients by the pediatric EP or surgeon predicted probability for appendicitis. When patients were stratified in this manner, pediatric EPs and senior surgical residents once again performed similarly (see Table 3). For the extreme predictions, pediatric EPs identified 55 patients as having ≥ 90% chance of appendicitis and they were correct in 44 cases (80%), whereas senior surgical residents identified 53 patients and were correct in a similar number (42, 79%; p = 0.92). When giving an extreme negative prediction (≤10%) for appendicitis, both physician groups had higher accuracy. Pediatric EPs identified 66 patients as low risk for appendicitis and were correct in 63 (95%), and senior surgical residents identified 78 patients as low risk and were correct in 73 (94%; p = 0.63). Pediatric EP and surgeon accuracy was not statistically different for other prediction cutoffs, as illustrated in Table 3.

Table 3.   Stratified Predictions of Appendicitis by Senior Surgical Residents and Pediatric EPs
Prediction of Appendicitis (Probability, %)Number of Patients Correctly Identifiedp-Value
Pediatric EP (% Correct)Surgeons
  1. EP = emergency physician.

≥9044/55 (80)42/53 (79)0.92
≥8060/81 (74)61/87 (70)0.45
≥7080/114 (70)85/124 (69)0.79
≥6091/138 (66)95/149 (64)0.70
≥50107/178 (60)114/196 (58)0.70
≤1063/66 (95)73/78 (94)0.63


Medical practice in academic settings has evolved to involve multiple caregivers of different levels of training. This is especially true in EDs. Two previously published studies have found moderate to poor agreement in physical examination findings between physicians of different levels of training.7,8 Interobserver agreement and accuracy must be considered when evaluating best practices for patient care as well as in the development of clinical pathways. In addition, the reliability among health care providers of historical and physical examination findings must be measured when determining the usefulness of these predictors in clinical decision rules.

Few prior authors have examined the interrater agreement of physical examination findings in patients with abdominal pain. Pines et al.8 recently published a prospective study of 122 adults with abdominal pain in which the interrater reliability of the physical examination was compared between residents and attending EPs. The authors reported that the interrater agreement for physical examination findings was highly variable, with kappas ranging from 0.27 to 0.82. Only fair agreement was found for the presence of rebound pain and guarding (kappas of 0.49 and 0.42, respectively).8 Similar to our study, agreement for right lower quadrant (RLQ) pain in Pines’ study was fair (kappa of 0.40; 95% confidence interval [CI] = 0.20 to 0.59).

One previous study, by Yen et al.,7 examined interobserver agreement for physical examination in pediatric patients with abdominal pain. In this study of 68 children examined by a pediatric resident, attending physician, or surgeon, the physical examination was not reliable. Only rebound pain had fair agreement with a kappa of 0.54 (95% CI = 0.008 to 1.07). These results are similar to our study in that we found that the majority of physical examination findings between senior surgical residents and pediatric EPs held poor to fair agreement. The only elements to hold fair agreement on physical examination in our study were the presence of RLQ pain or the presence of Rovsing’s sign. The degree of variability in the physical examination reported by Pines et al. and Yen et al. and supported in this study is concerning given that, in teaching hospitals, attending physicians often rely on residents’ examinations. The fact that physical examination findings were so variable may indicate that patients with abdominal pain have changing examinations over time. This would support the role of multiple examinations performed over time to gain a true understanding of the patients’ physical examination findings and disease progression.

To the best of our knowledge, in addition to comparing physical examination findings, our study is the first to compare the interrater reliability of historical findings between physicians. In contrast to the wide range of agreement for physical examination findings, we found that pediatric EP attendings and senior surgical residents exhibited slight to very good agreement for the majority of measured historical variables. The highest degree of correlation was seen for the presence of emesis and diarrhea (kappas of 0.82 and 0.69, respectively). This may be due to the fact that historical items are elicited from the patient and families, whereas physical examination findings are in part determined by skill and time of examination.

Last, our study was unique in that we directly compared senior surgical residents and pediatric EPs in their ability to predict pediatric appendicitis. We report that pediatric EPs and surgical residents are equivalent in their ability to clinically predict appendicitis. For individual patients, pediatric EPs and senior surgical residents predicted essentially identical probabilities of appendicitis without any trends to suggest over- or underestimation by either group (median difference 0). In addition, pediatric EPs and senior surgical residents are able to identify, with high accuracy, a subset of patients at very low risk for appendicitis. These findings support the fact that even though the physical examination findings between the groups differed, the interpretation and synthesis of the historical and physical examination findings result in a similar interpretation of a patient’s clinical diagnosis. It appears that overall clinical impression relies on multiple parameters, such as physician training, experience, and clinical acumen, items that we did not explicitly address in our study design. Furthermore, it may be that physicians were able to rely on subtle physical examination findings that allowed them to gain insight into the patient’s probable diagnosis. Finally, these results also indicate that clinical pathways that recommend early involvement by surgeons are unlikely to significantly alter patient management in terms of the need for diagnostic imaging, observation, or discharge from the hospital.


We compared pediatric EP attendings to senior surgical residents in their fourth year of training. Clearly, the results may have differed if we had compared pediatric EP attendings to surgical staff physicians. However, we felt that the comparison to senior residents was more realistic, as these are the surgeons (if not those more junior) performing the initial physical examination in the majority of academic medical institutions.

The surgeons evaluating the children in this study were consultants and were under the understanding the pediatric EP attending already had a suspicion for appendicitis, thus introducing an element of bias into the study. We did not control what information was relayed to the surgical consultant over the phone or in person and thus cannot exclude the possibility that the agreement found for historical or physical exam findings was influenced by this communication. We were unable to assess if the patients’ examination truly did change between examinations by the physicians, nor did we survey the families to determine if repeated examinations affected their responses to questions. Although the surgical resident always examined the patient after the pediatric EP attending, we were not able to calculate the time between examinations, because this was not tracked as part of the study.

The study was conducted at a single institution and the results are likely to vary in other clinical settings. Also, the clinical practice of multiple different pediatric EPs of varying degrees of experience was surveyed along with surgical residents who were at different points in their fourth year of surgical training, with variable but limited specific experience in the assessment of pediatric patients. Because we did not control for these factors, we cannot exclude that these differences in experience are factors in our reported findings. Finally, because the skills of individual physicians likely varied, and individual physicians often evaluated many patients, there is a possibility of a cluster effect. Fortunately, there were a large number of physicians (48 pediatric EPs and 29 surgical residents), which would limit any effect on the analysis.


Pediatric EPs and senior surgical residents elicit historical findings from patients with suspected appendicitis with a greater degree of similarity than physical examination findings, which exhibit a wide degree of variability. Pediatric EPs and surgical residents do not differ significantly in their ability to predict appendicitis in the pediatric ED setting. For patients felt to have equivocal findings for appendicitis by experienced pediatric EPs, delaying imaging studies until after surgical evaluation may not be necessary, especially in institutions where pediatric surgical evaluation may not be readily available. When pediatric EPs and surgeons agree that a patient has a high likelihood of appendicitis, urgent operative care, rather than additional time-consuming, expensive, and possibly harmful imaging studies should be considered.

The authors would thank the pediatric emergency attendings, pediatric emergency fellows, general surgeons, and radiologists at Children’s Hospital Boston for their assistance with this study.