Ultrasonography for diagnosis of acute appendicitis

  • Protocol
  • Diagnostic

Authors


Abstract

This is the protocol for a review and there is no abstract. The objectives are as follows:

Our primary objective is to provide readers with a summary of the diagnostic accuracy of US for appendicitis in patients who present with clinically suspected acute appendicitis.

Our secondary objectives are to explore the diagnostic accuracy of US for appendicitis in male and female patients, in the paediatric (under 16 years of age) and pregnant patient subgroups and in patients with intermediate clinical suspicion of acute appendicitis or indeterminate diagnostic scores.

Background

Target condition being diagnosed

Appendicitis results from inflammation of the vermiform appendix and is the most common abdominal condition requiring emergency surgery. Typically appendicitis presents with a 24 hour history of vague central abdominal pain that migrates to the right iliac fossa. This is often accompanied by anorexia, nausea and constipation. A tachycardia and pyrexia are common. Abdominal examination may reveal localised tenderness in the right iliac fossa with guarding, rigidity and percussion or rebound tenderness. Often the site of maximum tenderness is located at McBurney’s point, which lies two-thirds along a line from the umbilicus to the anterior superior iliac spine (Grover 2011). Pain may be exacerbated by movement, and by asking the patient to cough which will often localise the pain to the right iliac fossa.

The incidence of appendicitis is approximately 1 per 1,000 per year (Hall 2010) with a male to female ratio of1.1:1 and an overall lifetime risk of 8.6% for males and 6.7% for females (Addiss 1990). It is the most common abdominal emergency and accounts for more than 40 000 hospital admissions in England every year (approximately 1 per 1500 population) (Lewis 2011). Appendicitis is a progressive inflammatory process which may result in perforation, abscess formation, generalised peritonitis, bowel obstruction and rarely death with a mortality rate of 0.08%, rising to 0.5% in the event of a perforated appendix (Blomqvist 2001). The incidence of perforation rises with the duration of symptoms (Bickell 2006) therefore, prompt diagnosis and treatment are essential for reducing the morbidity and mortality associated with advanced inflammation.

The treatment of choice for appendicitis is appropriate resuscitation followed by expedient appendicectomy. Traditionally appendicectomy has been performed by laparotomy via a right iliac fossa incision, however in recent decades appendicectomy has been performed laparoscopically via multiple incisions (Swank 2011), with advances in technology also spurring an interest in single incision laparoscopic surgery (SILS) (Feinberg 2011) and Natural Orifice Transluminal Endoscopic Surgery (NOTES) (Roberts 2011) in more recent years. All patients with appendicitis should receive broad spectrum perioperative antibiotics as this decreases the incidence of postoperative wound infections and abscess formation (Andersen 2005). Antibiotics have also been proposed as the primary treatment for uncomplicated appendicitis (Varadhan 2010), however appendicectomy remains the gold standard.

In the Cochrane Library, there are five intervention reviews published relating to appendicitis; antibiotics versus placebo for prevention of postoperative infection after appendicectomy (Andersen 2005), laparoscopic versus open surgery for suspected appendicitis (Sauerland 2010), laparoscopy for the management of acute lower abdominal pain in women of childbearing age (Gaitan 2011), single incision versus conventional multi-incision appendicectomy for suspected appendicitis (Rehman 2011) and appendicectomy versus antibiotic treatment for acute appendicitis (Wilms 2011). In addition two protocols have been published, one intervention review protocol on appendix stump closure during laparoscopic appendectomy (Sauerland 2010), and one diagnostic test accuracy review protocol of computed tomography for appendicitis in adults (Rud 2012).

Index test(s)

Ultrasound (US) has been an important tool used in the diagnosis of appendicitis since the 1980s.The pulse-echo principle, in which sound waves are transmitted through a medium and reflected back, forms the basic principle of conventional ultrasound (US) (Case 1998). US machines consist of a transducer that contains piezoelectric crystals that convert electrical energy into ultrasonic sound waves, above the frequency of human hearing of 20 kHz or 20,000 cycles per second. The transducer then simultaneously detects the echo sound waves, reversing the conversion from ultrasonic sound waves into an electrical signal (Case 1998). This echo signal, having reflected off any structures that lying in its path, is then analysed and displayed as a cross-sectional tomographic image. Hyperechoic structures appear brighter and hypoechoic structures are darker, with the brightness of each pixel corresponding to the amplitude of the echo (Hangiandreou 2003). The echo strength is determined by the differential acoustic impedance between adjacent structures. US images can be made clearer by increasing the size of the transducer crystals, altering the US frequency, changing the angle of the transducer or focusing the sound beam (Hangiandreou 2003). Modern US technology has enabled multiple areas of beam focusing on the same image (Smith 2004).

Advances in US technology and the graded compression technique have improved the visualisation of the appendix (Birnbaum 2000). A standard contemporary US performed in suspected appendicitis will first examine the right hypochondrium and pelvis (typically using a 3-5 MHz transducer) in order to exclude alternative pathology relating to the liver, gallbladder, pancreas, kidney or pelvic organs and assess for the presence of free peritoneal fluid. Graded compression and colour Doppler sonography is then performed in the right iliac fossa. The ascending colon is identified and followed proximally, with the iliac vessels identified by the Doppler sonogram (Gaitini 2008). The graded compression technique involves applying steady, gradual pressure to the right iliac fossa, with emphasis over the site of maximal tenderness, in order to collapse the normal small bowel by dispelling bowel gas, and to differentiate between an incompressible inflamed appendix and compressible and displaceable normal small bowel (Puylaert 1986). A normal appendix appears as a blind ended, aperistaltic tubular structure with a wall thickness of 2mm or less that originates from the base of the caecum (Yabunaka 2007). An incompressible, blind-ended, fluid-filled, tubular structure with hyperemic walls with a thickness of greater than 6mm are often used as a criterium for sonographic appendicitis (Prystowsky 2005). Other positive findings suggesting appendicitis are the presence of a faecolith, hyperechoic periappendicular fat, peritoneal fluid or a collection (Gaitini 2008). To improve visualisation of a retrocaecal appendix, lumbar manual compression can also be performed (Lee 2002). A combined transabdominal and transvaginal approach has been suggested to improve the diagnostic accuracy of ultrasound in females with suspected appendicitis (Bondi 2012). Reports from US performed for suspected appendicitis often conclude as being positive, negative or inconclusive for appendicitis. US is operator dependent and as a result the reported sensitivity, between 76%-90% and specificity, between 83%-100% can vary (Keyzer 2005, Parks 2011). With increasing technical expertise and experience the diagnostic accuracy of US is improving, especially in high volume centres.

The main advantages of US over other modalities is that it is noninvasive, quick to perform, relatively cheap (Wan 2009), can be used to identify other alternative causes of abdominal pain and has real-time capability, mobility and lack of ionizing radiation making US safer for patients and operators (Cogbill 2011). With concerns over childhood and foetal exposure to ionizing radiation, US has been particularly useful in the diagnosis of appendicitis in children (Doria 2009) and in pregnant women, with a variable sensitivity ranging from 66% to 100% and a specificity of 95% to 96% reported (Patel 2007). Disadvantages of US include operator dependency and inconclusive results in the event of the appendix not being visible (Parks 2011). A false negative US result may lead to a delayed diagnosis, increased risk of perforation and increased sepsis-related morbidity and rarely mortality. A false positive US results in unnecessary surgery, and risk of unnecessary surgical complications.

Clinical pathway

Prior test(s)

The traditional approach of diagnosing appendicitis is based on careful history taking and physical examination, with assessment of the intensity and sequence of symptoms, clinical signs and basic laboratory tests. However, the classical sequence of vague abdominal pain followed by vomiting with migration of the pain to the right iliac fossa may only be present in as few as 6% of patients with suspected appendicitis (Lameris 2009) and the presence or absence of any particular individual symptom or sign cannot be relied upon to diagnose or exclude appendicitis. Each single element of the history and of clinical and laboratory examination are of weak discriminatory and predictive capacity. However, higher discriminatory and predictive power can be achieved by using the variables in combination with repeated patient evaluations over a period of close observation also increases the discriminatory power of clinical assessment (Andersson 2004). Although good clinical acumen remains the mainstay of the correct diagnosis of appendicitis, clinical presentation is, however, often equivocal and diagnostic errors are common (Wagner 1996). Presentation may be influenced by the anatomical position of the appendix, for example displacement of the appendix by the gravid uterus often results in atypical presentations of appendicitis in pregnancy (Ito 2011), and an accurate history may be difficult to obtain in young children or the elderly with delirium. There are also many conditions that can cause right iliac fossa pain and may therefore mimic appendicitis, sometimes making the early diagnosis of acute appendicitis a challenge. Therefore, initially, in order to reduce diagnostic uncertainty, clinicians utilise blood tests, such as white cell count and C-reactive protein, and diagnostic scoring systems.

Scoring systems have been developed to aid diagnosis by estimating the probability of appendicitis occurring in the individual patient. The best known is the Alvarado scale (Alvarado 1986), although alternatives have been proposed (Enochsson 2004; Andersson 2008; Chong 2010). The Alvarado scoring system comprises of eight weighted clinical indicators - three symptoms, three signs and two laboratory findings; migratory pain, anorexia, nausea and/or vomiting, right lower quadrant tenderness, rebound tenderness, pyrexia, leucocytosis (>10 X 109/L) and a neutrophilic shift to the left >75. Patients with an intermediate score require serial reassessment of physical findings and often complementary diagnostic imaging. The diagnostic accuracy of the Alvarado score has been reported as 90.9% for a score of 7-10 and 100% for a score of 0-4 (Jang 2008).

If the diagnosis of acute appendicitis appear highly likely, especially in male patients where alternative diagnoses are less common, it is currently accepted practice to proceed directly to exploratory surgery, which will be either an open or laparoscopic approach. However, with increasing evidence demonstrating that preoperative imaging improves diagnostic accuracy in appendicitis, there has been an increase in use of radiological investigations in order to confirm diagnosis prior to invasive surgical intervention.

Role of index test(s)

Following clinical assessment, US is often used as the primary imaging modality, especially in the paediatric and obstetric setting.

Alternative test(s)

If the US is negative or equivoval, a computed tomography (CT) scan is often the next imaging modality of choice. CT has been used in the diagnosis of appendicitis for around twenty years, with modern multi-slice CT scanners now capable of acquiring an image in a few seconds with the ability to reformat axial images into coronal and sagittal cross-sectional images that facilitate identification of the appendix (Paulson 2005). Attempts to improve the accuracy of CT with enhancement by intravenous (IV), oral or rectal contrast agents are controversial. The use of IV-contrast may cause allergic reactions, using enteric contrast is time consuming and some argue that enhancement may not be necessary (Neville 2009). Although CT has been shown to have a high diagnostic accuracy, with a sensitivity of 90%-100% and specificity of 90%-100% (Parks 2011), the major limitation of CT is the use of ionising radiation. The estimated lifetime risk of cancer due to the radiation exposure resulting from a CT scan is 0.14 - 0.02%, the lower the age at the time of the CT scan - the higher the estimates (Brenner 2007). It is for this reason that CT should be reserved for those cases of possible appendicitis with equivocal presentations (Hernanz-Schulman 2010). Diagnositic laparoscopy may often be used instead of CT, especially in young female patients, in order to prevent unnecessary radiation exposure with the added benefit of offering definitive treatment at the same setting as the investigation.Although diagnostic laparoscopy enables direct visualisation of the appendix and other intra-abdominal viscera, a recent study estimates the positive predictive value of laparoscopy to be 93%, with a negative predictive value of 71% (Hussain 2009). Laparoscopy is also associated with increased morbidity and expense (Golash 2005). Magnetic resonance imaging (MRI) has been shown to be an accurate test for diagnosing appendicitis, with a prospective study in 138 patients reporting a 100% sensitivity and 98% specificity (Cobben 2009). Although MRI has had a limited role in the evaluation of the acute abdomen due to low availability, high cost and long study duration, MRI is an attractive option in diagnosing appendicitis in pregnancy following an inconclusive ultrasound scan due to its high accuracy and that it does not involve the use of ionising radiation (Basaran 2009). Due to the above limitations, however, MRI is not currently routinely used in clinical practice for the diagnosis of acute appendicitis.

Rationale

It has been estimated that a third of patients with appendicitis have atypical presentations, with one study suggesting as few as 6% present with the classical sequence of vague abdominal pain followed by vomiting with migration of the pain to the right iliac fossa (Lameris 2009). A delay in diagnosis can result in increased perforation rates and increased morbidity and mortality therefore surgeons have been more inclined to operate when the diagnosis is probable rather than to wait until it is certain. An early decision to perform surgery may reduce potential morbidity and mortality, however can also lead to the unnecessary removal of a normal appendix. In the Western world, the life-time risk of acute appendicitis is 6.7% for females and 8.6% for males (Addiss 1990), yet the life-time chance of an appendicectomy is higher, with some reports of a negative appendicectomy rate as high as 20% (Parks 2011). In acute appendicitis, the diagnostic accuracy based on clinical examination alone is 80% (Old 2005). Recent studies advocate the use of medical imaging to reduce the rate of negative appendicectomies. US may be especially useful where there are equivocal clinical signs or an indeterminate diagnostic score. In these situations US may improve diagnostic accuracy by reducing the number of false negatives and therefore prevent unnecessary surgery. There are, however, there are no clear guidelines regarding the optimal use of US in the diagnosis of acute appendicitis. Practice varies across the world and reported sensitivity (76%-90%) and specificity (83%-100%) can vary (Keyzer 2005; Parks 2011). Although CT has been shown to have a high sensitivity and specificity, a prospective trial revealed similar diagnostic performance of US and CT (Keyzer 2005). When compared to CT, US is inexpensive, more readily available and does not expose the patient to ionising radiation, making US more preferable in the paediatric and pregnant patients.

A contemporary review of the diagnostic accuracy of US for acute appendicitis is therefore required which incorporates formal assessment of methodological quality while also exploring clinically relevant sources of heterogeneity and implementing appropriate statistical methods therefore providing the reader with a quantitative summary of the available evidence in this field. We believe that a systematic assessment comparing the diagnostic performances of US and CT merits a review of its own and we plan to carry out this comparison in a separate Cochrane review in the future.

Objectives

Our primary objective is to provide readers with a summary of the diagnostic accuracy of US for appendicitis in patients who present with clinically suspected acute appendicitis.

Secondary objectives

Our secondary objectives are to explore the diagnostic accuracy of US for appendicitis in male and female patients, in the paediatric (under 16 years of age) and pregnant patient subgroups and in patients with intermediate clinical suspicion of acute appendicitis or indeterminate diagnostic scores.

Methods

Criteria for considering studies for this review

Types of studies

Prospective cohort studies that compare the results of US to the results of a reference standard test for appendicitis will be included. In case of duplicate publications we will include the study report with the highest number of participants. Only studies in which true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN) are reported or can be extracted from will be included.

Participants

Adult and paediatric patients of all ages patients with suspected appendicitis will be included. There will be no restrictions regarding the degree of suspicion of appendicitis. If necessary we will contact the authors of the studies for the results in the adult, paediatric and pregnancy subgroups.

Index tests

Ultrasound scan (US) of the abdomen and pelvis performed for suspected appendicitis.

Target conditions

The target condition is acute appendicitis.

Reference standards

Studies that use at least one of the following two reference standards will be included:

1. Histological examination of the removed appendix as well as clinical follow-up of participants that did not have surgery.

2. Intraoperative assessment of the appendix by the surgeon as inflamed or normal, as well as clinical follow-up of participants that did not have surgery or underwent exploratory surgery without appendicectomy. Intraoperative assessment by laparotomy and laparoscopy is considered equal.

The reliability of intraoperative assessment of the appendix is however controversial. Some estimates of positive predictive values for macroscopic assessment of appendicitis at laparoscopy have been reported as 96-99% and 97-100% (Moberg 1998; Pedersen 2001), however studies where all patients underwent appendicectomy have concluded that macroscopic examination of the appendix intra-operatively is unreliable, both in open (Grunewald 1993)  and laparoscopic surgery (Hussain 2009). Although the validity of macroscopic assessment is controversial, we will aim to explore the potential effect of the two reference standards in the statistical analyses.

Studies where suspected appendicitis is treated non-operatively will be excluded as there will be no reference standard to compare.

Search methods for identification of studies

Electronic searches

We will search Medline (Ovid), Embase (Ovid) and the Cochrane Library by using an electronic search strategy that combines medical subject heading (MeSH) descriptors, indexing terms and text words to capture the index test and the target disease. We will also search the Science Citation Index Expanded and Biosis Citation Index for studies that cite the included studies. The searches will not be limited to particular types of study design and do not have language or publication dates restrictions. Our search strategy was developed in collaboration with a medical information specialist. The current versions of our search strategies are presented in Appendix 1.

Searching other resources

The reference lists of existing systematic reviews of US for appendicitis will also be screened for relevant studies.

Data collection and analysis

Selection of studies

Two reviewers will independently apply the selection criteria to the titles and abstracts of the study reports identified by the search strategy. If the decision to exclude a study cannot be made on the basis of the title and the abstract, the entire study report will be retrieved for assessment. The final decision on inclusion will be based on the entire study report. Disagreements between reviewers will be solved by discussion, or if necessary by a third reviewer. No language restrictions will be applied.

Data extraction and management

Two reviewers will independently extract information from selected studies according to a data collection form. Disagreements will be solved by discussion. If disagreement persists a third reviewer will resolve the matter. Reviewers will digitalize the collected data by using the Review Manager software. If insufficient information is reported then we will contact the relevant authors. Key characteristics of the selected studies will be summarized in tables. The following are considered key study characteristics: selection criteria, recruitment procedure, clinical setting, age and gender distribution, use of graded compression, method for diagnosing appendicitis, prevalence of appendicitis and duration of follow-up.

Assessment of methodological quality

We will use the QUADAS-2 checklist to assess methodological quality for risk of bias and concern regarding applicability (see appendix 2). At least two reviewers will independently apply the QUADAS-2 tool. Disagreements will be solved by discussion, or if needed, by a third reviewer.The outcome of the methodological quality assessment will be presented in a table summarising the number of studies with low, high or unclear risk of bias for each of the four domains. A similar table will be presented for concerns regarding applicability.

Statistical analysis and data synthesis

For each reported criteria for US diagnosis of appendicitis we

will extract the absolute counts of true positive (TP) i.e.. US positive for appendicitis confirmed by either one of the reference standards, false positive (FP) i.e.. ultrasound positive for appendicitis without the diagnosis of appendicitis by either one of the reference standards, false negative (FN), i.e.. a negative ultrasound with the diagnosis of appendicitis confirmed with either one of the reference standards and true negative (TN), i.e.. a negative ultrasound without the diagnosis of appendicitis on clinical follow-up. If these counts are unavailable we will contact the authors. These counts will be used to construct a two-by-two table for the index test. Extracted data will be entered into RevMan® to produce forest plots of sensitivity and specificity, with corresponding 95% confidence intervals, and we will plot the pair of estimates from each study on a single ROC scatterplot.

For the meta-analysis, as we anticipate little variation between studies in US-criteria for appendicitis, we plan on using the bivariate random-effects method to estimate summary estimates of sensitivity and spacticity with 95% confidence and prediction regions (Reitsma 2005). If our anticipation proves wrong we will use the hierarchical summary receiver operating characteristic (HSROC) model to generate a SROC curve (Rutter 2001). STATA or SAS software will be used for the analyses.

Investigations of heterogeneity

In order to investigate the potential sources of heterogeneity we will assess the effect of various factors on the diagnostic accuracy of US, including graded compression versus no graded compression, transabdominal approach versus combined transabdominal and transvaginal approach, lumbar manual compression versus no lumbar manual compression, the use of linear or curved probe, the frequency of the probe, the type of scanning (conventional gray scale, pulsed, color, or power Doppler), assessment by experienced operator versus other, appropriate blinding of surgeon (intraoperative assessment) or pathologist (histopathological assessment) or clinician (involved in patient follow-up) to the US scan report versus interpretation of reference standard without blinding to US scan report and the reference standard as being macroscopic intraoperative assessment versus microscopic pathological assessment. We will also assess the effect of methodological quality as a potential source of heterogeneity as per the QUADAS-2 tool.

If the number of studies per strata is large enough then we will perform stratified analyses of the sub-groups, if not then we will include covariate groups as additional independent variables in the bivariate random-effects model. Assuming homogeneity in US criteria for appendicitis between studies we will use the bivariate random-effects approach in the analyses of heterogeneity and will be visually assessed with the hierarchical SROC approach.

Sensitivity analyses

We will present sensitivity analysis stratified by methodological quality as per the QUADAS-2 tool. However we anticipate identifying the other most relevant factors to be subjected to sensitivity analyses in the process of reviewing the identified studies. We propose to specify the criteria for the sensitivity analyses in the review rather than to pre-define them at the protocol stage.

Assessment of reporting bias

No formal assessment of reporting bias will be undertaken.

Results

Acknowledgements

None

Appendices

Appendix 1. Medline, Embase, Cochrane Library, Science Citation Index and BIOSIS Citation Index search strategies

MEDLINE (Ovid)

1. Appendectomy/

2. Appendicitis/

3. Appendix/

4. (appendec* or appendic* or appendix).tw.

5. 1 or 2 or 3 or 4

6. exp Ultrasonography/

7. (ultrasound* or ultrasonograph* or echotomograph* or ultrasonic or echograph* or sonograph*).tw.

8. 6 or 7

9. 5 and 8

 

EMBASE (Ovid)

1. appendectomy/

2. appendicitis/

3. acute appendicitis/

4. appendix/

5. (appendec* or appendic* or appendix).tw.

6. 1 or 2 or 3 or 4 or 5

7. ultrasound/

8. echography/

9. (ultrasound* or ultrasonograph* or echotomograph* or ultrasonic or echograph* or sonograph*).tw.

10. 7 or 8 or 9

11. 6 and 10

THE COCHRANE LIBRARY

#1 MeSH descriptor Appendectomy explode all trees          

#2 MeSH descriptor Appendicitis explode all trees  

#3 MeSH descriptor Appendix explode all trees       

#4 (appendec* or appendic* or appendix):ti,ab,kw   

#5 (#1 OR #2 OR #3 OR #4)            

#6 MeSH descriptor Ultrasonography explode all trees        

#7 (ultrasound* or ultrasonograph* or echotomograph* or ultrasonic or echograph* or sonograph*):ti,ab,kw       

#8 (#6 OR #7)            

#9 (#5 AND #8)         

 

Science Citation Index

#1 Topic=(appendec* or appendic* or appendix)

#2 Topic=(ultrasound* or ultrasonograph* or echotomograph* or ultrasonic or echograph* or sonograph*)

#3 (#2 AND #1)

           

BIOSIS Citation Index

#3 (#2 AND #1)

#2 Topic=(ultrasound* or ultrasonograph* or echotomograph* or ultrasonic or echograph* or sonograph*)

#1 Topic=(appendec* or appendic* or appendix)

 

Appendix 2. Modified QUADAS 2

ULTRASONOGRAPHY FOR ACUTE APPENDICITIS - QUADAS-2

Risk of bias and applicability judgements

QUADAS-2 is structured so that 4 key domains are each rated in terms of the risk of bias and the concern regarding applicability to the research question (as defined above). Each key domain has a set of signalling questions to help reach the judgments regarding bias and applicability.

 

DOMAIN 1: PATIENT SELECTION

A. Risk of bias

1. Was a consecutive or random sample of patients enrolled?

 Answer ‘yes’ if one of the following conditions is met:

a) It is explicitly stated in the study report that enrolment was consecutive (or random)

b) It is stated that all eligible study participants were included, with patients enrolled during all hours of the day during the study period (ie. not just in “office hours”)

Answer ‘no’ if neither of the conditions is met

Answer ‘unclear’ if insufficient information is available to answer ‘yes’ or ‘no.’

 

2. Did the study avoid inappropriate exclusions?                                                                                             

Was a case-control design avoided?                                                                                                                   

Note: studies with a case-control design are excluded from this review

Could the selection of patients have introduced bias?                                                                 

Risk of bias is ‘low’ if signalling question 1 or 2 is answered ‘yes’, risk of bias is assessed as ‘high’ if signalling question 1 or 2 is answered ‘no,’ and risk of bias assessed as  ‘unclear’ if questions 1 or 2 answered ‘unclear.’

B. Concerns regarding applicability

Is there concern that the included patients do not match the review question?            

Concern is assessed as ‘low’ when the study population represents an unselected sample of adults with suspected appendicitis. Exclusion for critically ill (eg. septicaemic) patients and mentally incapacitated patients is considered appropriate. If inappropriate exclusions account for less than 5% of the number of included patients, this will be considered negligible. Concern is assessed as ‘high’ when the study population does not represent an unselected sample of adults with suspected appendicitis. Concern is assessed as ‘unclear’ when insufficient information is available.

DOMAIN 2: INDEX TEST

A. Risk of Bias

1. Were the index test results interpreted without knowledge of the results of the reference standard?                                                                                                       

For practical reasons the US must take place before a decision is made to perform a surgical exploration +/- appendicectomy or opt for clinical follow-up. However in a particular study ultrasound image analysis may be performed subsequent to the operation. Such analysis may be biased if the radiologist is aware of the operative findings. Therefore, answer ‘yes’ if one of the following conditions is met: 

a) the US analyses were performed before the patient had surgery

b) the US analyses were postponed / re-evaluations, where the radiologist were blinded to the patient’s outcome (ie. surgery or clinical follow-up and intraoperative findings if relevant)

2. Was a dichotomised ultrasound result recorded?  (ie. negative or positive for acute appendicitis)                                                 

3. Were inconclusive or uninterpretable ultrasound results recorded?  eg. appendix not visualised?                                                                                                                                 

Classify as yes if at least one of the following three conditions are met:

a) the number of interpretable or intermediate results related to the US scan is clearly reported.

b) it is explicitly stated that there were no uninterpretable or inconclusive results (as referred to above).

c) it can be deduced from the study report that no uninterpretable or intermediate results as referred to above occurred.

Classify as no if none of the conditions stated above are met.

Classify as unclear if insufficient information is available to classify this item as yes or no

If a threshold for the index test was used was it predetermined prior to the study?

Could the conduct or interpretation of the index test have introduced bias?                        

Risk of bias assessed as ‘low’ if signalling questions 1-3 are answered ‘yes.’

Risk assessed as ‘high’ if any one of the signalling questions 1-3 are answered ‘no.’

Risk assessed as ‘unclear’ if insufficient information if reported to answer signalling questions 1-3.

 

B. Concerns regarding applicability

 

Is there concern that the index test, its conduct, or interpretation differ from the review question?

                                                                                                                                                        

Two issues influence our assessment concerning applicability in relation to the index test:

1) Is the index test described in sufficient detail to permit its replication?

Answer ‘yes’ when the following are reported:

a) use transabdominal and/or transvaginal approach

b) use of graded compression and/or lumbar manual compression

c) region included in the scan (entire abdomen and pelvis v right iliac fossa only)

d) probe type used – linear or curved

e) type of scanning (conventional grayscale, pulsed, colour or power Doppler)

Answer ‘no’ if one or more of the details listed above are not described

Answer ‘unclear’ if insufficient information is available to answer ‘yes’ or ‘no.’

 

2) Was the analysis of the US images performed by the radiologist on-call?

Answer ‘yes’ if the analysis is based on the initial assessment of the US images by the radiologist on-call

Answer ‘no’ if the analysis is based on retrospective reassessment of the US images by a senior radiologist / consensus panel, for example

Answer ‘unclear’ if insufficient information is available to answer ‘yes’ or ‘no.’

 

Concern regarding applicability in relation to the index test will be assessed as ‘low’ if questions 1 and 2 are answered ‘yes.’ Concern will be assessed as ‘high’ if questions 1 or 2 are answered ‘no.’ Concern will be assessed as ‘unclear’ if insufficient information is reported to answer questions 1 or 2.

DOMAIN 3: REFERENCE STANDARD

A. Risk of Bias

1. Is the reference standard likely to correctly classify the target condition?                                           

Classify as ‘yes’ if the following three conditions are met:

a) the diagnosis of appendicitis is based on the finding of an inflamed appendix during laparoscopy or laparotomy (swollen or infused appendix in presence of supportive features (peritoneal exudates ± inflammatory adhesions ± turbid peritoneal fluid).

b)  the diagnosis of appendicitis is based on the histological examination of the removed appendix.

c) the diagnosis of absence of appendicitis is based on the judgement of the surgeon or on clinical follow-up (duration of follow-up at least 30 days). A clinical examination or a phone call from a health professional with standardised questions to confirm recovery will qualify as clinical follow-up.

Classify as ‘no’ if the diagnosis of appendicitis (or its absence) is not based on the conditions stated above.

Classify as ‘unclear’ if insufficient information is available to classify this item as yes or no                 

 

2) Were the reference standard results interpreted without knowledge of the results of the index test?

 

Answer ‘yes’ if the following three conditions are met:

a) If the diagnosis is made based on intra-operative visualisation only, then the surgeons are blinded to the US result

b) the pathologist examining the removed specimen is blinded to the US result

c) researchers responsible for clinical follow-up are blinded to the US result

Answer ‘no’ if one of the conditions stated above is not met.

Answer ‘unclear’ if insufficient information is available to answer ‘yes’ or ‘no.’                                      

Could the reference standard, its conduct, or its interpretation have introduced bias? 

Risk of bias related to the reference standard will be assessed as ‘low’ when signalling questions 1 and 2 are answered ‘yes.’

Risk will be assessed as ‘high’ when signalling questions 1 and 2 are answered ‘no.’

Risk will be assessed as ‘unclear’ when insufficient information is reported to answer signalling questions 1 and 2. 

 

B. Concerns Regarding Applicability

 

Is there concern that the target condition as defined by the reference standard does not match the review question?

            

DOMAIN 4: FLOW AND TIMING

A. Risk of Bias

1) Did all patients receive the reference standard?                                                                                         

Answer ‘yes’ if at least 95% of included patients had surgery with histological assessment of the removed appendix, macroscopic intraoperative assessment or clinical follow-up.

Answer ‘no’ if less than 95% of included patients had surgery with histological assessment of the removed appendix, macroscopic intraoperative assessment or clinical follow-up.

Answer ‘unclear’ if insufficient information is available to answer ‘yes’ or ‘no.’

 

2) Did all the patients receive the same (appropriate) reference standard?                                                                        

Answer ‘yes’ if one of the following conditions is met:

a) At least 90% of included patients had surgery with histological assessment of the removed appendix, or macroscopic intraoperative assessment with clinical follow-up where the appendix was not removed

b) At least 90% of patients with negative ultrasound had clinical follow-up

Answer ‘no’ if neither of the conditions is met.

Answer ‘unclear’ if insufficient information is available to answer ‘yes’ or ‘no.’

 

3) Did all patients with an US positive for acute appendicitis undergo surgery?                                        

Answer ‘yes’ if all patients with an US positive for acute appendicitis underwent surgery.

Answer ‘no’ if some patients with an US positive for acute appendicitis underwent clinical follow-up

Answer ‘unclear’ if insufficient information is available to answer ‘yes’ or ‘no.’

 

4) Did all patients with a negative US undergo clinical follow-up?                                                               

Answer ‘yes’ if all patients with a negative US underwent clinical follow-up.

Answer ‘no’ if some patients with a negative US underwent surgery.

Answer ‘unclear’ if insufficient information is available to answer ‘yes’ or ‘no.’

 

5) Was the choice of the reference standard independent of the result of the index test?                    

Answer ‘yes’ if the surgeon deciding on surgery or clinical follow-up was blinded to the US result.

Answer ‘no’ if the surgeon deciding on surgery or clinical follow-up was aware of the US result.

Answer ‘unclear’ if insufficient information is available to answer ‘yes’ or ‘no.’

 

6) Were all patients included in the analysis?                                                                                                    

Answer ‘yes’ if the analysis encompasses all included patients. Also answer ‘yes’ 5% or less are excluded from the analysis because of no reference standard (see question 1)

Answer ‘no’ if the above requirement is not met.

Answer ‘unclear’ if insufficient information is available to answer ‘yes’ or ‘no.’

 

7) Was there an appropriate time interval between the index test and the reference standard?          

An appropriate time interval between US and surgery is unclear. Dependent on the results of the US +/- clinical findings, a patient will either undergo exploratory surgery +/- appendicectomy or undergo a period of observation, which may then follow by exploratory surgery +/- appendicectomy if the clinical picture deteriorates or a period of clinical follow-up if it is felt that the patient does not have appendicitis. Although arbitrary, it is reasonable to suggest that if a patient undergoes clinical follow-up than this should take place within 7-31 days from discharge, thereby avoiding any instances where appendicitis is overlooked due to too short a follow-up period, or where new cases of appendicitis may be mistaken for the index-case if the follow-up interval is too long. Although a time interval is not relevant where an US is positive for appendicitis, as expedient surgery is indicated in this instance, It appears reasonable that, for patients who undergo surgery following a negative US, that this is carried out within 72 hours of the US in order to reflect clinical practice.

Could the flow and timing of the study have introduced bias?                                                

Risk of bias related to flow and timing is assessed as ‘low’ when signalling questions 1,2,6 and 7 are answered ‘yes.’ Risk is assessed as ‘high’ when signalling questions 1,2,6 and 7 are answered ‘no.’

Risk will be assessed as ‘unclear’ when insufficient information is reported to answer signalling questions 1 and 2.

 

Contributions of authors

Rick Nelson and Jonathan Wild collaborated in the conception of the study purpose and in the study design. Jonathan Wild wrote drafts of the protocol in collaboration with Bo Rud, Nicole Abdul and Judith Ritchie. Sally Freers provided statistical guidance. The final protocol was approved by all authors.

Declarations of interest

The authors have no conflicts of interests

Ancillary