Low-back pain (LBP) is a common cause of disability and one of the main reasons for healthcare expenditure around the world, especially in high-income countries. While up to 70% of people will experience at least one episode of LBP in their lifetime (Koes 2006), no specific pathology can be identified in up to 85% of patients (Deyo 1992). The difficulty in providing a definitive diagnosis has given rise to the term "non-specific LBP", which is generally considered to be benign and can be managed in a primary care setting (Koes 2010). However, a small proportion of patients present with LBP as the initial manifestation of a more serious pathology, such as spinal malignancy, vertebral fracture, infection, or cauda equina syndrome. The prevalence of these serious spinal pathologies has been estimated to be between 1% and 5% of all primary care patients with LBP (Deyo 1992; Henschke 2009).
The identification of serious pathologies is one of the primary purposes of the clinical assessment of patients with LBP and clinical guidelines recommend awareness of "red flags" as the ideal method to accomplish this purpose (Koes 2010). "Red flags" are features from the patient's clinical history and physical examination which are thought to be associated with a higher risk of serious pathology. The presence of a "red flag" should alert clinicians to the need for further examination and in most cases, specific management (Waddell 2004). As most clinical guidelines explicitly recommend against the use of routine diagnostic imaging for patients with LBP, it is important to determine whether "red flags" can be used to aid a clinician's judgment when screening for spinal malignancy.
Target condition being diagnosed
In this review we focus on red flags for spinal malignancies. Spinal malignancies are, after vertebral fracture, the most common serious pathologies affecting the spine and are estimated to be present in around 1% of primary care patients presenting with LBP (Deyo 1992; Henschke 2009). However, given the prevalent nature of LBP, the number of patients presenting to primary care with spinal malignancy is substantial and there exists a need for effective diagnostic strategies.
The spine is much more frequently affected by metastatic disease than it is the site of primary tumours. Approximately 10% of all malignancies have symptomatic spine involvement as the initial manifestation of the disease, including multiple myeloma, non-Hodgkin's lymphoma, and carcinoma of the lung, breast, and prostate (Sciubba 2006). Early detection and treatment of spinal malignancies are important to prevent further spread of metastatic disease and the development of complications such as vertebral fracture and spinal cord compression (Loblaw 2005). The consequences of a late or missed diagnosis of spinal malignancy necessitate the use of accurate screening tools, specifically for patients presenting with LBP. Ideally, clinicians should be able to identify the small number of patients with a higher likelihood of spinal malignancy at an early stage without subjecting a large proportion of their patients with LBP to unnecessary diagnostic testing.
Clearly, the prevalence of spinal malignancy is insufficient to warrant imaging studies or laboratory tests on all patients. As a first step in identifying spinal malignancy, clinical practice guidelines generally recommend assessing for the following "red flags": a previous history of cancer, unexplained weight loss, or age greater than 50 years (Deyo 1992). However, there are few empirical data on the accuracy of these features and most clinical features considered to be "red flags" for malignancy are derived from one study (Deyo 1988). The inclusion of these features in the guidelines has often been poorly justified by reference to previous guidelines (van Tulder 2004) and unpublished data (Bigos 1994). Despite their inclusion in the guidelines, the usefulness of screening for "red flags" for malignancy in patients with LBP continues to be debated (Underwood 2009) and there remains very little information on their diagnostic accuracy and how best to use them in clinical practice.
In 2007, we published a systematic review of six studies that evaluated a total of 22 clinical features used to screen patients with LBP for malignancy (Henschke 2007). The review found that four clinical features (used in isolation) were useful to raise the probability of malignancy: a previous history of cancer (positive likelihood ratio (LR+) = 23.7), elevated erythrocyte sedimentation rate (ESR) (LR+ = 18.0), reduced haematocrit (LR+ = 18.2), and overall clinician judgment (LR+ = 12.1) (Henschke 2007). The review also noted that the available studies were generally of poor quality, according to the criteria of the QUality Assessment of Diagnostic Accuracy Studies (QUADAS) checklist, and very few studies were carried out in the primary care setting, where "red flags" could potentially be of most benefit. This systematic review also included results from laboratory tests and clinician judgment as "red flags" for malignancy. These laboratory tests and an overall clinician judgment are subject to referral filter and incorporation biases as they are only performed if indicated (or containing features) from the clinical history or physical examination.
In the absence of accurate information about the diagnostic accuracy of "red flags", clinicians are left with the prospect of routine diagnostic imaging of all patients with LBP to exclude spinal malignancy. Diagnostic imaging of spinal malignancy can include plain radiography, nuclear scintigraphy (or bone scanning), computed tomography (CT), and magnetic resonance imaging (MRI) (Jarvik 2002; Joines 2001; Sciubba 2006).
Due to availability and low cost, plain radiographs have usually served as an initial screening test for spinal malignancy by revealing lytic or sclerotic areas of bone, pathologic compression fractures, deformity, and paraspinal masses. The major proportion of spinal metastatic lesions are osteolytic, but up to 50% of the bone must be eroded before there is a noticeable change on plain radiographs (Sciubba 2006). Nuclear scintigraphy or bone scanning is sensitive for identifying increased metabolic activity throughout the entire skeletal system, and finds cancer at an earlier stage than plain radiography. However, the poor image resolution and low specificity of both plain radiographs and nuclear scintigraphy requires correlation with CT or MRI to exclude benign processes (Sciubba 2006).
Magnetic resonance imaging is considered the gold standard imaging modality for assessing spinal metastatic disease. It has a reported sensitivity of between 83% and 93% and specificity between 90% and 97% (when compared to autopsy or surgery) for detecting spinal malignancy (Joines 2001). Such high sensitivity is due to the fact that MRI gives superior resolution of soft-tissue structures. Moreover, MRI provides clarity at the bone-soft tissue interface, yielding accurate anatomic detail of bony compression or invasion of neural and paraspinal structures. The MRI protocol should include T1- (which highlight fat deposition) and T2- (which highlight liquid) weighted images and contrast-enhanced studies, that provide axial, sagittal, and coronal reconstructions (Joines 2001; Sciubba 2006).
In light of recently published, pertinent primary diagnostic studies (Henschke 2009) and evolving guidance for the most appropriate methods to systematically review studies of diagnostic test accuracy (Deeks 2009), we decided to update our previous systematic review using the methods recommended by the Cochrane Diagnostic Test Accuracy (DTA) Working Group. The protocol for this review was largely based upon the first DTA review published within the Cochrane Back Review Group (CBRG) (van der Windt 2010). In order to assess the diagnostic accuracy of "red flags" to identify the most common serious spinal pathologies presenting as LBP, this review will be performed concurrently with another Cochrane review on the diagnostic test accuracy of "red flags" for vertebral fracture (Henschke 2010).
The objective of this systematic review is to assess the diagnostic performance of clinical characteristics ("red flags") identified by taking a clinical history and conducting a physical examination to screen for spinal malignancy in patients presenting with LBP, as assessed by diagnostic imaging. This information may assist clinicians to make decisions about appropriate management in patients with LBP.
Investigation of sources of heterogeneity
The secondary objective of this review is to assess the influence of sources of heterogeneity on the diagnostic accuracy of "red flags" for spinal malignancy. We aim to examine the influence of the healthcare setting (e.g. primary or secondary care), the study design (e.g. consecutive series or case-control), and aspects of study quality as reflected in the assessment of the items of the QUADAS checklist.
Criteria for considering studies for this review
Types of studies
Primary diagnostic studies were considered if they compared the results of taking a history and completing a physical examination for the identification of spinal malignancy in patients with LBP, with those of a reference standard. The main focus of the review was on studies using a cross-sectional or prospective design which present sufficient data to allow calculation of estimates of diagnostic accuracy (such as sensitivity and specificity), which are reported in full publications. Case-control studies were also considered if insufficient primary diagnostic studies were identified. If studies were reported in abstracts or conference proceedings, we retrieved the full publications where possible. Studies published in all languages were included in this review. Where necessary, appropriate translation of potentially eligible articles was sought.
Studies were included if they evaluated adult patients who presented to primary or secondary care settings for treatment of LBP or for lumbar spine examination. Longitudinal studies in which more than 10% of recruited patients had already been diagnosed with spinal malignancy as the likely cause of their LBP were excluded. This proportion was chosen based on a consensus among the review team, in an attempt to minimise referral bias.
Studies evaluating any aspects of the history taking or physical examination of patients with LBP were eligible for inclusion. This included demographic characteristics (e.g. age, gender), the clinical history (e.g. pain intensity or a previous history of cancer), and results of the physical examination (e.g. tenderness/pain on palpation, lumbar range of motion, or muscle strength). Studies were included if the diagnostic accuracy of the individual "red flags" were evaluated in isolation, or as part of a combination. Studies in which only a "clinical diagnosis" or "global clinician judgment" (without specifying which diagnostic tools were used) were compared with a reference standard were excluded from this review. An undefined clinical judgment represents an individual clinician's diagnostic ability, rather than providing useful data on clearly defined patient characteristics.
All studies that reported results of the history taking or physical examination in detecting spinal malignancy in patients who presented for management of LBP were included. Where possible, we described separate results for primary tumours and secondary metastases.
Studies were included if "red flags" were compared with diagnostic imaging procedures such as plain radiographs, computed tomography (CT), magnetic resonance imaging (MRI), and bone scans to confirm the presence of cancer or malignancy in the spine. Long-term (> six months) follow-up of patients after the initial consultation was also considered an appropriate reference standard, if suspected cases of malignancy were confirmed by medical records or specialist review.
Search methods for identification of studies
The search strategy to be used was developed in collaboration with a medical information specialist. Relevant computerised databases were searched for eligible diagnostic studies from the earliest year possible until 1 April 2012, including MEDLINE (PubMed), OLDMEDLINE (PubMed), EMBASE (embase.com), and CINAHL (Ebsco). The search strategy for MEDLINE is presented in Appendix 1 and was adapted for EMBASE (Appendix 2) and CINAHL (Appendix 3). A previous systematic review on the diagnostic performances of "red flags" for spinal malignancy was used as a point of reference (Henschke 2007). All publications included in that review are indexed in MEDLINE, so the current search strategy was refined until all publications from the previous review were identified by the search. The strategy uses several combinations of searches related to the patient population, history taking, physical examination, and the target condition.
Searching other resources
The reference lists of all included publications were checked and all included studies were subjected to a forward citation search using Science Citation Index. A further electronic search was composed to identify relevant (systematic) reviews in MEDLINE and Medion (www.mediondatabase.nl), from which reference lists were checked. In addition, we contacted experts in the field of LBP research to identify diagnostic studies missed by the search strategy.
Data collection and analysis
Selection of studies
The selection criteria and the QUADAS checklist were first piloted on selected diagnostic studies to ensure consistency among the review team. Two review authors (NH and RO) then independently applied the selection criteria to all citations (titles and abstracts) identified by the search strategy described above. Consensus meetings were organised to discuss any disagreement regarding selection. Final selection was based on a review of full publications, which were retrieved for all studies that either met the selection criteria, or for which there was uncertainty regarding selection. The other review authors were consulted in cases of persisting disagreement.
Data extraction and management
A data extraction form was specifically designed to collect details from included studies. For each study, the characteristics of participants, index tests, reference standards, and study methods were recorded and presented in tables.
Characteristics of participants (and studies) included details on the setting (location, type of clinic); inclusion and exclusion criteria; enrolment procedures (consecutive or non-consecutive); number of participants (including number eligible for the study, number enrolled in the study, number receiving the index test and reference standard, number for whom results are reported in the two-by-two table); reasons for withdrawal; patient demographics (age, gender); and duration and history of LBP.
Test characteristics included the type of index test; methods of execution; experience and expertise of the assessors; type of reference standard; and where relevant, cut-off points for diagnosing malignancy.
Aspects of study methods were reflected in the quality assessment criteria (Appendix 4).
Data for diagnostic two-by-two tables (true positive, false positive, true negative, and false negative numbers) were extracted from the publications or reconstructed using information from other relevant parameters (sensitivity, specificity, or predictive values). Two review authors (NH and RO) independently extracted the data to ensure adequate reliability of collected data. Where a review author was also an author of one of the primary diagnostic studies, they were not involved in the data extraction or quality rating of this study.
Assessment of methodological quality
The methodological quality of each study was assessed by two review authors (NH and RO) using the QUADAS checklist (Whiting 2003). The Cochrane Diagnostic Test Accuracy Working Group recommends assessment of 11 QUADAS items that refer to internal validity (e.g. blind assessment of index and reference test, or avoidance of verification bias) (Appendix 4; Deeks 2009).
The review authors classified each item as "yes" (adequately addressed); "no" (inadequately addressed); or "unclear" (inadequate detail presented to allow a judgment to be made). Guidelines for the assessment of each item were made available to the review authors (Appendix 4). Disagreements were resolved by discussion and if necessary, by consulting a third review author (CGM).
The 11 items of the QUADAS checklist were considered individually for each study, without the application of weights or the use of a summary score to select studies with certain levels of quality in the analysis. Where possible, the influence of negative or unclear classification of important items were explored as potential sources of heterogeneity. The following items were considered for these analyses as they have been shown to affect diagnostic performance in previous research (van der Windt 2010): item one (spectrum variation / selective sample), item two (adequate reference standard), item four (verification bias), item five (same reference standard), items seven and eight (blinded interpretation of index test and reference standard), and item 11 (explanation of withdrawals).
Statistical analysis and data synthesis
Indices of diagnostic performance were extracted or derived from data presented in each primary study for each "red flag" or combination of "red flags". Diagnostic 2x2 tables were generated, from which sensitivities and specificities for each index test with 95% confidence intervals (95% CI) were calculated and presented in forest plots. Positive and negative likelihood ratios with 95% CIs were also calculated for each index test.
Pooling of sensitivity and specificity results was intended if studies showed sufficient clinical homogeneity (e.g. same index test, similar definition of malignancy). However, due to the limited number of eligible studies as well as heterogeneity in the design and setting within those studies evaluating the same index test, pooling of diagnostic accuracy data was not performed. A descriptive analysis of the results, including the prevalence of spinal malignancy in the study populations along with measures of diagnostic performance is presented.
Investigations of heterogeneity
The potential influence of the healthcare setting, the study design, and aspects of study quality from the QUADAS checklist on estimates of diagnostic accuracy, can only be investigated if a sufficiently large number of studies report on the same index test and provide adequate information on the factor of interest. This was not the case in the current review, as the number of studies investigating each test was too small to allow investigation of sources of heterogeneity.
Results of the search
The electronic search of the MEDLINE, CINAHL and EMBASE databases resulted in 2082 unique titles. After screening of titles and abstracts, full text copies of 66 articles were retrieved. Apart from the systematic review used as a point of reference for this search (Henschke 2007), which included six primary studies, we were unable to identify any other systematic reviews on this topic. After reviewing the full text of the 66 selected articles, both review authors (NH, RO) agreed on the inclusion of eight studies (Figure 1). Only two case-control studies were identified, which were excluded because of poor methodology (Characteristics of excluded studies).
|Figure 1. Flow diagram of search strategy|
The reference lists of these eight studies were checked and forward citation searching was performed, but this did not result in any further eligible studies. Details on the design, setting, population, reference standard and definition of the target condition are provided in the Characteristics of included studies table. Of the eight included studies, six were performed in a primary care setting (Deyo 1986; Deyo 1988; Donner-Banzhoff 2006; Frazier 1989; Henschke 2009; Khoo 2003), one was performed in an accident and emergency department (Reinus 1998), and one was performed in a secondary care setting (Jacobson 1997). Six studies used a prospective design (Deyo 1986; Deyo 1988; Donner-Banzhoff 2006; Henschke 2009; Khoo 2003; Reinus 1998) and two studies collected information from medical records (Frazier 1989; Jacobson 1997). Five of the included studies were on a cohort of patients presenting with LBP (Deyo 1986; Deyo 1988; Donner-Banzhoff 2006; Frazier 1989; Henschke 2009), while three studies evaluated the diagnostic yield of imaging tests of the lumbar spine (Jacobson 1997; Khoo 2003; Reinus 1998).
The six studies conducted in primary care had a total sample size of 6622 patients, and the observed prevalence of spinal malignancy (21 cases) in the primary care studies ranged from 0% (Henschke 2009) to 0.66% (Deyo 1988). The primary diagnostic study by Henschke 2009 did not identify any cases of malignancy in 1172 consecutive cases of LBP, so sensitivity of the index tests could not be estimated for this study. In the accident and emergency setting (n = 482), the prevalence was reported as 1.45% (Reinus 1998) and in secondary care (n = 257) the prevalence was 7% (Jacobson 1997).
The reference standards used in the included studies were either diagnostic imaging (Deyo 1986; Khoo 2003; Reinus 1998; Jacobson 1997), long-term follow-up (Donner-Banzhoff 2006; Henschke 2009), or a combination of both (Deyo 1988; Frazier 1989). All studies evaluated individual tests from the clinical history or physical examination. No studies provided data on a combination of tests to screen for spinal malignancy.
Methodological quality of included studies
The results of the methodological quality assessment are shown in Figure 2. Most of the included studies were performed on a representative spectrum of patients (87.5%), avoided incorporation of the index tests in the reference standard (62.5%), and performed the index test in a blinded manner (62.5%). Only one study (Henschke 2009) provided adequate reporting of uninterpretable test results and explained withdrawals from the study. There was poor reporting of the time delay between the index tests and reference standard and whether the reference standard was blinded. Overall, three of the eight included studies (Donner-Banzhoff 2006; Henschke 2009; Reinus 1998) fulfilled six or more of the 11 methodological quality items.
|Figure 2. Methodological quality summary: review authors' judgements about each methodological quality item for each included study.|
The heterogeneity between the studies identified by the review meant statistical pooling of diagnostic accuracy data was not warranted. A descriptive analysis was performed from extracted data (2x2 tables) and sensitivity and specificity for all index tests. In total, data from 20 index tests (including two cut-offs for age) from the clinical history and physical examination were extracted. Of these, only seven were evaluated by more than one study and only two were evaluated by more than two studies.
Only one study (Deyo 1988) discussed the diagnostic accuracy of a combination of index tests. This study reported in the discussion section that a combination of age greater than 50 years, history of cancer, unexplained weight loss, or failure to improve with conservative therapy had a sensitivity of 100% for detecting malignancy. No further data on this combination of tests were provided.
From seven of the included studies, 15 index tests derived from the clinical history were evaluated. Six of these tests were evaluated by more than one study. The most common index test was older age, with a cut-off at greater than 50 years being evaluated by five studies (Deyo 1986; Deyo 1988; Frazier 1989; Henschke 2009; Jacobson 1997). Within the four primary care studies (Deyo 1986; Deyo 1988; Frazier 1989; Henschke 2009), the specificity (95% CI) of this test ranged from 0.66 (0.63 to 0.69) to 0.74 (0.70 to 0.78), the sensitivity ranged from 0.50 (0.01 to 0.99) to 0.77 (0.46 to 0.95), and the positive likelihood ratio (LR+) ranged from 1.92 to 2.65 (Figure 3). Of the remaining index tests from the clinical history, a previous history of cancer (three studies), no improvement in pain after one month (two studies), and unexplained weight loss (two studies) appeared to have high specificity across studies. Having an insidious onset of pain (two studies) or trying bed rest with no relief (two studies) had more inconsistent specificity across studies.
|Figure 3. Forest plot of sensitivity and specificities for: Age > 50 and Neurological symptoms.|
In the primary care setting, the post-test probability following a positive red flag from the clinical history remained below 1% in most cases (Summary of findings). Unexplained weight loss (post-test probability 1.2%) and a previous history of cancer (post-test probability 4.6%) were the only exceptions. In the accident and emergency setting, a previous history of cancer had a LR+ of 31.67 (Reinus 1998).
Three included studies evaluated aspects of the physical examination (Deyo 1988; Henschke 2009; Khoo 2003). Of the five index tests, only neurological symptoms (two studies) were evaluated by more than one study. The other four index tests were altered sensation from the trunk down, fever (temp > 100
Summary of findings
Summary of main results
This review aimed to summarise evidence for the accuracy of "red flags" to screen for malignancy in patients with low-back pain (LBP). An important finding is the low prevalence reported in the included studies, with less than 1% of patients presenting to primary care with LBP being diagnosed with spinal malignancy. The results show that diagnostic performance of most "red flags" (clinical history and physical examination tests) is poor, especially when used in isolation. The exception was a previous history of cancer which had a sufficiently high positive likelihood ratio (LR+) to meaningfully increase the probability of malignancy. Only seven out of the 20 "red flags" were evaluated by more than one study. This means that there is insufficient evidence to support or refute the clinical usefulness of most "red flags" to screen for spinal malignancy in patients with LBP. There were very limited possibilities to study the influences of sources of heterogeneity in this review. Apart from the small number of studies per index test, studies did not always provide sufficient information about important study characteristics.
Factors affecting interpretation
Population and setting
The primary care setting plays a vital role in early detection of serious disease and it is there that reliable and accurate diagnostic information is needed. Most of the included studies were carried out in a primary care setting using a prospective design, evaluating "red flags" only once, at the initial consultation. However, persons presenting for a second, third, or subsequent consultation because of pain that is not resolving may not have been evaluated by the included studies. Spinal malignancy can develop in patients with established LBP and thus cannot be disregarded irrespective of the duration of LBP. Three included studies were also performed on a cohort of patients referred for diagnostic imaging of the lumbar spine, rather than on a consecutive series of patients presenting with LBP. This will likely overestimate the diagnostic accuracy results of the "red flags", as patients with LBP who are not referred for imaging will be automatically excluded.
The most common reference standard used was long-term (six to 12 months) and complete follow-up of patients. It is assumed in these cases that any spinal malignancy would manifest over time and be identified without the need for all patients to undergo diagnostic imaging. However, the use of follow-up may result in missed cases of serious disease if the follow-up consists of reviewing medical records or tumour registries (Deyo 1988), as patients may seek care elsewhere. There is also a possibility that spinal malignancy could develop subsequent to the initial consultation for non-specific LBP. Despite considering studies from all settings, only two studies were identified from the accident and emergency or secondary care setting. While MRI is generally considered the "gold standard" for diagnosing spinal malignancy, no studies utilised this form of imaging as the reference standard for all patients.
Using "red flags" to screen for serious pathologies in patients with LBP would ideally involve identifying features which, when present, raise the index of suspicion of having the disease to a level that would suggest further diagnostic work-up. Of the four red flags endorsed in the recent American Pain Society guideline (Chou 2007) to indicate a higher likelihood of malignancy (unexplained weight loss, age > 50, failure to improve after one month, previous history of cancer) only a previous history of cancer increased the post-test probability of malignancy beyond 2%. The other three red flags, used in isolation, have modest LR+ and in the case of older age and failure to improve after one month, have substantial false positive rates which argues against their recommended use in clinical practice. Some red flags (e.g. thoracic pain, severe pain, insidious onset) have both LR+ and LR- that are close to 1, suggesting that these red flags are of no value in either increasing or decreasing the likelihood of malignancy. The large number of patients with false positive "red flag" symptoms is of concern, as the presence of a "red flag" will not help the clinician in deciding whether any further investigation or treatment is needed.
In the primary care setting, screening to exclude patients who do not have malignancy is often more appropriate than identifying the few cases of malignancy. While some red flags have been endorsed because they have a very low LR- and so help to reduce the likelihood of malignancy, it needs to be borne in mind that the prevalence of malignancy in primary care patients with LBP is very low. The starting position is that malignancy is unlikely and with a negative test result malignancy becomes highly unlikely. A negative response to these tests would only change clinical management for clinicians who would order a diagnostic work-up when the probability of malignancy is around 1%.
The low prevalence of spinal malignancy in patients with LBP makes it difficult to develop screening tools which are both easy to apply and accurate. Clinical guidelines usually suggest individual "red flags" and leave their interpretation up to the clinician (Koes 2010). A more effective screening tool could be recommended if data were available on how to use these "red flags" in combination with each other. When a number of positive "red flags" is used in combination, the LR+ would most likely be increased. This also becomes a more accurate reflection of what takes place in clinical practice. Additionally, as the spine is more frequently the site of metastatic disease than primary tumours, "red flags" may become more useful where the target population is not all patients seeking care for LBP but those with LBP and (for example) a history of cancer. As an example, an insidious onset of LBP in a patient aged over 50 years, with no prior history of LBP but a history of cancer, may indicate a higher likelihood of malignancy. Ideally, an effective series of "red flag" questions for spinal malignancy would highlight pertinent characteristics from the patient’s history and physical examination, and allow the clinician to forego invasive and potentially harmful tests, to identify all patients who require further assessment.
Strengths and weaknesses of the review
Despite employing a sensitive electronic search strategy, very few eligible studies were available. Poor reporting in the original publications affected the assessment of methodological quality (risk of bias) and was one of the main reasons for scoring "unclear" on some QUADAS items. Most studies were not specifically designed as diagnostic accuracy studies and so provided little information on important aspects of study design. The introduction and implementation of the STARD guidelines may improve reporting of diagnostic studies in the future (Bossuyt 2003; Smidt 2006). Assessment of quality in the current review was facilitated by defining clear guidelines for review authors on how to score individual items (Appendix 4).
Applicability of findings to the review question
Clinical practice guidelines for the management of LBP typically recommend that at the initial assessment, the need for further diagnostic work-up for those suspected of having an underlying serious disorder (e.g. fracture, spinal malignancy) should be guided by the presence of a number of "red flag" questions (Koes 2010). The objective of this review was to provide researchers and clinicians with a clearer definition of which "red flags", and in what combination, are useful to screen for spinal malignancy, and identify in which situations it is appropriate to use them in the management of LBP. However, the strength of our recommendations is limited by the small number of studies identified on this topic. Equally important is the fact that most studies only presented the diagnostic value of individual "red flags". Our review shows that when carried out in isolation, the diagnostic performance of most tests (with the exception of a previous history of cancer) is poor. It is arguable that in clinical practice the combination of several elements of diagnostic information will contribute to estimating the likelihood of serious pathology such as malignancy.
Implications for practice
Commonly suggested "red flags" for malignancy in clinical practice guidelines are: age > 50 years, no improvement in symptoms after one month, insidious onset, a previous history of cancer, no relief with bed rest, unexplained weight loss, fever, thoracic pain, or being systematically unwell (Koes 2010). These "red flags" are usually elicited through the initial assessment (history taking and physical examination), to decide which patients should be referred for imaging or specialist consultation. The limited evidence available suggests that only one "red flag" when used in isolation, a previous history of cancer, meaningfully increases the likelihood of cancer. "Red flags" such as insidious onset, age > 50, and failure to improve after one month have high false positive rates suggesting that uncritical use of these "red flags" as a trigger to order further investigations will lead to unnecessary investigations that are themselves harmful, through unnecessary radiation and the consequences of these investigations themselves producing false-positive results. While the lack of evidence to support or refute the use of "red flags" is recognised, a more pragmatic solution is to consider the possibility of spinal malignancy (in light of its low prevalence in primary care) when a combination of recommended "red flags" are found to be positive.
Implications for research
There is a need for good quality diagnostic studies of clinical tests in patients with LBP. For the identification of serious spinal pathologies, these studies should evaluate the performance of combinations of "red flags" in order to derive a diagnostic algorithm based on patient history and physical examination. The performance of such diagnostic models can be tested against appropriate reference standards in a consecutive series of patients with LBP. Appropriate standards for reporting of primary diagnostic studies should be followed and clear definitions should be given for positive results of both index tests and reference standard outcome. Due to the low prevalence of malignancy in primary care patients with LBP, further studies will need to be very large in order to have sufficient statistical power to produce precise estimates of the sensitivity and specificity of "red flags". Potentially, the quality of the evidence around diagnostic tests for such a rare condition could be improved through the use of well designed case-control studies or mathematical modelling to identify appropriate diagnostic strategies.
We would like to thank Danielle van der Windt for her assistance in the development of the protocol.
- Top of page
- Authors' conclusions
- Contributions of authors
- Declarations of interest
- Sources of support
- Differences between protocol and review
- Index terms
Presented below are all the data for all of the tests entered into the review.
Appendix 1. MEDLINE search strategy
1. Index test: clinical red flags
"Medical History Taking"[mesh] OR history[tw] OR "red flag"[tw] OR "red flags" OR Physical examination[mesh] OR "physical examination"[tw] OR "function test"[tw] OR "physical test"[tw] OR ((clinical[tw] OR clinically[tw]) AND (diagnosis[tw] OR sign[tw] OR signs[tw] OR significance[tw] OR symptom*[tw] OR parameter*[tw] OR assessment[tw] OR finding*[tw] OR evaluat*[tw] OR indication*[tw] OR examination*[tw]) OR (ra[sh] OR ri[sh]))
2. Population: low-back pain and anatomical location
(back pain[mesh] OR sciatica[mesh] OR "back ache"[tw] OR backache[tw] OR "back pain"[tw] OR dorsalgia[tw] OR lumbago[tw] OR sciatica[tw] OR Pain[mesh] OR pain[tw] OR ache*[tw] OR aching[tw] OR complaint*[tw] OR dysfunction*[tw] OR disabil*[tw] OR neuralgia[tw]) AND (Back[mesh] OR spine[mesh] OR back[ti] OR lowback[tw] OR lumbar[tw] OR lumba*[tw] OR lumbo*[tw] OR sciatic*[tw] OR ischia*[tw] OR sacroilia*[tw] OR spine[tw] OR spinal[tw] OR radicular[tw] OR "nerve root"[tw] OR "nerve roots"[tw] OR disk[tw] OR disc[tw] OR disks[tw] OR discs[tw] OR vertebra*[tw] OR intervertebra*[tw] OR Sacroiliac-joint[mesh] OR Lumbar vertebrae[mesh])
3. Target condition: spinal malignancy
cancer*[tw] OR tumor*[tw] OR tumour*[tw] OR carcinoma*[tw] OR sarcoma*[tw] OR neoplasm*[tw] OR Neoplasms[mesh] OR adenocarcinoma*[tw] OR metastasis*[tw] OR polyp*[tw] OR Cancer Screening[mesh] OR malignan*[tw]
4. Exclusion criteria: children, case reports, animal studies
(exp Child [mesh] OR exp Infant [mesh]) NOT ((exp Child [mesh] OR exp Infant [mesh]) AND (exp Adult [mesh] OR Adolescent [mesh])) OR (Animals [mesh] NOT (Animals [mesh] AND Humans [mesh])) OR “case report”[ti]
1 AND 2 AND 3 NOT 4
Appendix 2. EMBASE search strategy
1. Index test: clinical red flags
'medical history taking'/exp OR 'history'/de OR history OR 'red flag' OR 'red flags' OR 'physical examination'/exp OR 'physical examination' OR 'function test'/de OR 'function test' OR 'physical test' OR (clinical OR clinically AND ('diagnosis'/de OR sign OR signs OR significance OR symptom$ OR parameter$ OR assessment OR finding$ OR evaluat$ OR indication$ OR examination$)) OR 'radiography'/exp OR 'radionuclide'/exp AND [humans]/lim
2. Population: low-back pain and anatomical location
back AND 'pain'/exp OR 'back pain' OR 'low back' AND 'pain'/exp OR 'low back pain' OR 'sciatica'/exp OR sciatica OR backache OR coccyx OR coccydynia OR dorsalgia OR 'lumbar pain' OR spondylosis OR lumbago AND [humans]/lim
3. Target condition: spinal malignancy
'cancer$' OR 'tumor$' OR 'tumour$' OR 'carcinoma$' OR 'sarcoma$' OR 'neoplasm$' OR 'neoplasms'/exp OR 'adenocarcinoma$' OR 'metastasis$' OR 'polyp$' OR 'cancer screening'/exp OR 'malignan$'
4. Exclusion criteria: children, case reports, animal studies
'case report' AND [humans]/lim
1 AND 2 AND 3 NOT 4
Appendix 3. CINAHL search strategy
1 Index test: clinical red flags
MH "Patient History Taking" or TX history or TX "red flag" or MM “Physical examination” or TX "physical examination" or TX "physical test" or TX clinical* or MH "Diagnostic Tests, Routine" and (TX diagnosis or TX sign or TX signs or TX significance or TX symptom* or TX parameter* or TX assessment or TX finding* or TX evaluat* or TX indication* or TX examination*)
2. Population: low-back pain and anatomical location
MH "Back Pain" or MH "Low back pain" or TX "back pain" or TX "low back pain" or MM Sciatica or TX sciatica or TX Backache or TX Coccyx or TX Coccydynia or TX Dorsalgia or TX lumbar pain or TX spondylosis or TX lumbago
3. Target condition: malignancy
MH "Neoplams" or MH "Cancer screening" or TX cancer* or TX tumor* or TX tumour* or TX tumour* or TX carcinoma* or TX sarcoma* or TX adenocarcinoma* or TX metastasis* or TX polyp* or TX malignan*
1 and 2 and 3
Appendix 4. Guide to scoring QUADAS Quality Assessment items
Contributions of authors
All review authors contributed to discussions regarding the design of the current study. Nicholas Henschke wrote the first draft of the protocol with help from the other review authors. All review authors read and approved the final manuscript.
Declarations of interest
No conflicts of interest are declared.
Sources of support
- Vrije Universiteit, EMGO+ Institute for Health and Care Research, Netherlands.
- The George Institute for Global Health, Australia.
- National Health & Medical Research Council, Australia.
- Dutch Health Insurance Board, Netherlands.
Differences between protocol and review
Due to the limited number of index tests evaluated in the primary studies and the heterogeneity in study setting, meta-analyses were not performed.
Medical Subject Headings (MeSH)
MeSH check words
Humans; Middle Aged