Dilemmas in the interpretation of diagnostic accuracy studies on presurgical workup for epilepsy surgery

Authors


Address correspondence to Jane Burch, CRD, Alcuin B, University of York, Heslington, York YO10 5DD, U.K. E-mail: jane.burch@york.ac.uk

Summary

We conducted a systematic review to determine which noninvasive technologies should be used in the workup for epilepsy surgery to identify structural or functional abnormalities to help locate the site of seizure onset. The review focused on patients where there was insufficient confidence, in either the decision to go to surgery or the site at which surgery should be conducted, after the initial clinical examination. The majority of the studies identified were single-gate diagnostic accuracy studies; none were randomized controlled trials, and only one reported the effect of the test results on the decision making process. It became apparent that the data derived from diagnostic accuracy studies could not be used to answer the review question. This article focuses on the methods used to extract data from the diagnostic accuracy studies, the difficulties interpreting the resulting data, why such studies are not an appropriate study design in this setting, and how the evidence-base can be improved.

Patients being considered for resective epilepsy surgery will undergo a detailed clinical assessment. This includes taking a detailed history and gaining an accurate description of seizure semiology, a clinical examination, psychometric testing, routine interictal and ictal electroencephalography (EEG) with surface electrodes, and routine magnetic resonance imaging (MRI; T1, T2 with or without fluid-attenuated inversion recovery [FLAIR]). The Wada test, which assesses laterality of memory and language, may also be conducted. These initial stages of the presurgical workup are critical, and can often provide sufficient information on lateralization and localization to identify the epileptogenic zone and allow surgery to proceed (Unnwongse et al., 2010). However, in some cases, an epileptogenic zone may not have been identified, or confidence in the localization may be lacking. It is in these patients that further noninvasive testing is required, the assessment of which is the focus of this article.

In 2006, a Health Technology Assessment (HTA) report on neuroimaging technologies used to identify the seizure focus in patient with refractory epilepsy being considered for surgery (Whiting et al., 2006) identified a lack of effectiveness data, such that the links between the test result, management decisions, and clinical outcomes could not be determined. In 2010, we were commissioned by the National Institute of Health Research (NIHR) HTA Programme to identify new research and update this review. In the context of our systematic review of the diagnostic accuracy and clinical utility of diagnostic technologies used in the workup for epilepsy surgery, we explored the nature of the evidence base for these imaging technologies and the suitability of the study designs employed.

Of the many diagnostic frameworks for the phased evaluation of diagnostic tests (Lijmer et al., 2009), the most comprehensive and well known is that suggested by Fryback & Thornbury (1991). Each level of the framework addresses a different clinical question, with the requirements from the evidence base varying across these levels. In the early stages of test evaluation (level 2), the comparison of the new test (with or without comparator tests) against the currently available best practice can be used to determine diagnostic accuracy, but only where a suitable reference standard is available that is ethical to conduct in all participants regardless of the result of the index test. Subsequent stages evaluate the clinical utility of tests (Fig. 1; levels 3, 4, and 5) and require data on downstream effects of the test results on the decision making process, management strategies, clinical outcomes, and quality of life (Trikalinos et al., 2008).

Figure 1.


The six-level framework for the phased evaluation of diagnostic tests suggested by Fryback and Thornbury (1991).

The International League Against Epilepsy (ILAE) has established a commission on diagnostic methods that aims to provide evidence-based guides to clinical practice for currently used selected diagnostic methods, and an electronic educational resource to train and educate clinicians in the appropriate use of selected diagnostic methods (Gaillard et al., 2011b), in order to improve the quality of data from imaging and neurophysiologic studies. This article focuses on the methodologic limitations of the use of the diagnostic accuracy studies in the evaluation of diagnostic technologies used in the workup for epilepsy surgery identified during a systematic review, discusses the inappropriateness of this study design in this indication, and suggests potential alternative study designs and where these studies meet the ILAE essential criteria. The full methods and synthesis of the clinical and cost-effectiveness evidence are reported elsewhere (Burch et al., 2012).

Reference Standards

There is no gold standard for the identification of the epileptogenic zone in patients with epilepsy. All the reference standards currently available have limitations. Continuous video-EEG using surface electrodes may fail to localize an epileptogenic zone, or may localize it inaccurately (Whiting et al., 2006) and is not considered sufficiently accurate to be a gold standard test (Alving & Beniczky, 2009). Intracranial electrodes have a field of view of only millimeters and electrodes must include the site of the epileptogenic zone to provide accurate localization of the epileptogenic zone (Whiting et al., 2006; Duncan, 2009; Stefan, 2011); sampling bias is a limitation (Carrette et al., 2010), and a relatively high percentage of invasive EEGs are nonlocalizing (Knowlton et al., 2008). In addition, there is a potential for complications (Burneo et al., 2006), limited ability to repeat the test (Stefan, 2011), and its conduct in selected cases can lead to biased estimates of test performance if used as a reference standard (Hunink & Krestin, 2002). Outcome following surgery can be affected by variables unrelated to the accuracy of localization (Gaillard et al., 2011a, Stefan, 2011), including: complications; incomplete resection of the epileptogenic zone; and insufficient follow-up to determine the persistence of the surgical outcome such that seizure outcome cannot reliably verify accuracy even in well-designed studies (Stefan, 2011).

In current clinical practice, the location of an epileptogenic zone is derived using the consensus of a combination of tests. There is no agreement as to which tests should constitute this combination, and the combination of tests used varies considerably across institutions and studies. Concerns with the use of a composite of tests include the subjective interpretation of test results, observer variation, and the potential that the accuracy of individual observers is being measured rather than the overall accuracy of the diagnostic pathway (Warner et al., 2010; Sadatsafavi et al., 2011). Continued testing could also destabilize a correct localization due to doubt being cast by additional information; potentially an alternative localization could be indicated (Warner et al., 2010). The use of an expert panel to make a consensus decision helps address these concerns (Reitsma et al., 2009). However, the expertise of the panel, the way patient information is presented, and how to obtain a final classification are all factors that can affect the reliability of a consensus-based localization by a multidisciplinary team (Rutjes et al., 2007). An objective method that examines the strength of statistical relationships among variables is the latent class model; this relates the observed patterns of test results to latent categories of patients with and without the target condition (Reitsma et al., 2009). However, the target condition is not defined in a clinical way, so there can be lack of clarity about what the results stand for in practice (Reitsma et al., 2009).

Finally, the major limitation that afflicts all currently available reference standards for the workup for epilepsy surgery is the inability to verify whether the index test was accurate, and the decision appropriate, in patients who do not undergo surgery (Gaillard et al., 2011a).

Using an imperfect reference standard leads directly to bias in accuracy estimates, the direction and bias of which will depend on the frequency and correlation of the errors (Reitsma et al., 2009). For conditions where only an imperfect reference standard test is available, the use of clinical follow-up as a supplement to that test is often used, improving the accuracy of the reference standard to which the index test is being compared. This primarily allows index test results that would have been classified wrongly as false positive, to subsequently be reclassified as true positive. We attempted to improve the currently available imperfect reference standard (consensus decision from a combination of tests) by restricting inclusion of studies to those that reported a clinical consequence of the results of the comparator tests, namely the decision to go to surgery and/or the outcome following surgery.

Scope of the Systematic Review

The review was broad, investigating the diagnostic accuracy and clinical effectiveness of noninvasive technologies (and where possible, combinations and sequences of these technologies) in patients with refractory epilepsy who were being considered for surgery. Our review evaluated those noninvasive technologies that may be conducted subsequent to the initial clinical examination and investigations when there is thought to be insufficient confidence in either the decision to go to surgery, or the site at which surgery should be conducted.

On the basis of previous clinical and methodologic reviews (Fryback & Thornbury, 1991; Moons et al., 1999; Moons & Grobbee, 2002; Moons et al., 2004; Uijl et al., 2005; Whiting et al., 2006), it was clear that studies that encompassed all levels of the diagnostic framework needed to be considered to answer these questions and inform decision making in the National Health Service (NHS) concerning the relative value of these tests.

The systematic review was conducted in accordance with Centre for Reviews and Disseminations’ (CRD’s) guidance for undertaking reviews in health care (Centre for Reviews and Dissemination, 2009) and Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) (PRISMA Group; Moher et al., 2009). Eighteen sources were searched for studies that evaluated high-definition EEG (HD-EEG), volumetric magnetic resonance imaging (MRI), functional MRI, magnetic resonance spectroscopy (MRS), single photon emission computerized tomography (SPECT), subtraction ictal SPECT coregistered with MRI (SISCOM), positron emission tomography (PET), magnetoencephalography (MEG), and MSI (MRI and MEG coregistered). Studies evaluating history and neurologic examination, routine EEG, and routine MRI only were not included in the review. To be included, studies had to be conducted in patients with refractory focal epilepsy being considered for surgery, and had to report a clinical outcome such as the effect of the test result on patient management and/or surgical outcome. Studies in patients with tumors, vascular malformations, or epilepsy as a result of trauma were excluded. Eligible study designs were randomized controlled trials and cohort studies that compared two or more diagnostic strategies: studies reporting the effect of test results on management strategy; prospective single-gate diagnostic accuracy studies that reported the decision to go to surgery and/or the outcome following surgery; and outcome prediction studies reporting the results of a multivariate regression analysis in which an index test(s) of interest was an independent variable.

Of eighteen studies that met the inclusion criteria, 13 were single-gate diagnostic accuracy studies; none were randomized controlled trials, and only one reported the effect of the test results on the decision making process (Uijl et al., 2007).

Utilizing and Interpreting Data from the Diagnostic Accuracy Studies

The use of diagnostic accuracy studies poses a number of challenges for epilepsy surgery where the question is about localization rather than diagnostic accuracy. We classified the index tests based on whether they were concordant, nonlocalizing, partially concordant, or discordant with either the final consensus localization for the decision to go to surgery, or the site of surgery for outcome following surgery (Table 1):

Table 1.   Contingency table into which the data were extracted from the diagnostic accuracy studies
 Decision for surgery or good outcome following surgeryDecision against surgery or poor outcome following surgery
ConcordantAB
NonlocalizingCD
Partially concordantEF
DiscordantGH
  • Concordant: Epileptogenic zone identified by the index test was the same as the consensus localization/site of surgery.
  • Nonlocalizing: An epileptogenic zone was identified, or surgery undertaken, but the index test was nonlocalizing.
  • Partially concordant: The epileptogenic zone identified by the index test overlapped, but was not identical to, the consensus localization/site of surgery.
  • Discordant: The epileptogenic zone identified by the index test was different from the consensus localization/site of surgery.

To produce estimates of diagnostic accuracy, a 2 × 2 table is needed. This would involve combining different categories of patients. Combining the concordant and partially concordant categories may be valid, but the nonlocalizing and discordant categories are distinct populations for which the consequences of the test results would be very different. A nonlocalizing test would likely result in a patient receiving medical management, but a discordant test could lead to surgery at an incorrect location, which could leave the patient with continued seizures and/or transient or long-term complications. In addition, nonlocalizing tests can be categorized as true or false negatives; discordant tests identified an epileptogenic zone and are therefore positive, and combining these with negative test results is questionable.

Likelihood ratios can be calculated for these 2 × 4 data. However, they indicate the likelihood of an outcome (a decision to go to surgery or good outcome following surgery) given a combination of test results; they are not associated with the results of one particular index test as is usually the case. They may be of value if those for the concordant categories are compared to those of the nonlocalizing and discordant categories. If the likelihood ratio for the concordant and discordant categories is the same, the index test is of little or no value; the likelihood of a decision for surgery, or a good surgical outcome, is the same regardless of the result of the index test. In our review, all the likelihood ratios were close to unity, and no conclusions could be drawn (Burch et al., 2012).

Given the disadvantages of collapsing the data into 2 × 2 tables, the uninformative likelihood ratios, and the fact that diagnostic accuracy studies constituted the majority of the evidence base, we attempted to gain some clinical value from these data by determining the proportion of test results that contributed to the decision to go to surgery or not, and the proportion of patients in whom the index test identified an epileptogenic zone correctly (hit), incorrectly (error), or missed a resectable epileptogenic zone (miss) in those who underwent surgery.

Decision to go to surgery

The test results in each cell of the 2 × 4 table (Table 1) were interpreted as follows:

Cells A and B: Concordance between the consensus localization and index test means the decision to go to surgery or not may have been the same whether the index test was used alone or in combination with other tests. It could be argued that there was no value in conducting the additional index test. However, the index test may have increased confidence in the site of the epileptogenic zone and the subsequent decision to go to surgery or not, and may have negated the need for invasive EEG.

Cells C and D: Given that an epileptogenic zone was identified, the nonlocalizing index test provided no additional information regarding the potential site of excision and may cast doubt on the results of prior tests. Where the decision was not to go to surgery, we cannot tell whether it was due to increased uncertainty of the seizure location or other reasons.

Cells E and F: The partial concordance of the index test with the consensus localization means that there may have been some added value in the performance of the test in terms of increased confidence in the epileptogenic zone. Alternatively, additional potential sites for the epileptogenic zone indicated by the index test may decrease confidence in the potential benefit of surgery. The confidence the surgical team places on the results will determine whether the index test seems to be confirmatory of the epileptogenic zone or not.

Cell G: Despite the discordance between tests, surgery was performed. If the site of excision was based on the results of the consensus localization, then the index test provided no added value, and may have made the decision to undertake surgery more difficult. However, if the excision was at the site identified by the index test, there was not only value to undertaking the additional test, but a perceived greater value in the results of the index test than the other test(s).

Cell H: The index test was discordant with the comparator test(s) and there was a decision not to go to surgery. We cannot tell whether this was due to increased uncertainty of the seizure location or other reasons.

It is clear that the results generated by the diagnostic accuracy studies do not allow the additional value of the index test to be determined, either for any individual study or case within a study.

Outcome following surgery

The test results in each cell of the 2 × 4 table (Table 1) were interpreted as follows:

Cell A: The index test identified the epileptogenic zone at the site of surgery and surgical outcome was good, indicating the index test was correctly localizing (hit). The index test may have been sufficient to confirm the epileptogenic zone, avoiding the use of invasive EEG. It is possible that resection was more extensive than that identified by the tests, and that the epileptogenic zone was contained within the additional resected tissue.

Cell B: The index test identified the epileptogenic zone at which surgery was conducted, but surgical outcome was poor. These tests could be classified as errors; however, the poor outcome could be due to complications (intraoperative or postoperative, such as gliosis, infection, or bleeding), or an incomplete resection at the correct epileptogenic zone; these test results are therefore unclassifiable.

Cell C: The index test was nonlocalizing, but an epileptogenic zone was identified and surgical outcome was good; the index test missed a resectable epileptogenic zone.

Cell D: The index test was nonlocalizing, but an epileptogenic zone was resected and the surgical outcome was poor. Therefore, either: surgery was conducted at the correct location, but there were complications; surgery was conducted at the wrong location; only part of the epileptogenic zone was excised; or there was no resectable epileptogenic zone. Consequently, it is unclear whether the index test was a miss or “correctly nonlocalizing”; the tests results are unclassifiable.

Cell E: Partial concordance with the site of surgery means that the index test identified an epileptogenic zone that was more or less extensive than the area excised. Given the good outcome following surgery, these tests could be deemed “partial hits.” However, it is unclear whether a good outcome would have been achieved if the area for excision was informed solely by the index test; these tests are unclassifiable.

Cell F: The index test identified an epileptogenic zone that was more or less extensive than the area excised. Not only is it unclear whether a poor outcome would have occurred if the area excised was informed solely by the index test, but it could be that the localization was correct and complications or an incomplete resection resulted in the patients continuing to have seizures; these tests are unclassifiable.

Cell G: The index test was discordant with the site of surgery; therefore, the good outcome following surgery indicates that the index test result was an error.

Cell H: The site of surgery and the index test were discordant, and outcome following surgery was poor. It is unclear whether surgery was conducted at the wrong location, or at the correct location with complications or incomplete resection resulting in continued seizures. If surgery was complication-free, it is still unknown whether the patient would have had a good surgical outcome if the site indicated by the index test had been excised. These tests are unclassifiable. If verification of the index test localization was sought by conducting a second surgery at the alternative epileptogenic zone, the patient may undergo a second inappropriate and unnecessary surgical procedure, or consequences of the first surgery may mean a poor surgical outcome even if it was the correct location, making the index test appear inaccurate even if it was not.

Using the 13 diagnostic accuracy studies we identified in our systematic review we established that it was not possible to determine whether an index test was of value in the decision making process; neither were they informative for determining the diagnostic accuracy of these technologies in this indication (Burch et al., 2012). Data on patients who were not offered surgery were lacking in the majority of studies.

For the outcome following surgery, the number of test results that could not be classified ranged from 0–52.9%, with an average across data sets of 23.5%. The only cells that provided unequivocal results were C (misses) and G (errors); there is some doubt whether the test results in cell A are hits, as the good outcome following surgery could have been achieved following the inadvertent resection of the epileptogenic zone by removal of tissue adjacent to the site identified by the tests. Given the high proportion of index test results that could not be classified as hits, misses, or errors, any conclusions drawn from these data could be inappropriate and misleading. Such generalizations applied to all the test results in any one cell are inappropriate because some concordant, partially concordant, or discordant tests could be crucial in the decision making process, whereas others are not, and some nonlocalizing test results could cast sufficient doubt on the site of an epileptogenic zone to alter the management plan for a patient, whereas others would not.

How Can the Evidence Base Be Improved?

The diagnostic accuracy study design clearly has major limitations when attempting to determine the accuracy of the location of the epileptogenic zone. As the presence of disease has already been established, the role of the test(s) differs from the standard diagnostic evaluation. The desire to calculate estimates of test accuracy such as sensitivity and specificity, and the resulting perception of the need of diagnostic accuracy studies, is not necessarily appropriate where the object of the tests is to determine something other than the presence, absence, or stage of disease. As in our review, where the data are categorical, dichotomizing the data is difficult, or even impossible. Patients cannot move between categories, and therefore there is no continuum on which to define a cutoff point to dichotomize the data in a clinically meaningful way.

For a test evaluation to inform which test(s) should be conducted in clinical practice, the effect of the test results on the decision making process, management strategies, and subsequent clinical outcomes and quality of life need to be established in all patients, not just those who undergo surgery. Currently, where evidence is available at level 4 or 5 of the diagnostic framework, it tends to be subject to selection bias (Gaillard et al., 2011a). A review of diagnostic testing study guidelines conducted by the ILAE commission on diagnostic methods also highlighted a lack of high quality studies in the broad area of epilepsy imaging, and limitations of currently available guidelines (Gaillard et al., 2011a). Following this review, Gaillard et al. (2011a) identified 15 criteria that were considered essential elements of a quality imaging or neurophysiologic study. There are a range of alternative study designs that can be considered, but each has limitations; these are discussed in subsequent sections.

Nonrandomized Studies

Nonrandomized direct comparisons, where patients selected according to the clinical problem are given both the index test and the same reference standard test(s) may be informative if the necessary information is gathered, including the preliminary consensus decision without the index test; the influence of index test results on the consensus decision; clinical outcomes in all patients regardless of the management strategy chosen; compliance with tests and surgery; quality of life; and complication and reoperation rates (Moons et al., 1999).

The only study that met the inclusion criteria that considered the effect of test results on the decision making process, the reporting of which was one of the essential criteria identified by the ILAE commission on diagnostic methods (Gaillard et al., 2011a), was a retrospective study of all patients referred to the Dutch Collaborative Epilepsy Surgery Program (DCESP) between 1996 and 2002 (Uijl et al., 2007). Patients had a clinical history taken, routine EEG and MRI, and prolonged video surface EEG. A multidisciplinary team (neurosurgeon, neurologist, neurophysiologist, neuropsychologist, and radiologist) decided whether further tests (PET, SPECT, functional MRI, MEG, or invasive EEG) were required; documentation from these meetings was used as the basis for the study. A comparison of the decision made by the team before and after the fluorodeoxyglucose (FDG)–PET scan was used to determine whether FDG-PET altered the original management decision based upon the clinical history, routine EEG and MRI, and video EEG.

The study was retrospective, a large proportion of patients did not undergo FDG-PET (359 of 469 patients), and additional tests were conducted in some patients that could have influenced the decisions made by the multidisciplinary team. This study, therefore, fails to meet a number of the criteria considered essential by the ILAE Commission on Diagnostic Methods (Gaillard et al., 2011a). In addition, the study was restricted to patients with temporal lobe epilepsy. If such a study was conducted that encompassed the heterogeneous population seen in clinical practice and all the available noninvasive technologies, it would become labor intensive, difficult to organize, and potentially unfeasible, and maintaining the blinded status of the multidisciplinary team may be problematic. Studies could evaluate all tests in patients with TLE and non-TLE separately, as these are distinct patient populations with different challenges for diagnosis and management; however, this could still be a challenge. Such studies would need to make an assessment of population homogeneity/heterogeneity (Gaillard et al., 2011a).

Studies using such consensus-based diagnosis by a multidisciplinary team must consider a number of factors that affect reliability, namely, the number of experts and expertise mix, the way patient information is presented, and how to obtain a final classification (Rutjes et al., 2007). Ideally, the data should be collected prospectively, the test applied to all patients uniformly, there should be sufficient follow-up to establish the persistence of seizure outcome (at least 1 year), and in those who underwent surgery, the incidence of seizure freedom and degree of seizure reduction in those who did not become seizure-free, should be reported, as recommended by the ILAE commission on diagnostic methods (Gaillard et al., 2011a). In addition, there remains the problem of how to verify the accuracy/appropriateness of the decision not to go to surgery, and the accuracy of discordant tests where the outcome following surgery was poor; clinical outcomes for all patients regarding the final management strategy, seizure frequency, the eligibility for reoperation, and results after any reoperation undertaken, would need to be reported.

Randomized Controlled Trials

Although randomized controlled trials might be considered an ideal standard (Fig. 1: levels 4 and 5), in reality the difficulties of conducting these trials are well known and they may be considered impractical (Hunink & Krestin, 2002; National Institute for Clinical Excellence 2007, Pearson et al., 2008; Trikalinos et al., 2008; Gaillard et al., 2011a, Stefan, 2011). Despite these concerns, a randomized controlled trial has been performed that randomized patients with mesial temporal lobe epilepsy to a diagnostic regimen with or without ictal SPECT, demonstrating the feasibility of such trials (this randomized controlled trial did not pass the inclusion criteria for this review, as approximately 43% had precipitating trauma) (Velasco et al., 2011). Pragmatic randomized controlled trials are increasingly popular for complex therapeutic interventions, particularly where there is variability in standard practice and difficulty blinding patients and carers. Such a design could address some of the concerns raised regarding the use of the trials in this indication and provide results generalizable to clinical practice. If a randomized controlled trial was considered appropriate, there have been a number of suggestions as to when and who to randomize: prior to diagnostic testing (Hunink & Krestin, 2002); patients with discordant test results to subsequent management strategies (Hunink & Krestin, 2002); or patients with positive results on the new test and negative results on the old test to subsequent management strategies (Lord et al., 2006). Randomization posttest to different treatment strategies would be unacceptable (Bossuyt et al., 2000). In addition, a clearly defined control population would be required (Gaillard et al., 2011a).

National and International Databases

A potential starting point for informative studies of imaging technologies could be routine care databases or electronic patient records (Oostenbrink et al., 2003). Routinely documented data can quantify the additional value of a test, and covers the entire diagnostic and therapeutic process, allowing stepwise analysis of tests in the sequence as they occurred and an estimate of the added value of tests to previous findings (Oostenbrink et al., 2003). Data collection could be optimized through the use of a national registry, an organized system that collects uniform data (clinical and other) to evaluate specified outcomes for the defined population (Gliklich & Dreyer, 2010).

The advantages of the use of a patient registry include the collection of “real world” data for full heterogeneous population rather than the restrictive populations often recruited in trials, making the data more generalizable to clinical practice than those of other study designs; avoidance of publication bias; large sample sizes; collection of data on all the noninvasive technologies available; the ability for trends in clinical practice to be observed; the potential to identify interventions and populations that would benefit from further investigation; and the provision of data suitable for future decision analytic modeling. The main disadvantages are ensuring good quality data, and that diagnostic technologies and surgical techniques may continue to develop, making the older entries unrepresentative of current practice. The quality of the database will depend upon the structure, training of users, completeness of the data, external validation, and confidentiality (Gliklich & Dreyer, 2010). Detailed guidelines have been published by the Agency for Healthcare Research and Quality (AHRQ) for the planning, design, and use of patient registries (Gliklich & Dreyer, 2010).

There are examples of such databases and registries in epilepsy that demonstrate the potential success of this approach: the UK Epilepsy and Pregnancy Register (2011); the European and International Registry of Antiepileptic Drugs in Pregnancy (EURAP 2011); Epilepsiae, the European database on epilepsy (Ihle et al., 2010; Epilepsiae 2011); and EpiBase (Sorensen et al., 1999). In addition, studies using national registry data for evaluating epilepsy surgery have been published from the United States (Labar, 2004), The Netherlands (Zijlmans et al., 2008), Turkey (Baykan et al., 2005), China (Xu & Xu, 2010), Finland (Sillanpaa et al., 2011), and Sweden (de Flona et al., 2010), illustrating the feasibility of developing epilepsy registries and using the resulting data.

Decision Analytical Modeling

Decision modeling (Fig. 1: level 6) will be an important analytical method in future research in this area, allowing estimates based on study data such as that obtained from the alternatives described above, and expert opinion to be included in the synthesis. Modeling can simulate the effect of various diagnostic strategies, including which combinations of tests are optimal on patient outcomes (Hunink & Krestin, 2002; Pearson et al., 2008; Trikalinos et al., 2008). In addition, modeling can also be used to investigate evidentiary assumptions about test performance and clinician decision making, and examine all benefits, harms, and costs together to provide cost-effectiveness and cost utility information to decision makers (Hunink & Krestin, 2002; Pearson et al., 2008). The major limitations of decision modeling are: a lack of sufficient reliable data relating to key components of the model (Trikalinos et al., 2008); the reliability of the data available can depend on the clinical context in which the test is conducted (sole diagnostic modality, triage, further workup, confirmatory test) (Bossuyt et al., 2006; Trikalinos et al., 2008). It may not be appropriate to compare data from a test used as the sole diagnostic tool with alternative diagnostic strategies deriving data from a confirmatory test after a positive or inconclusive prior test (Trikalinos et al., 2008). As well as data on the influence of test results on consensus diagnosis and resultant management decision, clinical outcomes in patients who do not undergo surgery, compliance with tests and surgery, quality of life, and complication and reoperation rates are important.

A de novo decision analytic model was developed during the systematic review based on the study by (Uijl et al., 2007). This demonstrates the feasibility of assessing the cost-effectiveness of these technologies when suitable data are available; this model could be extended to accommodate such data as it becomes available (Burch et al., 2012). The model could be adapted to evaluate additional alternative treatment strategies consisting of either single tests or combination/sequence of tests, that is, evaluating more comparators and expanding the existing tree “vertically.” How this is done will affect the type of data required. In addition, clinical practice may suggest that additional subsequent tests should be undertaken following no definite decision to undertake surgery, that is, strategies may be more complex, involving a larger number of tests used sequentially. Data acquisition to inform such a “horizontal” extension of the decision tree would probably be complex, but the model would be flexible to accommodate these. It is essential to the evaluation of imaging technologies in the workup for epilepsy surgery that the effect of the use of these technologies on clinical decision making, and on further treatment decisions, is considered; the role of decision modeling is central to this.

Conclusions

It is clear that the use of data from diagnostic accuracy studies has serious limitations when trying to establish which diagnostic modalities should be used to identify the epileptogenic zone in patients with refractory epilepsy being considered for surgery, where doubt about surgery remains after the initial clinical examination, surface EEG and MRI. The diagnostic accuracy studies currently being conducted cannot provide information to inform either the diagnostic accuracy or clinical utility of the tests being evaluated, and are therefore an inappropriate study design to use to address this decision problem. Alternative methods of data collection that concentrate on the effect of these tests on the decision making process, therapeutic management decisions, and patient outcomes are required; a potential starting point for informative studies of imaging technologies could be routine care databases or electronic patient records. Despite focusing on a specific clinical area, we believe this applies to all areas of diagnostic research, and supports the work being conducted by the ILAE Commission on Diagnostic Methods (Gaillard et al., 2011a,b).

Acknowledgments

We would like to thank: Dr. Robert Phillips, Centre for Reviews and Dissemination, York, for his advice during the data extraction process and interpretation of the diagnostic accuracy data; Professor Stephen Palmer for his comments on a draft of the manuscript; and Jonathan Minton for his assistance during the project.

Funding

This report was funded by the NIHR HTA Programme (project number HTA 09/106/01) and will be published in full in Health Technology Assessment; see the HTA website for further details of this project (http://www.hta.ac.uk). The views and opinions expressed therein are those of the authors and do not necessarily reflect those of the HTA Programme, NIHR, NHS, or the Department of Health. Any errors are the responsibility of the authors.

Disclosure

None of the authors has any conflict of interest to disclose. We confirm that we have read the Journal’s position on issues involved in ethical publication and affirm that this report is consistent with those guidelines.

Ancillary