Guideline reviews of diagnostic literature—structural imaging, functional imaging, and neurophysiologic studies in epilepsy—seem to raise particular problems leading to “low” evidence ratings. Sample sizes are small, and randomization and blinding are uncommon. Most criteria for investigative criteria are designed to assess procedures on fairly narrow “diagnostic” criteria, rather than the more fluid localization and prognostic questions important for intractable epilepsy [AAN; Center for Evidence-based Medicine (CEBM); http://www.cebm.net/index.aspx?o=1157– accessed July 27 2009]. Given the limitations in the available data, and disagreement about the process, it is challenging to develop guidelines based on satisfactory quality of data that would appear generally satisfactory for a number of important questions to help guide clinical practice:
Limitations of current imaging guideline criteria
The classification of evidence for diagnostic and outcome studies of technology derives primarily from therapeutic trials (see Table 1), which outline clear study populations, control populations, intervention, measures, and outcomes. Technology does not readily lend itself to classification in this model format. Devices are usually evaluated in terms of their accuracy, reliability, therapeutic potential, and cost effectiveness. In epilepsy studies, devices and techniques usually are directed at diagnosis and prognosis for seizure control. There are several aspects of epilepsy that make application of evidence classification schema problematic.
The course of epilepsy is irregular, with remissions and exacerbations. It may take as long as 10 years after seizure onset for patients to develop persistent “intractable epilepsy” (Spooner et al., 2006; Berg, 2009). Imaging modalities used early in prospective studies may be obsolete by the time the data are analyzed, and thus are irrelevant to current practice.
For surgical planning, identifying—or confirming—the area responsible for seizures and, therefore, for surgical resection is considered to be paramount, based on the data showing that patients with focal findings on imaging or neurophysiology do better than those with normal studies (e.g., Mcintosh et al., 2004). These data themselves, however, generally would receive low ratings in the AAN scheme (due to lack of blinding and randomization, among other issues), perhaps doing slightly better in the GRADE classification. To complicate matters, patients may have a restricted zone of epileptogenicity within a structural lesion, a wider zone beyond it, multiple lesions, or a more broadly defined “epileptogenic network,” which is not evident on imaging studies.
Imaging studies are predicated on the assumption that a visualized abnormality is linked to cause, pathology, seizure focus, and outcome. MRI evidence of hippocampal sclerosis is usually taken to have pathophysiologic significance. However, this presumption is based on the observation that such MRI findings have been rare in the large number of normal volunteers scanned for neuropsychologic studies. Some investigators suggest that hippocampal sclerosis is not always associated with intractable epilepsy (Stephen et al., 2001; Kobayashi et al., 2002). Moreover, hippocampal sclerosis in the setting of refractory epilepsy may have different significance than when found in new-onset seizure populations (Spooner et al., 2006) or asymptomatic people. The lesion that has been shown to progress over time (Theodore et al., 1999; Mathern et al., 2002) may be a consequence as well as a cause of seizures.
Not all MRI abnormalities—including hippocampal sclerosis, cavernomas, gliomas, and malformations—cause seizures and not all seizures originate from identified structural cerebral abnormalities. It is necessary to establish with clinical and neurophysiologic data whether a given lesion is likely to be responsible for the seizures. Nevertheless, the consensus that identifying clear [hippocampal sclerosis, malformation of cortical development (MCD), tumor; not gliosis or encephalomalacia] imaging abnormalities is associated with good surgical outcome would make it very difficult to perform a prospective study (see also the large scale retrospective ILAE 2004 pediatric surgery outcome data [Harvey et al., 2008]).
Both diagnostic and prognostic classification schemes are based on some variety of a “final common criterion,” often referred to, with unintended irony, as a “gold” standard. The criterion itself may be elusive or flawed; in some instances there is no standard. The standard for identification of a seizure focus may be based on video–scalp ictal EEG, intracranial ictal EEG, pathology, or postoperative seizure freedom. For diagnostic purposes the standard usually means the seizure focus, initially defined electrophysiologically, with supporting evidence from imaging and sometimes pathology. This approach of course runs the risk of creating circular arguments, although new imaging approaches can be evaluated in comparison to “established” ones.
Linking imaging standards to pathology can be difficult as well: changes may be subtle, or missed, due to limited tissue availability and quality for review or insufficient expertise. Moreover, the relation between underlying pathology and clinical seizures is inexact. Pathologic classification schemes are subject to debate and reconsideration; changing pathologic classification schemes, like changing MR technology, can make comparison of new and old data difficult (Palmini et al., 2004; Blümcke et al., 2011).
Many factors may affect clinical outcome. For surgical studies the ideal measure—seizure freedom—is problematic. Surgical outcome depends on the surgeon, the approach, and functional/anatomic constraints. A success rate of <100% may not mean that imaging was incorrect. Sometimes the abnormality or the focus cannot be entirely removed for technical reasons (e.g., vascular), pathologic reasons (e.g., gliomas), or functional reasons (e.g., overlap with eloquent cortex). A reduction in seizures may suggest that the imaging data were correct, but the resection was incomplete. A further difficulty is the variability in time at which postoperative outcome is assessed. Postoperative seizure frequency fluctuates, as may patient compliance with postoperative AED treatment. A patient could be seizure-free for several years, experience one or more seizures, followed by another extended remission, or longer relapse. These confounds will effect sensitivity and specificity measures by underestimating or overestimating the value of diagnostic and prognostic testing.
For language and memory lateralization, the intracarotid amobarbital test (IAT) is often considered a “gold” standard. Yet there are clearly flaws: the IAT includes measurable risk, limited time for cognitive assessment of variables of interest, poor validation of memory, inaccurate results of IAT, as well as technical and vascular reasons for failure. Electrocortical stimulation (ECS) is considered the “gold” standard for functional localization but is limited in time for assessment, and sampling can only be performed at sites of implanted electrodes. Postoperative cognitive assessment could be considered a standard, but no study will randomize patients to removal of areas where language or memory are thought to reside on the basis of an imaging procedure—one can only examine unintended adverse surgical consequences.
Common shortcoming of current imaging literature
Although there are flaws in current guideline criteria, the current imaging literature commonly lacks study designs necessary to provide meaningful contributions to clinical practice. A limitation that plagues epilepsy surgery investigations is the size of study populations, especially for new or limited availability technology, and in pediatrics. Initial reports on imaging and physiology studies are usually small (15–30 patients) with follow-up studies rarely >100, and smaller when ionizing radiation is involved. With these limited numbers it is often impossible to generalize findings because of the heterogeneity of patient populations and limited statistical power. Imaging technology also changes rapidly, with upgrades annually and major changes of equipment every 5 years common place. Even at the most active epilepsy treatment sites it takes several years to obtain homogenous patient populations, with a minimum of 12 months postoperative follow-up, that have sufficient power to make meaningful conclusions. Meanwhile new positron emission tomography (PET) ligands or MR sequences may have been introduced.
Only a minority of epilepsy imaging studies have control populations. Exceptions include some adult PET studies, functional MRI (fMRI) language studies, diffusion tensor imaging (DTI), and structural MRI–voxel-based morphometry (VBM)–based approaches to data analysis [primarily structural, DTI, magnetization transfer, fluid-attenuated inversion recovery (FLAIR)] (Cook et al., 1992; Rugg-Gunn et al., 2001; Gaillard et al., 2002; Rugg-Gunn et al., 2003; Salmenpera et al., 2007; Focke et al., 2008a,b, 2009). Ionizing radiation used for PET and single-photon emission computed tomography (SPECT) precludes obtaining normal data in children (Chugani et al., 1987; Gaillard et al., 2002). Even when controls are available, the set may not be large enough to ensure that data accurately reflect population age-related norms; the control population must be appropriately powered for experimental comparisons (e.g., MRI-VBM methods require 30 or more subjects (Focke et al., 2009). Defining control populations for imaging studies in epilepsy populations with respect to outcome is also problematic. In therapeutic trials one can more readily randomize patient populations, and then move to open label or cross-over design (see below). The usual approach is to choose a more or less homogeneous sample of subjects with an epilepsy syndrome of interest (usually temporal lobe epilepsy, TLE) and perform an imaging study in order to compare clinical characteristics and surgical outcome between patients with positive and negative imaging findings.
Therapeutic trials are facilitated by an infrastructure for multisite trials and strict government criteria for approval [e.g., the European Medicines Agency (EMEA) and Food and Drug Administration (FDA)]. There is no mechanism for conducting comparable multisite imaging studies that would be the equivalent of “pivotal” medication trials. Other impediments to multisite technology studies include expense, limited availability, and expertise. Perceived technical differences in machines and sequences are viewed as impediments to studies although these differences are less than patient heterogeneity.
Diagnostic data are, with rare exception, used in the decision-making process [for the exception, see Theodore 1992, where fluorodeoxyglucose–PET (18FDG-PET) data were obtained but not provided for surgical planning and intervention]. Sometimes imaging data identify an abnormality that leads to intracranial EEG and subsequent resection in a patient previously considered not to be a surgical candidate (Salmenpera et al., 2007; Focke et al., 2009). It is difficult, in these circumstances, to test the data independently and without compromising good clinical practice in the use of accepted techniques. For example, it would be difficult to do such studies with MRI, SPECT, language fMRI, or MEG; one should be able to do so with new MRI sequences (diffusion/perfusion).
Recent alternative study designs advocate presentation of novel image or neurophysiologic data, after a case conference decision has been made using standard clinical and imaging material, in order to assess how reconsideration with the new information alters decision making (Medina et al., 2005; Knowlton et al., 2008a,b). Here one does not know what would happen with those patients who do not undergo the procedure and effect ultimate outcome. It is not clear how this can be avoided without compromising good clinical practice. The practice introduces a selection bias; TLE patients with normal MRI may be less likely to have surgery, and the effect is greater for extratemporal lobe epilepsy. Studies often do not evaluate how novel imaging changes practice.
There is a general failure to collect data prospectively. Ideally all the imaging analysis should be done before surgery, unless results of analysis may bias study conduct (e.g., preoperative fMRI to predict postoperative memory outcome). Imaging and physiologic data, inherently objective, lend themselves to independent review; but retrospective analysis may introduce several sources of bias. Many studies do not interpret data or assess outcomes blindly. Data need to be analyzed by a person blinded to patient identity and without a vested interest in the outcome. Most centers do not have special expertise in all imaging modalities, thereby complicating multimodal comparisons.
Another major limitation is the continuing and rapid evolution in technology. Although there are no class 1 studies on 1.5 MRI, imaging has moved to 3T and 7T studies are commencing. New MRI sequences and changes in scanner hardware and software are introduced every few years, but their application and proper place in epilepsy evaluation is not well established. In short, the technology does not stand still long enough to enable adequately powered studies with adequate follow-up to be carried out.
There is also an issue of sensitivity and specificity. Subtle focal cortical malformations are considered to be the likely cause of many cases of nonlesional focal epilepsy. With higher resolution scanners and sequences it will be difficult to be certain that increasingly subtle findings are clinically relevant unless adequate numbers of healthy controls are studied. Last, there is the issue of how to pay for new technologic assessment of efficacy; this is most problematic for new PET ligands, and less an issue for new MRI sequences that can be added to a clinical series. Studies of new data do not test whether a given technology is equivalent, and more importantly do not test when a test may be redundant (e.g., FDG-PET when MRI and video-EEG are concordant, or IAT when fMRI language laterality is clear).
For an ethical clinical trial there must be equipoise between the two arms of the study in terms of patient benefit. This may not be possible with many imaging studies. It would not now be considered ethical to withhold fMRI language lateralization results from a surgical team to determine whether the study could predict postoperative dysphasia. This would, however, be feasible at this time for fMRI studies of memory, not yet generally accepted. Here, there is reasonable equipoise as to whether and how the data should influence surgical decision making.