Previous validation studies assessing the use of administrative data to identify patients with epilepsy have used targeted sampling or have used a reference standard of patients in the neurologist, hospital, or emergency room setting. Therefore, the validity of using administrative data to identify patients with epilepsy in the general population has not been previously assessed. The purpose of this study was to determine the validity of using administrative data to identify patients with epilepsy in the general population.
A retrospective chart abstraction study was performed using primary care physician records from 83 physicians distributed throughout Ontario and contributing data to the Electronic Medical Record Administrative data Linked Database (EMRALD) A random sample of 7,500 adult patients, from a possible 73,014 eligible, was manually chart abstracted to identify patients who had ever had epilepsy. These patients were used as a reference standard to test a variety of administrative data algorithms.
An algorithm of three physician billing codes (separated by at least 30 days) in 2 years or one hospitalization had a sensitivity of 73.7% (95% confidence interval [CI] 64.8–82.5%), specificity of 99.8% (95% CI 99.6–99.9%), positive predictive value (PPV) of 79.5% (95% CI 71.1–88.0%), and negative predictive value (NPV) of 99.7% (95% CI 99.5–99.8%) for identifying patients who had ever had epilepsy.
The results of our study showed that administrative data can reasonably accurately identify patients who have ever had epilepsy, allowing for a “lifetime” population prevalence determination of epilepsy in Ontario and the rest of Canada with similar administrative databases. This will facilitate future studies on population level patterns and outcomes of care for patients living with epilepsy.
Epilepsy is a chronic condition characterized by recurring seizures, and it is estimated to affect 50–65 million people worldwide.[1, 2] It affects people of all ages, races, and socioeconomic status. Accounting for 0.5% of the global burden of disease, epilepsy not only affects mortality, but also quality of life, as one of the more common chronic neurologic conditions. Currently in Canada, the only national prevalence estimates of epilepsy come from self-report data, with provincial estimates ranging from 5.2 to 5.6 per 1,000. However, these self-reported rates, performed on only a sample of the population, are likely to reflect active epilepsy and may be subject to the typical inaccuracies with lay reporting such as individuals being in remission or well controlled, thereby under-reporting the presence of the disease, or individuals experiencing single seizures or pseudo-seizures and over-reporting the presence of the disease. In contrast, administrative data cover the entire population and are available in every Canadian province, and they thus can serve as a resource for national population-level estimates. This not only allows for the identification of patients for surveillance, but also to describe the occurrence, trends, and distribution of patients in the Canadian population over time as has been done for chronic conditions such as diabetes and hypertension.[4, 5]
Five administrative data International Classification of Diseases (ICD) coding validation studies for epilepsy have been performed, with three conducted in other countries,[6-8] and two in the Canadian setting.[9, 10] However, due to the low prevalence of epilepsy in the population, these studies were performed using either a targeted sample of seizure or epilepsy-like patients[6-9] or randomly sampled but confined to neurologist clinics. Therefore, the ability of administrative data to identify patients with epilepsy in the general population is currently unknown.
Although Canada has the rich resource of the availability of administrative data across the country in all provinces and territories, there has yet to be national reporting on the incidence and prevalence of epilepsy in Canada using administrative data. We therefore looked to bridge this gap by assessing the validity of administrative data algorithms to identify patients with a “lifetime” prevalence of epilepsy using a random sample of patients from a comprehensive primary care setting.
Setting and context
Data from the Electronic Medical Record Administrative data Linked Database (EMRALD) held at the Institute for Clinical Evaluative Sciences (ICES) was used as the reference standard to assess the performance of administrative data capture for the presence of epilepsy. EMRALD consists of data from family physicians in Ontario using the electronic medical record (EMR) software PS Suite® (EMR Ottawa, Ontario, Canada). All the clinically relevant information contained in the family physician patient chart is collected into EMRALD. This includes all clinical encounters that occur within the physician office, the cumulative patient profile (CPP) containing information on the current and past medical history, family history, risk factors, allergies and immunizations, all laboratory test results, prescriptions ordered or entered into the chart by the family physician, specialist consultation letters, discharge summaries, and diagnostic tests. Participating physicians contribute to EMRALD on a voluntary basis and data are collected on a semi-annual basis. This study received ethics approval from Sunnybrook Research Ethics Board.
Developing the reference standard cohort
At the time of the study there were 83 physicians who contributed data to EMRALD who had been using their EMR for at least 2 years and whose data met quality and completeness standards. Patients in EMRALD are similar to those in the Ontario population in terms of age and sex, with a slight over-representation of young adult women and slight under-representation of young adult males, which is typical of the types of patients that go to see physicians. Based on the availability of administrative data, which the EMR data were compared to, only physician visits and prescriptions dated prior to March 31, 2011, were included in the overall scoring for the presence or absence of epilepsy. To limit our reference standard to patients with a long enough EMR record to be reasonably populated, only patients who had at least one visit before April 1, 2010, were included. A random sample was taken of just over 10% (7,500 patients) of the 73,013 adult patients (age 20 years or older as of December 31, 2010) who had a valid health card number and date of birth, were rostered to one of our participating EMRALD physicians, and had at least one visit before April 1, 2010. The data were extracted between June and November 2011. The true prevalence of epilepsy in Canada is currently unknown, however, self-reported prevalence from a study conducted in the United States found a lifetime prevalence of epilepsy or seizure disorder of 1.3% among 43,020 respondents aged 18 years and older. The question “Have you ever been told by a doctor that you have a seizure disorder or epilepsy?” was asked of the respondents in a random-digit dialing cross-sectional survey by telephone to assess this lifetime prevalence. Assuming this estimate is correct, our cohort had a power of >90% to detect a difference of 0.45% from the expected prevalence, with an alpha of 0.05.
Using the criteria provided in an abstraction manual (see Fig. S1) created with input from family physicians, neurologists, and an epileptologist, six chart abstractors were instructed to read the entire EMR patient record and classify the CPP and each free text entry as the patient: (1) “definitely” having epilepsy; (2) “possibly” having epilepsy; (3) epilepsy was “ruled out”; or (4) there was no mention of epilepsy. Prescriptions were scored separately through automated text-matching of the prescription field maintaining the temporal pattern of the prescriptions. Because some antiepileptic medications can be used for multiple indications, no medication was deemed to indicate a classification of “definite” epilepsy diagnosis, but prescriptions for clobazam, ethosuximide, felbamate, lacosamide, levetiracetam, methsuximide, phenobarbital, phenytoin, rufinamide, and vigabatrin classified a patient as “possibly” having epilepsy. However, if a patient had mention of seizures in the free text and evidence of being treated with one of the above antiepileptic medications, that entry was scored as “definite.”
Scoring of the chart abstraction to identify patients with epilepsy took into account the temporal sequence of information in the patient chart and the prescriptions based on the following principles: any “definite” epilepsy classification without any subsequent “epilepsy was ruled out” classification was classified as “definite.” If a patient had an entry that was categorized as having “epilepsy ruled out,” then only subsequent entries classified as “definite” or “possible” would change the categorization of the patient to a “definite” or “possible.” Patients for whom there were entries categorized as possibly having epilepsy remained “possible” provided there were no entries classified as “definite” or having “epilepsy ruled out.” A 10% sample of charts was abstracted twice by the same abstractor and a second time by a different abstractor to assess for inter-rater and intra-rater reliability. Kappa scores for inter-rater and intra-rater reliability exceeded 0.80, indicating good agreement for all six chart abstractors.
After the chart abstraction was complete, each patient who had any kind of categorization for epilepsy was reviewed by a family physician (KT) in consultation with an epileptologist (NJ), and any errors in classification were corrected. Patients with a final classification of “definitely” having epilepsy were used as the reference standard to evaluate the accuracy of various administrative data algorithms. Epilepsy diagnoses were further classified based on the type of seizure (generalized, focal, or mixed), age of onset (neonate/infant, child, adolescent, or adult) and etiology (genetic, structural-metabolic, or unknown) as per the most recent recommendations by the International League Against Epilepsy (ILAE). Documentation of an epilepsy diagnosis was also classified as reported by the family physician, neurologist, other specialist, or electroencephalography (EEG) documentation. Due to the interchangeable nature of diagnosis terminology, the use of epilepsy versus seizure disorder terminology was also noted.
Administrative data sources available
The data used for the administrative data algorithms were the Canadian Institute for Health Information hospital discharge abstracts (CIHI DAD), the National Ambulatory Care Reporting System (NACRS), and the Same Day Surgery Database (SDS). These databases contain detailed diagnostic and procedural information for all hospital admissions and emergency department visits in the province and have been extensively validated. We identified epilepsy using the International Classification of Diseases, Tenth Revision (ICD-10) codes G40.x after the year 2002 and 345.x (excluding 345.2 petit mal status and 345.3 grand mal status) ICD-9 codes prior to 2002 (see Fig. S2). The Ontario Health Insurance Plan (OHIP) physician billing database was also used. This database includes a visit code accompanied by a diagnostic code. OHIP ICD8 (Ontario modified version) diagnostic code 345 for epilepsy was included, and for algorithms that required the OHIP billing code(s) to be submitted by a neurologist, the Institute for Clinical Evaluative Sciences Physician Database (IPDB) was used to identify the specialty of the billing physician.
Testing the accuracy of administrative data algorithms
Patients with a final classification of “definite” for epilepsy from the chart abstraction were used as the reference standard cohort to evaluate the accuracy of the various administrative data algorithms. The administrative data algorithms that were tested varied according to the administrative data sources used, the number of codes necessary to make the case definition, the length of time of the case definition, and the physician billing specialty. To avoid including patients with a single seizure event, which often result in multiple physician billing claims surrounding the event, we also tested a “30-day rule,” whereby only physician claims separated by at least 30 days were counted as a physician claim for the algorithms requiring more than one physician claim. Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), Youden Index, and kappa were calculated for each algorithm, in addition to their 95% confidence intervals (95% CIs). Once an optimal algorithm was identified, the false-positive and false-negative algorithms were reviewed to identify reasons for misclassification. All measures were calculated using the binomial approximation method with SAS version 9.2 (SAS Institute, Cary, NC, U.S.A.).
Applying the algorithms to the whole of the Ontario population
Last we applied the most optimal algorithms to the entire population of Ontario and looked at prevalence and incidence changes over time from 2000 to 2010. We used a 5-year run-in period whereby patients could start to be recognized as having epilepsy from 1995, but the annual incidence and prevalence reporting started in 2000 to allow sufficient time for cases to accumulate and to avoid erroneously including prevalent cases as incident cases in the early years of reporting. Patients were considered incident in the year in which they fulfilled all criteria of the algorithm rule. Annual prevalence was calculated and displayed by including cases that were prevalent in the prior year, adding the new “incident” cases in the same year and removing patients that died or moved out of province. Annual rates were age-standardized to 1991 Statistics Canada age distribution and according to those persons alive and registered with health care coverage within the province of Ontario in the given year.
Among the 83 physicians included in this study, the average years in practice was 15.6 years, with an average length of time on the EMR of 5.6 years; 51.8% were female and 22.9% were practicing in a rural area. This is compared to all Ontario family physicians with an average years in practice of 16.9 years, 40.3% female, and 7.7% practicing in a rural area. The average age of patients in the total EMRALD sample of 73,013 adults was 49.6 years, and 57.0% were female. This can be compared to all Ontario adult residents with an average age of 49.2 years and 53.5% female.
Overall the lifetime prevalence of epilepsy in our cohort population was 1.27%. Of the 95 patients with epilepsy, the average age was 48.3 years, 44.0% were female, and most 74 (78.0%) had a prescription for an antiepileptic medication recorded in the EMR record. The majority of the patients did not have seizure type(s) or epilepsy etiology recorded, but more than half had age of onset noted. Of the small proportion of patients that did have epilepsy type recorded, the majority were generalized with focal type being the least common type. Age of onset for epilepsy showed a bimodal distribution, with peak onset in the child and adult years. Just under half of the patients had neurologist documentation of epilepsy. Epilepsy and seizure disorder were used interchangeably and at similar frequency, and in <10% neither term was used but there was clear documentation of recurrent seizures and being on antiepileptic medication (see Table 1).
Table 1. Characteristics of “definite” epilepsy cohort of 95 patients
Age of onset
Family physician only
Seizure disorder only
Compared with the reference standard, hospitalization claims alone was insufficient to identify epilepsy patients. Adding a physician billing or a hospitalization increased the sensitivity but dramatically dropped the PPV. Increasing the number of physician claims required resulted in a decreasing sensitivity but improving PPV. There was little improvement by extending the time window for the multiple claims from 2 to 3 years, and requiring the claims to be by a neurologist dropped the sensitivity more than the gain in PPV. Requiring the multiple claim rules separated by >30 days resulted in a larger gain in PPV than a decrease in sensitivity (see Table 2). From the various algorithms that we tested the optimal algorithm was one hospitalization or three physician billings (separated by at least 30 days) within a 2-year period that provided sufficient sensitivity, a higher PPV, and had an estimated prevalence closest to our cohort prevalence.
Table 2. Validation of administrative data algorithms to identify adult patients who were definite for epilepsy using documentation in the primary care EMR as a reference standard
TP, true positive; TN, true negative; FN, false negative; FP, false positive; PPV, positive predictive value; NPV, negative predictive value; H, Canadian Institute for Health Information Discharge Abstract Database; P, Ontario Health Insurance Plan Physician Claims Database.
Reference standard: EMR chart – adult (≥20 years old) patients who had “definite” epilepsy (n = 95); total adult patients = 7,500; EMR epilepsy cohort prevalence = 1.3%. Bold text indicates the most optimal algorithm.
H or P
P by neurologist
H or 2 P in 1 year
H or 2 P (>30 day separation) in 1 year
H or 2 P in 2 years
H or 2 P (>30 day separation) in 2 years
At least 1 P by neurologist
Both P by neurologist
H or 2 P in 3 years
H or 2 P (>30 day separation) in 3 years
H or 3 P in 1 year
H or 3 P (>30 day separation) in 1 year
H or 3 P in 2 years
H or 3 P (>30 day separation) in 2 years
At least 1 P by neurologist
At least 2 P by neurologist
All 3 P by neurologist
H or 3 P in 2 years
H or 3 P (>30 day separation) in 3 years
H or 4 P (>30 day separation) in 2 years
H or 4 P (>30 day separation) in 3 years
We assessed the false positives and negatives of our optimal algorithm and found of the 25 false negatives (in the EMR but not in admin data) 12 (48%) had documentation of epilepsy as a child that may have predated the availability of administrative data and 13 (52%) did not have sufficient administrative data codes to meet our case definition. Of the 18 false positives (in the administrative data but not in the EMR), 6 (33%) were classified as “possible” epilepsy by our abstraction criteria, 4 (22%) had documentation of epilepsy being ruled out (seizures due to other causes), and there were 8 (44%) patients for whom there was no mention of epilepsy in the chart. However, the administrative data claims occurred prior to the start of the EMR record thus it is unclear whether those cases represented true epilepsy cases or were due to incomplete EMR records.
We found the prevalence and incidence of epilepsy in 2010 for our ideal algorithm of three physician billing codes (separated by at least 30 days) in 2 years or one hospitalization to be 0.70 per 100 and 0.51 per 1,000, respectively. The prevalence and incidence estimates were more impacted by the number of claims and having a 30-day interval between claims rather than the time frame to obtain the claims (see Figs. 1 and 2). Regardless of the algorithm used we found a gradual increase in the cumulative prevalence of epilepsy over time. Although we used a 5-year “run-in” period, epilepsy may require a longer run-in, as incidence still declined in the first 5 years before stabilizing (see Fig. 2).
We found that the ideal algorithm to identify patients with epilepsy was one hospitalization or three physician billings (separated by at least 30 days) within a 2-year period. Although two physician billings had a higher sensitivity, we opted for requiring the third billing with a higher PPV to give a more conservative estimate of prevalence and incidence and to decrease the number of false positives. Separating the additional codes by a minimum of 30 days also led to a higher PPV, as the separation most likely does not include single seizure cases with multiple billing codes for epilepsy surrounding a single event.
Although the results of our study showed lower sensitivity (73.7%) and PPV (79.5%) of the best performing administrative data algorithm compared to other studies,[6-10] our study was performed on a random sample of patients in a primary care setting and therefore is likely to better reflect the performance of administrative data algorithms in the general population. A recent Canadian study found a sensitivity of 89%, specificity of 92%, and PPV and NPV of 89% and 92%, respectively, for the most accurate algorithm of one hospitalization or two physician claims in 2 years; however, this study used patients from a neurologists' practice as the reference standard, which would have a much higher prevalence of epilepsy (44%) than the general population and therefore would not as accurately reflect the performance of administrative data algorithms in the general population. Other studies have used targeted sampling of a specific subgroup of patients who are likely to have epilepsy using epilepsy-specific codes[6-9] as a reference standard, and thus these other studies would also have artificially inflated PPVs due to the higher prevalence of epilepsy in a targeted sample. Another study looked at identifying epilepsy patients within a managed care organization in the United States using a computer algorithm with multiple inputs including age, ethnicity, physician billing, diagnostic testing, antiepileptic medication prescriptions and blood level monitoring, and presence of comorbid conditions. Although their best algorithm had a sensitivity of 83.1% and PPV of 85.3%, which was slightly better than what we found, all of these inputs are not available in Canadian administrative data and thus an algorithm with all these inputs could not be applied across the province or the country. To date, our study is likely the most generalizable due to its random sampling approach, the primary care population setting of the reference standard, and use of administrative data that is available across the country.
The prevalence estimates of epilepsy for adults using the optimal administrative data algorithm applied across the province, was in keeping with previously performed meta-analyses and systematic reviews combining international studies on the prevalence of epilepsy from a variety of information sources to determine the active prevalence of epilepsy.[17-20] Similarly the annual incidence was in keeping with a previous meta-analysis looking at the average annual incidence for people in North America/Europe.[18, 21] In addition our lifetime prevalence in the EMRALD cohort was similar to previously reported lifetime prevalence in the United States.[12, 22, 23] These concordant findings all add to the face validity of using administrative data to determine population prevalence and incidence of epilepsy.
We found that the term seizure disorder and epilepsy were used with similar frequency, and thus including both terms in surveys or other studies seeking to identify patients with this condition is recommended. Unfortunately, despite the detailed clinical information available in the EMR, a large proportion of epilepsies still could not be classified by etiology (76% unknown) or epilepsy type (61% unknown) according to the new ILAE classification, suggesting that the classification may not be as useful in primary care or for population-based studies. Of interest, however, 44% of our patients had a neurologist-confirmed diagnosis of epilepsy, in keeping with a recent U.S. study. In that study, it is estimated that 33% of patients with epilepsy and 52.3% with active epilepsy have seen a neurologist within the past 12 months. These findings of the characteristics of our EMR cohort used to validate the administrative data algorithms all support the suitability of using primary care EMR records as a reference standard for this type of validation work.
However, we must acknowledge the limitations of primary care EMR records in doing this validation work. Admittedly a detailed description of the type and course of epilepsy was missing in a large proportion of the patients identified in the primary care EMR. Therefore, more detailed analysis looking at these factors is likely not possible in primary care EMRs, and this detailed information may be better identified in patient registries with more detailed structured data entry specific to these aspects of epilepsy. In contrast, our finding of 78% of patients having a prescription for an antiepileptic drug may render primary care EMR records useful in determining trends or patterns of drug utilization for treating epilepsy. Furthermore, an advantage of the EMR record is the availability of prescription records for patients of all ages not just the elderly as in administrative databases available in most provinces in Canada. This may be particularly important as epilepsy onset often occurs in infancy or childhood. Although the EMR record has an area for recording historical health information, this area is dependent on the health care provider to populate, and the information provided varies from provider to provider. The length of the EMR record depends on when the physician commences using the EMR and when the patient comes in to see the physician after they have started using the EMR. Some physicians scan in historical information such as consultation letters, and some physicians may simply record the historical information in the cumulative patient profile with a simple phrase. As well in Ontario there is no provincial-wide program for automated electronic communication between specialists or hospitals and family physician EMRs, so it is also possible that current detailed information is missing from primary care EMR charts. This potential missing information may have impacted the completeness of our reference standard, as 8 of the 18 false positives using our optimal algorithm had administrative data records that predated the EMR record; it is plausible that several or all of these patients truly had epilepsy but not recorded in the EMR. Nonetheless if they truly had epilepsy and they were not false positives then the specificity and PPV that we found may be an underestimate of the performance of administrative data in identifying patients with epilepsy in the general population. Last we acknowledge that there are some differences between EMRALD participating physicians and all Ontario physicians; however, the prevalence of disease conditions should be representative of the patients that seek medical care, as there is no reason to believe that epilepsy patients would select their physicians based on age, gender, years in practice, or whether they are using an EMR.
Through our validation results we were able to identify patients with high specificity and reasonable sensitivity and PPV. This will allow for national disease surveillance and cohort-specific analyses using population-based administrative healthcare data, to investigate national epilepsy prevalence, incidence, and health care utilization trends.
This study is part of the National Population Health Study of Neurological Conditions. We wish to acknowledge the membership of Neurological Health Charities Canada and the Public Health Agency of Canada for their contribution to the success of this initiative. Funding for the study was provided by the Public Health Agency of Canada. The opinions expressed in this publication are those of the authors/researchers, and do not necessarily reflect the official views of the Public Health Agency of Canada.
This study was supported by the ICES, which is funded by an annual grant from the Ontario Ministry of Health and Long-Term Care (MOHLTC). The opinions, results, and conclusions reported in this paper are those of the authors and are independent from the funding sources. No endorsement by ICES or the Ontario MOHLTC is intended or should be inferred.
Dr. Karen Tu is supported by a Canadian Institutes of Health Research (CIHR) Fellowship Award in Primary Care.
Dr. Nathalie Jetté holds a Population Health salary award from Alberta Innovates Health Solutions (AI-HS) and a Canada Research Chair Tier 2 in Neurological Health Services Research.
Dr. Noah Ivers is supported by a CIHR Fellowship Award in Clinical Research and by a Fellowship Award from the Department of Family and Community Medicine, University of Toronto.
None of the authors has any conflict of interest to disclose. We confirm that we have read the Journal's position on issues involved in ethical publication and affirm that this report is consistent with those guidelines.
Karen Tu is a Senior Scientist at ICES and an Associate Professor and Family Physician at the University of Toronto.