• Epilepsy;
  • Prevalence;
  • Incidence;
  • Mortality;
  • Managed care organizations


  1. Top of page
  2. Abstract
  6. Acknowledgments

Summary: Purpose. The purpose of the present study was to apply computer algorithms to an administrative data set to identify the prevalence of epilepsy, incidence of epilepsy, and epilepsy-related mortality of patients in a managed care organization (MCO).

Methods. The study population consisted of members enrolled in Lovelace Health Plan, a component of Lovelace Health Systems, a statewide MCO headquartered in Albuquerque, New Mexico. Patient records were obtained from July 1996 to June 2001. Four logistic regression models with high sensitivity and specificity were applied to 1-, 3-, and 5-year time frames in which members were continuously enrolled in the MCO. Incidence was defined for patients who did not have an epilepsy-associated code in the 18 months before the first diagnosis entry. Mortality estimates in the population also were assessed by using a matched control group and linkage to a statewide death registry.

Results: The data yielded estimated prevalence rates of 7–10 per 1,000, depending on age, sex, ethnicity, and time interval. Annualized incidence was 47 per 100,000 for members continuously enrolled for 3 years and 71 per 100,000 for members continuously enrolled for 5 years. Crude mortality rates were 2–2.5 times higher for epilepsy patients identified with the algorithms than for the matched controls. Conditional logistic regression indicated that the odds of death for epilepsy patients as compared with controls ranged from 1.24 to 2.06.

Conclusions. Accurate estimation of prevalence, incidence, and mortality rates for epilepsy is an essential component of disease management in MCOs. The algorithms in this project can be used to monitor trends in prevalence, incidence, and mortality to inform decisions critical to improving the health care needs and quality of life for patients with epilepsy.

Authorities estimate that 2.5 million people are being treated for epilepsy in the United States (1). Each year, 150,000 people are newly diagnosed with epilepsy, and many more cases remain undetected (2). Most studies conducted in developed countries indicate wide variability in the prevalence of epilepsy, ranging from four to 10 cases per 1,000 persons (3–5). The actual rate may be significantly higher, however, because only half of patients with epilepsy are diagnosed within the first 6 months of the disorder (6). Although most patients can control their seizures with proper diagnosis and treatment, it is estimated that 20–25% of patients with seizures do not respond well to treatment (1) and do not achieve proper seizure control with current drug therapies (5).

Efforts to ascertain newly diagnosed cases of epilepsy are particularly challenging, as illustrated by studies that have estimated an incidence in the general population ranging from 50 to 100 per 100,000 people per year (3,4,7). A recent meta-analysis of 40 studies also found wide variability in incidence estimates and in the quality of results, depending on the method, geographic area, demographics, definitions, and classifications of epilepsy and epileptic seizures used (8). Most of these studies were conducted outside of the United States. Although incidence is variable among domestic and foreign studies, a consistent pattern occurs in relation to age. The onset of epilepsy occurs most frequently in the earliest years of life, decreases in adolescents, remains relatively stable in the middle years, and then increases for those aged 60 years and older (1,4,7,9). Epilepsy patients of all ages have an elevated standardized mortality rate 2 to 3 times higher than that of the general population (3,5). Excess deaths may be caused by cerebral diseases that are associated with seizures, fatal injuries during seizures, suicide, and sudden unexpected death in epilepsy (5,10–14). The risk of sudden unexplained death is roughly 20 times higher among epilepsy patients than that in the general population (4).

In managed care organizations (MCOs), disease-management and other programs to promote adherence to best-practice guidelines have grown dramatically in recent years because they promise to decrease costs, promote uniform practice patterns, and significantly improve health care outcomes. A crucial early step in any disease-management program is to identify accurately patients with the target disease so that administrators can estimate incidence and prevalence, develop and focus interventions, and assess the impact of quality-improvement programs. Health care administrators in MCOs could benefit from an accurate method of assessing the distribution of epilepsy patients so that they can allocate resources for the health care needs of this patient population. Clinical diagnosis of epilepsy can be a complex process, as no specific biologic markers exist and recurring seizures are associated with a wide range of disease conditions. Despite the uncertainties of using administrative data for determining diagnostic rates, disease-management studies of epilepsy can provide essential information for primary care physicians in MCOs, who must work collaboratively with specialists to diagnose, classify, and treat patients with epilepsy (2).

Record-linkage studies using rigorous methods and analytic approaches to examine the incidence and prevalence of chronic diseases in MCOs are limited. In the case of epilepsy, accurate case identification from MCO electronic claims data is difficult for several reasons, such as the use of antiepileptic drugs (AEDs) for multiple conditions. One recent study, however, used logistic regression to develop and validate an effective method for ascertaining newly diagnosed breast cancer patients in an MCO (15). Although this study provides a useful model for future record-linkage studies in MCOs, researchers have emphasized that epilepsy studies must specifically take into account the classification of seizures, risk factors, and geographic, age, sex, and ethnic differences to provide reliable information about prevalence and incidence (8). The purpose of this study was to develop a sensitive and specific algorithm to determine epilepsy prevalence, incidence, and mortality in an MCO population.


  1. Top of page
  2. Abstract
  6. Acknowledgments

Study design

Development of the algorithms for this study involved exploratory, confirmatory, and descriptive phases (Fig. 1) by using electronic administrative databases for patients who were continuously enrolled in Lovelace Health Systems (LHS), an MCO serving New Mexico (16). In the exploratory stage, a data set of potential epilepsy cases was constructed from administrative data systems for all health plan members continuously enrolled in the MCO for at least one year within the study period of July 1, 1996, through June 30, 1998. Epilepsy status was determined by using medical record review for a sample of 617 cases. The best algorithm for detecting epilepsy cases was developed by examining combinations of diagnosis, diagnostic procedures, and medication use. The best algorithm derived in the exploratory phase was then applied to a new set of data from the same MCO covering the period July 1, 1998, through June 30, 2000. A stratified sample based on ethnicity and age was drawn from the preliminary algorithm-identified epilepsy cases and noncases. Medical record review was completed for 644 cases to determine the accuracy of the algorithm. Data from both phases were combined to permit refinement of logistic regression models and to provide more stable estimates of the parameters. The best model used diagnoses and AEDs as predictors and had a positive predictive value of 84% (sensitivity, 82%; specificity, 94%). The best model correctly classified 90% of the cases.


Figure 1. The three phases of the study.

Download figure to PowerPoint

Use of management information system (MIS) data

Data for all phases of this project were obtained from multiple administrative databases throughout LHS. The institutional review board at LHS reviewed and approved the study. Information was collected from claims, pharmacy, and inpatient and outpatient utilization data and then linked to all membership information for a comprehensive view of a health plan member's care. Data were used from patients with primary care physicians who were part of the Staff Model in LHS during the study period. LHS uses IDX 8.4, a comprehensive MIS for tracking health care utilization, for all managed care applications (IDX, Burlington, VT, U.S.A.). These applications include scheduling, membership, Provider Network claims, inpatient and outpatient billing within the Staff Model, and member services.

One problem with most MCOs is that ethnicity of members is not recorded at the time of enrollment and thus is not available in the MIS data. To assign ethnicity to LHS members, we used the GUESS (Generally Useful Ethnicity Search System) computer program. Developed in the mid-1960s, this program assigns ethnic origins to surnames of people living in the United States (17). Although surnames are grouped into a variety of ethnic classes, the only reliable classifications for New Mexico and the surrounding region are Hispanic and non-Hispanic. The GUESS software is useful in this case because the LHS population is predominantly Hispanic and non-Hispanic white. Although the GUESS software was developed 40 years ago, three validation studies have been conducted for the New Mexico population of Hispanics and non-Hispanic whites. The initial study conducted in the late 1980s showed the software to be about 90% accurate for the designation of Hispanics in New Mexico (18). More recent validation of the GUESS software against the Surveillance, Epidemiology, and End Results tumor registry, which assigns ethnicity based on self-report of female health plan members with breast cancer, showed accurate assignment of ethnicity for 95.3% of non-Hispanic women and 83.8% of Hispanic women (19). The fact that the percentage is lower for Hispanic women may be due to intermarriage. An even more recent study was conducted by using self-reported race/ethnicity in Medicaid data as the standard to validate the GUESS software (20). Sensitivity for women was 89.4% and specificity was 81.67%. Interestingly, in light of the intermarriage name-change concern for women, a very similar pattern was obtained for men: sensitivity was 89.97% and specificity was 79.23%.

Study sample

We defined an epilepsy case for this study based on MIS data by extracting information about epilepsy, or possibly indicating epilepsy, for three groups: (a) enrollees having one or more specific epilepsy diagnoses made by a neurologist or a primary care physician, (b) enrollees lacking specific epilepsy diagnoses but having one or more other seizure-related diagnosis codes, and (c) enrollees not meeting either of the criteria in groups 1 or 2 but having more than a 30-day prescription for AEDS or undergoing related medical procedures (e.g., blood screening for AEDs or EEG).

Three data sets were created, each covering a different interval: 1 year, 3 years, and 5 years. The 1-year data set contained the data for members continuously enrolled from July 1, 2000, through June 30, 2001 (n = 113,939). The 3-year data set included members continuously enrolled from July 1, 1998, through June 30, 2001 (n = 79,121), and the 5-year data set included members continuously enrolled from July 1, 1996, through June 30, 2001 (n = 40,956). Each of these data sets contained four data tables: (a) all members meeting the specific enrollment criteria (to be used as the denominator in prevalence and incidence calculations), (b) members having an epilepsy-related code (i.e., specific epilepsy and seizure-related diagnoses), (c) total utilization for the members with an epilepsy-related code, and (d) total pharmacy utilization for members with an epilepsy-related code.

Data analysis


We developed multiple versions of the algorithm (i.e., different logistic regression models) for predicting prevalence because different MCOs across the country will have varying amounts of administrative data on which to apply these algorithms. Four different models were built, each adding another utilization component to the previous model and thus having different sensitivity and specificity values. Model 1 included the age of the member and the number of epilepsy-related diagnosis codes. Model 2 added AEDs that may be used by members with epilepsy. Model 3 added EEG and vagus nerve stimulator Current Procedural Terminology (CPT) codes. Finally, model 4 dropped the vagus nerve stimulator codes from model 3 and added depression and anxiety disorders as comorbid conditions likely to be treated by some of the drugs indicated for epilepsy. Specific parameters from the logistic regression models are presented in Table 1. For each of the models, prevalence estimates were calculated for each of the three different data sets (1-year, 3-year, and 5-year) to account for the varying degrees of patient “history” available with longer continuous enrollment.

Table 1. Estimated logistic regression parameters (B) and standard errors (SE) for alternative models containing epilepsy predictors
Intercept Model 1Model 2Model 3Model 4
B −2.3846SE 0.1203aB −2.8929SE 0.1574aB −2.7535SE 0.1586aB −2.5298SE 0.1628a
  1. These are logistic regression models. To obtain the predicted probability of epilepsy, the logodds must be transformed with the following formula: Predicted prob = Exp(logodds)/[1 + Exp(logodds)].

  2. ap < 0.0001.

  3. bDxA refers to ICD-9-CM codes for epilepsy; DxB refers to codes for seizures; DxC refers to codes for episodic phenomena that might represent seizures or epilepsy.

  4. cp < 0.05.

  5. dp < 0.001.

Age categories (yr)
 0–19−0.27980.1436  0.05830.1730 0.04570.1771−0.0590.1826
 65+−0.06530.1314 −0.24500.1696 −0.24890.1738−0.22090.1769
Diagnosis codesb
 DxA345.xx 0.45990.0544a 0.46200.0668a0.65930.0884a0.6850.09a
 DxB333.2  0.47520.0418a 0.22150.0475a0.24630.0498a0.240.048a
 DxC780.2 −0.46560.4435  0.07500.3945 0.16810.37290.1330.3666
Drug categories
 Drug cat 01Carbamazepine 0.10760.0181a0.10690.0191a0.1080.02a
Blood monitoring CPT code: (80156)
 Drug cat 02Clonazepam−0.10250.0417c−0.1010.0418c−0.0740.0435
 Drug cat 06Gabapentin 0.01780.0265 0.03140.02550.0640.0261c
 Drug cat 07Lamotrigine 0.06380.0767 0.03540.07750.0250.0771
 Drug cat 09Phenobarbital 1.88450.5970d1.70330.575d1.7570.546d
Blood monitoring CPT code: (80184)
 Drug cat 10Phenytoin 0.20500.0289a0.22170.0318a0.2240.0312a
Blood monitoring CPT code: (80185)
Blood monitoring CPT code: (80186)
 Drug cat 11Primidone 0.17700.0588d0.18510.0625d0.2370.0659d
Blood monitoring CPT code: (80188)
 Drug cat 12Valproic acid 0.07090.0174a0.06640.0175d0.0840.0192a
Blood monitoring CPT code: (80164)
 Drug cat 15Topiramate 0.43100.2941 0.37440.30560.380.3019
Blood monitoring CPT code: (80201)
 Drug cat 18Ethosuximide 0.16950.0743c0.15630.0763c0.2020.0851c
Blood monitoring CPT code: (80168)
Procedure codesEEG codes (95812–95958)
 CPT 01−0.7720.17858a−0.7490.175a
Comorbid conditions
 Depression311    −0.930.3211d
 Anxiety300    −1.3560.4632d

The numerator for the prevalence was determined from each logistic regression analysis by using a cutoff value of 0.28; that is, anyone with a score of 0.28 or higher was identified as an epilepsy patient. To determine the best cutoff point, we generated sensitivity and specificity values by using various cutoff points. The trade-off in this type of analysis is between false-positive and false-negative results. At a cutoff of zero, all of the cases would be classified as positive, and the model would have 100% sensitivity, but specificity would be zero, resulting in a high number of false-positive cases. At a cutoff of 1.00, all of the cases would be classified as negative, and the model would have a specificity of 100%, but sensitivity would be zero, resulting in a high number of false-negative cases. The cutoff point of 0.28 provides the best sensitivity and specificity for the models and therefore provides the highest number of correctly classified cases. Sensitivity and specificity values for the four models were 76.9% and 92.5% for model 1 (88.3% correctly classified); 81.8% and 94.0% for model 2 (90.6% correctly classified); 81.8% and 94.0% for model 3 (90.6% correctly classified); and 84.0% and 94.0% for model 4 (91.2% correctly classified). The denominator for the prevalence calculations was the number of MCO members in the respective 1-, 3-, or 5-year samples.


Incidence rates were calculated by applying model 3 to the 3- and 5-year data sets. An incident case was defined as a positive epilepsy patient, predicted by the model, who had no epilepsy-associated codes recorded within the first 18 months of the data set. Annualized incidence rates were generated by dividing the crude incidence rates by the number of months covered (i.e., 18 months for the 3-year data set and 42 months for the 5-year data set) and multiplying by 12.


Mortality rates were determined for the positive epilepsy cases in the 1-, 3-, and 5-year data sets by using all models. For each positively identified case (in each model and data set—12 groups of positive cases), we selected two matched controls who had no epilepsy-related diagnoses. The controls were matched on age category (0–19, 20–64, or 65+ years), sex (male or female), and ethnicity (Hispanic or non-Hispanic). The matched controls were selected from the remaining members in each of the three data sets (1-, 3-, and 5-year) who were not identified as positive cases. Cases overlapped from one model to another and from one data set to another. That is, a single health plan member could be a control for more than one case, depending on which model and data set the case was selected from.

Mortality data were obtained from Vital Records of the New Mexico Department of Health (NMDOH). The positively identified cases and their matched controls were linked to the NMDOH records. The information returned for analysis was the underlying cause of death. Currently, underlying cause is the only information that the NMDOH Vital Records maintains in its mortality data.


  1. Top of page
  2. Abstract
  6. Acknowledgments


Minor demographic differences were found across the 1-year, 3-year, and 5-year data sets. The percentage of Hispanics increased from 38% in the 1-year continuously enrolled population to 41% in the 5-year continuously enrolled population. For the subset of members with an epilepsy code, the percentage of Hispanics increased from 34% to 38%. The percentages of male and female subjects in both the total population and the epilepsy-related subset remained constant across the three data sets, with female subjects accounting for 52% of the total population and 58% of the members with an epilepsy code.

Among members in the total population, the percentage of continuously enrolled members in the age category 0–19 years dropped from 27% in the 1-year data set to 23% in the 5-year data set. For members in the middle-age category, 20–64 years, the percentage of members remained consistent across the 1-, 3-, and 5-year data sets. The percentage of members aged 65+ years increased from the 1-year to 5-year data set (8% to 13%, respectively). For the subset of members with an epilepsy-related code, the age category 0–19 decreased from 15% to 11% across the three data sets. In the age category 20–64, a 10% decrease was found from 65% to 55% across the three data sets. Epilepsy patients in the 65+ years age category increased across the three data sets from 20% to 34%. The 5-year data set had a larger percentage of older and Hispanic members.

Prevalence and incidence

Prevalence rates varied as more information was added to the models and as the time period covered by the data set was increased (1-, 3-, and 5-year continuous enrollment). A description of the results for each model follows.

Model 1 included age and diagnosis codes as predictors. The highest prevalence was seen in men aged 65+ years across all three data sets (Table 2). Among the elderly, non-Hispanic women had higher rates (11–12 per 1,000) than did Hispanic women (eight per 1,000). In contrast, elderly Hispanic men tended to have higher rates than elderly non-Hispanic men (15–18 per 1,000 versus 12–16 per 1,000, respectively). Among people aged 0–19 years, it appeared that men had slightly higher rates than women regardless of ethnicity, at eight to 10 per 1,000 versus six to seven per 1,000, depending on which data set was considered. The rates for people aged 20–64 years were relatively consistent in terms of sex and ethnicity across each data set, at seven to eight per 1,000.

Table 2. Prevalence of epilepsy among members continuously enrolled (per 1,000) in a managed care organization using model 1a
EthnicityAge category (yr)Sex1 yrb3 yrc5 yrd
  1. aModel 1 included age of the health plan member and number of epilepsy-related diagnosis codes.

  2. bMembers continuously enrolled from July 1, 2000, to June 30, 2001.

  3. cMembers continuously enrolled from July 1, 1998, to June 30, 2001.

  4. dMembers continuously enrolled from July 1, 1996, to June 30, 2001.

 0–19M810 10 
 65+M15 16 18 
 65+F11 12 11 
 65+M13 12 16 

Model 2 included age, diagnosis codes, and AED drug categories as predictors. In general, patterns similar to those in model 1 were obtained, although the addition of AEDs to the model increased the overall prevalence rates (Table 3). The highest prevalence was obtained for people aged 65+ years across all three data sets. Non-Hispanics in this age group had higher rates than Hispanics regardless of sex, and men had higher rates than women. Non-Hispanic men aged 65+ years had the highest rates, ranging from 21 to 25 per 1,000. Among people aged 0–19 years, male subjects generally had higher rates than female subjects, but the differences between male and female subjects were larger among Hispanics than among non-Hispanics. Hispanic females in this age group had a rate of four per 1,000, whereas males had rates of seven to nine per 1,000. Non-Hispanic male and female subjects aged 0–19 years had rates that were much more consistent, ranging from 5 to 7 per 1,000. The rates for people aged 20–64 years varied more than those in model 1, but were relatively consistent across sex and ethnicity, ranging from six to 10 per 1,000.

Table 3. Prevalence of epilepsy among members continuously enrolled (per 1,000) in a managed care organization using model 2a
EthnicityAge category (yr)Sex1 yrb3 yrc5 yrd
  1. aModel 2 included age of the health plan member, number of epilepsy-related diagnosis codes, and antiepileptic drugs.

  2. bMembers continuously enrolled from July 1, 2000, to June 30, 2001.

  3. cMembers continuously enrolled from July 1, 1998, to June 30, 2001.

  4. dMembers continuously enrolled from July 1, 1996, to June 30, 2001.

 65+F13 13 14 
 65+M16 17 18 
 65+F16 17 15 
 65+M22 21 25 

Model 3 included age, diagnosis codes, drug categories, and CPT procedure codes. The addition of procedure codes to this model did not significantly affect the prevalence rates. The results for specific age, sex, and ethnicity categories were similar to those obtained in model 2.

Model 4 included age, diagnosis codes, drug categories, EEG procedure codes, and psychiatric comorbid conditions (anxiety and depression) as predictors. The results were similar to those from models 2 and 3, except for an increase in rates among patients aged 65+ years. The rates for female subjects in this age category ranged from 13 to 15 per 1,000 for Hispanics and 17 to 20 per 1,000 for non-Hispanics, and the rates among male subjects ranged from 18 to 21 per 1,000 for Hispanics and from 21 to 24 per 1,000 for non-Hispanics.

The estimated annualized incidence rates were 47 per 100,000 for the 3-year data set and 71 per 100,000 for the 5-year data set.


Crude mortality rates for the epilepsy cases and controls as predicted by each model across the 1-, 3-, and 5-year data sets are presented in Table 4. Crude mortality rates were 2–2.5 times higher for cases than for controls across all four models for the 1- and 3-year data sets. As expected, mortality rates were higher for both cases and controls for the 5-year data set and were approximately 1.5–2 times higher for the epilepsy cases than for the matched controls. A conditional logistic regression analysis was conducted to compute odds ratios for the mortality rates. The results of this analysis are presented in Table 5. The odds of death for the epilepsy cases as compared with matched controls varied from 1.24 to 2.06. The odds of death were higher for the 1- and 3-year data sets than for the 5-year data set.

Table 4. Mortality rates for epilepsy patients and controls
Model1 yr3 yr5 yr
CaseMatched control CaseMatched control CaseMatched control
  1. Data are presented as percentages.

Model 13.331.203.411.244.562.28
Model 23.431.373.641.175.123.32
Model 33.131.453.351.314.573.36
Model 43.821.463.241.335.823.17
Table 5. Odds ratios for mortality among epilepsy cases and matcheda controls
Model1 yrb3 yrc5 yrd
OR95% CIOR95% CIOR95% CI
  1. OR, odds ratio; CI, confidence interval.

  2. aMatched on age category, sex, and ethnicity.

  3. bMembers continuously enrolled from July 1, 2000, to June 30, 2001.

  4. cMembers continuously enrolled from July 1, 1998, to June 30, 2001.

  5. dMembers continuously enrolled from July 1, 1996, to June 30, 2001.

Model 11.991.35–2.932.061.32–3.231.741.02–2.98
Model 21.721.18–2.501.851.22–2.801.380.85–2.22
Model 31.601.08–2.391.701.11–2.621.240.74–2.09
Model 41.711.20–2.451.671.07–2.601.480.93–2.36


  1. Top of page
  2. Abstract
  6. Acknowledgments

The purpose of the final phase of this project was to apply sensitive and specific computer algorithms to estimate the prevalence and incidence of epilepsy in an MCO population. The algorithms were developed during the initial phases of the project (16) for use by other MCOs to define their epilepsy patient populations. The results from this study are consistent with the literature indicating overall prevalence estimates of seven to 10 per 1,000, with higher estimates for those patients aged 65+ years ranging from 12 to 18 per 1,000. These findings are similar to the results of other studies showing that approximately 1.5–2% of the elderly population is affected by epilepsy (3,7). This study also supports a consistent finding in the epidemiology literature that males are more likely than females to be affected by epilepsy (7). Although limited previous research has indicated that epilepsy is more common among Hispanics than non-Hispanics (6), this study demonstrated the opposite effect, suggesting heterogeneity among populations classified as Hispanic.

A defined time frame was used to determine a new epilepsy case, and incidence estimates were provided for members continuously enrolled for 3- and 5-year periods. The annualized incidence estimate obtained for the 36-month cohort in this study (47 per 100,000) was close to the range of expectations for a 1-year incidence rate (50–100 per 100,000) as reported in the literature (2). The annualized incidence rate for members enrolled for 60 months (71 per 100,000) was also within the range expected based on reports of annual incidence rates in the literature. As in other studies, we found that determining incidence in an MCO is challenging (8). Even when case definitions are well established, it is difficult to identify new cases because MCO membership is in constant flux, with new members enrolling and others leaving the plan. Identifying new cases also is difficult because, even though many of these patients are new to the MCO, they are not necessarily new epilepsy cases. Preexisting diagnoses are often missed when patients enter an MCO.

Limited research is currently available to ascertain accurate prevalence and incidence estimates of epilepsy patients receiving services in MCOs. Most studies have been conducted on broader populations in larger community settings, resulting in variable incidence and prevalence estimates. This variation is primarily due to problems with diagnosis, different case definitions, and methodologic variations (9). Similar issues arise in studies that rely on administrative data to quantify other medical conditions. One example is a study that developed and evaluated a method for ascertaining newly diagnosed breast cancer cases by using multiple sources and Medicare claims data (15). The current study parallels the breast cancer study by applying similar methodologic and analytic strategies to develop algorithms to quantify prevalence and incidence of epilepsy in an MCO population. The different models allow, with varying degrees of sensitivity and specificity, the use of any one or all of the three types of administrative data that would be available to a typical MCO or medical group: diagnosis codes, procedure codes, and pharmacy codes. When using only a single source of data (e.g., diagnosis codes), the organization or researcher must be aware that the sensitivity and specificity of the models will be different than those in a model using all three types of data. The models built from fewer data sources are less sensitive. Concerning the availability of demographic information on MCO enrollees, both age and gender would certainly be available for any MCO population and would always be able to be included in the model.

This study also examined mortality rates among epilepsy patients in an MCO population. Two issues make it difficult to calculate epilepsy-related mortality rates. The first is related to the constantly changing membership in the MCO. As the population changes, the effects on mortality rates are unknown. The second issue concerns the definition of an epilepsy-related death. Such deaths may be caused by injuries or other related illnesses. The underlying cause of death is typically attributed to the most recent event and not epilepsy, so epilepsy-related mortality may be underreported. For this study, we determined mortality by linking a list of prevalent cases and controls from the utilization database with state death-certificate information to obtain primary cause of death and date of death. The mortality results indicated a higher likelihood of death for the identified epilepsy cases than for the matched controls. These results are consistent with the increased risk of mortality associated with epilepsy that has been reported in the literature (3).

Most MCOs, including the MCO whose data were used in this study, do not routinely collect information on the race/ethnicity of their enrollees. Therefore surrogates for ethnicity are useful to provide ethnic comparisons or examine ethnicity as a confounder. Clearly, the GUESS software is primarily useful in the Southwest, where the primary ethnic groups are Hispanic, non-Hispanic white, and Native American. The software would not be useful to study populations with a significant proportion of one or more multiethnic groups for which surname is not useful in assigning ethnicity (e.g., African Americans). The epilepsy identification algorithm and associated models that are the central focus of this article, however, do not require race or ethnicity as variables. Thus MCOs, other health care providers, and researchers can use the algorithm effectively in the absence of direct or indirect information on member ethnicity.

The methods developed in this study may be useful in future research on epilepsy in other health care or MCOs. This study used several models (algorithms) that can be used to estimate prevalence and incidence across several different time frames within MCOs. This approach should offer MCOs flexibility in selecting the model that is most appropriate for their data systems, depending on the quality and types of claims and utilization data captured by their organization. From the development of the algorithms used in this study, it appears that the mere presence of diagnostic codes for epilepsy or seizures (ICD–9–CM codes 345.xx and 780.3x) is insufficient for identifying cases of epilepsy in health care records because these codes alone lack both high sensitivity and positive predictive value (16). However, if multiple occurrences of such codes in a single patient record are considered, and if indicators of AED prescription are also included, then sensitivity and positive predictive value improve considerably. Thus among the algorithms used in this study, model 2 showed a distinct improvement over model 1. However, only small additional improvements were seen in the sensitivity and predictive value of epilepsy case detection when CPT procedure codes (primarily for EEG) or ICD–9–CM codes for psychiatric comorbidity were added, as reflected in models 3 and 4. Overall, being able to select a specific model offers other MCOs the ability to define accurately their existing patient populations, and such identification should assist in the allocation of health care expenditures.

In the first two phases of this study, measurements of sensitivity and specificity of models 1 and 2 indicated that they could correctly classify 88% and 91% of cases, respectively (16), suggesting that these models should perform well when used to estimate epilepsy prevalence and incidence. In application, as demonstrated in the third phase of this study, the model yielded credible estimates of prevalence estimates that are generally consistent with the findings of comparable studies of epilepsy prevalence. The estimates varied only modestly when derived from study population members with 1, 3, and 5 years of continuous enrollment. Thus the models appear to meet expectations that they can provide valid estimates of epilepsy prevalence. The application of our method to assess incidence, however, yielded less consistent estimates. Although these were within the range of incidence rates found among other studies, substantially more variation was found between estimates derived from study members with 3 and 5 years of continuous enrollment. Additional study of methods to apply these models may be needed to ensure more reliable estimates of epilepsy incidence.

In conclusion, these methods show promise for broader use in the study of epilepsy occurrence in MCOs and other defined populations that have linked administrative data covering inpatient and outpatient services. A primary advantage of these methods is their comparatively low cost, because they rely on existing data. Furthermore, these methods may be useful for related research (e.g., studies of secular trends in epilepsy occurrence and studies of health care service delivery). Finally, these methods may assist in the identification and sampling of cohorts of people with epilepsy for follow-up studies. Epidemiologic research and surveillance are important to assess the public health burden of epilepsy, to provide accurate information to assist in policy development, to ensure necessary services for those with epilepsy. The methods described are useful tools for these purposes.


  1. Top of page
  2. Abstract
  6. Acknowledgments

Acknowledgment:  Work on this project was supported by contract 2001-Q-000163 from the Centers for Disease Control and Prevention.


  1. Top of page
  2. Abstract
  6. Acknowledgments