Research Using Emergency Department–related Data Sets: Current Status and Future Directions
Dr. Hirshon was supported by National Heart, Lung, and Blood Institute Grant 5K08HL073849 and Dr. Andersen was supported by National Institute of General Medical Science Grant T32GM075767.
This work is the output from a consensus workshop conducted during the May 2009 Academic Emergency Medicine consensus conference in New Orleans, LA: “Public Health In the ED: Surveillance, Screening, and Intervention.”
Consensus conference session participants included Srikar Adhikari, Sabina Braithwaite, Kerry Broderick, Chris Buresh, John Finnell, William G. Fernandez, Toni Gross, Michael Handrigan, Nancy Holson, Jeffrey Hom, Yu-Hsiang Hsieh, Chris Kahn, Nancy Kerr, Patrick Chow-In Ko, Damon Kuehl, Rika Maeshiro, Priya Mammen, Mary Pat McKay, Douglas McLachlan, Lawrence Melniker, Ward Myers, Toby Nagurney, David J. Osban, Michael Radeos, Jennifer Setlik, Matthew Scholer, Ashley Sullivan, Carolyn Synovitz, and Adrian Tyndall.
Address for correspondence and reprints: Jon Mark Hirshon, MD, MPH; e-mail: email@example.com.
The 2009 Academic Emergency Medicine consensus conference focused on “Public Health in the ED: Surveillance, Screening and Intervention.” One conference breakout session discussed the significant research value of health-related data sets. This article represents the proceedings from that session, primarily focusing on emergency department (ED)-related data sets and includes examples of the use of a data set based on ED visits for research purposes. It discusses types of ED-related data sets available, highlights barriers to research use of ED-related data sets, and notes limitations of these data sets. The paper highlights future directions and challenges to using these important sources of data for research, including identification of five main needs related to enhancing the use of ED-related data sets. These are 1) electronic linkage of initial and follow-up ED visits and linkage of information about ED visits to other outcomes, including costs of care, while maintaining deidentification of the data; 2) timely data access with minimal barriers; 3) complete data collection for clinically relevant and/or historical data elements, such as the external cause-of-injury code; 4) easy access to data that can be parsed into smaller jurisdictions (such as states) for policy and/or research purposes, while maintaining confidentiality; and 5) linkages between health survey data and health claims data. ED-related data sets contain much data collected directly from health care facilities, individual patient records, and multiple other sources that have significant potential impact for studying and improving the health of individuals and the population.
As highlighted by the Institute of Medicine, understanding and strengthening the relationship between public health and medicine is vitally important to better address the health needs of the public.1,2 Evaluating, measuring, and monitoring the health of populations are primary objectives of public health research, in comparison to the traditional biomedical model that focuses on the evaluation and treatment of diseases in individuals.3,4 Emergency departments (EDs) play a vital role in our health care system, in part due to their key location at the interface between the populace and health care. They are well positioned to collect data concerning important aspects of public health, medical, and social problems. In many ways, EDs can be viewed as windows on the general health of their communities, allowing for better understanding of the health needs of the public and reasons for health care resource utilization.5,6 Constantly improving information technology has dramatically increased the availability of health-related information. The data collected from EDs and related systems are vital to the understanding that emergency physicians, researchers, and other health care professionals have concerning patterns of health resource utilization, the magnitude of various public health problems, and the identification of new and emerging health threats. Focused research activities using these data will help to guide more thoughtful and efficacious public health and medical interventions.
The 2009 Academic Emergency Medicine consensus conference focused on “Public Health in the ED: Surveillance, Screening and Intervention.” One session of this conference discussed the significant value, from a research perspective, of health-related data sets. This article represents the conference proceedings from that session, primarily focusing on ED-related data sets, and includes an example of the use of a data set, based on ED visits, for research purposes.
What are health- and ED-related data sets?
Research using data sets that provide information related to health or the utilization of health services, such as ED visits, is well known to epidemiologists and other public health professionals. This universe of health-related data sets is generally used by researchers to answer scientific and public health questions concerning populations. Health-related data sets are the main tool of many epidemiologists and health services researchers, and the breadth of available data sets is vast and growing. Health statistics and information on many health-related data sets are available at http://www.fedstats.gov. The recent National Report Card on the State of Emergency Medicine developed by the American College of Emergency Physicians used a broad range of health-related data sets to develop the 116 objective measures used in the analysis of the current emergency care environment.7 While there are many health-related data sets that are relevant for public health research and may have implications for emergency medicine, there are a smaller number of data sets that are focused on EDs or include specific questions relevant to the ED, which is the primary focus of this paper.
Health-related data sets that include information on ED use generally include traditional health services encounter data that are collected during the visit. Data sets generated from ED visits are frequently collected for administrative or billing purposes, such as administrative Medicare data collected by the Centers for Medicare and Medicaid Services. Alternatively, data on use of the ED may be collected specifically for research and planning purposes, such as the National Hospital Ambulatory Medical Care Survey (NHAMCS; http://www.cdc.gov/nchs/ahcd.htm). In addition, some general purpose health surveys, such as the National Health Interview Survey (http://www.cdc.gov/nchs/nhis.htm), collect some information on the ED use of survey respondents.
Many of the large-scale ED-related data sets discussed in this article are sponsored by federal and state governments. However, some private organizations have data useful for emergency care research. For instance, the American College of Surgeons collects trauma registry data from participating trauma centers and produces the National Trauma Data Bank (NTDB; http://www.facs.org/trauma/ntdb/index.html). Recently, the American College of Surgeons and the Centers for Disease Control and Prevention (CDC) supported the development of the National Sample Program (http://www.facs.org/trauma/ntdb/nsp.html), which is a national probability sample of 100 Level 1 and 2 trauma centers from the NTDB.
Syndromic surveillance systems are another example of data sets collected from ED visits. Syndromic surveillance systems have been developed to provide rapid electronic collection of health data. Most states have syndromic surveillance systems that utilize ED data.8–10 Syndromic surveillance systems typically consist of data collected on chief complaints of patients, combined with information on other acute health care activities such as medication purchases in a specific population, to offer real-time or near-real-time detection of outbreaks of infectious or bioterrorism-related diseases.8
Types of data sets
ED-related data sets can be categorized by their data collection methods. The primary collection methodologies are 1) complete enumeration, 2) population-based and provider-based sample surveys, 3) non–population-based registries, and 4) linked data. Examples of ED-related data sets based on different collection methodologies are listed in Table 1. Each data set has strengths and limitations, in part based on data collection methodology. The discussion of these details is beyond the scope of this paper.
Types of ED-related Research Data Sources
| Medicare claims and enrollment data|
|Medicare claims (bills) for persons with fee-for-service coverage||Medicare eligible 65+ as well those with end stage renal disease and some disabled||- Data on ED visits can be found in the inpatient file (if they result in an admission), outpatient file, or, occasionally, in the (physician) carrier file.|
| National Vital Statistics System, Mortality Data|
|State death registration||All deaths in U.S.||- Place of death indicates if the death occurred in the ED or outpatient department|
| HCUP State Emergency Department Databases|
|ED encounter medical records||All ED encounters in a state||- Diagnoses and procedures|
- Charges and expected source of payment
| HCUP State Inpatient Databases|
|Inpatient discharge medical records||All inpatients discharged in a state||- Admission through the ED|
|Population-based and provider-based sample surveys|
| National Health Interview Survey|
|Personal interviews||Persons in civilian households selected from a multistage probability design||- Use of medical services including number of ED visits in the past year|
- Demographic characteristics
- Health insurance status
| Medical Expenditure Panel Survey (MEPS)|
|Five rounds of personal interviews conducted in two years that take place over a two and a half year period||Nationally representative subsample of households that participated in the prior year’s National Health Interview Survey||- Use of medical services including the ED|
- Costs of care and source of payments
- Health insurance coverage, income, and employment.
| National Electronic Injury Surveillance System- All Injury Program|
|ED encounter medical records||All injury related ED encounters from a national probability sample of hospitals in the U.S.||- Cause of injury seen in ED|
- Circumstances of the injury
- Information on consumer products involved
| National Hospital Ambulatory Medical Care Survey|
|Encounter forms completed by physicians or office staff||Sample of visits to EDs and outpatient visits from a national probability sample of hospitals in the U.S.||- Characteristics of patient visits|
- Characteristics of treatment
- Prescribing patterns
| Drug Abuse Warning Network, ED component|
|Case report form completed after review of ED medical records||All drug-related visits from a national probability sample of hospital EDs||-Information about drug-related ED visits including visits related to illegal drugs of abuse, prescription and over-the-counter medications, dietary supplements, non pharmaceutical inhalants, and alcohol use|
|Non-Population Based Registries|
| National Trauma Data Bank|
|Encounter data from trauma centers||Convenience sample based upon all visits from 600 U.S. trauma centers submitting data||- Characteristics of patients entering trauma care|
- Abbreviated Injury Scale for trauma patients
| National Health Interview Survey (NHIS) Linked Mortality Files|
|NHIS survey participants matched with the National Death Index||NHIS survey participants who are eligible for linkage||- Data on deaths in the ED combined with detailed demographic characteristics|
- Data on ED visits in the previous year and follow up on deaths
| NHIS/Medicare Enrollment and Claims Data|
|NHIS survey participants matched with Medicare data||NHIS survey participants who provide information necessary for linkage||- Data on ED visits paid for by Medicare can be linked to NHIS patient characteristics|
Importance of data sets
Health-related data offer researchers a rich opportunity to conduct epidemiologic studies, defined as “how disease is distributed in populations and the factors that influence or determine this distribution.”11 Data sets generated from ED visits can be used to study the distribution of diseases and injuries treated in the ED. The data have been used to study health threats requiring immediate attention such as injuries from car crashes and to conduct surveillance on emerging public health threats such as influenza. ED data have also been used to investigate disparities in the utilization and provision of services among subpopulations. For instance, ED data have been used to investigate ethnic disparities in the initial management of trauma patients.12 Additionally, ED data can be linked to data from other sources such as inpatient data to answer important health and health care utilization questions.
Because EDs function as a critical societal health care and social safety net, data sets created from ED clinical and administrative records may hold great promise for understanding treatment patterns for the underserved, including those without insurance and other special subpopulations within the United States. An example of this is recent research based on data from NHAMCS that indicates that African Americans received fewer prescriptions for pain when seen in EDs compared to whites.13 As another example, studies using trauma registry data indicate that trauma patients are not a cross-section of the general or ED population, but are much more likely to have psychiatric conditions, including substance abuse and dependency disorders.14 Using this information, researchers and practitioners can provide more effective and efficient interventions by tailoring their programs for the patients in trauma care settings.
Health-related data sets play an important role in public health surveillance. This is especially true for ED-related data sets, because EDs are where the first victims of infectious disease outbreaks or other medical emergencies (e.g., natural disasters or biologic or chemical attacks) are most likely to present. Utilizing the capabilities of informatics systems to rapidly populate databases to alert front-line medical responders and public health authorities could be crucial in situations requiring immediate action to limit the spread of disease and decrease morbidity and mortality. With the current interest and concern related to potential pandemic influenza, such as novel H1N1, EDs are well positioned to provide near-real-time surveillance of influenza cases. Data collected from multiple data sources, such as ED visits and medication usage patterns from pharmacies, can be used to monitor the health of the population in emergency situations.
Research potential for data sets
Just as the types of ED-related data sets available are multiple and varied, so is their research potential. Research using all types of ED-related data sets can serve both inductive and deductive pursuits through hypothesis generation and hypothesis testing, respectively. The latter use is especially powerful for clinicians or health care workers with research proclivity who notice interesting trends or cases in their clinical practice that may be indicative of a greater public health problem. In particular, data sets related to ED visits can be used to research injury or disease patterns, to identify risk factors in populations, or to tailor interventions for specific health problems. There are numerous additional research applications including, as examples, analysis of costs, quality of care, treatment disparities, and trends in ED utilization or treatment of diseases, injuries, or illnesses.
Many studies have used data to identify populations presenting with particular illnesses or injuries to develop targeted interventions.15–19 A recent study of ED visits for chronic obstructive pulmonary disease exacerbations used the ED component of the NHAMCS data set and reported that patients in the south United States were less likely to receive systemic corticosteroids.17 Using NHAMCS data, another study found that sports and recreational activities were the leading external causes of pediatric injury among children visiting EDs.20
As noted above, syndromic surveillance systems allow for the timely analysis of ED chief complaint data, frequently on a daily basis, and are currently used for bioterrorism and infectious threat detection. Although these data collection mechanisms may be useful in nonsyndromic research, the application has not yet been fully explored.21
Costs of analysis of data sets for research
The costs of using ED-related data sets in general varies depending on the source of the data and whether the data are already compiled and collated in a usable format. Establishing health-related data sets based on ED visits can be expensive, but is extremely important considering the approximately 119.2 million ED visits in the United States in 2006.22 Once established, the cost for ED-related data sets varies depending on the data set utilized. Financial resources required for research based on ED data sets could include the charges associated with obtaining the data, statistical programs for data analysis, and the personnel to analyze the data. Many of the data sets are free (e.g., NHAMCS), while others require a nominal fee (e.g., Healthcare Cost & Utilization Project [HCUP]). However, some may be expensive, such as ED-related data sets purchased from private sector vendors. Funds may be required to analyze confidential data at a research data center.
Without the use of existing ED-related data sets, many investigators would be limited to research pursuits based on primary data collection. Even for investigators who would be able to get funding, the time frame between protocol development and the publication of results can be exceedingly long. Existing health-related data sets can provide preliminary data for grant development or for initial exploration of health-related questions at a significantly reduced cost relative to the cost of primary data collection.
Barriers to use of data sets
There are numerous reasons for not using currently available ED-related data sets. These include, but are not limited to, concerns regarding the completeness, accuracy, and timeliness of the data, along with the ability to generalize (if the data set is a convenience sample). As noted above, many data sets require statistical analysis packages to analyze the data, which may be a financial and educational barrier for some individuals. However, online data analysis tools are increasingly available, therefore negating the need for sophisticated data analysis tools. The Web-based Injury Statistics Query and Reporting System (http://webappa.cdc.gov/sasweb/ncipc/mortrate.html) is an online statistical analysis tool to analyze fatal injury data from the National Vital Statistical System and nonfatal injury data from the National Electronic Injury Surveillance System–All Injury Program. Additionally, HCUPnet (http://hcupnet.ahrq.gov/) provides a free online statistical analysis tool for queries of all hospital stays, including mental health hospitalizations, and, for some states, of all ED visits.
Institutional review board (IRB) approval is required for any research project involving human subjects, according to the Code of Federal Regulations (Part 46, Protection of Human Subjects, 46.101).23 When health-related data sets contain publicly available deidentified data, IRB approval is generally less problematic. However, as the requested data become more specific and the potential for identification of an individual increases, the scrutiny by an IRB will also increase.
Current research with an ED-related data set example: national hospital ambulatory medical care survey
NHAMCS, conducted annually by the CDC’s National Center for Health Statistics (NCHS), is an example of a nationally representative health services data set. Data are collected on hospital ED and outpatient department visits, as well as hospital and ED characteristics. Yearly reports are published for both types of ambulatory care facilities.22 NHAMCS uses a complex multistage sampling strategy, beginning with 112 primary sampling units (PSUs), which generally correspond to counties or other similar jurisdictions. Within each PSU, a sample of nonfederal short-stay general hospitals is stratified into four classes based on whether the hospital has an ED only, an outpatient department only, both, or neither. Finally, for the ED component, a sample of visits is selected from each emergency service area over a random 4-week reporting period. Each hospital, ED, and patient visit is weighted according to its inverse probability of having been selected. The somewhat complex weighting process includes an adjustment for nonresponse and is important for the results to be nationally representative. Hospital-, ED-, or patient visit–level research can be conducted using NHAMCS. An example of a hospital-level study is a national report on bioterrorism response planning.24
Data are made available to researchers through public use files released yearly through the NCHS Research Data Center (http://www.cdc.gov/nchs/r&d/rdc.htm). The survey instrument with variables collected is available online (http://www.cdc.gov/nchs/data/ahcd/nhamcs100ed_2009.pdf). To provide oversight, the survey undergoes yearly review and approval at NHCS through its ethics review board, which serves as the IRB for these surveys. Data files undergo rigorous disclosure review and information is not available that could identify either the hospital or an individual patient.
An analysis using NHAMCS provides an example of the research potential of this data set.25 The authors were interested in studying the epidemiology of blood culture use in EDs in light of published guidelines, associated factors, and time trends. Because a check box for blood cultures on the patient record form was implemented in 2001, the time period chosen was 2001 through 2004. The frequencies of blood cultures over all ED visits and over visit subsets of more relevant interest were generated. Through statistical analyses, the authors were able to show an increasing national trend in blood culture use and that a large proportion of those cultures were for patients without clinical indicators of bacteremia and for patients who were not admitted to the hospital.
Limitations of data sets
Health- and ED-related data sets may be limited by several factors, examples of which include variable data quality, timeliness, and representativeness. Each data set has its own limitations based on a number of factors, such as the method by which the data are collected, the completeness of the data, and the structure and format of the data set. The limitations of the data must be considered prior to analyzing the data, because the limitations will impact both the accuracy and the reliability of the results.
Some ED data sets rely on administrative data using either medical records or claims data. Data from these sources are frequently limited to basic demographic, diagnosis, and health outcome at the time of discharge. The quality of the data even within an institution can vary depending on operational pressures and may have missing and unknown values. For instance, quality of information on race and ethnicity may be poor, as it may be collected using visual inspection rather than asking the patient or may not be recorded in the chart. Problems with classification can be particularly problematic if there is differential misclassification. For instance, if race is missing for younger ages and not for older persons, analysis based on race will be more accurate for older persons than for younger.
Lack of consistency in coding and classification of diseases and injury may also be a limitation of some health-related data sets. In some instances, this is a reflection of the quality of data in the administrative record, but may also be related to the screening and diagnostic technologies available at a particular facility.
For injury research, external cause-of-injury codes (E-codes), which are used to classify injury incidents by intent (e.g., unintentional, homicide/assault, suicide/self-harm, or undetermined) and mechanism (e.g., motor vehicle, fall, struck by/against, firearm, or poisoning), are incomplete in the majority of state inpatient and ED databases.26 The E-codes are critical to understanding the causes and patterns of injury, as well as for evaluating prevention programs. A recent CDC report recommended strategies to improve data collection, coding, quality assurance practices, analysis, reporting, and dissemination of E-coded data, which are currently being implemented.26
Timeliness is frequently a concern for ED-related data sets regardless of data source. For example, many governmental data sets based on surveys or administrative data are made publicly available several years after the data are collected. The time to collect and process the data, particularly for nationally representative samples, can hinder the ability of researchers to see emerging trends. The cost of creating and maintaining ED-related data sets is frequently quite high. Fortunately, the initial costs should be incurred only once and may be outweighed by increases in patient safety and improved outcomes.
Specification of a denominator for calculating rates may be a problem with ED-related data sets, particularly if they are health care provider–based, because clearly defining catchment areas may be difficult due to variability in usage patterns or due to the fact that the population of a catchment area utilizing a particular facility or group of facilities may vary from the population at large. In addition, ED-related data sets are commonly based on visits, as opposed to individual patients, which may be of concern because a single patient may make multiple visits. Because the data are deidentified, it is often impossible to identify the first visit for a condition. Population-based and household surveys, as well as Medicare data, do not have this limitation. Additionally, some data sets are convenience samples, which will pose a threat to external validity, especially when attempting to extrapolate the data nationally. This includes the National Trauma Data Bank, which is currently the largest trauma registry data set available. However, while these injury data would produce inaccurate national estimates, they may be valuable for planning, implementing, and evaluating the effectiveness of injury prevention programs.26
Future directions and challenges
The rapid growth of the electronic medical record opens numerous opportunities for the collection of data from health care providers. First, it may allow for electronic capture of many variables currently manually extracted. This may improve the quality and accuracy of data collection while decreasing the cost. Additionally, the speed of data extraction should improve. However, as the data become more location- and time-specific, there may be additional challenges with maintaining confidentiality.
The move to Internet-based statistical analysis tools, like Web-based Injury Statistics Query & Reporting System (WISQARS) and HCUPnet, may increase the ease of analyzing these databases and therefore increase their use for research. Finally, as electronic data collection evolve, there may be opportunities to link multiple ED visits electronically to see patterns over time, while maintaining data confidentiality.
As an outgrowth of the consensus conference, five main needs related to enhancing the use of ED-related datasets were identified. These were:
- 1For applicable data sets, electronic linkage of initial and follow-up ED visits and linkage of information about ED visits to other outcomes, including costs of care, while maintaining deidentification of the data.
- 2Timely data access with minimal barriers.
- 3Complete data collection for clinically relevant and/or historical data elements, such as the external cause-of-injury codes.
- 4Easy access to data that can be parsed into smaller jurisdictions (such as states) for policy and/or research purposes, while maintaining confidentiality.
- 5Linkages between health survey data and health claims data.
Research questions related to the above identified needs include:
- 1How can the risk of disclosure for individuals included in ED-related data sets be reduced, especially considering that linked data increases the quantity and specificity of information available at the individual level?
- 2How will the data be managed and who will be the data stewards once the data are linked, considering that the owners of the data that are to be linked are often different entities, sometimes even from public and private sources?
- 3How can the appropriate balance between increased timeliness of data and data quality be determined, since processing raw data and running error checking can be time-consuming, but leads to better data quality?
- 4How can the recording of pertinent data elements, such as mechanism of injury, be improved in the clinical record, considering the current pressures that health care providers face?
- 5How can methods be developed to parse national data into smaller areas, such as states, that would allow for stable estimates, while maintaining confidentiality?
ED-related data sets contain much data collected directly from health care facilities, individual patient records, and multiple other sources that have significant potential impact for studying and improving the health of individuals and the population. These data sources have been used for multiple epidemiologic, health services utilization, and other research studies and have significant potential for use by emergency medicine researchers in the future.