Pharmacoepidemiology and Drug Safety

The U.S. Food and Drug Administration's Mini-Sentinel program: status and direction


R. Platt, Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, MA, USA. E-mail:


The Mini-Sentinel is a pilot program that is developing methods, tools, resources, policies, and procedures to facilitate the use of routinely collected electronic healthcare data to perform active surveillance of the safety of marketed medical products, including drugs, biologics, and medical devices. The U.S. Food and Drug Administration (FDA) initiated the program in 2009 as part of its Sentinel Initiative, in response to a Congressional mandate in the FDA Amendments Act of 2007.

After two years, Mini-Sentinel includes 31 academic and private organizations. It has developed policies, procedures, and technical specifications for developing and operating a secure distributed data system comprised of separate data sets that conform to a common data model covering enrollment, demographics, encounters, diagnoses, procedures, and ambulatory dispensing of prescription drugs. The distributed data sets currently include administrative and claims data from 2000 to 2011 for over 300 million person-years, 2.4 billion encounters, 38 million inpatient hospitalizations, and 2.9 billion dispensings. Selected laboratory results and vital signs data recorded after 2005 are also available. There is an active data quality assessment and characterization program, and eligibility for medical care and pharmacy benefits is known. Systematic reviews of the literature have assessed the ability of administrative data to identify health outcomes of interest, and procedures have been developed and tested to obtain, abstract, and adjudicate full-text medical records to validate coded diagnoses. Mini-Sentinel has also created a taxonomy of study designs and analytical approaches for many commonly occurring situations, and it is developing new statistical and epidemiologic methods to address certain gaps in analytic capabilities.

Assessments are performed by distributing computer programs that are executed locally by each data partner. The system is in active use by FDA, with the majority of assessments performed using customizable, reusable queries (programs). Prospective and retrospective assessments that use customized protocols are conducted as well. To date, several hundred unique programs have been distributed and executed.

Current activities include active surveillance of several drugs and vaccines, expansion of the population, enhancement of the common data model to include additional types of data from electronic health records and registries, development of new methodologic capabilities, and assessment of methods to identify and validate additional health outcomes of interest. Copyright © 2012 John Wiley & Sons, Ltd.


Mini-Sentinel is a collaboration between the U.S. Food and Drug Administration (FDA), 31 academic and private organizations, and hundreds of scientists to develop the capability to use routinely collected electronic healthcare data to perform active surveillance of the safety of marketed medical products, including drugs, biologics, and medical devices. FDA initiated the program in 2009 as part of its Sentinel Initiative. The Initiative is a response to a Congressional mandate in the FDA Amendments Act of 2007 to perform active surveillance of the safety of approved drugs through use of routinely collected electronic health information resulting from the care of at least 100 million people.[1, 2] The Mini-Sentinel is a pilot program charged with developing the framework, data resources, analytic capabilities, policies, and procedures to satisfy this mandate. In this article, we provide an overview of the Mini-Sentinel's status and direction. Additional information is available in a series of articles describing specific activities[3] and on the Mini-Sentinel website,


The FDA's vision is creation of a system that can use routinely collected electronic health information to support active surveillance of approved medical products, including drugs, biologics, and medical devices, in near real time.[4] Such a system will augment, but not replace, other means of surveillance, including examination of spontaneously reported adverse events. Achieving this vision requires development of a methodologic framework to guide safety surveillance assessments, and creation of the ability to rapidly define cohorts of individuals exposed to medical products of interest, to capture specific health outcomes, and to perform a core set of assessments using customizable computer programs. FDA is committed to achieving this vision through the use of distributed data methods, that is, without creating a centralized data repository.

Mini-Sentinel's mission is to create a “laboratory” that develops and evaluates policies and procedures, organizational structures, and scientific methods that might later be used in a fully operational Sentinel System.[5] Mini-Sentinel activities will thus offer the FDA the opportunity to evaluate safety issues in existing automated health care data systems and to learn more about barriers and challenges to real-time active surveillance using electronic healthcare data.

The initial focus of Mini-Sentinel is on signal refinement, which is the assessment of predefined exposure-outcome pairs to determine whether there is evidence of association. As shown in Figure 1, signal refinement is the second of three steps that begin with signal generation. The exposure-outcome pairs assessed during signal refinement may be identified through signal generation activities using automated data, from the product's clinical development program, through prior knowledge about the product in question or similar products, via spontaneously reported adverse events, or from other sources. Mini-Sentinel is also working on signal generation methods, although this is not a major focus at present.

Figure 1.

Stages of postmarket active medical product safety surveillance. The Mini-Sentinel's principal focus is on signal refinement

Mini-Sentinel's signal refinement activities will ordinarily comprise either rapid one-time assessment of the accumulated experience of a product, or prospective repeated (sequential) monitoring of data as it accumulates. In either case, the emphasis of signal refinement is on speed and the use, as much as possible, of standardized methodologic approaches and tools. Signal evaluation, the third step in active surveillance, continues the work of signal refinement, focusing on assessing whether an association is likely to be causal, and addressing questions such as dose-response, duration-response, and inter-individual variability in risk. There is some overlap between the activities of signal refinement and signal evaluation, with the latter typically depending more heavily on customized, in-depth, study-specific protocols. Signal evaluation is not currently a focus of the Mini-Sentinel's activities.

Another Mini-Sentinel activity is rapid assessment of the impact of FDA's regulatory activities. The goal of such assessment is to evaluate the impact of new regulation, such as a new boxed warning, on both prescribing and health outcomes.

Mini-Sentinel's current activities thus include these domains: (i) developing a consortium of data partners and other content experts, (ii) developing policies and procedures, (iii) creating a distributed data system with access to electronic healthcare data and full-text medical records, (iv) developing secure communications capabilities, (v) evaluating extant methods in safety science and developing new epidemiological and statistical methods as needed, (vi) evaluating FDA-identified medical product-adverse event pairs of concern, and (vii) assessing the impact of selected FDA regulatory actions.


Mini-Sentinel has developed policies to govern its work.[6] A foundational policy classifies the work of the Mini-Sentinel as public health practice, not research, from the perspective of both the Common Rule that governs research involving human subjects and the Health Insurance Portability and Accountability Act (HIPAA). This classification is the result of determinations by the Department of Health and Human Services' Office for Human Research Protections, with regard to interpretation of the Common Rule, and by FDA, with regard to HIPAA. As a matter of policy, Mini-Sentinel minimizes the transfer of protected health information and proprietary data. The use of a distributed data system plays a central role in implementation of this policy. An independent panel of experts in patient privacy assessed the Mini-Sentinel's policies regarding the use of healthcare information.[7]

Additional policies govern the data partners' participation.[8] Key provisions include their status as full partners in the development and implementation of scientific protocols and in interpretation of results, their ability to choose whether or not they participate in specific activities, and their right to use for other purposes their own data that they have transformed into the Mini-Sentinel's common data model format. Mini-Sentinel policies also commit FDA and the investigators to making publicly available the program's policies, tools, methods, protocols, computer programs, and scientific findings. They also address the handling of non-public and confidential information, and conflict of interest.


The Mini-Sentinel's principal data resource is a distributed data system comprised of information held by each data partner. Each data partner retains physical and operational control over its own data. This organizational structure has several advantages. It satisfies FDA's requirement that the Mini-Sentinel not establish a centralized data repository, which might raise public concern about potential misuse of confidential medical data. The distributed design avoids the need to create, secure, maintain, and manage access to a complex central data warehouse. It also avoids data partners' concerns about sharing both individuals' confidential information and their own proprietary data. Additionally, it ensures that local content experts maintain a close relationship with the data. This relationship is important because data partners have the best understanding of their data and its uses; valid use and interpretation of findings requires input from the data partners. This knowledge has been critical to understanding appropriate use and interpretation of data, even after its transformation into a common format. Differences in the delivery of care and in coding practices between health plans, and within health plans over time, are typically undocumented and difficult to infer based on data inspection alone. This information is typically only available to individuals with detailed knowledge of a health plan's or practice's operations.

The distributed data system requires each data partner to transform its data to a common data model based on a standard format according to pre-specified definitions. This transformation in advance of use confers two major operational advantages. It allows extensive quality assurance evaluation to assess completeness of the data and identification and remediation of many data quality problems before the data are used to address medical product safety questions. The common data model also allows assessments to be performed through the use of computer programs that are distributed and then executed without site-specific modification. The use of distributed programs makes highly efficient use of programmer effort and eliminates the potential for protocols to be implemented differently in different systems.

The common data model is comprised of separate tables, each of which contains a specific type of data. This structure is intended to allow the model to evolve to accommodate FDA's needs and the availability of additional data types.[9] The model currently focuses on administrative and claims data. The data areas it encompasses include enrollment, demographics, outpatient pharmacy dispensing, utilization (encounters, diagnoses, procedures), and mortality (death and cause of death). The model also incorporates clinical data including vital signs, smoking status, and results of ten priority laboratory tests recorded since 2005.

As of July 2011, the distributed dataset contained quality-checked data held by 17 partner organizations. The data covered nearly 100 million individuals (individuals who belonged to more than one participating health plan during the past several years are counted in each plan) for whom there is well-defined eligible person-time during which medically attended events are known. There were over 300 million person-years of observation time, 2.4 billion unique encounters including 38 million acute inpatient stays, and 2.9 billion dispensing of prescriptions. The dataset is refreshed periodically. The development, content, and use of the distributed dataset are described in more detail by Curtis et al.[10] Special considerations for assessment of the safety of vaccines, such as linkage to state immunization registries, are described separately.[11] The Mini-Sentinel's vaccine-related activities are collectively named the Post-licensure Rapid Immunization Safety Monitoring (PRISM) system. PRISM was initiated as a separate single purpose program to evaluate the safety of the H1N1 influenza vaccine; it was then incorporated into the Mini-Sentinel to continue surveillance of influenza and other vaccines.

Data queries (programs) are distributed and returned via a secure portal, as shown in Figure 2. Mini-Sentinel uses three types of queries. It uses a menu-driven query generator for simple questions, such as determining the number of exposures to specific products or the number and age/sex distribution of individuals with a diagnosis or procedure of interest.[12] These queries run against pre-compiled summary tables, thus avoiding the computational overhead involved in analyzing the full distributed dataset. The data partners can also be confident that the queries do not request sensitive information as the tables do not contain personally identifiable information.

Figure 2.

Querying the Mini-Sentinel distributed database. Each query involves five steps: 1) A query (program) is created and then posted by an authorized user on the secure portal. 2) Data partners are notified and retrieve the query. 3) Data partners review the query and execute it against their local data. 4) Data partners review the results, which are typically counts, e.g., number of exposed individuals, amount of exposed person-time, number of individuals with outcomes of interest. 5) Data partners submit their results using the secure portal. 6) The results are reviewed and then combined with other data partners' results

For more complex types of recurring queries, Mini-Sentinel uses customizable, reusable (modular) programs.[9] These programs execute in data partners' full distributed datasets. An example is a program that identifies cohorts of new users of specific products, determines the number of dispensings, the amount of exposed person-time, and the number of outcomes of interest observed during exposed time. These reusable programs allow users to specify parameters such as inclusion and exclusion criteria, and the new user and outcome definitions. These programs carry several operational advantages, including the fact that the programs are extensively vetted to assure that they perform the desired assessments, and that they execute efficiently in the data partners' diverse computing environments. The program logic is pre-approved by the data partners so the output generated requires minimal evaluation by data partners. These programs produce counts, and in some cases, rates, for specified age, sex, and calendar time strata, but do not currently adjust for confounding factors. The third type of query involves custom programs that perform assessments beyond the scope of existing modular programs. These are typically used to support prospective surveillance protocols, which may have unique needs. Mini-Sentinel attempts to capture the novel programming performed for these studies and make it available through a program library or by incorporating it into a new modular program.


Mini-Sentinel investigators have developed a taxonomy of study designs to guide the development of active surveillance protocols and also of new modular programs.[13] This taxonomy considers various combinations of exposure attributes (e.g, acute, chronic), outcome attributes (e.g., rare, common), and characteristics of the exposure-outcome relationship, with the intent of expediting the choice of study design aspects for a wide range of exposures and outcomes. The taxonomy continues to evolve to include considerations of analytic strategy and conditions specific to assessment of adverse reactions to vaccines.

Substantial effort has also been devoted to clarifying the applicability of semi-automated methods for control of confounding in cohort designs, such as the high-dimensional propensity score,[14] and to providing guidance regarding the strengths, limitations, and practicability of case-only methods.[15] Mini-Sentinel investigators also tested a multivariable adjusted self-controlled case series and conducted statistical simulation studies on aspects of semi-automated covariate identification and selection strategies.[16, 17]

Because a substantial portion of the Mini-Sentinel portfolio will involve prospective repeated (sequential) assessment of accumulating data for specific exposures and outcomes, Mini-Sentinel investigators have begun to explore the challenges associated with applying sequential designs in observational safety surveillance settings.[18] To date, sequential testing methods have primarily been used in randomized clinical trials. Although their application in observational contexts like Mini-Sentinel is promising, several issues that are generally not of concern (or are of much smaller magnitude) in trials complicate matters. These include (i) lack of experimental control, which can yield confounding, unpredictable new user accrual rates and composition over time, missing data, and misclassification, (ii) heterogeneous sites contributing data in a distributed environment that prevents individual-level data pooling and thus constrains analytic options, and (iii) the safety outcomes typically evaluated can be rare, which introduces instability and may require small sample testing strategies. In addition, the scientific and regulatory aims for postmarket safety, which inherently impact key sequential design decisions such as the frequency of interim testing, are different than in premarket trials and require additional consideration.

Recognizing the need for better ability to choose between different approaches to sequential assessments in observational safety surveillance settings, Cook and colleagues performed simulations to compare the performance of four methods, which each use a different confounder adjustment strategy: the Lan-Demets group sequential error spending approach, a group sequential likelihood ratio test, the conditional sequential sampling procedure, and a group sequential generalized estimating equations approach.[19] The simulation evaluated type 1 error rate, power, and time-to-signal detection, under varying assumptions about outcome prevalence, exposure, and confounder complexity.


Mini-Sentinel investigators have devoted considerable effort to understanding the state of knowledge in use of administrative data to identify the health outcomes of greatest interest as endpoints for safety assessments of medical products, and the validity of current methods to identify outcomes. In collaboration with FDA, investigators identified the 20 highest priority outcomes among a candidate list of 140 outcomes for which there had been no recent review. Investigators then performed systematic reviews of these 20 conditions, drawing on protocols that had been developed by the Observational Medical Outcomes Partnership.[20] The methods for conducting these reviews have been summarized by Carnahan and Moores, along with lessons learned about the strengths and limitations of the review process.[21] A high-level classification of the findings of these reviews is provided in Table 1. Carnahan and Moores also identified the gaps in our knowledge of the usefulness of administrative data for identifying these outcomes and offered suggestions for additional research in this area.[22]

Table 1. Utility of administrative data to identify health outcomes of interest
Good utility*Moderate utilityLittle utility
  • *

    Positive Predictive Values consistently >70% to identify acute or incident events across most of multiple studies examining relatively generalizable study populations.

  • Positive Predictive Values 50–70%, inconsistent findings, based on few studies, limited information on identifying acute or incident events, sensitivity of algorithms questionable, or limited generalizability based on study populations.

  • Positive Predictive Values <50%, very limited or dated information on validity of algorithms compared to medical record review, or other substantial limitations in algorithm performance or evidence.

Cerebrovascular accident and transient ischemic attacks[25]Atrial fibrillation[26]Anaphylaxis[27]
Heart failure[28]Ventricular arrhythmia[29]Hypersensitivity reactions other than anaphylaxis[30]
Venous thromboembolism[31]Seizures, convulsions, or epilepsy[32]Erythema multiforme and other serious skin reactions[33]
Angioedema[27]Depression[34]Acute respiratory failure[35]
Revision of total hip arthroplasty[36]Pancreatitis[37]Pulmonary fibrosis and interstitial lung disease[38]
  Infection related to blood products, tissue grafts, or organ transplantation[39]
  Transfusion-associated sepsis[40]
  Transfusion reaction caused by ABO incompatibility[41]
  Suicide – attempted or completed[42]
  Revision of knee arthroplasty[36]

Our expectation is that instances of potential outcomes identified through use of administrative data will usually require review and adjudication of full-text medical records in situations that require the predictive value of designation as a case to be very high. Confirmation might be needed if signal refinement discovers evidence of excess risk. Cutrona and colleagues describe the process Mini-Sentinel developed to identify cases of acute myocardial infarction using distributed programs, to have data partners obtain the relevant portions of full-text inpatient medical records, and provide either redacted records or abstracted information to an expert panel for adjudication.[23] Notably, it was possible to obtain redacted information from 93% (143/153) of requested full-text records.


The Mini-Sentinel distributed dataset became usable for distributed queries in early 2011. To date, the data partners have executed several hundred distributed programs in response to FDA queries. Examples of modular program queries included assessment of the occurrence of acute myocardial infarction or stroke among new users of drugs used to treat Parkinson's disease, celiac disease among recipients of angiotensin receptor blockers, and cardiac outcomes among individuals who were dispensed prescription drugs for smoking cessation.

One-time protocol-based assessments include initiation of assessments of intussusception after two rotavirus vaccines, and venous thromboembolism following human papilloma virus vaccine.[11] A prospective sequential evaluation of the occurrence of acute myocardial infarction among users of different antidiabetic drugs is also in progress.[24]


Near-term objectives include expanding the number and type of assessments, increasing the size and diversity of the covered population, including data from ambulatory and inpatient electronic health records and registries, and broadening the range of medical products and outcomes under observation. Additional data from two large national health plans are expected to become available within the next year, substantially increasing the size of the population. Expansion of available laboratory results and development of modular programs that incorporate height, weight, blood pressure, smoking status, and outpatient laboratory test results in conjunction with drug exposures and clinical diagnoses are planned. Algorithms will be developed to identify populations of special interest, such as pregnant women and patients with renal dysfunction. The availability of information about exposures to blood products will be explored.

Ongoing and planned methodologic studies include evaluation of inverse probability weighting to adjust for confounding within a sequential monitoring framework, evaluation of methods for anonymous linkage of individuals who are represented in more than one data source, methods for distributed multivariable-adjusted analysis, assessing the roles of propensity score and disease risk score methods in monitoring the safety of new medical products, additional simulation capabilities, and work on signal generation.

Systematic reviews of the validity of coded diagnoses for additional health outcomes of interest that are especially relevant to evaluation of vaccine safety will be performed. Validation studies that involve adjudication of full-text medical records will be performed for severe acute liver injury, venous thromboembolism, intussusception, and anaphylaxis.

Surveillance activities will include new prospective and retrospective assessments with customized protocols, as well as assessment of the impact of regulatory action.


Developing a robust system for active surveillance of medical product safety is a long-term, complex initiative. It will be necessary to implement it in stages as scientific methods and data infrastructure mature. Ongoing effort will be required to achieve an appropriate balance between the need for timeliness in assessing the safety of medical products and avoiding misleading conclusions. It will also be necessary to ensure privacy and security within the distributed system and to continue to address the concerns of stakeholders including patients and the public. Finally, it will be important to consider ways in which the resources and methods that the Mini-Sentinel develops can serve as a national resource to support other secondary uses of electronic health data, including clinical effectiveness and quality of care.


The authors have declared that there is no conflict of interest.


  • Mini-Sentinel has created a distributed data network, analytic methods, and policies to enable use of routinely collected electronic health information to assess the safety of marketed medical products
  • The network is currently in routine use by FDA
  • Mini-Sentinel focuses on rapid assessment of past experience, prospective assessment of accumulating data, and assessment of changes in utilization and health outcomes after regulatory action
  • This network has the potential to address national needs beyond safety of medical products.


Mini-Sentinel is funded by the Food and Drug Administration through Department of Health and Human Services Contract Number HHSF223200910006I.