An algorithm to detect unexpected increases in frequency of reports of adverse events in EudraVigilance

Abstract Purpose The European Medicines Agency developed an algorithm to detect unexpected increases in frequency of reports, to enhance the ability to detect adverse events that manifest as increases in frequency, in particular quality defects, medication errors, and cases of abuse or misuse. Methods An algorithm based on a negative binomial time‐series regression model run on 6 sequential observations prior to the monitored period was developed to forecast monthly counts of reports. A heuristic model to capture increases in counts when the previous 4 observations were null supplemented the regression. Count data were determined at drug‐event combination. Sensitivity analyses were run to determine the effect of different methods of pooling or stratifying count data. Positive retrospective detections and positive predictive values (PPVs) were determined. Results The algorithm detected 8 of the 13 historical concerns, including all concerns of quality defects. The highest PPV (1.29%) resulted from increasing the lower count threshold from 3 to 5 and including literature reports in the counts. Both the regression model and the heuristic model components to the algorithm contributed to the detection of concerns. Sensitivity analysis indicates that stratification by commercial product reduces the PPV but suggests that pooling counts of related events may improve it. Conclusion The results are encouraging and suggest that the algorithm could be useful for the detection of concerns that manifest as changes in frequency of reporting; however, further testing, including in prospective use, is warranted.

Past attempts to define models to detect changes in frequency include an algorithm published by the Food and Drug Administration in 1992 5 which required data on exposure, and thus information beyond that which is available in pharmacovigilance databases.
The Uppsala Monitoring Centre has also tested a modification of their Information Component algorithm applied to the identification of substandard medicines. 6 The algorithm compares the Observedto-Expected ratio of a country/year stratum to the Observed-to-Expected ratio of the other strata, using the Information Component.
The authors selected a list of 78 terms they considered indicative of substandard medicines. This algorithm was not tested under the hypothesis that changes in the frequency of reports of harm can be a proxy of changes in quality.
The Sequential Probability Ratio Test, and variations, have also been proposed as a method to allow for multiple looks at accumulating data over time, more recently by Chan et al. 7 The Sequential Probability Ratio Test is based on the difference, not the ratio, of The methods based on ratios have the advantage of not generating spurious signals when there are abrupt changes in the usage of the product or increased awareness affecting the overall reporting rate. However, they could generate attenuated signals when changes occur simultaneously in several drug-event combinations. For instance, a quality defect leading to anaphylactic reaction may lead to increased reporting of angioedema, anaphylactic reaction, bradycardia, etc., and these could all fall below a signal threshold when considered separately.
The algorithm described in this paper was developed based on a regression modelling of counts of reports. This is simpler and more objective but may lead to a higher number of spurious signals from an increase in the use of a medicinal product or an artificial increase in the frequency of reporting. Whether this is an important disadvantage is determined by empirical evaluation of its performance in routine pharmacovigilance settings.
Certain types of events are likely to be reported in a reasonably concentrated period, such as product quality defects (QD), medication errors (ME), and abuse or misuse (A/M). Unintentional changes in the content of the medicinal product such as degradation of constituents and contamination result in deviation from the specified product quality. While not all quality issues will lead to a short-term increase in reporting (for example a loss of potency of a vaccine could lead to reporting of lack of efficacy over a prolonged period), quality defects have shown this type of reporting pattern. A well-studied example was the contamination of heparin with oversulfated chondroitin sulphate in 2008, which led to an increase in reports of allergic reactions. 9 Initially, these reactions were not thought to be related to quality defects. Similarly, cases of abuse or misuse have led to increases in frequency of reports such as with ephedra, 10 and medication errors have also manifested in the same way, such as with cabazitaxel. 11 This analysis was aimed at validating a novel algorithm to detect unexpected increases in frequency (UIF) of reports to be used as an indicator of potential quality defects, medication errors, and abuse or misuse. As with other algorithms used in routine signal detection, only data from the EV Post Marketing Module, excluding reports from studies, were used.
The count of reports was based on the receive date, the date when the report was received by the sender.

| Algorithm
The algorithm consists of a negative binomial time-series regression model developed in SAS® version 9.3. The Poisson distribution is widely used in pharmacovigilance to model count data; however, monthly counts of reports in pharmacovigilance databases tend to show overdispersed count data. This violates the assumption the variance equal to the mean; thus, a negative binomial distribution was preferred as it includes an extra parameter to model the over-dispersion.
The unit of time was defined as a month. Let t indicate the unit of time, t 0 indicate the monitored period, and T 6 represent a period of 6 preceding months to the monitoring period (t 0 ), such that T 6 = {t −6 , t −5 ,…,t −1 } represents the 6 months prior to t 0 .
Let y be an observed count of reports per month, and y 0 be the observed count of reports for the monitored period. The regression model is run based on the counts of the preceding 6 months to

KEY POINTS
• A newly developed and tested algorithm to detect unexpected increases in frequency of reports of adverse events in EudraVigilance is presented.
• The algorithm correctly identified higher than expected frequencies of reports of several historical concerns related to quality defects, medication errors, and abuse and misuse.
• All quality defects were detected based solely on the reported harms, and not on terms related to product quality issues.
forecast the expected count at the monitoring period (ŷ) and respective confidence intervals. Let τ n be a threshold of minimum count of reports. Previous research on report count thresholds for routine signal detection suggest that appropriate minimum counts are between 3 and 5 reports. 14 Correspondingly, 2 thresholds were used to test the algorithm, 3 reports (τ 3 ) and 5 reports (τ 5 ).
There is a possibility that sequential null counts occur; in such case, the regression model becomes less reliable. Hence, to allow for the detection of a sudden increase in counts following 4 sequential periods of null counts, a supplementary heuristic model was added to the algorithm. Where 4 or more of the immediately prior 6 observations were zero, the regression was not run, and the observed count y was compared with the threshold (τ). A theoretical example of the algorithm is presented in Figure 1.

| Selection of historical controls
Historical concerns were defined as safety issues of the type that the algorithm is designed to detect. The European Pharmacovigilance Issues Tracking Tool (EPITT) was searched to collate all candidate events. Concerns that were detected by batch testing or before the administration of the product were excluded from the list of historical concerns as these were detected prior to human exposure, which led to a final list of 13 concerns ( Table 1).
The index date for the concerns was considered the date when the concern was first introduced in EPITT.

| Events
The medical terminology used in pharmacovigilance regulatory activities and databases is the Medical Dictionary for Regulatory Activities (MedDRA). 15  Each historical concern may have slightly different clinical manifestations, for instance, a contamination that causes thromboembolism may manifest as portal vein thrombosis or thrombophlebitis, etc.
Hence, the concerns were defined at appropriate higher levels of MedDRA, and calculations were performed at PT level. This reflects the fact that an increase in any of the PTs grouped under a clinically suitable higher level of hierarchy or SMQ could help identify the historical concern.
Only PTs classified as important medical events (IME), ie, events that may not be immediately life-threatening or result in death or hospitalisation but may jeopardise the patient or may require intervention to prevent one these outcomes, 16 or as designated medical events 17 were included.
MedDRA contains terms related to product quality issues that were not included in the case definition of the historical concerns FIGURE 1 Hypothetical example of the elements of the algorithm. For the regression model, 6 sequential monthly counts are used to forecast the count ŷ at the monitoring period t 0 and the confidence intervals. If the observed count y is higher than the upper bound of the ŷ estimate and the threshold has been achieved, an unexpected increase in frequency is detected. For the heuristic model, an unexpected increase in frequency is detected where y is higher than to the threshold τ TABLE 1 List of historical concerns to test the algorithm to detect unexpected increases in frequency. All historical concerns of quality defects (QD), medication errors (ME), and abuse or misuse (A/M) were extracted from EPITT. The final list resulted from exclusion of concerns that were detected before human exposure. The events that were considered as indicative of the historical concern were all PTs grouped under the grouping terms (HLTs, HLGTs, and SMQs), eg, an increase in frequency of any PT of the SMQ embolic and thrombotic events would assist in detecting the embolic and thrombotic concern of 2010. The commercial product name refers to the products affected by the concern but other product names may exist related to quality defects as these would have been highlighted and would have triggered appropriate regulatory action prior or concomitantly to reporting to EV. The underlying premise of the algorithm is that the detection of a quality defect can be achieved through detection of changes in the frequency of harms. In the case of medication errors or abuse or misuse, regulatory action is not programmatically set; hence, terms related to these were included.

| Positive identification
An increase in frequency was considered a true positive if it occurred 1 year prior or 6 months after the index date of a MedDRA PT related to the reported concern. An increase in frequency was considered a false positive if it occurred outside the period or occurred to MedDRA terms not related to the concern.
The window of time was defined empirically to take into account the fact that these historical examples were collected in the absence of a tool to monitor real-time increases in frequency. It is therefore possible that they only became evident over a relatively long period.
This would include the actual detection of the issue and additional regulatory timelines, such that the entry date in EPITT may have been several months after the event. As the event may persist beyond the initial detection, due to a wash-out of products still in the market, the time window was extended past the entry date in EPITT.

| Methodological choices for determining monthly count data
The main analysis focused on testing the algorithm in conditions that simulate the routine monitoring of drug safety concerns: monthly counts were determined as the count of an event, at PT level, for a substance, in a calendar month.
Spontaneous reports include reports that stem from published scientific literature and including them in the monthly report count may hypothetically have an effect on the performance of the algorithm as they are associated with higher rates of duplicate reporting. 18 To assess this effect, the analysis was run including and excluding reports from the literature.
Furthermore, it is known that different approaches to pooling or stratifying counts of reports may influence the results of the algorithm.
This concern can derive from the level of MedDRA chosen 19 but also from the granularity at which the drug is identified, namely if it is stratified by commercial product name. Sensitivity analyses were run to assess the performance of changing these parameters.

| Metrics
In the absence of a gold standard that would allow a comparative analysis, the positive predictive value (PPV) was defined as the ratio between all UIF for each historical concern and all UIF in the dataset for the substances included in the list of historical concerns.

| RESULTS
The algorithm developed correctly detected 8 of the 13 historical concerns, including all quality defects ( Table 2). The PPVs ranged between 1.03% and 1.29% in the analysis that included reports from literature in the monthly counts and between 0.82% and 1.03% in the analysis that excluded these reports.
Increasing the lowest count threshold from 3 to 5 improved the PPVs in both analyses, at no expense in the number of concerns detected.

| DISCUSSION
The analyses showed that 8 out of 13 historical concerns were detected using the algorithm. These results are promising, particularly considering that all quality defects were detected solely through the increase in PTs related to harm rather than through PTs related to product quality defects. This is important, as these are the concerns that are more likely to go unnoticed-quality defects that are only reported as harm-particularly if the reported PTs refer to safety concerns already known for the medicinal product, such as those stemming from overdose.
The algorithm also detected the majority of concerns of abuse or misuse. This supports the potential use of the algorithm to detect the acute events of abuse or misuse that concentrate in short periods; however, the dynamics of reporting of cases of abuse or misuse needs further research. Abbreviations: A/M, abuse or misuse; ME, medication error; QD, quality defect.  The sensitivity analysis provides useful insight that pooling counts at higher levels of MedDRA may improve the PPV, but the dynamics of this effect are unclear as at higher levels contradictory terms may be combined, such as for the HLGT Product use issues that includes "overdose" and "underdose", which would be distinct quality defects.
Future research is needed to understand if other levels might achieve better results, including using SMQs or bespoke groupings of terms.
The resulting PPVs might be construed as low; however, it should be noted that algorithms in pharmacovigilance act as decision support tools and expert review is always performed, and hence a relatively larger number of false positives can be considered acceptable as a trade-off to enhancing the toolkit of safety monitoring. At any rate, additional research is routinely performed to both adapt the pharmacovigilance toolkit to regulatory changes and to improve its performance and efficiency.
The highest PPV (1.29%) was achieved by using a minimum count threshold of 5 reports and including reports from the literature in the counts. A balance is needed in setting the minimum count threshold.
Increasing it is likely to reduce the number of false positives, and thereof increase the PPV: this is seen in the results of the analyses.
However, the threshold should not be as large as to require an unduly number of events to occur prior to detection, especially considering that under-reporting means an unknown fraction of these events is never reported.
Whereas it is possible that including literature reports will lead to false positives due to a spurious increase in the frequency, the exclusion of these from the counts seems to have an important depletory effect on the PPV.
The regression model did not include spatial-temporal adjustment.
Theoretically, it cannot be assumed that the geographical distribution of any of these events is different than for any other reaction. A contamination in the water for sterile injections in a European-wide production facility, for instance, would lead to events across countries, whereas the abuse of a psychoactive product is equally unlikely to be bound to a location, unless different access restrictions exist. At any rate, by pooling the counts of reports from different countries it is still be possible to understand a posteriori if the concern is geographically contained. On the other hand, to adjust for seasonality, years of observations would be needed, the downside is that it would postpone the implementation the algorithm to years after the introduction of a new product to the market.

| CONCLUSION
The algorithm developed to detect UIF allowed the detection of most historical concerns. Whereas the algorithm warrants further testing, including in prospective use, the results suggest it could be useful for the detection of concerns that manifest as changes in frequency of reporting.