A prediction model‐based algorithm for computer‐assisted database screening of adverse drug reactions in the Netherlands

Abstract Purpose The statistical screening of pharmacovigilance databases containing spontaneously reported adverse drug reactions (ADRs) is mainly based on disproportionality analysis. The aim of this study was to improve the efficiency of full database screening using a prediction model‐based approach. Methods A logistic regression‐based prediction model containing 5 candidate predictors was developed and internally validated using the Summary of Product Characteristics as the gold standard for the outcome. All drug‐ADR associations, with the exception of those related to vaccines, with a minimum of 3 reports formed the training data for the model. Performance was based on the area under the receiver operating characteristic curve (AUC). Results were compared with the current method of database screening based on the number of previously analyzed associations. Results A total of 25 026 unique drug‐ADR associations formed the training data for the model. The final model contained all 5 candidate predictors (number of reports, disproportionality, reports from healthcare professionals, reports from marketing authorization holders, Naranjo score). The AUC for the full model was 0.740 (95% CI; 0.734–0.747). The internal validity was good based on the calibration curve and bootstrapping analysis (AUC after bootstrapping = 0.739). Compared with the old method, the AUC increased from 0.649 to 0.740, and the proportion of potential signals increased by approximately 50% (from 12.3% to 19.4%). Conclusions A prediction model‐based approach can be a useful tool to create priority‐based listings for signal detection in databases consisting of spontaneous ADRs.

historically relied on a case-by-case clinical review of incoming reports, directly submitted by health care professionals (HCP) and consumers. This review is performed by trained pharmacovigilance assessors, the majority of them being medical doctors and pharmacists. Reports that may represent a potential signal in the view of the assessor are discussed in a weekly scientific meeting. Potential signals undergo a more detailed analysis. 4 Lareb has criteria in place for assessors to determine which reports should be discussed at the weekly scientific meeting. However, because multiple assessors are involved in this process and the selection of reports for the weekly scientific is prone to some level of subjectivity, a computer-assisted database screening tool is in place as an additional approach to reduce the risk for missing potential signals. 5 The screening tool is even more important for ADRs reported by marketing authorization holders (MAHs) that may be indicative of potential signals, as these are not assessed on a case-by-case basis at Lareb.
The computer-assisted database screening tool used in the Netherlands relies on the number of reports of drug-ADR associations and disproportionality based on the reporting odds ratio (ROR). With the disproportionality analyses, the observed rate of a drug and ADR together is compared with an expected value based on their relative frequencies reported individually in the spontaneous reporting database. 5,6 In the approach applied at our centre, the lower limit of the 2-sided 95% confidence interval (CI) is used combined with a number of at least 3 reports per association. Associations can be automatically selected by the screening tool based on 1 or more of the following predefined criteria; Anatomical Therapeutic Chemical code (allowing the assessor to screen more efficiently), ADR being unlabeled in the Summary of Product Characteristics (SPC), number of reports (≥3), threshold of the lower limit of the 2-sided 95% CI of the ROR (ROR025) (>1), pre-specified calendar date, set during previous analysis. Associations highlighted by the screening tool undergo a short analysis by trained pharmacovigilance assessors. Based on the decision of the assessor, subsequent new thresholds can be specified (ADR, unlabeled, number of reports or lower limit 95% CI, new date) or the association can undergo further detailed analyses. The association will be highlighted again as soon as one of the aforementioned criteria is met. 7 Although the current approach facilitates the selection of potential signals, the downside of this approach is that it yields a high number of associations that need an initial, short analysis, which is a time-consuming process. With the current methods, associations can be ranked on the basis of number of reports or the level of disproportionality. However, it is not possible to prioritize based on other, possibly relevant features of the reported association. Prioritization based on associations that would theoretically yield the number of highest potential signals is seen by Lareb as a way to improve timelines of the signal detection process.
The Uppsala Monitoring Centre, WHO Collaborating Centre for International Drug Monitoring, has developed a data-driven screening algorithm for emerging drug safety signals that accounts for report quality and content, called vigiRank. 8 VigiRank is a model which uses several predictive values as determined in the WHO Global ICSR database; VigiBase®. Some of the predictors that were found for this model are not applicable for a national database, such as geographical spread. The Lareb database contains a high number of reports with free text and have a relatively high documentation grade, as represented by the vigiGrade® completeness score of the Lareb reports in VigiBase®. 9 Because Lareb does a case-by-case analyses of all reports, except those received through the MAH, it is known for each association whether the ADR is labeled in the Dutch SPC. Also, for each report (except those received through the MAH), a causality score (Naranjo) is calculated. 10 Based on this, and other, additional information that is available for ICSRs, a more elaborate set of predictors would probably be suited for a screening tool on the Dutch national spontaneous database.
The primary aim of this study was to develop a new prediction model-based screening tool in order to improve statistical signal detection. Secondary aim was to compare this new model to the old screening tool, which is based on the number of reports and the ROR025.

| Setting
In this study, we developed a logistic regression-based prediction model for drug-ADR associations present in the Lareb spontaneous reporting database. Using the linear predictor of this model, a prioritized list of associations not present in the SPC was made for comparison with the current method. The data for this study were derived from the database of the Netherlands Pharmacovigilance Centre Lareb. This database consists of spontaneous reports of suspected ADRs reported to Lareb directly by both HCP and consumers. Additionally, reports from MAHs regarding events that occurred in The Netherlands are imported into our database from the European Medicines Agency database Eudravigilance. Each report contains 1 or more drug-ADR associations. For the development of the prediction model, all drug-ADR associations were extracted from each report. ADRs were coded using the preferred terms from the Medical Dictionary for Regulatory Activities. 11 Drugs were classified according to the WHO Anatomical Therapeutic Chemical classification system. 12

KEY POINTS
• Current methods for full database screening of ADRs are mainly based on disproportionality, which has its limits due to its sensitivity for several types of selection bias.
• We developed a prediction model-based approach to generate a priority list of drug-ADR associations to be analyzed.
• The performance of the model and the comparison with the current method showed that the prediction modelbased approach is to be preferred over the current method.

| Outcome
The outcome of the model was defined as the presence in the SPC of each unique drug-ADR association at the time of the analysis.
Although the use of the SPC to determine if an association is actually an ADR (implying causality) has its limitations, it has been used in several studies aimed at statistical signal detection. 13,14 At Lareb, for each association present in an ICSR received directly from a HCP or consumer, a causality assessment using the Naranjo score is performed. 10 For Naranjo question 1: "Are there previous conclusive reports on this reaction?", 3 options are available at our centre: 1) "Yes, listed in SPC",

| Inclusion / exclusion criteria
All reports received until 12-May-2016 were considered eligible for inclusion with the exception of reports related to vaccines. For these reports, a method other than Naranjo is used to determine causality.
Because that particular method lacks information about the presence in the SPC, reports related to vaccines were excluded. For statistical considerations, only associations with a minimum of 3 reports were selected, because this was deemed to be the minimum number of reports needed for a reliable ROR estimation.

| Selection of candidate predictors
For each association, the following variables were selected as candidate predictors in the model: 1. The number of ICSRs.

| Performance and validation
The performance of the model was satisfactory, based on the area under the receiver operating characteristic curve (AUC = 0.740; 95%CI 0.734-0.747; see Figure 1). The 3 strongest predictors in the model were more than 8 reports per association, followed by a percentage of HCP reports of 75% or higher, and a percentage of MAH reports between 0% and 20% (see Table 3).
The calibration curve of the model shows good calibration based on the observed versus predicted probabilities (see Figure 2).

| Comparison with current method
A comparison of the new model with the current method based on a model with only the number of reports and ROR025 as predictors showed an increased performance (AUC new = 0.740; AUC old = 0.649; see Figure 3).
As mentioned previously, this is a theoretical comparison, because the current method used at Lareb is not based on a prediction model. Therefore, the models were not compared in terms of their AUCs, but priority lists of associations were made for both models, and these were compared in terms of possible signals. The results of these analyses show that the proportion of possible signals increased by 58.2% (from 12.3% to 19.4%) and 44.2% (from 9.6% to 13.9%) depending on the number of associations used (800 vs 1600, respectively). Additional information is present in Table 4.

| DISCUSSION
In this study, we developed a prediction model-based screening tool aimed at improving statistical signal detection of our spontaneous ADR reports. Five relevant characteristics (number of reports, disproportionality, Naranjo score, proportion MAH reports, proportion HCP reports) were chosen as potential predictors in the model. For Naranjo, we considered to use the scoring (doubtful, possible,  We found little differences in AUC values among individual predictors, although disproportionality (ROR025) seems to have the lowest predictive value. This may be explained by the fact that disproportionality is sensitive to selective reporting and other types of bias. [24][25][26] Within subgroups of predictors, we found some noteworthy results regarding the regression coefficients. The predictors "number of reports" and "percentage of HCP reports" showed a consistent increase in coefficients with increasing categories, which was as expected. For ROR025 and Naranjo, we anticipated similar results, which was not the case though (see Table 3). This may be explained by the fact that the outcome used in our model (presence of the asso- The linear predictor-based priority lists comparing the old and new model showed a substantial increase in potential signals among the most highly ranked drug-ADR combinations not present in the SPC.
In this context, the increase in potential signals should be seen in terms or earlier detection due to prioritization and not in terms of signals that would, or would not be picked up by either method.
Previous research suggests that results obtained from signal detection algorithms depend on the database the algorithm is applied to. 8,19 The same will hold for our algorithm. For example, in the Netherlands, we receive a substantial amount of ICSRs reported by patients, but this is not necessarily the case in other countries. Therefore, the use of the amount of HCP reports as a candidate predictor may not be a logical choice for other databases. Consequently, the development of such a model should be based on the reporting and database characteristics of the country or region it is applied to.
Nevertheless, the method of generating a prediction model-based priority list of signals could be useful in other (spontaneous reporting) databases.
One of the limitations of our study is the risk of bias due to selective reporting. Because the database contains well-established associations, it is reasonable to assume that these associations are reported more frequently than unknown associations, therewith influencing the predictors in the model. In an alternative approach, the values of the predictors immediately prior to the recognition of the association could be used in the model. However, recovering the date of recognition for several thousand associations may prove to be infeasible.
In conclusion, this study shows that a prediction model-based screening tool can be used to generate priority-based listings of drug-ADR associations for signal detection. Additionally, as seen in other studies, 8,27 the introduction of variables other than the number Top-1600 a 154 (9.6) 222 (13.9) 44.2 a of reports and disproportionality can increase screening efficiency due to priority-based assessment of drug-ADR associations.

ETHICS STATEMENT
The authors state that no ethical approval was needed.