Establishment of the MID‐NET® medical information database network as a reliable and valuable database for drug safety assessments in Japan

Abstract Purpose To establish a new medical information database network (designated MID‐NET®) to provide real‐world data for drug safety assessments in Japan. Methods This network was designed and developed by the Ministry of Health, Labour and Welfare and the Pharmaceuticals and Medical Devices Agency in collaboration with 23 hospitals from 10 healthcare organizations across Japan. MID‐NET® is a distributed and closed network system that connects all collaborative organizations through a central data center. A wide variety of data are available for analyses, including clinical and administrative information. Several coding standards are used to standardize the data stored in MID‐NET® to allow the integration of information originating from different hospitals. A rigorous and consistent quality management system was implemented to ensure that MID‐NET® data are of high quality and meet Japanese regulatory standards (good post‐marketing study practice and related guidelines). Results MID‐NET® was successfully established as a reliable and valuable medical information database and was officially launched in April 2018. High data quality with almost 100% consistency was confirmed between original data in hospitals and the data stored in MID‐NET®. A major advantage is that approximately 260 clinical laboratory test results are available for analysis. Conclusions MID‐NET® is expected to be a major data source for drug safety assessments in Japan. Experiences and best practices established in MID‐NET® may provide a model for the future development of similar database networks.


KEYWORDS
data quality, drug safety, medical information database, MID-NET ® , real-world data, regulatory science

| INTRODUCTION
The utilization of real-world data for regulatory purposes has been actively discussed and pursued in recent years. [1][2][3] For example, the US Food and Drug Administration established the Sentinel Initiative in 2008 to explore the creation of an electronic system to monitor the safety of its regulated products. 4,5 In 2011, Japan's Ministry of Health, Labour and Welfare (MHLW) and the Pharmaceuticals and Medical Devices Agency (PMDA) started an initiative to establish a medical information database network (designated MID-NET ® ) that would enable the utilization of real-world data for drug safety assessments. 6 This initiative involved collaborations with 23 hospitals from 10 healthcare organizations across Japan  Figure 1 presents an overview of the partner hospitals and the types of data stored in the MID-NET ® system. All expenses for this project were financed from a government budget of the MHLW and the PMDA's own budget, which was originally derived from contributions from the pharmaceutical industry for the purpose of developing safety measures.
MID-NET ® was officially launched on April 1, 2018, whereupon the database network became available to analysts in the pharmaceutical industry and academia. Prior to launch, the database had only been used by the MHLW, the PMDA, and the collaborative organizations. MID-NET ® is anticipated to become a major data source for clinical research, post-marketing drug safety studies conducted by the pharmaceutical industry, and drug safety assessments conducted by the PMDA under the MIHARI framework. 1 In this article, we discuss how MID-NET ® was established, and share our experiences in creating a reliable and valuable database to enable accurate assessments of drug safety and promote the utilization of real-world data in regulatory decision-making.

| OVERVIEW OF THE MID-NET ® SYSTEM
MID-NET ® adopts a common data model that stores a wide variety of hospital information system (HIS) data ( Figure 1) such as electronic medical records (EMRs), administrative claims data, and diagnosis procedure combination (DPC) data. 7,8 EMRs constitute a particularly important component of the MID-NET ® system and are standardized based on the message specifications of SS-MIX2. 7 EMRs include different types of information, such as patient identifiers, medical examination history data (including admission and discharge data), diagnostic orders data, discharge summary data, prescription orders/execution data, injection orders/execution data, and laboratory test data. Administrative claims data are produced to determine reimbursements for inpatient and outpatient care according to a fee-forservice system. DPC data are produced to determine reimbursements for inpatient care according to diagnosis-related groups, and provide important context in terms of patient case-mix.
As shown in Figure 2, MID-NET ® is a distributed and closed network system that connects all collaborative organizations through a central data center. The stored data are periodically updated (every week or 1-3 months depending on the type of data) to provide access to the most up-to-date information from clinical practice. Analytical results are obtained through the following steps: (1) a user creates a program to extract and summarize the target data, such as data from patients who were prescribed a particular drug; (2) the user sends a request to approve the running of the program for analysis; (3) technical staff in the relevant collaborative organization approve the request (if applicable); (4) The program is used to extract the target data from MID-NET ® and/or obtain summarized data; (5)   • The high quality of MID-NET ® data is ensured using various quality management systems, including the daily monitoring of messages and periodic data consistency checks.
• MID-NET ® is expected to be a major data source for drug safety assessments in Japan.

FIGURE 1
Partner hospitals and data categories of MID-NET ® . MID-NET ® is a database network established by Japan's Ministry of Health, Labour and Welfare and the Pharmaceuticals and Medical Devices Agency (PMDA) in collaboration with 23 hospitals from 10 healthcare organizations. MID-NET ® includes hospital information system (HIS) data such as electronic medical records (EMRs), claims data, and diagnosis procedure combination (DPC) data. Note that radiology examination data and physiological laboratory data only include order and execution data but not results such as images FIGURE 2 Outline of the MID-NET ® system and the process of data extraction, transfer, and analysis data center; (6) the extracted data are sent to the central data center; (7) the user remotely accesses the extracted data and conducts more detailed pharmacoepidemiological analyses using statistical programs such as SAS ® as required; (8) The user can locally access the summarized data after the analysis is completed, but cannot download individual-level data. However, users are able to reaccess individuallevel data if necessary, as the data are stored and maintained in the central data center for a prespecified period of time (standard: 2 years; legally required study for a new molecular entity: 8 years and more to allow reexamination submissions). 9 MID-NET ® is operated and managed under the Act on PMDA (Act No. 192, 2002), and is exempt from requirements to obtain informed consent from patients in accordance with the Act on the Protection  Figure S1 for details of the data anonymization process). Thus, users are only able to access anonymized data for their analyses.

| DATA QUALITY MANAGEMENT
The PMDA actively works with all collaborative organizations to ensure the quality of MID-NET ® data, which is defined as ensuring that the original data from all partner hospitals are appropriately sent to and stored in MID-NET ® in a standardized format (SS-MIX2 [HL7-based standard] 7 for EMRs and the governmental reimbursement rules for administrative claims and DPC data) with high levels of accuracy, consistency, and completeness. In the real-world setting, however, data patterns entered into hospital systems can vary even in cases where SS-MIX2 is applied. MID-NET ® also receives data on a daily basis from a variety of systems in each hospital in order to achieve timely updates, which is a notable feature of this database.
For example, an EMR system can connect with several different specialized and independent clinical operating systems, including clinical laboratory testing, nursing, and radiology examination systems. Data in these systems are routinely sent through EMR systems to MID-NET ® . Because of the wide variations in hospital systems, it is very difficult to accurately anticipate all possible varieties of data messages during the system validation process. Furthermore, hospitals may implement configuration changes, modifications, or updates in one or more of these systems as part of improvements in daily clinical operations even after system reliability is confirmed. This further hinders the prediction of how such changes in hospital systems can affect MID-NET ® data that are stored after secondary data collection. In addition, MID-NET ® data are utilized in post-marketing database studies that must comply with the quality standards stipulated in a ministerial ordinance for good post-marketing study practice (GPSP) 10 and their related guidelines. 11 These regulations require the confirmation of database integrity in terms of data management and quality assurance (eg, accuracy, consistency, and completeness of data). Therefore, MID-NET ® data quality cannot be ensured without daily and periodic monitoring with checks on actual data conditions. Similar practices for data quality assurance are implemented in the Sentinel Initiative's database, although daily management may not be required because of the lower frequency of updates (daily updates in the MID-NET ® database vs periodic updates in the Sentinel database). 12 In daily quality management, data logs and the actual number of messages sent to MID-NET ® are monitored ( Figure 3). If any errors or marked changes in data size are detected, further investigations are conducted to identify the underlying reasons and to resolve any issues. For example, the quantity of incoming messages from one of the partner hospitals was found to be generally consistent on weekdays ( Figure 3A). However, the daily monitoring system detected an irregular decrease in messages with several missing data elements over a 12-day period ( Figure 3B) because of an erroneous system setting. In this case, the PMDA promptly contacted the hospital and contractor, which immediately resolved the issue. Subsequently, the missing messages were recovered over 2 days in the following week to avoid incomplete data storage.
In addition to daily quality management, we also periodically check data completeness and consistency between original data in hospitals tems among the hospitals. Despite these issues, we were able to maintain high data quality with almost 100% consistency after implementing these quality management practices (Table 1). Although some inconsistencies are still occasionally observed, these are mainly because of a time-lag between extracting the data and updating the information. At present, there appears to be no major reliability issues in MID-NET ® . Consistency checks are scheduled to be conducted for each partner hospital at least once a year to verify that no unexpected inconsistencies have occurred and to maintain the quality of MID-NET ® data.
The processes described above also facilitate prompt root cause identification and data recovery when any issues are detected in MID-NET ® data.

| STANDARDIZED CODING PROCEDURES ACROSS ALL PARTNER HOSPITALS FOR INTEGRATED ANALYSES
As shown in Table 2, several coding standards are used to standardize EMR data in MID-NET ® to allow the integration of data originating from different hospitals. In MID-NET ® , data based on localized codes used in each hospital are converted to these standardized codes while preserving the original clinical implication. To ensure the accuracy and uniformity of data coding across different hospitals, the PMDA collaborates with the partner hospitals to select the most appropriate codes (see Figure S2 for a detailed description of these procedures). During this process, a candidate code for each item is first selected by the PMDA on the basis of scientific rationale, and the applicability of the same codes to clinically identical data across hospitals is confirmed by the PMDA and the hospitals. If any differences in data or new localized codes are identified, discussions are held between the PMDA and the relevant hospitals to decide which code should be applied to the data. In the case of laboratory tests, the data distribution of each test is compared among the hospitals to consider the appropriateness of applying the codes across different hospitals. If an irregular case is identified, the PMDA contacts the relevant hospital to ascertain the reason for the irregularity and to find an appropriate solution. Similar to the development of the US Sentinel System, 13 many discussions and analyses were required to choose the most appropriate coding standard for each laboratory test. Approximately 260 laboratory tests (eg, tests for liver function, renal function, and bone marrow function) were targeted for this mapping process (as of December 2018). The details for each laboratory test are available on the PMDA website (URL: http://www.pmda.go.jp/safety/mid-net/0001.html). More standardized tests will become available in the future after undergoing similar checks.
In the case of administrative claims data and DPC data, the codes (eg, claims processing system codes and DPC codes) are standardized across hospitals based on the rules set by the government for the purpose of reimbursements. These administrative data represent the final data configuration that is actually used to determine reimbursements, and the codes are preserved in MID-NET ® to reflect actual reimbursements. In addition, we confirmed that these data were appropriately sent to and stored in MID-NET ® with high levels of data accuracy, consistency, and completeness.

| VERIFICATION OF SYSTEM RELIABILITY
Inspections were performed to examine the reliability of the MID-NET ® system in the data extraction process at each hospital, data FIGURE 3 An example of daily quality management in MID-NET ® . A, Regular quantity of incoming messages to MID-NET ® from a partner hospital. The quantity of incoming messages is generally consistent across weekdays, but lower on weekends. B, Irregular quantity of incoming messages to MID-NET ® from a partner hospital. The box indicates a marked decrease in the number of incoming messages with several missing data elements (such as physiological examinations, laboratory test results, prescription orders, and hospitalization plan-related information) over a 12-day period (2018/5/10-2018/5/21) because of an erroneous system setting. The missing messages were recovered over 2 days in the following week (2018/5/22-2018/5/23) to avoid incomplete data storage. Note that the reduction in the number of messages on 2018/5/3 to 2018/5/4 was because of public holidays followed by the weekend TABLE 1 Data consistency in major data categories in MID-NET ® transfer from each hospital to the central data center, and data conversion into the SAS ® format at the central data center. For example, the reliability of data extraction was confirmed through the following steps: (a) data were extracted from the MID-NET ® database using a MID-NET ® program, (b) the data were also manually and independently extracted from the database using the SAS ® program, and (c) reliability was examined by comparing the extracted data. Similar inspections were conducted for the other processes in the system.
No major issues were detected during these inspections, which confirmed the reliability of the system.

| CURRENT FEATURES OF MID-NET ®
Through the rigorous checks and analyses described above, MID-NET ® was successfully established as a reliable and valuable medical information database. A general overview of MID-NET ® and its advantages and limitations are summarized in Table 3. A major advantage of MID-NET ® is the availability of many laboratory test results for analysis (approximately 260 tests; detailed lists are available on the PMDA website at http://www.pmda.go.jp/safety/mid-net/0001. html). For example, drug-associated changes in liver, renal, or bone marrow function can be measured directly through the use of relevant parameters from the laboratory test results. MID-NET ® is also designed to fulfill the requirements of GPSP and their related guidelines. 10,11 Accordingly, the pharmaceutical industry is able to utilize MID-NET ® to provide post-marketing surveillance data for regulatory submission in Japan. The general characteristics of MID-NET ® indicate that data for analysis are available across a broad range of patient ages and diseases, as well as a wide variety of prescription drugs (see Figure S3 for more details). However, the partner hospitals generally comprise mid-sized and large hospitals, such as university hospitals and regional core hospitals. Therefore, the following trends may be observed in MID-NET ® data when compared with a general patient population in Japan 14 : (a) a lower proportion of very elderly patients, who may be more likely to visit a nursing care hospital or rehabilitation hospital than a MID-NET ® partner hospital; (b) a higher proportion of patients with acute and severe conditions, which would be seen in some diseases such as infectious diseases and cancer; and  for the analysis of rare diseases and orphan drugs. Another limitation is that data cannot be linked across hospitals when a patient moves from one hospital to another. These points should be taken into consideration when evaluating data in terms of the generalizability of analytical results based on MID-NET ® data. We have recently reported the results of pilot pharmacoepidemiological studies using MID-NET ® data for drug safety assessments. 15 These studies can help to promote an understanding of the characteristics and appropriate analysis of MID-NET ® data.

| CHALLENGES IN ESTABLISHING A RELIABLE AND VALUABLE DATABASE FOR DRUG SAFETY ASSESSMENTS
On the basis of our experiences in the development of MID-NET ® , we found that consistent data quality management was vital to establishing a reliable and valuable database that has applications in regulatory science. Furthermore, hospital-level differences in the actual management and interpretation of coding standards for health and billing records should be taken into consideration to ensure data quality and reliability. The creation of an organizational cultural environment that supports synergistic collaborations among all involved parties (including the partner hospitals, the MHLW, the PMDA, and associated information technology companies) was also crucial to the success of this project. The experiences and best practices established in MID-NET ® may provide a model for the future development of similar database networks.
Since its inauguration on April 2018, MID-NET ® still faces many challenges, especially with regard to data quality maintenance. Periodic data quality checks will be necessary to confirm that newly stored data are consistent with the original data, and its local codes are appropriately converted to the standardized codes while preserving the original clinical implication. In particular, the coding procedure should be timely and sustainable to ensure that the most recent data are available for integrated analysis of data from different partner hospitals. In addition, quality assurance is an indispensable prerequisite for allowing the utilization of real-world data for regulatory purposes.
Another major challenge is the future expansion of MID-NET ® without any loss in data quality. The Japanese government is considering the possibility of linking MID-NET ® with other databases to promote the utilization of real-world data in Japan. 16,17 Since MID-NET ® is a complex distributed database that requires substantial resources, it would be necessary to establish an efficient and feasible process to manage and maintain data quality amidst an increasing number of partner hospitals and the formation of linkages with other databases.
The utilization of real-world data for regulatory purposes is still in its learning phase, and international regulatory collaborations (with an emphasis on experience sharing and common understanding) will be needed to promote international integration and advance regulatory science. We will continue to work to further the development of MID-NET ® as an internationally recognized medical information database network for assessing and improving drug safety.

Advantages Limitations
✓ High data quality ➢Daily and periodic quality checks are conducted. ➢Accuracy, consistency, and completeness between extracted and original data is periodically confirmed. ✓ Frequent updates of stored data. ➢Data are updated every week or 1-3 months. ✓ Wide variety of data, including EMRs data, claims data, and DPC data.
➢Data categories can be linked at the patient level. ✓ Detailed checks for approximately 260 standardized laboratory tests have been conducted. ➢Reliability of coding has been confirmed. ✓ Regulatory requirements are met.
➢The requirements of regulatory standards ("good post-marketing study practice" standards) have been fulfilled. ➢Can be used as a major source of data in regulatory assessments of drug safety.
✓ Sample size is still relatively small. ✓ No patient-level linkage of data among hospitals. ➢Loss of follow-up when a patient moves across hospitals. ✓ Only medium-to-large hospitals are represented.
➢Mainly university hospitals and regional core hospitals without general practitioner and clinics. ➢Higher proportion of patients with acute and severe conditions.