Data resource profile: JMDC claims database sourced from health insurance societies

Abstract JMDC, Inc. (JMDC) has created a database, using data collected from health insurance societies in Japan, consisting of ledgers of insureds, claims (for hospitalization, outpatient treatment, drug preparation, and dental treatment), and health checkup results. The earliest data are from the claims in January 2005, except dental claims from December 2009 and health checkup results from April 2008. Currently (the end of June 2020), the number of insureds included is approximately 9.8 million. This database is unique for Japan and has the following characteristics: (a) the basic population can be ascertained; (b) standardization is carried out using a dictionary; and (c) anonymized individual IDs can be followed on the basis of a time‐series over various periods, with the earliest starting date being January 2005. However, it has certain limitations, in that the disease status and test results cannot be ascertained, and there is insufficient access to data for elderly people.


| DATA RE SOURCE BA S I C S
Japan has health insurance provided by the social insurance system, and this health insurance is provided by multiple bodies, on the basis of occupation, geography, and age (eg, elderly, geriatric), including health insurance associations which cover medical costs except for employment-related injury of people less than 75 years old, primarily those who are employees of large businesses, and their dependents. 1 JMDC, Inc. (JMDC) has created a database, detailed below, using data collected from health insurance societies, as well as the databases sourced from medical institutions. 1 We describe the detailed information below although it is partly redundant with the information of the medical institutions' databases 1 : 1. The collected data consist of ledgers of insured persons, claims (for hospitalization, out-patient treatment, drug preparation, and dental treatment), and health checkup results. Ledgers of insured persons include information about all persons insured by health insurance societies. Claims include information about medical expenses for which health insurance societies have been invoiced by medical institutions and about the insured persons who incurred the medical expenses. Health checkups are carried out so that health insurance societies can ascertain the health statuses of insured persons, and special health checkups are required to be conducted with the aim of preventing lifestylerelated diseases in insured persons aged 40-74.
2. JMDC provides services of data analysis of actual conditions such as medical cost analysis by comparison with other insurers and extraction of high-risk persons to health insurance societies distributed throughout Japan. Some of these health insurance societies agree to provide anonymously processed data of employees working at all branches and offices in Japan of the companies covered by the societies and their families, which do not include area information to third parties. Data are collected from these health insurance societies. The specific names and geographical information of included societies are not disclosed from JMDC because of their policy.
3. Data are collected monthly, and collected data are added to the database approximately five months after the treatment date.
Data are currently still being added to the database. 4. In order to protect personal information, the collected data are added to the database as information that has been anonymized in accordance with Clause 2:9 of the Law for Protection of Personal 7. The monthly numbers of insured persons whose data can be accessed are shown in Figure 1. The number of insured persons tends to increase with increasing number of health insurance societies collecting data. Currently (the end of June 2020), the number of health insurance societies collecting data is 216, and the number of insured persons is approximately 6.1 million. 8. Currently (the end of June 2020), the number of hospital department and drug preparation claims included in the database is approximately 370 million, and the number of actual patients, that is, the number of insured persons for whom at least one hospital department and/or drug preparation claim is released, is approximately 8.9 million. 9. In the last 5 calendar years, the annual withdrawal rates by month from this database were about 8%-10%. It should be noted that those who moved to other included societies were also counted as withdrawals in this calculation, because it is difficult to determine the same person who has moved to another society.

| DATA COLLEC TED
All data collected from health insurance societies are anonymized on the basis of personal ID so that individuals cannot be identified.  Table 1.

F I G U R E 1 Numbers of insured
persons whose data can be accessed each month

| DATA RE SOURCE US E
The following analyses can be carried out using this database: 1. The data are from a complete-enumeration survey of the population, including healthy people. Therefore, for certain diseases, chosen on a discretionary basis, the prevalence, incidence rate, specific treatment rate, medical expenses per patient, and number of days with hospital visits or admissions can be calculated.
2. The data were from a survey that followed the patient longitudinally. Therefore, it is possible to analyze sequential data (hospital visit and admission histories, and numbers of health checkups).  This database offers the following advantages:

| S TRENG TH S AND WE AK NE SS E S
1. As the master data are accorded after standardization of all data, researchers need almost no data pretreatment for analysis.

Data are available about the populations insured by health insur-
ance societies; therefore, it is possible to ascertain the prevalence and incidence rates for different genders and age-groups.
3. In the case of people insured by the same health insurance society (but not in other cases), all claims are handled in a centralized manner. Therefore, when the same individual is treated by more than one medical institution, the information about him/her can be determined comprehensively.
4. In addition to information about claims, health checkup data are available.
However, it has the following disadvantages: 1. As the data source is health insurance societies, there are few data from people aged over 65 and none from people aged over 75.
2. If a member of health insurance society withdraws, the data are discontinued.
3. As the claim data are for the purpose of medical expense invoicing, the disease names used are those used for insurance purposes; so, if the disease has to be defined, a contrivance of some sort is required. October 2018. In this context, the publicly available data for the total F I G U R E 3 Comparison between scaled-up data from health insurance societies and publicly available data (numbers of hospitalized for each gender and age-group)

F I G U R E 4
Comparison between scaled-up data from health insurance societies and publicly available data (numbers of outpatients for each gender and age-group) F I G U R E 5 Ratios used for comparison of data from health insurance societies relative to publicly available data (database population relative to total population of Japan)

F I G U R E 6
Comparison between scaled-up data from health insurance societies and publicly available data (total numbers of hospitalized and outpatients for each chapter of ICD-10)

F I G U R E 7
Comparison between scaled-up data from health insurance societies and publicly available data (numbers of hospitalized for each chapter of ICD-10) population of Japan were taken to be the total populations of each gender and age-group, as of October 1 each year, shown in Table 1 ( Comparisons were also made for chapters of ICD-10 representing large categories of injuries/diseases in a similar manner. If two or more injuries/diseases are included in a single claim, the handling mode in the Statistics of Medical Care Activities in Public Health Insurance is unknown, and the following counting method is therefore used, as it is expected that this will limit the bias to the JMDC numbers. In this method, in the case of claims that have one or more principal injuries/ diseases, the value counted is obtained by sharing 1 equally between all the principal injuries/diseases, whereas in the case of claims with no principal injuries/diseases, the value counted is obtained by sharing 1 equally between all the injuries/diseases. The comparison results for each chapter of ICD-10 are shown in Figures 6-8.
In addition, taking the health insurance societies as the data source, JMDC data and publicly available data of health insurance societies were compared. Figure 9 shows the results of comparison between the age-group composition ratios of this database and the numbers in the column of health-insurance-society insurance in Table 1 Table 1 of

| CON CLUS ION
The database of data from health insurance societies is unique for Japan and has the following characteristics: (a) the basic population can be ascertained; (b) standardization is carried out using a dictionary; and (c) anonymized individual IDs can be followed on the basis of a time-series over various periods, with the earliest starting date being January 2005. However, the database of data from health insurance societies, owing to the properties of the data source, has certain limitations, in that the disease status and test results cannot be ascertained, and there is insufficient access to data for elderly people. Use of this database is on a fee-paying basis. The characteristics mean that it is provided for a wide range of purposes.

CO N FLI C T O F I NTE R E S T
This article was co-authored by academic researchers and JMDC Inc. (JMDC). First four authors are affiliated with JMDC. The other authors have no connection with JMDC.

E TH I C A L A PPROVA L A N D I N FO R M E D CO N S E NT
The study about objective descriptions of the database was ap-