Assessing the concordance and accuracy between hospital discharge data, electronic health records, and register books for diagnosis of inpatient admissions of miscarriage: A retrospective linked data study

Despite the high prevalence of miscarriage, there are few studies which assess the concordance of a diagnosis of miscarriage in routinely collected health databases.


Introduction
Routinely collected health data, also called "big data," are becoming an essential source of information to generate research about population health. 1 Reliability and validity of the information recorded in the databases are key to ensuring consistency and high-quality evidence of the outcome investigated. 2 Miscarriage is one of the most common complications during early pregnancy, with up to one third of clinically recognized pregnancies ending in miscarriage. [3][4][5][6] Nevertheless, prevalence and trends in rates of miscarriage vary considerably depending on the type of miscarriage identified (i.e., biochemical versus clinically recognized miscarriage) but also depending on the source of data from where it is measured (i.e., self-reported data 4,7-10 vs. routine hospital registered data 11 ). For example, a comparison between the prevalence of miscarriage between three Danish studies showed that between 17% and 30% of miscarriage diagnosis were not recognized in registered hospital data compared to self-reported data. [12][13][14] However, evidence of reliability and accuracy of the diagnosis of miscarriage in routinely collected health sources are surprisingly scant in the literature. 11,15,16 The hospital in-patient enquiry (HIPE) is a computerbased system designed to collect demographic, clinical, and administrative data on discharges and deaths in the Republic of Ireland (ROI). 17 HIPE is a national health information system which serves as a reliable source of inpatient data from all 62 acute hospitals in the ROI. 18,19 Although, HIPE is not designed as a research tool, a series of initiatives to improve data accuracy has been implemented in the ROI since 2001 (i.e., computer-based edits/checks, clinical coder training, chart-based audits). 18 However, accuracy and reliability of the diagnosis of miscarriage using hospital charts have not been published, and consequently, errors in coding or "ruleout" diagnosis might be reported. The lack of evidence raises the following questions in particular in the country where the study was conducted: is routine hospital discharge data of diagnosis of miscarriage comparable between three main clinical data sources in Ireland? and is routine hospital discharge data accurately identifying types of miscarriage in Ireland? Therefore, the first aim of this study was to assess the reliability of routine hospital discharge data of diagnosis of miscarriage at admission in the ROI by determining the level of agreement between three data sources: electronic health records (EHR), hospital discharge data using HIPE, and register books (paper-based hospital records) from January 1, 2017 to June 30, 2017. The second aim of this study was to evaluate the accuracy of routine hospital discharge data for classifying types of miscarriage on admission in the ROI.

Study design and setting
This is a concordance study of diagnosis and types of miscarriage using a retrospective chart review methodology of three data sources at a large, tertiary maternity hospital with approximately 7500 deliveries annually in the ROI. The three data sources reviewed were the HIPE database, the EHR, and register books.

Hospital discharge data (HIPE)
A list containing all hospital discharge data of diagnosis of miscarriage and early pregnancy loss from January 1, 2017 to June 30, 2017 was identified using the HIPE database. This list included the medical record number (MRN) for each inpatient admission of miscarriage, a unique identification number which is given to each woman who is pregnant and ordinarily resident in Ireland. All inpatient admissions of miscarriage and early pregnancy loss identified by the list were then searched for in the EHR and the register books. Therefore, the data was linked in the three data sources using the MRNs. The HIPE database records each individual hospital admission even if it is related to a single miscarriage event.
Diagnoses and procedures performed as an inpatient are recorded in all of the patient's notes and clinical coders translate the medical terminology into alpha-numeric codes (ESRI). All inpatient admissions for miscarriage and early pregnancy loss coded in the HIPE dataset were identified using the 10th Revision Australian modification of International Statistical Classification of Disease and Related Health Problems (ICD-10-AM) and the Australian Refined Diagnosis Related Groups (AR-DRGs). The ICD-10-AM and AR-DRGs are the coding classification systems of diagnosis used in the HIPE system since 2005. A list of the inclusion criteria based on main ICD-10-AM codes containing the diagnoses of interest are listed in Supporting Information, Table S1. Only inpatient admissions are recorded in the HIPE database, and consequently, data from the emergency department (ED) and outpatient settings were not available. 18,19 Electronic health records Inpatient admissions identified by HIPE were manually reviewed using patient notes in the EHR for the same time period at Cork University Maternity Hospital (CUMH). The Maternal and Newborn Clinical Management System (MN-CMS) has been recently implemented in the Irish maternity services. 20 The main aim of the MN-CMS Project was to design and implement an EHR for all women and babies who access the maternity services to move from paperbased records to electronic records in the 19 maternity hospitals in the ROI. 20 20 Even though the EHR contains all the clinical information for women admitted to the hospital, it does not provide a definitive diagnosis of miscarriage. That means that the information in the medical notes and clinical reports for each hospitalization had to be individually reviewed. A specialist registrar in obstetrics and gynecology and in pregnancy loss (Karen McNamara) was responsible for identifying the diagnosis and type of miscarriage by assessing the available information gathered in the EHR and classifying the diagnosis and type of miscarriage. The information available included medical notes for both outpatient and inpatient admissions, nursing reports of surgical procedures such as evacuation of retained products of conception (ERPC) and manual removal of placenta (MROP), histological exam results and ultrasound scans. This detailed information was not available in the HIPE dataset nor in the register books; therefore, the diagnosis and type of miscarriage identified by the specialist registrar in the EHR was considered the gold standard.

Register books
For the purpose of this study, charts of consecutive admissions of miscarriage and early pregnancy loss were retrospectively reviewed using register books from a dedicated ward at CUMH. Register books are paper-based records which contain key information related to the diagnosis and procedures of inpatient admissions during the admission process. Information identified from the register books included: MRN, age, gravity and parity, weeks of gestation at admission, nature of loss (i.e., missed miscarriage, miscarriage, incomplete or complete miscarriage, late [second trimester] miscarriage, ectopic pregnancy, molar pregnancy, etc.), main procedure during admission (i.e., ERPC, medical treatment, manual removal of placenta, etc.). Identification of main examinations and investigations carried out during admission (i.e., post-mortem examination, histology, and/or cytogenetic investigations).

Data collection form
A data collection form was designed to collect information about miscarriage from the three sources. This data collection form was designed to provide a standardized collection of the data from the three databases, but also to discern discrepancies or duplications between the three datasets. The main variables included were: MRN, date of admission and discharge, weeks of gestation at admission, diagnosis at admission including missed miscarriage, incomplete or complete miscarriage, late miscarriage, or other early pregnancy loss such as ectopic pregnancy and molar pregnancy. Additional information such as type of treatment undertaken during the hospitalization and histological reports was also added order to ascertain the diagnosis of miscarriage and the type of classification of miscarriage (e.g., molar or parcial molar pregnancies, ectopic pregnancies). All the variables included in the data collection form can be seen in Table S2.

Definitions of miscarriage
It is known that the definition of miscarriage varies between countries and health organizations. 21 According to the National Clinical Guideline in Obstetrics and Gynecology in Ireland, miscarriage is defined as the loss of a pregnancy before 24 completed weeks of gestation, excluding perinatal deaths. 22 However, according to the coding definition standards used by the HIPE database, miscarriage is defined as the expulsion or extraction of the products of conception before 21 completed weeks of gestation. 23 Similarly, this study followed the Irish clinical guidelines to classify type of miscarriage as early or late miscarriage based on the weeks of gestation in the EHR and the register books. When a miscarriage occurred before 13 weeks of gestation. It was classified as an early (first trimester) miscarriage was identified, and when a miscarriage occurred at 13 or more weeks up to 24 weeks of gestation, and it was classified as a late (second trimester) miscarriage. 24 However, HIPE data does only uses ranges between <5, 5 to 13, 14 to 19, 20 to 25, 26 to 33, and 34 to 36 completed weeks of gestation. Therefore, our analysis was restricted to early miscarriage, which was defined as a miscarriage before 14 completed weeks of gestation.
Furthermore, this study was interested in assessing the agreement between type of early miscarriages in the routine collected health records in Ireland. Thus, early miscarriages were classified as incomplete, complete, and missed miscarriages. Incomplete miscarriage was identified if the women presented with symptoms of vaginal bleeding and/or pain, and with retained products of conception (RPOC). Complete miscarriage was identified if all the RPOC had been expulsed or extracted from the uterine cavity, and missed miscarriage was defined when no symptoms had been experienced by the women; therefore women will only become aware of the miscarriage following a routine ultrasound. 25

Exclusion criteria
Records were excluded from the analysis when miscarriage was not confirmed (e.g., threatened miscarriage, still pregnant at the time of the inpatient admission) and when indication of admission to the hospital was related to an intervention which was undertaken in another country (e.g., termination of pregnancy). Duplications and non-early pregnancy losses (i.e., neonatal deaths and stillbirths) were also excluded from our analysis. In Ireland, stillbirth is defined as a child born weighing 500 g or more or having a gestational age of 24 weeks or more who shows no sign of life according to the stillbirth registration act of 1994. 26

Statistical Analysis
Inpatient admissions for miscarriage and other types of early pregnancy loss were compared using 2 × 2 tables for each pair of data sources (i.e., HIPE versus EHR, HIPE versus register books, and EHR versus register books). In HIPE, inpatient discharges are counted as unique cases even though several discharges might be related to a unique miscarriage event. 23 When a woman, who was previously admitted and managed for miscarriage in the hospital, was readmitted because of a complication without being managed for the miscarriage per se, these cases were included in the analysis as complications after miscarriage.
The crude prevalence of miscarriage was calculated for each data source. Cohen's Kappa was calculated to provide a measure of agreement for the diagnosis of miscarriage between two data sources (raters) using Stata's "kap" command. Therefore, positive and negative predictive values (PPV and NPV) were calculated to assess the concordance of diagnosis of miscarriage between each pair of register sources. An exploration for more than two raters with binary outcomes was also carried out using Stata's Kappa command. In this case, the nonunique rater case had two possible ratings, which were positive (when a diagnosis of miscarriage was made) and negative (when no diagnosis of miscarriage was made). Negative ratings were calculated by subtracting the total number of positive ratings to the total number of raters for each admission to the maternity hospital. All analyses were undertaken using STATA v.12. A summary of main formulas can be seen in Table S3.

Ethical approval and consent to participate
This study received ethical approval from the Clinical Research Ethics Committee of the Cork Teaching Hospital on ECM 4 (I) 17/10/2017. A patient consent form was not required by the Ethics Committee because this was an observational study which did not include any intervention and which examined routinely collected data. All data and information was stored safely and securely.

Results
Overall, a total number of 405 records were reviewed between the three sources (i.e., HIPE, EHR, and register books). Figure 1 outlines the record which were included and excluded in this study for each data source. This study included 385 records after excluding duplicates and other types of inpatient admissions which did not meet the inclusion criteria (i.e., neonatal death, stillbirth, gynecology hospital admissions). After excluding duplicates, 4 records were excluded from EHR, 12 records from HIPE, and 127 records from the register books. These records were excluded if not register in HIPE but register in the EHR and the register books, or vice versa, or excluded when the diagnosis of miscarriage was not identified. The EHR did not have any missing records compared to the HIPE database or the register books.
Following exclusions, 304 inpatient admissions of miscarriage out of a total of 370 records of early pregnancy loss in EHR (82.2%), 291 out of 360 records in HIPE (80.8%), and 219 out of 255 records in register books (85.9%).

Comparing EHR, HIPE, and register books
When comparing the three data sources (N = 385), 3 (0.78%) records were rated by one database (rater), 137 (35.5%) records were rated by two databases, and 245 (63.6%) by the three databases (raters). Of the 316 diagnosed miscarriages, more than half of the records were rated as being admitted to the hospital because of a diagnoses of miscarriage by the three databases (n = 196; 50.9%); 27.5% (n = 106) of records had two positive diagnosis of miscarriage, 3.6% (n = 14) records had only one positive diagnosis of miscarriage ( Figure 2). A total of 17.9% (n = 69) of the records were identified as having no diagnosis of miscarriage at admission to the maternity hospital or were excluded. This study obtained a very good level of agreement when comparing the three data sources (k = 0.84; p-value <0.001).

Classification of type of miscarriage by three sources
A considerable discrepancy was identified with the classification of the type of miscarriage across the three data sources (Table 1). For example, percentages of missed miscarriage recorded in HIPE (n = 16; 4.2%) were considerably lower than the percentage classified as missed miscarriage by EHR (n = 173; 44.9%) and in the register books (n = 150; 39.9%). In fact, 60% of admissions were classified as incomplete miscarriage according to HIPE (n = 231; 60.0%; Table 1). A higher number of ectopic pregnancies were identified in HIPE (n = 58, 15.1%) compared to the EHR (n = 44; 11.4%) or the register books (n = 32; 8.3%). The number of molar pregnancies were almost identical between the EHR and the register books, but this number increased moderately in the HIPE database (n = 6, 1.6%; n = 4; 1.0%, and n = 11, 2.9% for EHR, register books, and HIPE, respectively; Table 1).

Classification of late miscarriage
Less discrepancy was found between the three data sources when classifying late (second trimester) miscarriage. Both HIPE and register books identified 28 (7.3%) inpatient admissions for late miscarriage compared to 37 (9.6%) identified by EHR.

Classification of missing records
This review found some missing records in the HIPE database and the register books compared to the EHR. A total of 95 records were not recorded or missing in the register books but were recorded in the EHR or in HIPE (Table 2). Of these, the most frequent classifications in EHR were missed miscarriage (n = 44; 46.3%), ectopic pregnancies (n = 20; 21.1%), and incomplete miscarriages (n = 14; 14.7%). Of these 95 cases, the most frequent classifications in HIPE were incomplete miscarriage (n = 50; 52.6), ectopic pregnancy (n = 29; 30.5%), and missed miscarriage (n = 5; 5.3%). Of the 30 records with missing diagnosis of miscarriage in the register books, but identified in the EHR and HIPE, the most frequent classification in EHR was missed miscarriage (n = 15; 57.7%) and the most frequent classification in the HIPE database was incomplete miscarriage (n = 24; 80%; Table 2). Of the 12 records identified in the EHR, but not identified in HIPE, 4 (33.3%) were classified as incomplete miscarriage, 3 (25.0%) were classified as late miscarriage, and another 3 (25.0%) were classified as ectopic pregnancy (Table 2). Of the 12 records identified in the register books, but not identified in HIPE, 3 (25.0%) were classified as missed miscarriage, 3 (25.0%) were classified as late miscarriage, and another 3 (25.0%) were classified as ectopic pregnancy (Table 2).

Main findings
In this retrospective concordance study, EHR records confirmed 98.3% of HIPE and 96.3% of register books diagnosis of miscarriage. Level of agreement between each pair of data sources was found to be good or very good and level of agreement between the three data sources was found to be very good. However, a considerable discrepancy between identification of the type of miscarriage was found between the three data sources. EHR and register books were more likely to classify missed miscarriages compared to the HIPE dataset. This could be explained by the fact that HIPE does not include a standardized definition for missed miscarriage in their codebook. However, EHR often identified later (second trimester) miscarriages compared to HIPE and register books. There is a lack of standardization in definitions of late miscarriage between the three data sources, explaining the considerable misclassification when comparing this specific type of miscarriage. According to our analysis, this study recommends EHR and HIPE as the preferred sources of reliable data sources to report number of inpatient admissions of miscarriage. However, the authors believed that EHR is the preferred sources of reliable information to identify the type of miscarriage given that HIPE uses different definitions to classify types of miscarriage.

Comparison with other studies
Only three studies were identified that assessed the concordance of the outcome of miscarriage at hospital settings in the literature. Of the three studies which we found in the literature review, the first study compared diagnosis of miscarriage between the Danish National Registry of Patients (DNRP) and discharge records from hospital files between 1980 and 2008. 11   16 The third study assessed the concordance of the diagnosis of missed miscarriage using the ICD-9 diagnosis code from hospital electronic    The authors concluded that the code for missed miscarriage "632" had low sensitivity for identifying stable women with a missed miscarriage (41.9%), with high specificity (98.6%), and moderately high PPV (75%). 15 Therefore, true cases of stable miscarriage were correctly identified using ICD-9 code "632" with a low rate of false positives. Some of the reasoning of such a low sensitivity was that other similar codes were used to diagnose miscarriage such as threatened miscarriage or hemorrhage complications.
In keeping with our findings, the lack of a standardized definition of type of miscarriage, or the lack of training to identify types of miscarriage will introduce variation in the hospital data records. Therefore, it may affect reliability and accuracy of these data in epidemiological studies.

Strengths and limitations
To our knowledge, this is the first study validating the diagnosis of miscarriage using three sources in Ireland. In this study, a team of trained administration staff was in charge of the coding process at HIPE. As a consequence, our findings might not be extended to other hospitals where care providers are responsible for coding the diagnosis of miscarriage themselves. However, it also implies less variability between people who assigned ICD-10-AM codes as a profession compared to health professionals, and our findings could be generalized to other hospitals where medical coders are in charge of the coding process of miscarriage. Secondly, the lack of standardized definitions of miscarriage and type of miscarriage in the literature nationally and internationally might have influenced the variation in the classification of the type of miscarriages between our sources. The discrepancies between the cut-offs of weeks of gestation does not affect the overall diagnoses of miscarriage (e.g. miscarriage yes or no) but influences the classification of early and late miscarriage. This will affect on the type of the treatment and investigations associated to the type of miscarriage; therefore, it influences the hospital activity and costs, national rates, and morbidity investigated.
Finally, the authors are aware that a "true gold" standard would have involved a retrospective review of all ultrasound scans examining intrauterine contents and viability for women with suspected or confirmed miscarriage, paired with a quantitative human chorionic gonadotropin (hCG) hormone level by a specialist in the field. This review would have determined whether or not this was a true loss, ectopic pregnancy, molar pregnancy, or a viable pregnancy with complications, and whether or not this pregnancy was correctly recorded in the EHR.

Implications for practice and/or policy
Reporting reliable prevalence and trends in incidence rates provide information about the burden of miscarriage at national and international levels. Although the level of agreement to identify the diagnosis of miscarriage was found to be good or very good between the three data sources, this study found a high variability when comparing the classification of the type of miscarriage. This might be explained by the fact that the process of miscarriage is a continuum where the diagnosis might evolve during the stay at the hospital. For example, a woman can be diagnosed with an incomplete miscarriage at the beginning of the hospitalization process, but she could also be diagnosed as having a complete miscarriage before discharge and after expulsion or retrieving the RPOC. Identifying the correct type of miscarriage during the hospitalization process is very important because pathways of treatments available are intrinsically related to the type of miscarriage (i.e., incomplete, complete, or missed miscarriage). There is a need to standardize definitions of miscarriage between data sources not only in the ROI, but also in an international level. An effort needs to be done to modify the definitions used by HIPE, in Ireland, in accordance with the Irish legislation. The European Society of Human Reproduction and Embryology (ESHRE) published a series of papers which attempted to standardize the terminology for the classification of the different types of pregnancy loss for research purposes worldwide. 27,28 These definitions and classifications should be incorporated in the ICD codes in order to establish consistency in the semantics used in national and international electronic health records in this field. In doing so, the classification of miscarriage and data reliability of miscarriage may improve.
This study found a considerable number of records missing in the register books compared to EHR and HIPE. It is well reported that doctors, midwives, and nurses experience the high levels of stress at the workplace and burnout in Ireland and in Europe. 29,30 The introduction of the EHR represents a change in the way data are routinely entered in the maternity services in Ireland, and it may have increased the workload of the healthcare professionals. Currently, entering patients' information for both electronic and paper-based records is required. There is evidence of inconsistency in patients' medical documentation when both paper-based and electronic health records are used in the same system; this can impact health care staff's decision making in their daily practice. 31 The duplication of information might result in subsequent human error as healthcare professionals have to enter the same clinical information into two different databases at the same time. One of the recommendations is to combine the information from both systems where appropriate. 32 In our concordance study, register books had the highest number of missing records compared to the EHR and the HIPE database. Record keeping is a fundamental part of clinical practice. 33 In order to provide high quality data in record keeping, the World Health Organization set a list of strategies to maintain the quality documentation practice. 33 If books have to be used, regular audit of their contents and accuracy of their data should be carried out. However, the ultimate goal in medical documentation is to combine all the data into a unique electronic system. 32 Maybe, it is time to evolve to only electronic records to decrease the amount of workload for health professionals in the maternity services. Ultimately, health care professionals need to be aware of their important role when entering medical information in either paperbased records or EHR. It is also time to enable access to the electronic health records for simple data analysis such as numbers of miscarriage. Consequently, ensuring reliable data on type of miscarriage would allow investigation of the implications and morbidity of different management strategies. It might also influence future changes in the supports available to women who miscarry nationally and internationally. Finally, it is becoming more common that women who miscarry are managed and medically treated at the outpatient department, and in outpatient early pregnancy units. In order to improve protocols and care of women who experience miscarriage, both outpatient and inpatient data should be available in the national health systems.
In conclusion, using electronic health records (EHR) as a "gold standard," this study found a good and very good level of agreement between HIPE and register books (paper-based records) for identifying inpatient admissions for miscarriage in a tertiary maternity hospital in Ireland. However, the high number of missing records or unclear diagnosis limited the usefulness for monitoring and reporting the prevalence of miscarriage based on the register books. In addition, identification of the type of diagnosis of miscarriage varied significantly between the three data sources. According to the statistical analysis, EHR and HIPE are sufficiently reliable and valid databases for monitoring and reporting prevalence and trends in inpatient admissions of miscarriage at a national level, but some improvements are needed. However, the authors believe that EHR is the preferred source for obtaining types of diagnosis of miscarriage when it is assessed by an experienced and specialized professional in pregnancy loss.

Supporting information
Additional Supporting Information may be found in the online version of this article at the publisher's web-site: Table S1. Description of main ICD-10 AM of diagnosis of early pregnancy loss in HIPE.