Sixteen years of creatinine measurements among 460 000 individuals—The Funen Laboratory Cohort (FLaC), a population‐based pharmacoepidemiological resource to study drug‐induced kidney disease

Register‐based administrative data comprise the backbone of pharmacoepidemiological research. However, information from these registers lacks biochemical details. The aim of our study was to describe the creation, coverage and content of the Funen Laboratory Cohort (FLaC). FLaC is a database comprising all inhabitants of Funen, Denmark, who in the study period of January 2000 to December 2015 had their creatinine levels measured. Data were linked to the Danish nationwide registers with information on vital status, redeemed prescriptions, discharge diagnoses, and socio‐economic status. A total of 693 843 individuals lived on Funen during the study period, and we included 460 365 (66.4%) individuals with a creatinine measurement. In total, 7 742 124 creatinine measurements were performed during the study period. The coverage increased with increasing age, reaching 90%‐100% of all 65‐90 + year‐olds in 2015. We found that an overall coverage of individuals recorded in FLaC with at least one creatinine measured redeeming prescriptions from public pharmacies was 83% (interquartile range [IQR] 75%‐89%) compared to the entire Funen population. In total, 94.1% of all individuals with a discharge diagnosis of chronic kidney disease (CKD) were covered in FLaC, but only 16.5% (n = 3136) of all individuals with a laboratory‐confirmed CKD also had a discharge diagnosis of CKD. We described the creation and content of the FLaC ‐ a haven and a valuable resource for pharmacoepidemiological research using Danish nationwide administrative registers enriched with individual‐level biochemical information in a population‐based setting.


| INTRODUCTION
Register-based administrative healthcare and socio-economic data comprise the backbone of pharmacoepidemiological research. 1 Denmark has been named "the epidemiologist's dream" due to a government-funded universal healthcare, a tradition of record-keeping, and the possibility of individual-level linkage. 2 This is made possible by the Danish Civil Registration System, where all Danish citizens are assigned a unique personal identification number at birth (a CPR number). 3 A common way to identify diseases or conditions in Danish register-based studies is to use the Danish National Patient Register, which contains information on all in-and outpatient contacts, including discharge diagnoses. 4 A large number of validation studies have been conducted showing a large difference in the sensitivity, specificity and positive predictive values depending on the discharge diagnoses studied. 4 Discharge diagnoses of, for example, chronic kidney disease (CKD) and acute kidney injury (AKI) have been used to identify patients with these conditions. But discharge diagnoses as surrogate markers of AKI and CKD have been shown to underestimate the true incidence. 5,6 Large prospective cohorts exist where values of plasma or serum creatinine are collected, especially in the United States. 7 Unfortunately, these prospective cohort studies are prone to a lack of generalizability, because enrolled individuals are likely to differ substantially from the general population. 8 The aim of the study was to describe the content and coverage of the Funen Laboratory Cohort (FLaC) a population-based pharmacoepidemiological resource combining administrative register information enriched with results from blood sample analyses.

| METHODS
The content and coverage of FLaC during the period of January 2000 to December 2015 were described using utilization statistics for individual-level data as used in pharmacoepidemiological studies. 9 The study was conducted in accordance with the Basic & Clinical Pharmacology & Toxicology policy for experimental and clinical studies. 10

| Setting
The main island of Funen and the small islands surrounding Funen (hereafter mentioned as Funen), with a population of 491 474 in 2016, are located centrally in Denmark and are a part of one of the five regions in Denmark-the Region of Southern Denmark (1 211 770 inhabitants in 2016). We have previously demonstrated that the Region of Southern Denmark in most socio-economic and healthcare aspects is representative of the entire Danish population, 11 and this also applies to Funen 12 ( Figure 1).

| Cohort definition
Eligible individuals were all inhabitants on Funen at any time during the period of 1 January 2000 to 31 December 2015, who had one or more entries in the NetLab/BCC database.
We excluded individuals who had their creatinine levels measured, but did not reside on Funen at any time within the entire study period.
We also retrieved information from the above-mentioned nationwide Danish administrative registers on all individuals residing on Funen within the study period, who did not have their creatinine levels measured. We did this to characterize individuals who had their creatinine levels measured vs those who had not. Because we had administrative information of all Funen inhabitants including individual-level information on deaths and migration, we were able to keep track of individual follow-up. Thus, FLaC refers to the cohort of individuals with biochemical information available (individuals who had at least one creatinine measured within the study period) but the data resource also contains administrative information on all individuals who lived on Funen within the study period.

| Data sources
We created the FLaC by combining information from a clinical laboratory database (The laboratory database of Odense University Hospital [NetLAB/BCC]) and several nationwide Danish registers including the Danish National Prescription Register, 13 The Danish National Patient Register, 4

| NetLab/BCC
The laboratory database of Odense University Hospital is a clinical laboratory system which comprises data on all blood samples analysed in all hospital laboratories on Funen since 11 November 1999. The database is considered to provide full coverage of the geographical area. The coverage includes both primary and secondary healthcare providers as well as both in-and outpatients. However, results from point-of-care testing at general practitioners are not included. These tests represent an unknown proportion of the analysis results for C-reactive protein, sedimentation rate, haemoglobin and blood glucose that may also be measured point-of-care. Also, some arterial blood gas analysis and blood glucose were measured at hospital wards (primarily intensive care units and emergency departments) with independent equipment.
From March 2013 and onwards, the database was extended to include all blood samples analysed in hospital laboratories in the entire Region of Southern Denmark.
We retrieved all analysis results from individuals who had their plasma creatinine levels measured in the study period of 1 January 2000 to 31 December 2015. The variables included the personal identification number, the date of blood sample analysis, the nomenclature and results of the analysed blood test.

| Nationwide Danish registers
The Danish National Prescription Register contains data on all redeemed prescription drugs by Danish citizens at outpatient pharmacies since 1995 and onwards. 13 Prescription data include the Central Person Register number, date of dispensing, the substance, formulation, unit strength, brand name and quantity. Drugs are categorized according to the Anatomical Therapeutic Chemical (ATC) code, developed by the World Health Organization for purposes of drug use statistics. 15 The quantity for each prescription is expressed by the defined daily dose (DDD) unit, also developed by the World Health Organization. The prescribed daily dose and the treatment indication are not recorded systematically.
The Danish National Patient Register contains data on all contacts to Danish hospitals since 1977. 16 From 1995, outpatient diagnoses and emergency department contacts have been included in the register. Discharge diagnoses are coded according to International Classification of Diseases, eighth revision (ICD-8), from 1977 to 1993 and ICD-10 since 1994.
The Danish Civil Registration System contains data on vital status (date of birth and death) and migrations to and from Denmark. 3 Owing to the unique Central Personal Register number, and as the Danish National Health Service provides universal tax-supported healthcare for the entire Danish population, it is possible to conduct true population-based register-linkage studies covering the entire population. 3 The Income Statistics Register includes anyone who has submitted a tax return to the Danish Tax Administration, and as such the register covers anyone who is economically active with residence in Denmark, whether it be permanent or not. 14 The Income Statistics Register is updated annually with a delay of approximately 14 months (from the last day in the calendar year).
The Danish Education Register holds information on the individual's education length, unique institution identification number, individual enrolment and completion dates, as well as identifiers for ongoing and completed education. 17 Further, the database holds information from several other nationwide registers: the Danish Cause of Death Register 18 and the Danish Register of Sickness Absence Compensation Benefits and Social Transfer Payments. 19 A comprehensive list of variables and data formats is available in the referenced papers.

| Creatinine level measurements
Plasma creatinine was analysed on an Architect instrument (Abbott, Wiesbaden, Germany) using an enzymatic colorimetric method with end-up reaction. The measurement range was 8.8-3536 μmol/L. Quality control was assured with internal (SERO, Norway) and external (Ringversuche, Germany) control programmes. CV for inter-and intra-assay variation was <3.6%.
In June 2009, the creatinine analysis changed to a more sensitive enzymatic measurement method, which is accounted for in the dataset.
We defined CKD based on the KDIGO definition, 20 where an eGFR below 60 mL/min should be measured and recorded at least twice during a period of 6 months. The 6month period was divided into two: 0-3 and 4-6 months, respectively. If the case had more than one eGFR measured within the period of 0-3 months, the mean eGFR had to be below 60 mL/min, which also applied to the period of 4-6 months. We calculated the eGFR based on the CKD-EPIcrea formula. 21

| Analyses
Categorical demographic variables were presented as absolute numbers and proportions, and continuous variables with medians and interquartile ranges (IQR).
We calculated period prevalence proportions, that is the number of individuals with a creatinine measurement per 1000 in the population, from 2000 to 2015 using the total population living in Funen on 1 January of the relevant year as the denominator.
We presented the 20 most frequent types of laboratory tests among all individuals with a creatinine measurement within the study period.
We presented an adapted version of the Lorenz curve to describe the proportion of the total number of creatinine measurements that is accounted for by percentiles of individuals with a creatinine measurement. 22 We presented the proportion of drugs redeemed from public pharmacies covered in FLaC specified by each ATC code (2nd level, therapeutic subgroup) for the period of January 2000 to December 2015.
Stata Version 14.1 (StataCorp, College Station, TX, USA) was used for all analyses.

| Ethics and data protection
This study, as well as future studies regarding drug-induced kidney injury using the above-mentioned data sources, was approved by the Danish Data Protection Agency (j.nr 2008-58-0034) and the Danish Patient Safety Authority (j.nr. 3-3013-809/1). According to Danish law, studies based solely on register data do not require approval from an ethics review board. 23 The data from NetLab/BCC were securely transferred to Statistics Denmark and linked to the administrative registers hosted by Statistics Denmark. All access to data was through a research server placed at Statistics Denmark separated from the production network and via VPN access. The research server contained only de-identified microdata for research purposes subject to strict data protection protocols, approved by the Danish Protection Agency. All variables containing information which potentially could identify the individual were removed by Statistics Denmark. Statistics Denmark has strict rules against transfer of microdata back to the researcher, and only aggregated information could be emailed back. 24 3 | RESULTS

| Demographics
In total, 468 823 individuals had their plasma creatinine levels measured, of whom 8458 individuals did not live on Funen within the entire study period and were excluded from the cohort; thus, we included 460 365 individuals with a creatinine measurement within the study period. A total of 693 843 individuals lived on Funen during the study period, which leaves 233 478 (33.7%) without a creatinine measurement.
The entire population of Funen contributed with a total of 10 349 042 years of follow-up time within the study period, where individuals with a creatinine measurement contributed with 7 937 358 (76.7%) years of follow-up time.
The median age of patients with their first creatinine measurement in the study period was 44. Demographic characteristics of individuals with their initial creatinine measurement within the study period are presented in Table 1.

| Distribution of number of measurements
We described the distribution of number of measurements by use of the Lorenz curve. 9 In total, 1% of individuals represented 11.7% of the total number of creatinine measurements. 50% of individuals represented 91.0% of the total number of creatinine measurements ( Figure S1).

| Prevalence, laboratory tests and coverage
The prevalence of individuals who had their creatinine levels measured increased during the study period from 198.7 per 1000 individuals in 2000 to 423.4 per 1000 individuals in 2015 (Figure 2). A higher prevalence among females was observed throughout the study period.
The most frequent laboratory test to be analysed among all individuals with a creatinine measurement (excluding creatinine) was potassium (number of tests: 6 564 021; number of persons: 451 355) followed by sodium (number of tests: 6 456 346; number of persons: 451 032) and haemoglobin (number of tests: 5 769 845; number of persons: 421 652). The twenty most frequent tests are presented in Table 2.
The coverage was generally high and increased with increasing age, severity of comorbidity and among individuals with discharge diagnoses of different conditions (Table 1).  When assessing the coverage of the FLaC, only about 10% of all 5-10-year-old Funen inhabitants were covered. The coverage increased with age, thus covering 90%-100% of all 65-90+ year-olds in 2015 ( Figure S2).
The coverage of patients with co-morbid conditions based on identification using ICD-10 discharge diagnoses was high. The lowest coverage was among individuals with a discharge diagnosis of asthma (74.1%), whereas 94.1% of all Funen inhabitants with rheumatoid arthritis were covered in FLaC.
We found that an overall coverage of individuals recorded in FLaC with at least one creatinine measured redeeming prescriptions from public pharmacies of 83% (interquartile range [IQR] 75%-89%) compared to the entire Funen population. Among ATC categories with at least 10 individuals redeeming prescriptions, the highest coverage was the ATC code H05 (Calcium homeostasis) of 98%, followed by C10 (lipid modifying agents) of 95% and M09 (other drugs for disorders of the musculo-skeletal system) of 94.5%. The lowest coverage was in the ATC group P02 (Anthelmintics) of 62%. (Figure 3).

| Chronic kidney disease-laboratory measurements vs ICD-10 discharge diagnoses
Based on laboratory measurements, we identified 19 053 (2.8%) individuals with CKD of the 693 843 individuals who lived or had lived on Funen in the study period, whereas 8992 (1.3%) of the population had an ICD-10 discharge diagnosis of CKD. In total, 94.1% of all individuals with an ICD-10 discharge diagnosis of CKD were covered in FLaC, whereas only 16.5% (n = 3136) of all individuals with a laboratory-confirmed CKD also had an ICD-10 discharge diagnosis of CKD. Two thirds (n = 5856) of all individuals with an ICD-10 discharge diagnosis of CKD did not have a laboratory-confirmed CKD diagnosis.

| DISCUSSION
In this study, we have shown that two-thirds of all inhabitants on Funen had their creatinine level measured within the period of January 2000 to December 2015. Individuals who had their creatinine levels measured were more likely to be female and were older than those who did not have their creatinine measured. We found a high coverage, especially among individuals with high level of comorbidity and high age. We also found a high coverage of individuals redeeming prescriptions from public pharmacies. Also, we were able to show that only 16% of individuals with a laboratorybased CKD had an ICD-10 discharge diagnosis of CKD.
Large population-based studies of drug-induced kidney injury are gaining popularity these years, yielding interesting results with the likelihood to change the current clinical practice. [25][26][27][28][29][30] However, a number of these studies use surrogate markers of renal impairment or use laboratory values from selected patient populations. As we have reported in the current study, the accuracy of the CKD diagnosis is low, which increases the risk of selection bias.
Within the framework of FLaC, we can reproduce these studies or explore signals from clinical trials in a "realworld" setting. The framework allows other kinds of study methodologies, like the association between CKD and other diseases, 31 socio-economic status and CKD 32 or hypothesis-free screening studies. 33 One other resource of laboratory test results exists in Denmark-the clinical laboratory information system (LABKA) research database at Aarhus University, 34 and in Sweden, a cohort similar to ours has been developed. 35 Also, several international resources exist. The Kent Integrated Dataset (KID) contains information on diagnostic tests and test results. 36 The Stockholm CREAtinine Measurements project (SCREAM) covers all Stockholm residents with a creatinine level measured during the period of 2006 to 2011. The coverage and representativity were high. Similar to our findings, the coverage increased with age.
The current set-up has several strengths. The populationbased setting allowed us to record creatinine measurements regardless of an individual's socio-economic or insurance status. The completeness and systematic collection of the laboratory database and completeness of the nationwide administrative registers allowed the analyses to be conducted over a 16-year period with no risk of recall bias or drop-out.
The data of the current database are stored on approved research servers hosted by Statistics Denmark, containing only de-identified microdata undergoing strict data protection protocols. The access is only possible via VPN, which minimizes the risk of data breach.
We assessed the use of ICD-10 discharge diagnoses to define CKD as a surrogate marker of CKD and found that only 16% of all individuals with a laboratory-based CKD had an ICD-10 discharge diagnosis of CKD. Interestingly, two-thirds of all individuals with an ICD-10 discharge diagnosis of CKD did not have a laboratory-confirmed diagnosis. This could be because the look-back period of the two data sources was different, but also because the ICD-10 discharge diagnoses were more sensitive to more heterogenous coding practice.
Some limitations apply to the current set-up as well. Creatinine is widely used as a marker of renal function, and many primary and secondary care physicians rely on the blood test to monitor patients' renal function. The limitations of using creatinine to assess renal function have previously been addressed, where one in six patients with a creatinine level measured within normal range was diagnosed with renal impairment. 37 Therefore, several algorithms have been developed to estimate the glomerular filtration rate (eGFR) and to stage chronic kidney disease (CKD). 20 The widely used Modification of Diet in Renal Disease (MDRD) was recommended by the International Society of Nephrology until 2009, where it was replaced by the CKD-EPI crea formula. 21 Although it was possible to calculate both MDRD and CKD-EPI crea , we chose to present creatinine measurements in this study as our aim was to assess the use of creatinine measurements in a population-based setting and not to assess the renal function in a population. However, we presented measures of validity in ICD-10 discharge diagnoses of CKD.
Another limitation of the study is the potential for selection. While creatinine is widely assessed in in-hospital patients, individuals with severe or co-morbid conditions are more likely to have their creatinine measured than "healthier" individuals. However, this limitation also applies to other ways of identifying diseases (ie, by using discharge diagnoses).
It is not possible to extract data from FLaC to local projects, but we are very interested in collaborations targeted at FLaC's research aims, so please contact the corresponding author for further enquiry.

| CONCLUSIONS
We described the creation and content of the Funen Laboratory Cohort (FLaC)-a haven and a valuable resource for pharmacoepidemiological research using Danish nationwide administrative registers enriched with individual-level biochemical information in a population-based setting. Two thirds of all inhabitants on Funen had their creatinine levels measured during the period of January 2000 to December 2015. The region representativeness and coverage were deemed high, especially among older individuals with comorbid conditions, as well as among individuals who redeemed prescriptions from public pharmacies.