Utilizing Large Electronic Medical Record Data Sets to Identify Novel Drug–Gene Interactions for Commonly Used Drugs

Real‐world prescribing of drugs differs from the experimental systems, physiological‐pharmacokinetic models, and clinical trials used in drug development and licensing, with drugs often used in patients with multiple comorbidities with resultant polypharmacy. The increasing availability of large biobanks linked to electronic healthcare records enables the potential to identify novel drug–gene interactions in large populations of patients. In this study we used three Scottish cohorts and UK Biobank to identify drug–gene interactions for the 50 most commonly used drugs and 162 variants in genes involved in drug pharmacokinetics. We defined two phenotypes based upon prescribing behavior—drug‐stop or dose‐decrease. Using this approach, we replicate 11 known drug–gene interactions including, for example, CYP2C9/CYP2C8 variants and sulfonylurea/thiazolidinedione prescribing and ABCB1/ABCG2 variants and statin prescribing. We identify eight novel associations after Bonferroni correction, three of which are replicated or validated in the UK Biobank or have other supporting results: The C‐allele at rs4918758 in CYP2C9 was associated with a 25% (15–44%) lower odds of dose reduction of quinine, P = 1.6 × 10−5; the A‐allele at rs9895420 in ABCC3 was associated with a 46% (24–62%) reduction in odds of dose reduction with doxazosin, P = 1.2 × 10−4, and altered blood pressure response in the UK Biobank; the CYP2D6*2 variant was associated with a 30% (18–40%) reduction in odds of stopping ramipril treatment, P = 1.01 × 10−5, with similar results seen for enalapril and lisinopril and with other CYP2D6 variants. This study highlights the scope of using large population bioresources linked to medical record data to explore drug–gene interactions at scale.

Given the large number of drugs currently available and being licensed each year and the presence of potentially multiple pharmacokinetic (PK) pathways affecting absorption, metabolism, and excretion 2 for each drug, it is not feasible to undertake clinical studies of drug-gene interaction (DGI) for all drugs covering all PK pathways. Therefore, currently used DGI tools mainly report "potential" rather than clinically relevant actual interactions based on clinical studies. In addition, for many drugs, the knowledge of the entire PK pathway is unknown, limiting the ability to undertake candidate gene-based clinical studies. Thus, most of the current work on DGIs focuses only on well-known drug PK pathways and their established genetic variants.
Electronic medical records (EMRs) linked to biobanks offer the potential to explore interactions between multiple drug exposures and multiple genotypes and their impact on clinical outcomes at scale in a real-world setting. There are a limited number of studies that have utilized such bioresources for pharmacogenomics, and while some of these have been done at scale using UK Biobank (UKBB), they have largely focused on known pharmacokinetic drug interactions. 3 In this study, we aimed to discover novel and potentially clinically relevant DGIs between a large variety of commonly prescribed medications for chronic conditions and genetic variants in important enzymes and transporters, using multiple large populationbased bioresources linked to EMRs. Our discovery cohort was the combination of three Scottish cohorts (Genetic of Diabetes Audit and Research in Tayside Scotland (GoDARTS), Generation Scotland: Scottish Family Health Study (GS:SFHS), and Genetics of the Scottish Health Research Register (GoSHARE)). For replication or validation of our top findings we used the recently released primary care data from UKBB.
The GS:SFHS study [5][6][7] collected participants from Scotland as families rather than individuals. Data for around 24,000 participants aged 18-98 years were collected in the period from 2006 to 2011. Around 20,000 participants were genotyped using high density genome-wide chips (Illumina HumanOmniExpressExome8v1-2_A or HumanOmniExpressExome--8v1_A). The number of included SNPs was 604,858 before it increased into ~ 24 million after imputation. The longitudinal prescribing data were available for ~ 10,000 participants at the time of our study.
In the SHARE project, 8,9 individuals from Scotland aged ≥ 16 years have been recruited since 2011 with recruitment ongoing, aiming to recruit 1 million individuals. The GoSHARE project, 10 which is currently held in the Tayside area only, is a substudy from the SHARE project where participants are also asked to allow use of their genetic data. Approximately 5,000 participants had genotype data at the time of our study.
The UKBB study 11,12 recruited ~ half a million participants aged 40-69 years from the United Kingdom including England, Scotland, and Wales. Two novel closely related genotyping arrays were specifically developed for the UKBB genotyping project: Applied Biosystems UK BiLEVE Axiom Array by Affymetrix and Applied Biosystems UK Biobank Axiom Array (Thermo Fisher Scientific, Waltham, MA, USA). The former was used to genotype 49,950 individuals participating in the UK Biobank Lung Exome Variant Evaluation (UK BILEVE) study, and the latter was used for genotyping the remaining 438,427 subjects with a total of ~ 825,000 markers included, which increased into ~ 92 million markers after imputation. In September 2019, longitudinal prescribing data from primary care records for ~ 230,000 individuals had been released.
The combined Scottish cohort used in the present study for discovery consists of ~ 27,000 participants, which were combined and analyzed as one cohort as all data were from the same region using the same EMR. The primary care data on ~ 230,000 participants from the UKBB cohort were utilized for replicating/validating the top discovery findings.
Other than the GoDARTS cases who were selected to have diabetes, all other cohorts included unselected populations with a large variety of chronic conditions for which long-term treatment prescribing was required.

Selection of candidate common drugs
The cross-sectional self-reported prescribing data from the UKBB study was utilized to select the most frequently prescribed drugs for the 500,000 participants. More than 3,100 drugs were utilized at least once within the UKBB. Of these we selected the top 122 most frequently used drugs (i.e., have usage frequency no less than 1,000 times) and refined this to 50 drugs by selecting drugs used nonacutely (i.e., more than one prescription is usually required) and excluding natural products and supplements, hormonal replacement treatments, eye drops, topically used drugs, and inhalers. Supplementary Figure S1 illustrates the process of selecting the 50 drugs and Supplementary Table S1 shows the names of the 50 selected drugs.

Selection of candidate genetic variants
The full detail on how SNPs were selected is provided in the Supplementary Methods and Supplementary Figure S2. In brief we selected 35 genes that encoded all common drug-metabolizing enzymes or drug transporters 2 or were established as genes for which at least one of the 50 selected drugs is a substrate according to DrugBank database (https://go.drugb ank.com/); 1 additional gene encoding a relevant enzyme (PTGS1) was also included as, although it is not a gene involved in drug PK, it is relevant for some of the cardiovascular drugs included in our study. From these 36 genes, 757 SNPs were extracted from PharmGKB; 13 of these, 156 SNPs were absent in European populations and were excluded. We then further filtered on minor allele frequency (MAF) (≥5%) and excluded SNPs that deviated from Hardy-Weinberg equilibrium (HWE) (P < 1×10 −8 ) for each of the three Scottish Cohorts. This resulted in 320 SNPs (GoDARTS), 322 SNPs (Generation Scotland), and 312 SNPs (GoSHARE) being selected. Finally, we pruned the SNP selection based upon linkage disequilibrium (LD). Here we first prioritized 26 well-known pharmacogenetic variants reported by PharmGKB and excluded 82 correlated SNPs (r 2 ≥ 0.5). Next, we reviewed the remaining SNPs and identified pairs or groups that were correlated with each other; of these we selected one SNP from each pair or group, retaining 60 SNPs. Finally, the 76 remaining SNPs were retained. Overall, this resulted in 162 independent SNPs being taken forward for analysis. MAF and HWE results for these SNPs are presented in Supplementary Table S2.

ARTICLE
Defining drug response phenotypes We developed two drug response phenotypes that could be applied generically across all the drugs under study. As such these relied on drug prescribing behavior rather than measured effects of each drug (e.g., blood pressure or cholesterol reduction).
The two phenotypes were "drug-stop" and "dose-decrease, " as stopping a drug after only one prescription was considered to be a surrogate for an intolerable side effect, lack of therapeutic efficacy, or both, and a dose decrease similarly indicated intolerance or extreme efficacy. While we acknowledge there may be other reasons for why people reduce their dose or stop a drug that may not reflect drug efficacy or tolerance, we have successfully previously used these surrogate indicators to identify patients with potential statin 14 or metformin 15 intolerance.
For the "drug-stop" phenotype, cases were those who received a single prescription of a drug that is usually prescribed for the long term. Controls were those who stay on treatment for two or more prescriptions. A single prescription usually covers 3-4 months of treatment supply.
For the "dose-decrease" phenotype, cases were those who reduced their daily dose at a certain point of time during the treatment period. Controls were those who never reduce their dose while treated with the drug.
We did not consider a "dose-increase" phenotype in the present study as a proxy of inefficacy as for some drugs, the drugs are introduced at low dose with subsequent dose titration. Here, dose increase does not reflect inefficacy. Similarly, we chose not to include another measure of inefficacy-addition of a second drug. This in part is because this is difficult when considering all commonly used drugs, and also because addition of a second drug may not reflect drug inefficacy as it may reflect progression of the underlying disease.
Testing the association between genetic variants and the drug response phenotypes A logistic regression model for the drug-stop or dose-decrease phenotype was undertaken with a log-additive genetic model to explore the associations between the 162 selected genetic variants and both phenotypes for all 50 drugs. Given the 162 SNPs being assessed, we considered the Bonferroni-adjusted P value ≤ 0.00030 (0.05/162) to be significant. Analysis was performed with the "SNPassoc" package in R: RStudio, Boston, MA, USA) (version 4.0.0). 16 Where there was a strong prior support, i.e., evidence of the literature already existed for a drug-gene variant interaction, we accepted a P < 0.05 as significant.
We also defined a group of DGIs that were of potential interest but lacked sufficient statistical significance or prior evidence. We include these for interest but acknowledge that further replication is required. We selected drug-gene pairs where (i) the drug is a known substrate for or is affected by the protein coded by the gene; (ii) the genetic variant has been associated with at least one drug response phenotype or we have supporting evidence from the UKBB; and (iii) the P value significance level is moderate (>0.0003 and ≤0.009).

Replicating the top findings in UKBB
We utilized the UKBB primary care prescribing data to replicate the top findings from the discovery cohort. We mirrored the "drug-stop" and "dose-decrease" phenotypes we have used in the discovery cohorts. We also explored the drug-specific phenotype of blood pressure reduction. For systolic blood pressure (SBP) reduction we analyzed White British UKBB participants where SBP measurements were available from 1 year prior to starting treatment and up to 1 year after initiation of treatment. The pretreatment SBP was the mean of all measures in the 1 year prior to treatment; the posttreatment SBP was the mean of all measures in the 1 year after treatment initiation. A multiple linear regression model was then used with on treatment SBP as the dependent variable, and the explanatory variables being pretreatment SBP, age, sex, and genotype.

RESULTS
The results for the analysis of the 50 commonly used drugs and 162 independent genetic variants for both the drug-stop and dosedecrease phenotype are available via an online database at: https:// c1abo 933.caspio.com/dp/d81f 7 000f8 97fd1 1b108 465c9be4. Overall, we identified 815 DGIs with a significance level of P ≤ 0.05; 8 of these were significant after Bonferroni correction and a further 11 had strong prior evidence for a drug-variant association.

Significant associations after Bonferroni correction
These results are summarized in Table 1, along with replication results from UKBB. We found supporting evidence for replication in UKBB for two of these drug-gene pairs: quinine and rs4918758 in CYP2C9 and doxazosin and rs9895420 in ABCC3.
We also outline here one other nonreplicated but potentially interesting association: ramipril and rs1135840 (CYP2D6*2).
Doxazosin-rs9895420 (ABCC3). The odds of decreasing the daily dose of doxazosin were reduced in A-allele carriers at rs9895420 (T > A) in ABCC3, with each A-allele associated with a 46% (24-62%) reduction in odds of dose decrease (P = 1.2 × 10 −4 ). In UKBB the variant allele was associated with a 10% reduction in odds of dose decrease, but this was not significant (P = 0.339).
We then explored the association between rs9895420 SBP lowering with doxazosin in the replication cohort (UKBB). Compared with the T-allele, each A-allele was associated with a 1.0 mmHg greater reduction in SBP related to doxazosin treatment (P = 0.0089). AA homozygotes had a 2.1 mmHg greater reduction in SBP than the TT homozygotes. Overall, our results suggest that the rs9895420 in ABCC3 is associated with greater tolerability and greater efficacy (either directly or through increased adherence) of doxazosin.
Ramipril-rs1135840 (CYP2D6). The C-allele at rs1135840 (G > C, CYP2D6*2), which represents an extensive metabolizer phenotype, was associated with a 30% (18-40%) reduction in odds of stopping ramipril treatment (P = 1.01 × 10 −5 ). Of note, the loss-of-function variant, rs3892097 (CYP2D6*4), that All results show the effect of the minor allele compared with the major allele. The common formula is: rs-id (major allele > minor allele) unless it is stated otherwise in the table.
ARTICLE is in LD (D′ = 1, R 2 = 0.21) was also associated with the same phenotype but with opposite direction of effect, a 29% (8-54%; P = 0.0065) increased odds for stopping the drug. Unfortunately, the rs1135840 CYP2D6*2 SNP and *4 deviated significantly from HWE in the UKBB cohort and therefore could not be used for replication. However, further support for this finding can be seen with other angiotensin-converting enzyme (ACE) inhibitors in our data. The CYP2D6*4 variant was also associated with a 31% (1-72%) increased odds of decreasing enalapril daily dose (P = 0.053) while the CYP2D6*2 variant was associated with a 13% (1-23%) lower risk to decrease lisinopril daily dose (P = 0.032).
Oral hypoglycemic agents and CYP2C9/CYP2C8 variants. Of note, two well-known functional variants, rs1057910 (A > C, CYP2C9*3) and rs10509681 (T > C, CYP2C8*3), previously shown to reduce catalytic activity of the CYP2C9 enzyme and increase activity of the CYP2C8 enzyme, respectively, were found to be associated with response to oral hypoglycaemic drugs. Compared with the wild type CYP2C9*1, the CYP2C9*3 allele was associated with a 26% (7-48%) increased odds of reduction in daily dose of gliclazide (P = 0.007). In addition, the CYP2C8*3 allele was associated with a 33% (3-70%) increased odds of stopping pioglitazone treatment (P = 0.026). Consistent with these findings, these two variants (CYP2C9*3 and CYP2C8*3) have been reported to be associated with increased gliclazide-induced hypoglycemia and decreased pioglitazone plasma levels respectively. 17,24 Statins and ABC transporters family. The minor allele (A) in the ABCB1 SNP rs2032582 (Ala893Thr) and the T-allele at rs2231142 (G > T) (Gln141Lys) ABCG2 SNP were associated with a 15% (1-26%) lower odds of stopping atorvastatin (P = 0.035) and 18% (5-30%) lower odds of stopping simvastatin (P = 0.0069) treatments, respectively. These two alleles have previously been reported to be associated with increased atorvastatin efficacy and decreased simvastatin clearance, respectively. 18,26 Other results of potential interest We have identified 12 novel drug-variant interactions of potential interest. All of these 12 results with their supporting evidence  are summarized in Supplementary Table S3. Here, we highlight two examples.
Clopidogrel-rs12353214 (PTGS1). The rs12353214 (C > T) variant is in the gene encoding the PTGS1 enzyme. This enzyme is responsible for the production of prostaglandins which facilitate clotting formation, and therefore genetic variability in this gene could affect clopidogrel efficacy as has been shown previously. 28 Each T-allele at rs12353214 was associated with 43% (20-59%) lower odds of stopping clopidogrel treatment (P = 5.3 × 10 −4 ). A similar, albeit nonsignificant, result was seen in UKBB where each T-allele was associated with 14% lower odds of stopping clopidogrel therapy (P = 0.067).
Atenolol-rs628031 (SLC22A1). Each A-allele at rs628031 in SLC22A1 (encoding OCT1) was associated with a 21% (7-38%) increased odds of stopping atenolol treatment (P = 0.0034). Consistent with this, atenolol has been recently reported to be transported by OCT1 38 similarly to metformin and the A-allele has shown to be associated with increased risk of metformin intolerance. 39

DISCUSSION
This is the first pharmacogenomic study covering a large variety of commonly used drugs for chronic conditions in the United Kingdom and a comprehensive range of 162 genetic variants across the key enzymes and transporters involved in drug pharmacokinetics. We have identified 11 drug-variant interactions that have been previously reported, and 8 novel drug-variant interactions of which 3 have been replicated or validated in an independent data set. This study highlights the scope of using large population bioresources linked to prescribing and other medical record data to explore DGIs at scale.
The fact that we replicate a number of known interactions provides validation of the approach used. These are outlined in Table 2; identifying that CYP2C9/CYP2C8 variants alter prescribing behavior with sulfonylureas/thiazolidinediones and ABCB1/ABCG2 variants alter prescribing behavior with statins in ways that are consistent with the known literature is reassuring in that the surrogate phenotypes of "drug-stop" and "dose-decrease" can be used to identify biologically plausible DGIs when large population biobanks are studied.
It is beyond the scope of this report to discuss all potentially novel DGIs identified here; instead we focus on the three druggene pairs where we have independent replication or validation in the UKBB. We have detected these using a surrogate for drug efficacy or side effects manifest in altered prescribing behavior. There are many ways that this might occur. The most plausible mechanism would be via alteration of drug metabolism or transport, i.e., a direct consequence of the variant's alteration of enzyme or transport function and altered drug PK and that this in turn leads to a difference in clinical condition, such as evidence of increased or reduced efficacy or the experience of side effects and a consequent decision to change prescribing. However, this does not necessarily have to apply to the parent drug; the altered PK may affect a metabolite of the drug, and this may result in the altered drug use. Thus, detailed PK studies of all drug metabolites in relation to the variants identified would be required to establish whether the altered prescribing behavior is indeed due to altered PK. It is also possible that the variant may have an impact on drug prescribing via drug-PK independent mechanisms. Although we have limited this analysis to include only variants in PK genes, these variants can result in phenotypes unrelated to the drug under consideration Decreased plasma concentration > decreased efficacy.

(Continued)
ARTICLE that might account for the altered prescribing patterns (for example, as outlined below for quinine and CYP2C9).

Quinine/CYP2C9
The C-allele at rs4918758 (T > C) in CYP2C9 gene was associated with lower odds of a dose decrease of quinine. This finding was replicated among quinine users from the UKBB and can be explained either by reduced efficacy or better tolerability in variant carriers. Quinine is extensively metabolized in the liver, primarily by CYP3A4 but other enzymes including CYP2C8, CYP2C19, and CYP2C9 have been reported to be involved. 51 The rs4918758 (T > C) variant in CYP2C9 has been previously associated with decreased warfarin dosage requirements in a Korean population, 30 suggesting that the variant might be associated with decreased enzyme activity. In addition, rs4918758 is in strong LD with the rare loss-of-function missense variant CYP2C9*8 (rs7900194 (G > A), MAF = 0.02 in the European population). This variant has been previously correlated with decreased warfarin clearance 52 and decreased phenytoin metabolism. 53 Of note, CYP2C9 reduced activity variants CYP2C9*2/*3/*11 were recently reported to have a possible role in decreased therapeutic efficacy of the quinine derivatives chloroquine and primaquine in treating malaria. 54 Consistent with this, our results also show that both CYP2C9*2 and CYP2C9*3 variants are associated with decreased odds of reducing quinine dose. Taken together these results suggest that the observed quinine/CYP2C9 interaction with reduced efficacy or increased tolerability is associated with reduced function of CYP2C9. This seems unlikely to be an effect in quinine metabolism per se, which is largely metabolized by CYP3A4, but might reflect altered metabolism of downstream metabolites.
As discussed above, the interaction observed may not be mediated via a PK interaction. A recent study 55 reported that rs4918758 was associated with decreased coronary heart disease risk. Given that a main side effect of quinine is cardiac toxicity (i.e., prolongation of QT and arrhythmias), the DGI could be explained by a reduction in cardiac toxicity with quinine in carriers of this cardioprotective variant.
In the United Kingdom the predominant use of quinine is for the treatment of leg cramps rather than as an antimalarial. The side effects that might lead to cessation of quinine are wide ranging including abdominal pain, headache and vertigo, skin reactions, prolongation of the QT interval, and thrombocytopenia. To fully explore the interaction between quinine and CYP2C9-rs4918758 would require access to primary care free text as these side effects are not likely to be well recorded in structured data. Primary care unstructured (free text) data are not yet available in the UKBB or the Scottish cohorts, but hopefully will become available in due course, allowing further interrogation of this potentially important DGI.

Doxazosin/ABCC3
The A-allele at rs9895420 (T > A, A-189 T) promoter variant in ABCC3 was associated with lower odds of dose reduction of doxazosin, and in the UKBB we showed the same allele was associated with greater blood pressure reduction with doxazosin treatment. Doxazosin is mainly metabolized in the liver, and 63%  ARTICLE of the dose is excreted in the feces, 56 suggesting a potential role of hepatic transporters in its elimination. The ABCC3 transporter is expressed in the hepatic basolateral membrane pumping its substrates back into systemic circulation. It is unknown whether the ABCC3 transporter contributes to the elimination of doxazosin or its metabolites, but the A-allele at rs9895420 variant has been previously linked with increased ABCC3 activity, 19 reduced efficacy of methotrexate in the treatment of childhood acute lymphoblastic leukemia, and increased plasma levels of methotrexate but also reduced gastrointestinal toxicity. 57 These results suggest that, if doxazosin is transported by ABCC3, the A-allele at rs9895420 that increases ABCC3 activity could result in increased doxazosin systemic exposure and the increased efficacy of doxazosin we observe. Further studies on ABCC3 and doxazosin transport are required.

Ramipril/CYP2D6
We report a strong association between the variant allele at rs1135840 (G > C) in CYP2D6 (CYP2D6*2) and lower odds of stopping ramipril treatment, suggesting greater efficacy or better tolerability. Ramipril is a prodrug which undergoes renal and hepatic metabolism to be converted into its active metabolite ramiprilat. 58 Seventy-five percent of ramipril metabolism occurs in the liver with 25% catalyzed by esterases; 58 however, whether the CYP2D6 enzyme also plays a role in ramipril metabolism or metabolism of ramipril metabolites is not yet known. The minor allele (C) at rs1135840 (G > C) (CYP2D6*2) represents the extensive (normal) metabolizer phenotype. 59 Interestingly, supportive of a potential role of CYP2D6 in ramipril metabolism, we also show that the loss-of-function variant rs3892097 (C > T) (CYP2D6*4; D'=1, R 2 = 0.21 with CYP2D6*2) is associated with a 29% (8-54%) increased odds of stopping ramipril (P = 0.00654). The fact that these variants did not pass quality control (out of HWE) in the UK Biobank limits our ability to directly replicate these findings; however, we do find a similar signal in the Scottish discovery cohorts for both enalapril and lisinopril, suggesting that this finding is consistent and applies to all ACE inhibitors. Interestingly, we also note a case report for a patient homozygous for the CYP2D6*4 variant and who discontinued ramipril therapy shortly after starting it due to ramipril-induced dry cough, which is consistent with our finding that this variant is linked with increased likelihood of stopping ramipril. 60 In order to predict the degree of clinical relevance for the above three drug-variants associations, it could be helpful to look at the percentages of drug users in the Scottish cohort along with the distribution of the variant allele in different ethnic groups as shown in Supplementary Table S4. The higher the distribution of the variant and the more the drug is used, the more common the expected DGI.

Strengths and limitations
There are advantages and disadvantages to the approach we apply. Firstly, we have access to two large longitudinal population biobanks with linked medical record data, and this enables the comprehensive study of all important ADME variants for all commonly used drugs. However, to apply this approach we have to develop generic response phenotypes that apply to all drugs, resulting in the use of the prescribing behaviors-drug stop and dose decrease. Other longitudinal data are available to enable drugspecific phenotypes to be modeled-i.e., blood pressure reduction and glycated hemoglobin A 1c reduction-but this would limit the drugs we could study as many adverse drug reactions (ADRs), such as nausea, vomiting, back pain, etc., cannot be quantified. In addition, direct records on ADRs were not available from the Scottish cohort with very limited data from the UKBB on drug-induced ADRs. The other advantage of our selected phenotypes is that they provide direct evidence on a potential change in prescribing behavior as a result of a certain DGI. However, the exact reason behind these resultant phenotypes cannot be identified due to the lack of data. More confidence can be gained, however, for results which have been replicated, validated, or are supported by previous research as shown in our study.
Secondly, there are differences in the cohorts we use for discovery and replication; notably, the discovery cohorts utilize dispensed prescription data, whereas the replication UK Biobank cohort utilizes only issued prescriptions. This may have an impact on the prescribing phenotypes we use, especially the drug-stop phenotype, where we have more certainty in the Scottish data that a prescribed drug has actually been taken. Finally, we do not consider concomitant medication. We recognize that drug-drug-gene interactions are important 2 and that this is an area to explore in further studies as even larger data sets become available.
Although our approach has identified many novel and replicated/validated associations along with results consistent with previous findings, our analysis also identified a few drug-gene associations which were not statistically significant even though they were significant in previous studies, or were discordant in direction. This is to be expected due to differences in sample sizes, ethnic groups, and/or the nonspecific and relatively noisy phenotypes used for discovery.

CONCLUSIONS
In this study we provided, for the first time, a large coverage of clinical pharmacogenomic associations between 162 genetic variants and 50 commonly used drugs for chronic conditions in the United Kingdom, with all results available in an online resource. We replicate 11 associations consistent with previously reported findings, validating the methods we have applied in this study, and identify 8 novel associations; 3 of which have been replicated or validated. Overall, we have established the utility of large population biobanks linked to electronic healthcare records to interrogate for potential clinically important DGIs. Our approach parallels the more targeted approach undertaken in the UKBB recently, 3 which also establishes the merits of "pharmacogenetics at scale." As these resources increase-for example in Scotland the SHARE bioresource has ~ 270,000 individuals consented for use of spare blood for genetic analysis and linked healthcare records, and the UKBB has only released half of the cohort primary care data for non-COVID research-there will be considerable potential for discovery of novel clinically actionable drug-gene and drug-drug-gene interactions.