Advancing drug discovery using the power of the human genome

Abstract Human genetics plays an increasingly important role in drug development and population health. Here we review the history of human genetics in the context of accelerating the discovery of therapies, present examples of how human genetics evidence supports successful drug targets, and discuss how polygenic risk scores could be beneficial in various clinical settings. We highlight the value of direct‐to‐consumer platforms in the era of fast‐paced big data biotechnology, and how diverse genetic and health data can benefit society. © 2021 23andMe, Inc. The Journal of Pathology published by John Wiley & Sons, Ltd. on behalf of The Pathological Society of Great Britain and Ireland.

From non-clinical models to human genetics All drugs entering human trials have shown evidence of efficacy in non-clinical models of disease, and yet a large fraction fail to demonstrate efficacy in humans. Of phase II trials conducted between 2005 and 2015, 51% failed to achieve their prespecified primary objective [1]. Within AstraZeneca from 2005 to 2010, lack of efficacy was responsible for the closure of 57% of phase IIa projects and 88% of phase IIb projects [2]. Clearly, efficacy in treating non-clinical disease models is not always an adequate proxy for efficacy in treating human disease. Human genetic studies take advantage of naturally occurring genetic variations that may mimic the effect of therapeutically perturbing a gene. Unlike studies of animal or in vitro models, human genetic studies are well-suited to the task of establishing a relationship between human disease and variation in the activity of a potential drug target or pathway, thereby decreasing the probability that a drug trial will fail due to lack of efficacy [3].
When the draft human genome was published in 2001, authors from the International Human Genome Consortium wrote: 'Knowing the complete set of human genes and proteins will greatly expand the search for suitable drug targets. Although only a minority of human genes may be drug targets, it has been predicted that the number will exceed several thousand, and this prospect has led to a massive expansion of genomic research in pharmaceutical research and development' [4]. Initial efforts were focused on identifying the consensus sequence of all genes that were homologous to existing drug targets and all druggable genes so that they could be tested for therapeutic potential, but the effect of genetic variation on gene function or activity has since come to play a much larger role in the field.
Genetics-driven drug discovery has had notable successes for Mendelian disorders (see Glossary of terms), in which rare genetic variants have large effects on the function of a single gene. Examples include enzyme replacement therapies for lysosomal storage diseases [5] and nusinersen for spinal muscular atrophy [6]. Many of the diseases that cause the greatest global morbidity and mortality also have Mendelian subtypes. For example, about 11% of early onset Alzheimer's disease cases are due to mutations in APP, PSEN1, and PSEN2 [7]. Nelson et al [8] found that drugs were about 7.2 times more likely to be approved if the drug's target was linked to a Mendelian form of the disease for which the drug was indicated. Follow-up work by King et al [9] also estimated that the odds of approval were more than six times higher given Mendelian genetic support. With the advancement of sequencing technologies, more rare genetic causes of common diseases have been discovered [10][11][12][13][14][15][16]. The increasing number of whole-exome and whole-genome sequences will further shed light on the low-frequency end of the spectrum of human genetic variation (e.g. The 1000 Genomes Project [17]; Haplotype Reference Consortium [18]; The Genome Aggregation Database [19,20]; and Trans-Omics for Precision Medicine program [21]).
However, for the vast majority of highly prevalent diseases, the heritable risk is driven by a large number of common variants (often in the form of single nucleotide polymorphisms, i.e. SNPs, see Glossary of terms) with much smaller individual effect sizes [22]. This finding comes as a result of the widespread application of genome-wide association studies (GWAS, see Glossary of terms) to scan the genome to look for associations of genetic variants with disease risk. Nelson et al [8] and King et al [9] investigated whether genetic support from GWAS was predictive of drug approval. Retrospectively, they found that drugs with GWAS support were at least two times more likely to be approved, particularly if the GWAS signal appeared to be driven by a mutation that altered the amino acid sequence of the gene product [8,9].
The era of human genetics-driven drug discovery Increasing focus on human genetics by academia and industry has caused the number of genetic associations recorded in the GWAS Catalog (https://www.ebi.ac.uk/ gwas/) to expand rapidly in the past few years, providing novel leads for genetics-driven drug discovery. This growth will probably continue, given the availability of large and diverse databases of genotyped individuals, such as The China Kadoorie Biobank (www. ckbiobank.org), Biobank Japan (http://jenger.riken.jp/ en/), the UK Biobank (https://www.ukbiobank.ac.uk/), the Million Veteran Program (US Department of Veterans Affairs, https://www.research.va.gov/mvp/), the All of Us Research Program (NIH, Bethesda, MD, USA, https://allofus.nih.gov/), and direct-to-consumer databases. In addition, several countries with singlepayer healthcare systems (such as Denmark, Estonia, Finland, Iceland, and The Netherlands) have established national biobanking infrastructure and large-scale population genotyping initiatives [23].
The number of individuals who volunteer their data through various platforms for advancing biomedical research has led to substantially larger genetic studies than would have been possible otherwise. In 2016, the largest GWAS meta-analysis at the time was published on major depressive disorder [24]. In 2018, genetic analyses were conducted in over 1 million individuals for blood pressure traits [25]. In 2019, a meta-analysis of tobacco and alcohol use and a meta-analysis of insomnia included approximately 1.2 million and over 1.3 million individuals, respectively [26,27].
One approach to increase the power of GWAS for drug discovery is to scale participation through directto-consumer platforms. Conventional biobanks create repositories of biospecimens from recruited participants that are later analyzed. Under the direct-to-consumer model, customers' DNA is genotyped and analyzed to provide insights regarding their ancestry, health risks, and other traits that are influenced by genetics. These customers may then volunteer their genetic and phenotypic data for research purposes, engaging and empowering a wide range of participants. Today, 23andMe, Inc. (http://www.23andme.com), a direct-to-consumer genetics company established in 2006, has a database that includes more than 12 million customers. Approximately 80% of the customers actively opt in and consent to research and have contributed over 3 billion phenotypic data points. Genealogy companies with large customer bases, such as MyHeritage, have also recently expanded to include DNA testing and health products, and have significant potential to grow in scale.
Within the GWAS catalog, studies on adult height as a model polygenic trait have achieved some of the largest sample sizes and continue to grow considerably over time. The number of independent risk loci identified for height has grown proportionally to the increase in sample size ( Figure 1A), previously also observed in Panagiotou et al [28]. A similar trend is seen in the 23andMe database across a wide range of disease phenotypes ( Figure 1B). Even with very large sample sizes, we anticipate that the availability of large-scale genotyped cohorts will continue to yield approximately proportional increases in the number of discovered GWAS associations. Larger study cohorts often correlate with greater discovery power and, therefore, should accelerate therapeutic target discovery. GWAS is powered to find associations that explain the largest proportion of phenotypic variation first. As sample sizes increase, the individual effect sizes of the newly discovered associations will probably be smaller, or allele frequencies lower [29]. Even such, these associations may drive new therapeutic hypotheses as the effect of the allele in the population usually differs from the therapeutic effect of a drug (e.g. statins). For fine mapping, van de Bunt et al [30] showed, via simulation and empirical data, that the sizes of credible sets, defined as the minimum set of variants 95% likely to contain the causal variant [31], negatively correlate with the power to detect association signals, thereby increasing the confidence in identifying a causal variant or gene. Metaanalyses are now regularly employed to achieve larger study sample sizes. Additionally, heterogeneity analysis may identify possible false-positive findings due to biases originating from single studies and serve as some level of replication [32] to further boost the confidence in therapeutic hypotheses.
Whereas phenotyping of individuals in cohorts derived from health care systems may be performed by both computational analysis of electronic medical records and data that are self-reported via web-or smart phone-based questionnaires, direct-to-consumer companies primarily rely on the latter. Self-reporting has proven to be an effective method to collect health and medically relevant data at scale. A proof-of-concept study showed 100% concordance between self-reported Parkinson's diagnosis and neurologist assessments in 50 patients [33] and an early set of GWAS based on self-reported medical phenotypes was able to replicate 75% of National Human Genome Research Institute (NIH)-curated genetic associations [34]. A two-stage GWAS design that used self-reported data in the discovery phase and clinically ascertained patients in the replication phase has further validated the use of 'self-reported data as a platform for discovery' [35].
Self-reported phenotypes are imperfect. For example, numerical laboratory values are not well-suited for selfreporting. These phenotypes may suffer from both reporting of misdiagnoses (e.g. mild cases of eczema versus psoriasis) and incorrect reporting of diagnoses (e.g. osteoarthritis versus rheumatoid arthritis). Whereas the latter can be mitigated by asking follow-up questions and aggregating answers to several related questions, the former will be a much greater challenge. The construction of accurate disease phenotypes from medical records also has its difficulties, as diagnoses may only be present in the unstructured text of clinical notes or in the form of billing codes justifying tests or procedures that are later rejected with additional information [36]. In the case of both electronic medical record-based phenotyping and self-reporting, these potential shortcomings are typically offset by the scalability and speed of data collection for GWAS purposes, where scale can be a dominant factor for discovery. As a testament to the validity of the self-report approach, the UK Biobank has also adopted self-reporting for data collection, in addition to the use of medical records. However, as a result of either misdiagnosis or misreporting, the potential non-specificity of the association between a locus and a disease will need follow-up confirmation [37]. A recent analysis using UK Biobank data compared GWAS using cases derived via hospital records versus those via verbal questionnaires. Importantly, the study examined variants beyond previous replication studies that focused mostly on genome-wide significant associations. They found high genetic correlations (>0.8) for 27 of 41 phenotypes studied and showed that combining the two phenotyping methods does not significantly alter GWAS effect size estimates. The increase in sample size by leveraging both phenotyping methods improved the power of identifying alleles associated with disease risk. Hence, utilizing self-reported data together with structured hospital records can enhance human genetics studies [38].
A disproportionate number of published GWAS so far have focused on individuals of European descent [39][40][41][42]. As of 2018, fewer than 20% of study participants in the GWAS catalog were non-European, despite making up greater than 80% of the global population [43]. To increase the understanding of human diversity and to improve on health equality, establishing study cohorts from under-represented populations is critical. Individuals of European descent represent only a limited fraction of the total human genetic variation. Studies in populations with African and/or Latino ancestry tend to find a greater number of genetic associations when compared with studies in an equivalent number of Europeanancestry individuals [44]. Diverse cohorts represent unique opportunities for identifying novel drug targets based on genetic variants that are less frequent or even  Table S1 for details of the studies used). The associated publication for each study was manually assessed, excluding (1) GWAS of traits other than adult height, (2) GWAS of individuals of European ancestry with fewer than 19 000 cases, and (3) GWAS conducted using whole-genome or whole-exome sequencing data. SNPs with p > 5 × 10 −8 and SNPs that were only identified by conditional analysis were also excluded. The color of the points represents the ancestry of the individuals included in the study (black = East Asian; gray = European; gold = multi-ethnic). (B) Trajectories for a selection of GWAS for 126 23andMe disease phenotypes conducted in individuals of European ancestry at four time points between October 2017 and August 2019. Effective sample size is defined as N eff = 4/(1/N cases + 1/N controls ) for binary phenotypes and is equal to the sample size for continuous phenotypes. Trajectories for autoimmune diseases and infection phenotypes are highlighted in blue and pink, respectively.

420
K Heilbron, SV Mozaffari et al absent in people of European ancestry. Multiple APOL1 gene variants that are specific to African Americans were found to be associated with chronic kidney disease [45,46]. Many diseases have greater prevalence in non-Europeans. For example, according to the most recent data from the US Centers for Disease Control and Prevention (https://www.cdc.gov/asthma/most_ recent_national_asthma_data.htm), Puerto Rican children are two to four times more likely to have asthma compared with non-Hispanic Whites [47]; data from the National Institute of Diabetes and Digestive and Kidney Diseases (https://www.niddk.nih.gov/healthinformation/kidney-disease/race-ethnicity) show that African Americans are four times more likely to have end-stage kidney disease compared with Americans of European ancestry [48]. Genetic discoveries will have greater discovery power in populations where a disease is more prevalent and, hence, with larger disease cohorts; at the same time, these discoveries will be more relevant and be beneficial for these populations.
Improving participation and recruitment is one important avenue for increasing the ethnic diversity of human genetic studies [49,50], and where very large genetic cohorts can play a vital role. For example, although the majority of the 23andMe customer base is made up of individuals of predominantly European ancestry (73%), given the large number of research participants, even relatively smaller Latino (12%) and African-American (4%) cohorts are among the largest in the world. As of 2019, among those who have consented to participate in research, the 23andMe database included over 300 000 African-American individuals, compared with approximately 148 500 (18% of approximately 825 000) veterans enrolled so far in the Million Veteran Program (2019) [51,52] or approximately 46 000 (20% of approximately 230 000) participants enrolled in the NIH All of Us study cohort (2020) [53]. 23andMe launched the African genetics project in 2016 and the Global Genetics Project was launched in early 2018 to recruit customers from under-represented countries.
Studies of populations with historically small population sizes (e.g. Iceland's deCODE database [https:// www.decode.com/] and Finland's FinnGen research project [https://www.finngen.fi/en/]) and cohorts with a high rate of consanguinity (e.g. the Pakistan Risk of Myocardial Infarction Study [54], https://www.phpc.cam.ac.uk/ ceu/promis/) also offer unique opportunities for therapeutic discovery. deCODE genetics was acquired by Amgen in 2012 [55], and FinnGen currently has 12 industry partners [56]. Strongly deleterious mutations that disrupt gene function may persist at higher frequencies in smaller populations and provide insights into the function of human genes. As such, some of the genetic variants with the largest effect sizes have been identified in cohorts with unique population structures [57][58][59], with PCSK9 being an example [60]. One limitation of these cohorts is that they only have access to the genetic variation within the population. If these populations are bottlenecked, then they will present limited opportunities for understanding the full spectrum of human genetic diversity.
Recognizing the untapped potential of human genetics, the biotechnology and pharmaceutical industries have had a longstanding interest in investing in large genomics initiatives, consortia, and databases in order to accelerate drug discovery efforts. Below we illustrate a variety of examples of this investment since the Human Genome Project (https://www.genome.gov/human-genome-project). In 2007, the Genetic Association Information Network (GAIN) collaborative research group was established as a public-private partnership in order to 'investigate the genetic basis of common diseases' [61]. In the following years, a large number of industry-funded studies found genes linked to different diseases, such as schizophrenia and type II diabetes [13,62]. The Global Alliance for Genomics and Health (https://www.ga4gh.org/) formed in 2013 to accelerate research and medicine, with a specific mission to foster 'effective and responsible data sharing'. In 2014, OpenTargets [63] was established as a public-private consortium that integrates the wealth of data from publicly available genomic resources to enhance the ability to systematically identify and prioritize drug targets. In 2018, Genomics plc and Vertex Pharmaceuticals signed a 3-year contract to use machine learning and human genetics in target discovery and precision medicine [64]. In the same year, GlaxoSmithKline plc (GSK) entered into a collaboration with 23andMe Inc. to leverage human genetics for the discovery of novel medicines [65]. More recently, several companies, including Regeneron, AbbVie, Alnylam, AstraZeneca, Biogen, and Pfizer, have invested in the UK Biobank exome sequencing initiative to accelerate data generation [66,67].

Human genetics can identify successful drug targets
Many successful drug targets were first identified as a result of genetic associations. For example, gain-offunction variants in PCSK9 were first discovered in 2003 in French families with high rates of heart disease, suggesting that this gene may play a causal role in cardiovascular risk [60]. Cohen et al [68] later found that a loss-of-function mutation in PCSK9 correlated with significantly lower plasma cholesterol levels in 2% of African-Americans in the Dallas Heart Study. Spurred on by these associations, the first PCSK9 inhibitors were approved by the FDA to lower LDL cholesterol levels in 2015 (alirocumab and evolocumab) [69,70] and to prevent heart attack and stroke in 2017 (evolocumab) [71], thereby improving cardiovascular outcomes.
Human genetics can also retrospectively identify important features of successful drug targets. Cancer immunotherapies activate the immune system to recognize and kill tumors [72]. Variants in some immunotherapy targets show risk associations in opposite directions for cancer and immune phenotypes. This suggests that boosting the immune system could reduce cancer risk and that it may be possible to identify novel immunotherapies by screening for similar types of genetic Human genetics in drug discovery 421 associations. For example, CTLA4 is an immune checkpoint for T-cell activation and is the target for ipilimumab and tremelimumab. Genetic variants near this gene are associated with an increased risk of immune phenotypes, including thyroid diseases [73][74][75], rheumatoid arthritis [76], and type I diabetes [77], but are also associated with a decreased risk of multiple skin cancers [78] (Figure 2A). Recognition of the potential of this cancer-autoimmunity signature may help to identify the pivotal nodes in the vast interconnected network of the human immune system to increase the likelihood of clinical success for future therapies. Genetic associations have been able to successfully predict drug side-effects and drug repurposing opportunities. Basiliximab is an immunosuppressant that is used to prevent transplant rejection. It is a monoclonal antibody targeting the gene product of IL2RA but has been shown to increase the risk of diabetes [77,81]. Variants near IL2RA show genetic associations with various immune phenotypes [82][83][84], as expected for an immunosuppressant, but also for type I diabetes [77,81] ( Figure 2B). Topiramate, an anticonvulsant used to treat epilepsy and prevent adult migraines, was later shown to be effective in chronic weight management [85,86]. Topiramate targets the gene product of SCN1A. Genetic variants near SCN1A are associated with epilepsy [87] and body mass index [88,89]. Topiramate has been shown retrospectively to be an unsuccessful treatment for inflammatory bowel disease (IBD) [90]. Although it has been suggested that a well-designed and powered clinical trial could show that topiramate is effective for IBD, there is no association of SCN1A with IBD in the GWAS catalog (with approximately 29 000 cases in the largest study cohort) [91]. Ustekinumab is an anti-IL12B monoclonal antibody used to treat psoriasis [3,92] and has since been successfully repurposed to treat Crohn's disease [93,94]. Genetic variants near IL12B are associated with both psoriasis [95] and Crohn's disease [84] (Figure 2C). Denosumab, a monoclonal antibody against TNFSF11, is used to treat osteoporosis. Franke et al [96] subsequently found that variants near TNFSF11 were also associated with Crohn's disease, ultimately leading to denosumab being successfully repurposed for Crohn's disease [85,93,97]. TNFSF11 variants are associated with both heel bone mineral density [98,99] and Crohn's disease in multiple studies ( Figure 2D).
In summary, human genetics has prospectively identified successful drug targets, is often able to retrospectively recapitulate the genetic profile of successful drugs informing future development efforts and relevant toxicities, and can provide evidence for opportunities to repurpose existing drugs.

Polygenic risk scores in precision medicine
The era of human genetics-driven drug discovery is an exciting time, not only for gene-focused efforts, but also for advancing precision medicine. Most common diseases are driven by a complex genetic architecture that involves a large number of genetic variants. The cumulative effect of these genetic variants is informative of an

422
K Heilbron, SV Mozaffari et al individual's overall risk of disease and could help to personalize treatment and preventative measures. To that end, polygenic risk scores (PRS) combine the risk effects from many genetic variants and have been widely used to predict disease risk [100,101]. PRS applied in clinical settings can improve disease diagnosis and the prediction of health outcomes. Many studies demonstrated the potential of PRS to predict risks of individuals and improve risk stratification for different diseases, such as Alzheimer's disease [102], ischemic stroke [103], and skin cancer [104]. Relative to monogenic mutations, PRS can identify a larger fraction of the population that is at high disease risk and are thus potentially more clinically relevant. A PRS constructed for cardiovascular disease can identify up to 20-fold more people at comparable or greater risk than those identified with only the known monogenic mutations [105]. For some diseases, PRS have been able to further stratify risks on top of known genetic risk variants, such as in BRCA1 and BRCA2 for breast cancer [106][107][108], in MSH2, MLH1, MSH6, and PMS2 for Lynch syndrome [108], and in APOE for Alzheimer's disease [109]. PRS also show great promise as a tool in refining disease diagnosis. It is particularly challenging to accurately diagnose diseases with similar symptoms or to diagnose diseases that progress slowly. Knevel et al [110] reported that adding PRS of different inflammatory diseases to existing clinical information can improve correct diagnosis at the first visit from the initial 39% to 51% (McFadden's R 2 , see Glossary of terms).
With the potential ability to better stratify risk and identify disease subtypes, and, therefore, better enrichment of patient populations, PRS shows great promise for clinical trials. Traditional trial designs compare the effects of the treatment relative to a placebo within a typically homogenous patient population. Inherent patient heterogeneity can lead to challenges due to insufficient biomarkers or outcome measures [111]. PRS that are disease subtype-specific may better capture the clinical heterogeneity among individual patients, including their response to available treatments, development of complications, and rate of disease progression. In fact, partitioned PRS have been proposed as a promising tool to capture disease subtypes in type II diabetes [112]. In amyotrophic lateral sclerosis, identifying fast progressing patients in a lead-in period was shown to have the potential to shorten clinical trials, and result in cost and time savings [113]. For diseases such as non-alcoholic steatohepatitis, there are currently no approved therapies, despite significant clinical and economic burden. In addition to searching for better drug targets [114], selecting the faster progressors within non-alcoholic steatohepatitis patients may be a key to successful trials, which have been long and often complicated by high placebo responses [115].
To define appropriate patient populations for successful drug development and use, identifying accurate, predictive biomarkers may be pivotal. We are still at the very early stage of applying PRS to predict a patient's response against a given therapy, but there have been some early successes in cardiovascular and neurological diseases. Statin therapy was shown to lead to greater risk reduction in those with high genetic risk for the first coronary event [116]; and a high PRS for coronary artery disease (>90th percentile) was associated with a greater reduction (37% versus 13%) in major adverse cardiovascular events compared with a lower PRS (≤90th percentile) upon treatment with alirocumab/anti-PCSK9 [117]. Recently, a PRS constructed for migraine was able to identify subgroups of individuals with a higher likelihood of responding to triptans when looking for associations between migraine PRS and migraine-specific drug efficacy [118].
The potential for PRS to predict response to therapy could have large impacts on clinical trials. Treatment of cancer patients with PD1/PD-L1 checkpoint inhibitors has been associated with immune-related adverse events, most commonly in skin. Furthermore, the development of these adverse events is associated with longer overall survival. Consistent with the role of immune checkpoints in self-tolerance and autoimmunity, Khan et al [119] set out to apply PRS constructed for skin autoimmunity (psoriasis, vitiligo, atopic dermatitis) to a failed phase III clinical trial that tested the efficacy of the immune checkpoint inhibitor atezolizumab/anti-PD-L1 (CD274) as a bladder cancer treatment. High skin autoimmunity polygenic risk individuals had longer overall survival, making the PRS predictive of the treatment effects. Future trials are needed to test whether selecting individuals whose genetics predicted a high likelihood of response will lead to a successful trial [119].

Discussion
Non-clinical models of disease play a critical role in target validation and the screening of drug candidates. However, the efficacy of a drug in a non-clinical model does not always translate into efficacy in patients. Human genetic data can serve as a complementary tool to increase confidence that modulating a target is likely to improve patient outcomes. In this regard, GWAS have been successful in identifying variants and genes associated with many human diseases, helping us to understand their biological underpinnings and informing drug discovery efforts that we anticipate will have a higher likelihood of clinical success.
Many diseases have both rare and common genetic risk factors. Rare variants in a gene can lead to Mendelian forms of a disease, whereas common variants affecting the same gene can influence non-Mendelian disease susceptibility. For example, the LRRK2 p.G2019S variant confers an approximately 25% lifetime risk of Parkinson's disease (minor allele frequency = 0.15%, odds ratio = 11.3 in Europeans), whereas a common variant (rs76904798, minor allele frequency = 14.4% in Europeans) that is linked to a LRRK2 expression quantitative trait locus (eQTL, see Glossary of terms) is Human genetics in drug discovery 423 associated with an odds ratio of 1.15 [120]. Having multiple variants in a locus that influence a disease creates an allelic series, which can potentially demonstrate that larger perturbations of gene function lead to larger effects on disease susceptibility [3]. These doseresponse curves are an important aspect when establishing a causal relationship between gene function and disease [121], and show how GWAS can build upon established, high-penetrance genetic links to disease to inform disease pathology in 'idiopathic' subsets.
Case-control GWAS of disease phenotypes conventionally identify genetic variants associated with lifetime susceptibility. With increasingly large cohorts and availability of diverse study populations, GWAS that focus on disease severity and progression may reveal further opportunities for novel therapies [122]. As societal disease burden increases due to an aging population, treatments to slow disease progression and to lessen the effects of a disease are in need. However, the use of GWAS in drug discovery and development has a number of limitations. For example, perturbing pathways and gene functions that influence developmental processes may not make for effective therapies in adults. Drug discovery that is informed by human genetics is also not equally applicable to all disease areas. Medicines to combat infectious diseases and new antibiotics are highly unlikely to be derived from GWAS ( Figure 1B). Host-microbial interactions, rapid selection, and drug resistance are all factors that play a large role in the effectiveness of these treatments that are not easily captured in genetic studies. However, genetic susceptibility may still prove useful for understanding variation in infection rates, symptoms, and response to therapy [123][124][125][126]. Moreover, intrinsic differences in genetic architecture may explain why some phenotypes yield significantly more genetic associations than others for a given sample size ( Figure 1B). These include differences in heritability, polygenicity, and the distribution of effects and allele frequencies of causal variants.
Most GWAS associations are in non-coding regions, some of which have been shown to influence disease risk via regulating gene expression [127]. The increasing availability of large functional datasets and genomics resources, such as the Encyclopedia of DNA Elements (ENCODE) project [128] and the Genotype-Tissue Expression (GTEx) project [129], have advanced the functional annotation of these variants. However, causal gene identification and linking causal genes to function remain challenging. With the availability of genome editing tools, such as zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), and CRISPR/Cas systems (Nobel Prize in Chemistry, 2020) [130], we are now able to perturb the entire genome with unprecedented scale and fine control. Functional genomics screens with phenotypic assay readouts are a promising avenue that can deconvolute this complexity. Some cancer types have been the first to benefit from these screens, as fitness and survival of tumor cells are relatively straightforward phenotypic readouts. Ptpn2 was identified by an in vivo CRISPR screen as a promising target to increase the efficacy of immunotherapy [131]. The Wellcome Trust Sanger Institute [132] and the Broad Institute [133] later prioritized Werner syndrome RecQ helicase as a key survival gene and an attractive drug target in tumors characterized by high microsatellite instability. Although they were not initially discovered from GWAS, these examples reveal the potential of such an approach. In addition to knock-out screens where a gene is disrupted and hence gene function ablated, knock-in assays that rely on the less-efficient homology-directed repair to introduce precise changes to the DNA sequence are more challenging. Gupta et al [134] utilized both deletion and base editing to link a GWAS-identified SNP to a distal regulation mechanism in five cardiovascular diseases. However, most GWAS associations are not resolved to a single variant due to linkage disequilibrium (LD, see Glossary of terms) [135], complicating the identification of candidate causal variants for functional follow-ups and underscores the value for genome-wide knock-in screens. Recently, methods were developed to screen transcriptional or splicing variants endogenously [136] and to perform high-throughput screens using base editors [137], greatly increasing the scalability of functional genomics assays. Additional methods, such as CRISPR-QTL [138] and TAP-seq [139], have expanded CRISPR's potential by mapping enhancer-gene pairs. These innovations may further enable linking GWAS associations to genes and their functions and potentially offer new therapeutic modalities for genes that are not easily targeted with current approaches.
PRS are a promising tool for precision medicine. Many studies have shown that PRS has great potential for improving diagnosis, prediction of health outcomes, response to therapy, and clinical trials. Validated PRS can also impact individual behaviors, clinical decision making, as well as implementation of population screening strategies. For example, research shows that polygenic risk influences the penetrance of monogenic disease risk factors [105,108], indicating the utility of PRS in counseling and clinical decision making for carriers of pathogenic variants. In a recent study, Forgetta et al [140] was able to use PRS for quantitative ultrasound speed of sound at the heel, a heritable risk factor for osteoporotic fracture, to identify low-risk individuals who can be safely excluded from an expensive fracture risk screening.
However, PRS is limited by disease heritability, and genetics generally contributes less than the environment to overall phenotypic variation. Future risk models will probably need to incorporate both genetics and environmental factors to be of maximal predictive value. In the short term, assessing PRS alongside existing risk factors (such as age and sex) will be important for understanding their clinical utilities. Recently, a genetic risk score of coronary heart disease was shown to have minimal value in improving risk stratification to predict incident events compared with a guideline-based risk equation [141].
One of the main limitations of many PRS studies is that they are carried out retrospectively. In order to

424
K Heilbron, SV Mozaffari et al validate these PRS, more rigorous and prospective studies are needed to replicate the results, including randomized controlled clinical trials. Another limitation in establishing the clinical utility of PRS is to ensure they are applicable across diverse populations, especially under-represented groups. Due to the vast overrepresentation of European-ancestry individuals in GWAS studies, the majority of PRS are generated using European-based associations and tend to have attenuated prediction accuracy when applied to non-European populations [142]. Consequently, the clinical application of PRS is currently most suitable to a small proportion of the global population. Substantial investments in methodology development and research infrastructure improvements are needed to achieve transferability of PRS across diverse populations, and to ensure thorough exploration of the value of PRS within clinical settings. The ability to create predictive polygenic models requires large training cohorts, both to identify genetic variants associated with a disease and to estimate their joint contribution to risk [143]. Large-scale and diverse databases and biobanks, including direct-to-consumer platforms, are in a unique position to develop better, more transferable PRS.
In conclusion, public and private investment in human genetics to date has improved our understanding of human health and will continue to play an important role in drug development. Continued investment to scale these efforts, refine phenotypes, improve computational methods, and increase the diversity of the individuals being studied is essential if we are to fully leverage the human genome and ensure that the products of this research benefit the full breadth of humankind.

Glossary of terms
Common variant A variant (most often a SNP) with a minor allele frequency of at least 1%.
Expression quantitative trait loci (eQTL) Genomic loci that explain variation in the expression level of mRNAs. An expression trait is the amount of an mRNA transcript for a protein. Chromosomal loci that explain variance in expression traits are called eQTL(s).

Genome-wide association study (GWAS)
An approach used in genetics research to associate genetic variations with disease risk. The method involves scanning the genomes from many different people and looking for genetic markers that can be used to predict the presence of a disease. Once such genetic markers are identified, they can be used to understand how genes contribute to the disease and develop better prevention and treatment strategies.

Linkage disequilibrium (LD)
The non-random association of alleles at different loci in a given population. Loci are said to be in linkage disequilibrium when the frequency of association of their different alleles is higher or lower than what would be expected if the loci were independent and associated randomly.
McFadden's R 2 A measure of explained variation, defined as 1log(L current )/log(L null ), where L current denotes the maximum likelihood value from the current fitted model and L null denotes the maximum likelihood value from the null model with only an intercept and no covariates.
Mendelian disorder/disease A disorder/disease that is controlled by a single locus in an inheritance pattern. In such cases, a mutation in a single gene can cause a disease that is inherited according to Mendel's principles.
Single nucleotide polymorphism (SNP) Substitutions of a single nucleotide at a specific genomic location.