Heritability Estimation Approaches Utilizing Genome‐Wide Data

Prior to the development of genome‐wide arrays and whole genome sequencing technologies, heritability estimation mainly relied on the study of related individuals. Over the past decade, various approaches have been developed to estimate SNP‐based narrow‐sense heritability ( hSNP2${\rm{h}}_{{\rm{SNP}}}^2$ ) in unrelated individuals. These latter approaches use either individual‐level genetic variations or summary results from genome‐wide association studies (GWAS). Recently, several studies compared these approaches using extensive simulations and empirical datasets. However, sparse information on hands‐on training necessitates revisiting these approaches from the perspective of a stepwise guide for practical applications. Here, we provide an overview of the commonly used SNP‐heritability estimation approaches utilizing genome‐wide array, imputed or whole genome data from unrelated individuals, or summary results. We not only discuss these approaches based on their statistical concepts, utility, advantages, and limitations, but also provide step‐by‐step protocols to apply these approaches. For illustration purposes, we estimate hSNP2${\rm{h}}_{{\rm{SNP}}}^2$ of height and BMI utilizing individual‐level data from The Northern Finland Birth Cohort (NFBC) and summary results from the Genetic Investigation of ANthropometric Traits (GIANT;) consortium. We present this review as a template for the researchers who estimate and use heritability in their studies and as a reference for geneticists who develop or extend heritability estimation approaches. © 2023 The Authors. Current Protocols published by Wiley Periodicals LLC.


INTRODUCTION
A long-standing question in quantitative and behavioral genetics is whether the variation in a particular trait is due to genetic or environmental factors (Visscher et al., 2008). A key step in finding an answer to this question is partitioning the observed phenotypic variance into variance components attributable to unobserved genetic and environmental factors. R. A. Fisher (Fisher, 1918) first modeled and partitioned the phenotypic variance into genetic and environmental components without any knowledge of specific genes affecting the trait (Visscher & Goddard, 2019). Although he did not use the term 'heritability', his research laid the foundation for various future approaches for the estimation of heritability (Falconer, 1960;Walsh, 1998).
Heritability is defined as the proportion of phenotypic variance that is attributable to genetic factors in a given population at a specific time (Falconer, 1960;Walsh, 1998). Heritability can be defined in two ways. The broad-sense heritability (H 2 ) estimates the proportion of phenotypic variance attributable to all genetic factors, including additive genetic effects (A), dominant genetic effects (D), and epistatic effects (G × G) (Mayhew & Meyre, 2017;Visscher et al., 2008;Zhu & Zhou, 2020). In contrast, narrow-sense heritability (h 2 ) estimates the proportion of phenotypic variance attributable to additive genetic effects (A) or breeding values (Mayhew & Meyre, 2017;Visscher et al., 2008;Zhu & Zhou, 2020). Since h 2 is more relevant to the evaluation of genetic influence on phenotypic resemblance of relatives and predicting evolutionary responses to selection, it is a commonly used parameter for heritability estimation and applications (Visscher et al., 2008).
Heritability plays an important role in several areas of biology such as agriculture, medicine, and evolution (Visscher et al., 2008). It facilitates selective breeding programs to improve the quality of plant and domestic animals (Alvarez, 2017;Bernardo, 2020;Berry et al., 2003Berry et al., , 2014Cassell, 2009;Manjula et al., 2018;Miglior et al., 2017;Palmquist & Jenkins, 2017;Utrera & Van Vleck, 2004;Velasco & Fernández-martínez, 2002). Heritability also provides insights into the genetic architecture of complex traits and diseases (Eichler et al., 2010;Friedman et al., 2021;Lunde et al., 2007;Manolio et al., 2009;Silventoinen et al., 2003;Tenesa & Haley, 2013;Vinkhuyzen et al., 2013;Visscher et al., 2007;Wray et al., 2007). For over a century, heritability has played a key role in measuring the genetic influence on various traits and diseases (Dempster & Lerner, 1950;Falconer, 1960Falconer, , 1965. A large heritability implies that genetic factors have a strong influence on a trait or disease. Being an important parameter of genetic influence on phenotype, heritability has been frequently used as the basis for genetic linkage and genetic association studies (Boomsma et al., 2002;Friedman et al., 2021;Institute of Medicine, 2006). These studies led to the discovery of genes associated with various anthropometric and behavioral traits and diseases such as birth defects, psychiatric disorders, etc. (Institute of Medicine, 2006;Visscher et al., 2008). Therefore, an accurate estimation of heritability can help prioritize the use of resources for further genetic studies (Zhu & Zhou, 2020). Heritability acts as a key for understanding the evolution of quantitative traits and diseases (Bateson, 1922;Fisher, 1930;Grant & Grant, 1995;Hadfield, 2008;Kelly, 2011;Kingsolver et al., 2001;Lande & Arnold, 1983;Mousseau & Roff, 1987;Wood et al., 2016). Particularly, heritability determines how a population will respond to selection. Therefore it can be utilized to compare the evolution of a particular trait or disease across different populations at the same time and within a population at different timepoints (Mayhew & Meyre, 2017).
New approaches to estimate heritability (VanRaden, 2008;Yang et al., 2010) were developed in parallel with advanced genotyping and sequencing technologies (1000Genomes Project Consortium et al., 2010, 2012, 2015International HapMap, 2005) that facilitated the estimation of realized genetic similarity; these approaches became popular for estimating SNP-heritability in natural populations, as they did not require large pedigree recruitment. In the last decade, several approaches have been developed for estimation of SNP-heritability of complex human traits and diseases. These approaches utilize either individual-level genetic variations such as genome-wide complex trait analysis (GCTA; Yang et al., 2010;Yang, Lee, et al., 2011), linkage-disequilibrium-adjusted kinships (LDAK; Speed et al., 2012Speed et al., , 2017, threshold genome-based restricted maximum likelihood (Threshold GREML; Zaitlen et al., 2013), or summary results from GWAS such as LD Score (LDSC) regression Zaitlen et al., 2013) and SumHer (Speed & Balding, 2019) (Fig. 1). Recently, several studies compared SNPheritability estimation approaches using simulated and empirical datasets Hou et al., 2019;Speed et al., 2017Speed et al., , 2020Tang et al., 2022;Uricchio, 2020;Yang et al., 2017;Zhu & Zhou, 2020). However, few resources provide hands-on training for these various approaches, thereby warranting an overview along with step-by-step protocols for practical applications.
The current review provides an updated summary of SNP-heritability estimation approaches utilizing either individual-level genome-wide data or summary results from previous GWAS. These approaches utilize a variety of methods such as Maximum Likelihood (ML; Thompson, 1971;Visscher et al., 2006), Restricted Maximum Likelihood (REML; Lee & van der Werf, 2006;Yang et al., 2010), Haseman-Elston (HE) Regression (Haseman & Elston, 1972;Sham & Purcell, 2001), and Phenotype Correlation-Genotype Correlation (PCGC; Golan et al., 2014). REML is the most widely used method for individual-level genetic data, and is employed in linear mixed model (LMM) to simultaneously estimate the contribution of fixed and random effects. Therefore, we focus here on the REML methods while discussing the approaches developed for individuallevel genetic data. Likewise, we discuss LDSC , and SumHer (Speed & Balding, 2019) that utilize the regression method and REML, respectively, to estimate SNP-heritability from GWAS summary results. We discuss each heritability estimation approach in the context of statistical basis, utility, advantages, and limitations. We also provide stepwise protocols to apply commonly used approaches utilizing individual-level genetic data such as GCTA Yang, Lee, et al., 2011), LDAK (Speed et al., 2012(Speed et al., , 2017, and Threshold GREML (Zaitlen et al., 2013), as well as GWAS summary results such as LDSC , and SumHer (Speed & Balding, 2019). We present this review as a template to the researchers who need to estimate and use heritability in their research and as a reference to the geneticists who want to develop or extend heritability estimation approaches.

AN OVERVIEW OF SNP-HERITABILITY ESTIMATION APPROACHES
Genome-wide SNP arrays and whole genome sequencing technologies have revolutionized many aspects of human genetics such as determination of genetic susceptibility and underlying mechanisms that increase risk of diseases, estimation of heritability, and understanding the evolution of complex traits (Eichler et al., 2010;Visscher et al., 2008;Zaitlen & Kraft, 2012). Genome-wide association studies (GWAS) have discovered a multitude of genetic variants associated with various complex human traits and diseases (Buniello et al., 2019). However, variants derived from GWAS explain only a small proportion of phenotypic variance as compared to family studies, leading to a major question of missing (hidden) heritability (Manolio et al., 2009). There are several possible reasons for missing heritability, such as weak linkage disequilibrium (LD) between genotyped variants and ungenotyped causal variants, common variants with small effects that do not reach the canonical significance threshold (5 × 10 -8 ) in GWAS, rare variants with large effects not captured by genotyping arrays, contribution of non-additive effects, gene-environmental interactions, and overestimation of narrow-sense heritability in pedigree-based studies due to shared environmental confounding (Eichler et al., 2010;Gibson, 2012;Manolio et al., 2009;Yang et al., 2017;Zhang, 2015). In last decade, several approaches have been developed that utilize genome-wide variations instead of only statistically significant variations to solve the problem of missing heritability of complex traits Speed & Balding, 2019;Speed et al., 2012Speed et al., , 2017Yang, Lee, et al., 2011;Zaitlen et al., 2013). These approaches successfully explained a large proportion of phenotypic variance attributable to genome-wide SNPs for a variety of complex human traits and diseases. Unlike widely studied continuous traits such as anthropometric traits, behavioral traits, and pre-and perinatal traits, dichotomous traits such as diseases are represented on a discrete scale (e.g., 0-1). Therefore, observed heritability on a risk scale is usually parameterized on an unobserved continuous liability scale so that the heritability is independent of disease prevalence (Falconer, 1965;Lee et al., 2011;Tenesa & Haley, 2013;Yang et al., 2017). Here, we explain major advances in the approaches based on genome-wide SNP data at individual or summary level with their advantages and limitations (Table 1).

Approaches Utilizing Individual-Level Genetic Data
The fundamental idea behind the approaches developed for individual-level genomewide data is to estimate the realized genetic relationship among individuals by using genome-wide variants and using this relationship matrix to estimate the genetic variance. Yang et al. (2010) first utilized such approach to address the problem of missing heritability of human height. The study used 294,831 SNPs genotyped on 3925 unrelated individuals to calculate realized genetic relatedness and fitted this relatedness matrix in LMM to estimate SNP-based narrow-sense heritability (ĥ 2 SNP ) of human height. The results explained 45% of phenotypic variance in human height. Here, we discuss the most widely used SNP-heritability estimation approaches utilizing individual-level genetic data.

Genome-wide Complex Trait Analysis (GCTA)
One of the most popular software packages for estimating SNP-heritability using genome-wide data from unrelated individuals is Genome-wide Complex Trait Analysis (GCTA), which uses a genome-based restricted maximum likelihood (GREML; Yang et al., 2010;Yang, Lee, et al., 2011) method. GCTA depends upon LD between genotyped variants and ungenotyped causal variants to estimate additive genetic variance in unrelated individuals.
The basic concept behind the method is to fit the effects of all the SNPs as random effects via an LMM. In this design, phenotype Y can be represented in simple equation form: Y = Zu + e, where Z is standardized genotype matrix (scaled genotypic values from unrelated individuals), u is the vector of random genetic effects with u i ∼ N(0, σ 2 i )) for SNP i, and e is the vector of residual effects [e ∼ N(0, Iσ 2 e )]. This model can be rewritten as Y = g + e, g = Zu is the additive genetic values of phenotype Y [g ∼ N(0, Aσ 2 g )], A is the genetic relatedness matrix (GRM) among unrelated individuals (A = ZZ T m , m is the number of variants), and σ 2 g is additive genetic variance of all variants (σ 2 g = mσ 2 i ). Similarly, phenotypic variance of Y can be expressed in terms of variance attributable to random (additive) effects (σ 2 g ) and residual effects (σ 2 e ).
var (Y) = Aσ 2 g + Iσ 2 e where A is GRM with each cell representing pair-wise genetic relatedness and I is identity matrix, assuming independence of environmental influence and no gene-gene or geneenvironment interaction. For example, A jk represents genetic relatedness between individuals j and k from m genotyped SNPs:

Multi-component GREML that bins
SNPs by their MAF and regional LD scores.
Same as GREML-MS with additional advantage due to LD bins.
(i) Similar to GREML-MS-if regional LD scores of causal variants differ from surrounding SNPs used to create GRM; (ii) Relatively large standard errors.

Multi-component GREML that bins
SNPs by their MAF and individual LD scores.
To date, best version of GREML (least biased approach) which can address the uneven tagging of causal variants and the influence of MAF on SNP effects.

Multi-component GREML with two
GRMs: first GRM is created from all SNPs and second GRM is created by setting off-diagonals below a set threshold to 0.
Generally useful in samples with extended genealogy.
It can be upwardly biased by shared environmental influences. (Continued) where p i is the minor allele frequency (MAF) and x i is the genotype code of the SNP i (x i = 0, 1, or 2).
A limitation of the GCTA approach is that it relies heavily on LD between assayed and causal variants (Speed et al., 2012;Zhu & Zhou, 2020). Therefore, it overestimates and underestimates the contribution of causal variants in high LD (strong LD between ungenotyped causal and genotyped variants) and low LD (weak LD between ungenotyped causal and genotyped variants) regions, respectively. In addition, genetic relatedness between a pair of individuals based on genotyped variants may not reflect genetic relatedness based on ungenotyped causal variants. If ungenotyped causal variants are in strong LD as compared to genotyped variants, heritability estimated using genotyped variants will be underestimated. GCTA suggests a uniform transformation of relatedness matrix [scaling the genotype matrix with 2p(1-p) -1 , where p is MAF]. Such scaling implies that effect sizes are inversely proportional to MAF and each causal variant contributes equally to the phenotypic variance. However, equal contribution of each causal variant to the phenotypic variance is not realistic due to uneven LD patterns across the genome. Additionally, assortative mating, epistasis, and gene-environment interaction can bias heritability estimates by incorrectly allocating variance due to these phenomena to additive genetic effects. Likewise, population structure (admixed population) can bias the estimation of heritability. This bias can usually be avoided by identifying population structure through principal component analysis (PCA) and eliminating outliers from the data or correcting for admixed samples in the analysis by including the first few PCs as fixed effects in the LMM.
Later, several other variants of GCTA based on MAF stratified variants (GCTA-MS), LD and MAF stratified variants (GCTA-LDMS), were developed to overcome these limitations (see Alternate Protocol 1). These approaches facilitated not only partitioning of genetic variance into additive and non-additive components, but also variance attributed to different chromosomes, genes and inter-genic regions, biological pathways, and SNP functions Yang, Manolio, et al., 2011). In addition, an approach was introduced to estimate SNP-heritability in individuals with close or extended relationships (Zaitlen et al., 2013). This approach essentially uses GREML with two GRMs; the first GRM is created using all SNPs, whereas a threshold is applied on the second GRM by setting off-diagonals < threshold to zero (see Basic Protocol 3). However, each approach has advantages and disadvantages (Table 1). Speed et al. (2012) developed a method (Linkage Disequilibrium Adjusted Kinships; LDAK) to overcome the bias arising from ungenotyped causal variants in regions of high or low LD. Yang and colleagues suggested a uniform scaling of the SNP-based kinship matrix [2p(1-p) -1 , where p is MAF]. This transformation adjusts for the average bias caused by variable LD leading to uneven tagging of ungenotyped causal SNPs across the genome; however, it depends upon the MAF spectrum of the causal SNPs, which is generally not known. In contrast, LDAK suggests modification of the GRM according to local LD-contribution of the SNPs to the genetic similarity between a pair of individuals is weighted according to the LD with their neighboring SNPs. Estimating heritability using genetic similarity adjusted for local LD reduces the potential bias and increases the precision of the heritability estimate.

Linkage Disequilibrium Adjusted Kinships (LDAK)
Reanalysis of the height data with LD-adjusted GRM showed a slight change in the estimated SNP-based heritability (ĥ 2 SNP ), suggesting that the underestimation of h 2 in low-LD regions was balanced by overestimation of h 2 in high-LD regions. However, approximately a quarter increase inĥ 2 SNP of hypertension and type I diabetes using LDAK as compared to GCTA suggested that causal SNPs were poorly tagged by genotyped SNPs-i.e., causal SNPs had lower MAF than genotyped SNPs, and a uniform transformation did not represent these LD patterns (Speed et al., 2012). In contrast, nearly one tenth decrease inĥ 2 SNP of rheumatoid arthritis suggested that causal SNPs had higher MAF as compared to genotyped SNPs (Speed et al., 2012). Further, LDAK was used for imputed data across 19 human traits to develop a model for accurately describing the variation inĥ 2 SNP with MAF, LD, and genotype certainty. The improved model led to on average 43% (S.D. 3%) higherĥ 2 SNP than those obtained from GREML and 25% (S.D. 2%) higherĥ 2 SNP than those from GREML-LDMS (Speed et al., 2017).
Like GCTA, LDAK estimates are highly sensitive to MAF of causal variants, population stratification, and SNP data type [arrays, imputed or Whole Genome Sequence (WGS)].
In addition, as LD is a function of MAF, the weighting strategy in LDAK can introduce MAF bias because it gives more weight to SNPs with lower MAF. An analysis using all SNPs from WGS data showed that LDAK weighted SNPs inversely proportional to their LD, which resulted in near-zero weights for common SNPs and very high weights for rare SNPs. This led to underestimated h 2 SNP for common variants and overestimated h 2 SNP for rare variants. The LDAK-induced MAF bias can be substantial, especially when there are several rare variants, leading to an inflated estimate of h 2 SNP (Yang et al., 2017). LDAK assumes that the variance explained by rare variants is very large in comparison to that explained by common variants, which predicts that the power to detect rare variants would be higher than that to detect common variants in the same order. This prediction is not consistent with empirical results in the cases of human height, schizophrenia, and type 2 diabetes (Yang et al., 2017).

Approaches Utilizing Summary Results From Previous GWAS
We discussed the approaches to estimate h 2 SNP from individual-level genome-wide SNPs data; however, availability of such data with relevant phenotype information is often limited. Therefore, other approaches were developed that utilize summary results of GWAS (estimated SNP effects and their standard errors for hundreds of thousands of SNPs analyzed in a study) instead of per-individual genome-wide information.

LD Score (LDSC) regression
In GWAS, the deviation of observed χ 2 test statistic for an SNP from its expected value under the null hypothesis (no association) is a function of LD between a target SNP and underlying causal variants (Yang et al., 2017). Therefore, h 2 SNP can be directly estimated from the summary results by regressing the observed χ 2 test statistic against LD score of genome-wide SNPs. This is the basic principle of the LD Score regression approach (LDSC; . Under the polygenic model, average variance explained by each SNP is h 2 SNP /m, where m and h 2 SNP are number of SNPs and total variance attributable to all SNPs in the summary data, respectively. Therefore, the expected χ 2 can be represented as E(χ2|l i ) = 1 + h 2 SNP N m l i for SNP i, where N is number of individuals and l i is the sum of LD r 2 values between SNP i and all SNPs (including itself). This approach requires only summary data from GWAS because LD scores can be estimated in a reference population (for example, the 1000 Genomes Project). Besides LD between the target SNP and the underlying causal variants, cryptic relatedness/population stratification can also inflate the expected χ 2 test statistics. Therefore, an extra term (a) can be included in the model for confounding biases: E(χ2|l i ) = 1 + Na +

of 31
Current Protocols due to confounding factors such as population stratification (b 0 >1 represents confounding bias) but also estimate h 2 SNP (ĥ 2 SNP = b 1 m N ). Like GCTA, LDSC has also been extended to estimate genetic correlation (r g ) between traits using summary results. Genetic correlation can be defined as genetic covariance , where ρ xy , σ x , and σ y represent genetic covariance between trait x and y, the square root of genetic variance of x and y, respectively, which can be approximated as additive genetic covariance between x and y (h 2 xy ) and square roots of SNP-based narrow-sense heritability of x (h 2 x ) and y (h 2 y ). Unlike pedigree-based methods, bivariate (multivariate) GREML, an extension of GREML, estimates genetic correlations between traits (or diseases) in unrelated individuals, for example two independent studies. While LDSC also allows the traits sought for genetic correlation to be measured on different sets of samples, a major advantage of LDSC is that it requires only summary statistics. Calculations for genetic correlation are quite similar to heritability estimations via LDSC. χ 2 statistics for a single study are replaced with the product of z-scores from two studies of traits with non-zero genetic correlation. An expected value of product of z-scores from two studies, study 1 and 2, based on SNP i can be expressed as E(z 1i z 2i ) = ρN S √ N 1 N 2 + √ N 1 N 2 ρ g m l i , where N 1 and N 2 are sample sizes from study 1 and 2, N S is the number of individuals included in both studies, ρ is the phenotypic correlation among the N S overlapping samples, ρ g is the genetic covariance between trait 1 and 2, and l i is the sum of LD r 2 values between SNP i and all SNPs (including itself). Hence, ρ g can be estimated by regressing the product of z-scores from two studies (z 1 z 2 ) on LD r 2 values (ρ ). If study 1 and study 2 are the same study, then genetic covariance between a trait and itself is the estimate of heritability (ĥ 2 SNP ), and χ 2 =z 2 . Once genetic covariance (ρ g ) is estimated, genetic correlation (r g ) can be estimated in the same way as with GCTA. As compared to GCTA, LDSC provides great flexibility to estimate r g between any two GWAS data sets.
A major advantage of LDSC is that it is faster than individual-based approaches and its computing time does not scale up with sample size. LDSC only requires summary data, which allows the reanalysis of summary data available from published meta-analyses. The LD score regression intercept can be used to estimate population stratification. Since summary results are available usually only for common variants, LDSC is limited in estimation of the variance explained by rare variants even with imputed or WGS data, and it is more sensitive to the genetic architecture of a trait. The estimates using LDSC depend on the LD scores and, thereby, the reference population in which LD scores were calculated. If there is a mismatch between the LD scores from the reference population and the target population used for GWAS, then LD score regression can be biased. A previous study showed thatĥ 2 SNP measures from LDSC are consistently smaller than those from GREML in the same data set, which is likely due to errors in LD scores estimated from the reference population (LDSC recommends using LD scores from HapMap 3 SNPs in the 1000 Genomes Project; Ni et al., 2018).
LD score regression has been frequently applied to summary statistics from GWAS-to estimate the SNP-heritability of a trait, average bias due to confounding, heritability enrichments of SNP categories, and genetic correlation between a pair of traits. Like GCTA, LDSC also assumes that all causal variants contribute equally to the phenotypic variance, and therefore provides equal weight to each SNP. Although this model is widely used in statistical genetics, it usually underestimates the averageĥ 2 SNP in regions of high LD (due to multiple tagging of causal variants). LDSC tends to over-estimate confounding bias, under-estimate SNP heritability, and produce exaggerated estimates of enrichments, due to misspecification of heritability model (Speed & Balding, 2019).

of 31
Current Protocols SumHer Speed et al. (2019) proposed an approach (SumHer) to overcome the limitations of LDSC (Speed & Balding, 2019;Speed et al., 2020). The basic idea behind SumHer is that SNP heritability (e.g.,ĥ 2 i for SNP i) varies across the genome, and therefore, E(ĥ 2 i ) ∝ q i = [p i (1 − p i )] 1+α w i , where p i is MAF of SNP i, w i is the weight calculated for SNP i based on local LD, and α is a constant (like LDAK, SumHer chooses α = − 0.25 by default)]. The main difference between LDSC and SumHer is that, unlike LDSC (which assumes all q j are same; q i =1), SumHer allows users to choose any heritability model. i.e., the user can specify arbitrary values for q i . As compared to additive modeling of inflation in LDSC, SumHer models inflation of test statistics multiplicatively. A recent analysis of 24 large-scale GWAS using the recommended model in SumHer showed that these studies were inclined to over-correct for confounding, which reduced the discovery of genomewide significant loci by about a quarter (Speed & Balding, 2019). Heritability estimate enrichment analyses using LDSC concluded that heritability is highly concentrated in specific functional categories. For example, an analysis across 17 diseases showed that conserved regions contribute 35% of SNP heritability, indicating that they were 13-foldenriched for casual variants. By contrast, analyses across 24 traits using SumHer finds that none of the categories have enrichment above 2-fold (Speed & Balding, 2019).
SumHer proposes a solution to unequal contribution ofĥ 2 per SNP due to differential LD pattern, overestimation of confounding due to population stratification and exaggerated heritability enrichment due to misspecification of the heritability model. However, it also suffers from limitations in common with LDSC. It is not known yet how well the SumHer heritability model would fit while estimating variance attributable to rare variants. Like LDSC, heritability estimates from SumHer depend on LD scores from reference samples indicating a mismatch in LD scores from reference samples and GWAS samples would result in biased estimation ofĥ 2 .

STEP-BY-STEP GUIDE FOR SNP-HERITABILITY ESTIMATION
Here, we provide a stepwise guide to estimate SNP-heritability using various approaches. For illustration purposes, we use individual-level genetic data and summary results from previous GWAS to estimate SNP-heritability of height and BMI. Individual-level genetic data from The Northern Finnish Birth Cohort (NFBC; 1966) consists of several metabolic trait measurements in 5402 individuals (Sabatti et al., 2009), genotyped for 364,580 SNPs using the Illumina HumanCNV370-Quadv3_C platform. Likewise, we use summary results from meta-analysis of height and BMI using UK Biobank and GI-ANT GWAS (2018) (Yengo et al., 2018).
The NFBC dataset is available through DbGaP authorized access. These data are for general research use-i.e., use of the data is limited only by the terms of the model Data Use Certification. There is no limitation in the usage of the genomic results outside the study for which they were originally consented. Summary results from the GIANT consortium are publicly available and can be accessed without restriction.
Although not an integral part of the protocol, we also provide a brief overview of widely used quality control procedure for phenotype and genotype data. Current approaches, developed for SNP-heritability estimation can also be used for various other purposes for example, estimation of genetic correlation, confounding due to population structure and cryptic relationship, gene enrichment analysis. However, we provide protocols only for the SNP-heritability estimation to align with the focus of the current review.

Resources
Before starting the estimation of SNP-heritability, it is necessary to install appropriate software; assemble the data files; input genotype (usually, plink format is preferred; if Srivastava, Williams and Zhang

of 31
Current Protocols genotypes are present in variant call format (vcf) file, it can be converted to plink format for further analyses), phenotype (phenotype file should have at least three columns in the order family id, individual id, and phenotypic values), and summary results (usually contains an identifier/SNP id, effect allele, other allele, sample size, p value, and summary statistics); and download pre-calculated tagging (LD score) information from reference population (e.g., 1000 Genomes database, UK Biobank database). Reference population used for LD scores should be ancestrally similar to the GWAS samples. Similarly, we should be cautious while using summary results from previous GWAS and use the large studies with rigorous quality control.

Hardware
Any laptop/computer with 4 cores and 8-16 GB RAM is sufficient for most of the analyses in a reasonable time (Yang, Lee, et al., 2011).

Operating system
Linux-based operating system such as Ubuntu, Fedora etc.

Quality Control of Phenotype and Genotype Data
A detailed description of quality control for genome-wide analysis can be found elsewhere (Truong et al., 2022;Turner et al., 2011;Weale, 2010). Here, we briefly summarize the routinely used quality control procedures. In general, quality control of phenotypes depends on the research question, trait type (discrete or continuous), and other phenotypes/covariates available in the dataset. R (R Team, 2020)/R-Studio (R Team, 2019) (https:// www.R-project.org) is a well-established platform for phenotype quality control. The first step is to select a key phenotype in a given dataset with multiple phenotypes, followed by summarizing the data. A density plot for continuous traits and bar plot for discrete traits can provide a rough idea about phenotype distribution and outliers. As normal distribution of variables is one of the assumptions in commonly used analyses, removing outliers (mean ±4*S.D.) is a common practice to attain normality in the dataset. However, one should be cautious while removing outliers and should choose this option only when alternative approaches such as data transformation (log or exponential) or adjusting with other covariates do not work. Phenotype data can be supplied to linear models either adjusted for the covariates or without adjusting for covariates where covariates can be supplied separately into the model.
Quality control of the genotype file is performed based on individuals and markers. PLINK (Chang et al., 2015;Purcell et al., 2007) (https:// www.cog-genomics.org/ plink/ 1.9) and R/R-Studio (R Team, 2019; R Team, 2020) (https:// www.R-project.org) are routinely used for genotype quality control and plotting various quality measures, respectively. In general, individuals are filtered on the basis of genotype missing rate, average heterozygosity (inbreeding), inconsistency between biological and reported sex, and Mendelian errors (if pedigree information is available). Similarly, markers are filtered on the basis of call rate, minor allele frequency (MAF), and Hardy-Weinberg equilibrium (HWE). In addition, heritability estimation methods assume a homogeneous population; therefore it is advisable to check for population stratification and remove outliers using principal component analysis (PCA) or multi-dimensional scaling (MDS) based on the set of markers in the genome that are independent.
We performed the quality control procedure for NFBC dataset as mentioned above. After careful examination of the density plot of height and BMI, 33 samples were removed from the analysis. Phenotypes were adjusted for sex before fitting into LMM. Similarly, genotype data were controlled for individual and marker quality. Individuals were examined and excluded on the basis of genotype missing rate >5%, average heterozygosity ± 4*S.D., and inconsistency between reported and biological sex, whereas SNPs were

of 31
Current Protocols examined and excluded on the basis of call rate <95%, MAF <1%, and HWE with p < 1.0E-6. After genotype and phenotype quality control, 5348 individuals genotyped on 324,851 autosomal SNPs with available phenotype information remained for SNPheritability estimation analyses.

Protocols
Assuming that genotype and phenotype files are pre-processed for quality control, we provide the protocols below to run SNP-heritability analyses using different heritability estimation approaches. We believe that these protocols will make it easy for readers to estimate SNP-heritability (h 2 SNP ) using individual-level data and summary statistics.

GREML (GCTA)
This GREML protocol can be broadly categorized into three steps-(1) create genetic relatedness matrix (GRM); (2) remove one of the cryptically related individual pairs; (3) run restricted maximum likelihood (REML). GCTA allows multi-threading that can be enabled by using the flag --thread-num or -threads. Depending upon the requirements of the analysis, GRMs can be created in different ways, such as by using only autosomes, using each chromosome separately, using the X chromosome alone, or using a subset of SNPs.

Current Protocols
The phenotype file typically has three columns-Family ID, Individual ID, and Phenotype. However, more than one phenotype can also be provided and assigned to a specific column (phenotype) for h 2 SNP estimation by providing an additional option --mpheno [(column-number) -2] in the above command.

sex.txt is a list of individuals' sexes (discrete variable) and PCs.txt is a file with first 10-20 principal components (continuous variable). Similar to the phenotype files, covariate files also have the first two columns as family id and individual id followed by covariate columns.
d. Run REML for case control data (test_cc.phen-phenotype file with casecontrol information). Let us assume that the prevalence of the disease is 0.1 in the general population. The option --prevalence is used to specify the disease prevalence and transformation ofĥ 2 SNP from observed discrete (0-1) scale to unobserved continuous liability scale.
gcta64 --reml --grm test_grm_0.05 --pheno test_cc.phen --prevalence 0.1 --out test_greml_cc --thread-num 4; Usually, GCTA runs REML in a constrained manner such that 0 <ĥ 2 < 1. If there are multiple matrices each with a small contribution toĥ 2 , one or more random effects may hit the boundary. REML stops if more than half of the total components hit the boundary. To avoid such situation, --reml-no-constrain can be used to run REML in an unconstrained manner.

STRATIFIED GREML
As seen in the previous example, SNP-heritability attributable to each chromosome can be estimated by simultaneously fitting GRMs based on each chromosome in to REML. Similarly, GREML can be run in various other stratified ways, for example, using GRMs created by a subset of SNPs stratified by either minor allele frequency (MAF) bins alone or both linkage disequilibrium (LD) and MAF bins. These variations of GREML were developed to adjust for the influence of MAF and local LD on the estimated SNPheritability, and known as the GREML-MAF Stratified (GREML-MS) and GREML-LD and MAF Stratified (GEML-LDMS) approach, respectively. Like original GREML, stratified GREML is also performed in three major steps: (1) create GRM, (2) remove one of the cryptically related individual pairs, and (3) run REML. However, GREML-LDMS includes an additional step-calculation of LD scores (summation of r 2 values between a SNP and all SNPs in a given genomic region) prior to creating GRMs. It is noteworthy that multiple GRMs are created and fitted in REML based on the stratification criteria in stratified GREML.

GREML-LDMS (based on LD and MAF bins)
1b. Calculate LD scores: LD scores are calculated using option --ld-score-region [window size]. GCTA uses default window size of 200 Kb with 100Kb overlapping regions between two segments: gcta64 --bfile test --autosome --ld-score-region 200 --out test_ld --thread-num 4; Import the output of the above command (test_ld.score.ld) to R and create quartiles based on either ldscore_SNP or ldscore_region. Save SNPs corresponding to each quartile as test_ld_q*.txt, where * is 1/2/3/4. Different bins are created on the basis of LD score quartiles and MAF ranges. For example, SNPs within each MAF range such as 0.01 < MAF ≤ 0.1, 0.1 < MAF ≤ 0.2" 0.2 < MAF ≤ 0.3" 0.3 < MAF ≤ 0.4 and 0.4 < MAF ≤ 0.5 can be binned on the basis of quartiles of regional or SNP LD scores. 3b. Remove one of the cryptically related individual pairs: One individual from the cryptically related pairs can be removed using the command provided in the GCTA protocol. Alternatively, an already filtered list of unrelated individuals can be used as in GREML-MS.

LDAK
This LDAK protocol can be divided into five steps-(1) Thinning of SNPs; (2) calculating weights of thinned SNPs based on the pair-wise LD with all nearby SNPs in a bin (e.g., 100 kb); (3) creating kinship matrix; (4) removing one of the cryptically related individual pairs; (5) running REML. In the following protocols, we use a default setting of α = −0.25; the user may change this depending on the desired model. Like GCTA, LDAK also allows multi-threading for most of the analyses which can be enabled by using the option --max-threads.

Calculate weights of thinned SNPs:
All the thinned SNPs are weighted equally for the LDAK-Thin model, whereas variant specific weights are calculated for the LDAK model. Prior to calculation of variant specific weights, LDAK cuts the thinned SNPs into multiple sections. We save these sections and corresponding SNP weights for each chromosome in a sub-directory ./sections/section${j}, where j represents chromosome number 1-22.

of 31
Current Protocols Thinned SNPs in the file weights.short have SNP-specific weights and are used to calculate kinship matrix using the LDAK model. weights.short usually has a smaller number of SNPs than initially thinned SNPs because many of the thinned SNPs have zero weight and are not included in the calculation of kinship matrix.
a. Calculate Kinship matrix using same weight for all thinned SNPs (LDAK-Thin model): The above commands produce two files-test_ldak_thin_0. 05

STRATIFIED LADK
A stratified version of LDAK can be run using already calculated weights of thinned SNPs (see LDAK protocol). Unlike, GCTA, LDAK does not allow -min-maf or -maxmaf option along with -calc-kins-direct. Therefore, markers based on MAF bins should be extracted from 'test.bim' files and the list should be used to extract the set of markers while creating kinship matrix (-extract list-of-SNPs.txt). Since, we are using pre-computed weights and advise one uses already pruned set of individuals (see LDAK protocol), we provide rest two steps here -i) create kinship matrix; ii) Run REML.

THRESHOLD GREML
The threshold GRM approach uses two GRMs corresponding to one genetic component: a first GRM is the same as that created in GREML (without threshold) and a second GRM is created with a threshold by setting the off-diagonals that are <0.05 to 0. Here, we do not need to remove samples based on the GRM threshold. SNP-heritability attributable

of 31
Current Protocols to the first kinship matrix is same as the SNP-heritability estimated by GREML. Overall, the estimate represents pedigree-based heritability, and h 2 attributable to second GRM (h 2 Ped − h 2 SNP ) represents h 2 attributable to shared environment. Frist, a GRM is created using commands in the GREML protocol (except, removing one of the cryptically related individuals), and then the following steps can be used to estimate SNP and pedigree-based heritability.

LD SCORE (LDSC) REGRESSION
LDSC allows h 2 SNP , the SNP heritability, to be directly estimated from the summary results by regressing the observed χ 2 test statistic against LD score of genome-wide SNPs. Estimation of h 2 SNP using LDSC can be broken down into four simple steps: i) installing the program, ii) obtaining the summary results from the study in question, iii) formatting summary results for use in LDSC, and iv) running the program to estimate common SNP heritability.

Installation and activation.
LDSC can be installed from the resource provided earlier using following command: git clone https://github.com/bulik/ldsc.git; LDSC is a python package and an Anaconda environment (environment.yml) present in the original package must be created before using LDSC. It installs a list python dependency for LDSC: conda env create --file environment.yml; Before running LDSC an Anaconda environment is installed as above and must be activated as below:

of 31
Current Protocols source activate ldsc 2. Download summary results.
The next step is to download the summary results that can be downloaded from the resource provided above directly or using command line via wget.
LDSC accepts a specific format of summary statistics with six columns-a unique identifier (rs id), allele 1 (effect allele), allele 2 (other allele), sample size, p-value and a signed summary-statistics (effect, odds ratio, log odds ratio, Z score). Sometimes sample size is not provided in the summary results. In that case, a uniform sample size can be provided by using a flag --N [sample size]. In the case of unsigned effects, LDSC assumes allele 1 to be a risk increasing/positively associated allele and processes summary result accordingly. Although summary results can be formatted manually, LDSC recommends using the python script munge_sumstats.py provided in the original package because it checks for several things besides converting summary result to LDSC format. In addition, it is recommended to use SNPs from summary results that are common in the HapMap3 dataset, particularly if the summary result is obtained from imputed data. To estimate heritability attributable to common variants present in summary result, χ 2 values from the output of above command (sumstats-ldsc.gz) is regressed on the ld scores (sum of r 2 values for a SNP with surrounding SNPs in a predefined window) calculated in a reference population such as the 1000 Genomes Project or UK Biobank. LD scores can be downloaded from the link provided in the resource. Assuming the GWAS included European population, LD scores should be used from European population, for example eur_w_ld_chr. In addition to LD scores, LDSC requires a regression weight file that includes r 2 values for the SNPs used in the regression, i.e., GWAS SNPs. Generally, LDSC is not very sensitive to regression weights. Therefore, it is currently recommended to use the same LD scores for both flags. For partitioned h 2 estimation, one may choose a subset of GWAS SNPs to calculate LD scores using 1000 Genomes data separately, and use them as regression weight. If the original GWAS already controlled for population stratification and cryptic relatedness, the intercept can be constrained by adding a flag --intercept-h2 [threshold] or --no-intercept which constrains the intercept to 1.

SUMHER
SumHer is integrated into LDAK software; therefore, no extra software needs to be installed. Unlike LDSC, one must modify summary results to SumHer-compatible format manually. A compatible summary stats file has 5 or 6 columns (column names are case sensitive) with core columns: 'Predictor', 'A1', 'A2', 'n'; then, there are three options to choose additional 1-2 columns. The last column could be 'Z', or last two columns could be 'Direction', 'Stat' or 'Direction', 'P'. Predictor should be in 'chr:position' format.

of 31
Current Protocols

ESTIMATION OF SNP-HERITABILITY USING INDIVIDUAL-LEVEL DATASET AND SUMMARY RESULTS
We compared eleven approaches for the estimation of SNP-heritability of height and BMI utilizing individual-level dataset (NFBC) and summary results from the GIANT consortium (Table 1). Using GREML, LDAK, and Threshold GREML approaches, we observed  that genome-wide variations explained 56.9%-61.8% and 25%-28.1% variance in height and BMI respectively, in NFBC (Fig. 2, Table 2). We also used stratified analysis such as stratified-GREML and stratified-LDAK to estimate SNP-heritability attributable to different MAF and LD bins in NFBC. The sum of the heritability attributable to different bins was consistent with the results using single GRM ( Fig. 2; Table 2). Comparison of the results from stratified-GREML (GREML-LDMS-R and GREML-LDMS-I) and stratified-LDAK (LDAK-Thin-MS, LDAK-MS) showed that the variance attributable to different bins based on MAF and LD scores were similar in both stratified-GREML and stratified-LDAK approaches ( Fig. 3; Table 3). Likewise, variances attributable to different MAF bins in GREML-MS were similar to those in GREML-LDMS-R and GREML-LDMS-I ( Fig. 3; Table 3). As reported previously Speed & Balding, 2019;Yang et al., 2017) Table 2). The behavior of SumHer as compared to LDSC is examined in detail elsewhere (Speed & Balding, 2019).

CONCLUSION AND FUTURE DIRECTION
Heritability has been widely used to improve the quality of crops and farm animals, to understand the genetic basis of complex human traits and diseases, and to estimate the response of evolutionary forces such as selection in a population. However, the utility of heritability has been limited to a certain extent, mainly due to lack of appropriate data types and heritability models. Over the past decade, several approaches fitting a variety of analytical models have been developed to estimate SNP-heritability in unrelated individuals. In the current review, we provide an overview of these approaches along with step-by-step protocol to run widely used approaches for SNP-heritability estimation.
Despite advances in heritability models and availability of genome-wide SNP information in large datasets, estimates of h 2 SNP are one third to two thirds of the heritability estimated through family-based approaches. With the availability of whole genome sequence (WGS) information in large data sets, population-based approaches can be used to estimate the variance attributable to all variants (including rare variants), which should bring estimates from the two approaches closer. Such datasets also demand development of integrative models that can fit other types of genetic variations such as structural and copy number variants into the LMM. Inclusion of rare variants and other types of variants should not only resolve the problem of incomplete tagging of causal variants, but also fill the gap of missing heritability. However, selection of appropriate heritability models for different data types and their precision still present an ongoing debate (Speed et al., 2020;Zhu & Zhou, 2020). Therefore, a consensus needs to be reached regarding heritability models for different data types. Previously, some efforts were made to estimate SNP-heritability utilizing identity-by-descent information inferred from genome-wide data (Browning & Browning, 2013;Evans, Tahmasbi, Jones, et al., 2018). In near future, we can expect new and better approaches that can estimate unbiased h 2 SNP from identityby-descent information inferred from WGS data. Likewise, methods using pedigree data     LD_Q1-4 represent quartiles one to four based on regional or SNP LD scores. p-values were calculated using one sided z test.
to estimate narrow-sense heritability should account for shared environmental effects and assortative mating. Moreover, new methods to estimate heritability free of assumptions about the relationship between effect size and minor allele frequency are also required in the near future.

ACKNOWLEDGMENTS
This work was supported by the Eunice Kennedy Shriver National Institute of Child Health & Human Development of the National Institutes of Health under Award Number R01HD101669, the Burroughs Wellcome Fund (Grant 10172896), and the March of Dimes Prematurity Research Center Ohio Collaborative.