Estimating heritability using genomic data

Authors


Summary

  1. Heritability (h2) represents the potential for short-term response of a quantitative trait to selection. Unfortunately, estimating h2 through traditional crossing experiments is not practical for many species, and even for those in which mating can be manipulated, it may not be possible to assay them in ecologically relevant environments.
  2. We evaluated an approach, GCTA, that uses relatedness estimated from genomic data to estimate the proportion of phenotypic variance due to genotyped SNPs, which can be used to infer h2. Using phenotypic and genotypic data from eight replicates of experimentally grown plants of the annual legume Medicago truncatula, we examined how h2 estimates from GCTA (h2GCTA) related to traditional estimates of heritability (clonal repeatability for these inbred lines). Further, we examined how h2GCTA estimates were affected by SNP number, minor allele frequency, the number of individuals assayed and the exclusion of causative SNPs.
  3. We found that the average h2GCTA estimates for each trait made with the full data set (>5 million SNPs, 200 individuals) were strongly correlated (r = 0·99) with estimates of clonal repeatability. However, this result masks considerable variation among replicate estimates of h2GCTA, even in relatively uniform greenhouse conditions. h2GCTA estimates with 250 000 and 25 000 SNPs were very similar to those obtained with >5 million SNPs, but with 2500 SNPs, h2GCTA were lower and had higher variance than those with ≥25 k SNPs. h2GCTA estimates were slightly lower when only common SNPs were used. Excluding putatively causative SNPs had little effect on the estimates of h2GCTA, suggesting that genotyping putatively causative SNPs is not necessary to obtain accurate estimates of h2. The number of accessions sampled had the greatest effect on h2GCTA estimates, and variance greatly increased as fewer accessions were included. With only 50 accessions sampled, the range of h2GCTA ranged from 0 to 1 for all traits.
  4. These results indicate that the GCTA method may be useful for estimating h2 using data sets of a size that are available from reduced-representation genotyping but that hundreds of individuals may need to be sampled to obtain robust estimates of h2.

Ancillary