On the Usage of HWE for Identifying Genotyping Errors


*Corresponding author: E-mail: teo@well.ox.ac.uk

A recent publication by Zou & Donner (2006) discussed the merits of testing Hardy-Weinberg Equilibrium (HWE) in the setting of unmatched case-control association studies. Specifically, the authors highlighted that testing for HWE should not be used as a criterion for identifying single nucleotide polymorphisms (SNPs) with genotyping errors. The authors went on to discuss the potential problems associated with a two-stage analysis procedure, which incorporates checks for adherence to HWE prior to performing the association analyses. The authors supported their assertions through numerical derivations, and two applications to real data from Sun et al. (2004) and Zorzetto et al. (2002).

While we believe that the authors have raised valid concerns, and we emphasize that we share the same concerns with the authors with regards to the naïve usage of HWE as a tool for searching for disease genes, we wish to highlight the fact that, although minor genotyping errors are unlikely to be detected by testing for HWE, gross genotyping error can and often does cause extreme deviation from HWE (Hosking et al. 2004; Leal, 2005; Cox & Kraft, 2006) . This is especially true in genomewide studies where the number of markers genotyped can be greater than 300,000 SNP sites and genotypes are assigned through automated procedures which analyse hybridization intensities.

A number of genomewide association studies are currently underway for various diseases, and these studies typically make use of pre-designed high density oligonucleotide array-based chips on various platforms (for example, the Affymetrix GeneChip Human Mapping 500K Array Set (Affymetrix Inc. 2006a) or the Illumina Sentrix HumanHap550 Genotyping BeadChip (Illumina Inc. 2006)). Due to the massive scale of the genotyping, the genotypes for these platforms typically rely on unsupervised and automated calling procedures, which may be prone to making erroneous calls. These erroneous calls have been shown to possess the potential to cause confounding in an association study (Clayton et al. 2005).

We use a simple example from a set of real genomewide data to exemplify how genotyping errors can cause extreme deviations from HWE. The data are from the first phase of a study investigating genetic variants associated with high density lipoprotein cholesterol concentration in the blood, and consist of 298 male subjects sampled from the Chinese population in Singapore. These subjects (as with the remainder of the subjects in the study) were genotyped on the Nsp array of the Affymetrix 500K Array set, and the genotypes called using the BRLMM algorithm by Affymetrix (Affymetrix Inc. 2006b). While the sampling of the participants was subjected to meeting certain phenotypic criteria, and thus they were not expected to represent samples from a randomly mating population, we emphasize the issue here is the possibility of extreme deviations of HWE due to genotyping errors. Using only data from chromosomes 18, 19, 20, 21 and 22 (cumulatively consisting of 23,136 SNPs), we used the standard Pearson chi-square test to evaluate the HWE assumption and plot the quantile-quantile plot of the test statistics (Figure 1a). We identified 411 SNPs that each have a test statistic > 11 (corresponding to a significance of ∼0.001), and visually inspected the cluster plots of these SNPs (Figure 2). We noted that 345 SNPs out of the 411 actually had elevated test statistics on the basis of obvious genotyping errors upon visual inspection, of which 132 SNPs were clearly due to homozygote-to-heterozygote miscall. This significantly affects the assessment of HWE (Kang et al. 2004). Upon removing the miscalled SNPs, the quantile-quantile plot of the test statistics collapsed towards the null distribution, except for a handful of SNPs which we noted subsquently were in strong LD with each other (Figure 1b). Interestingly, the removed SNPs were not in high LD (defined by r2 > 0.5) with each other, nor with the remaining 66 SNPs.

Figure 1.

Quantile-quantile plots of Pearson chi-square test statistics for testing Hardy-Weinberg Equilibrium on 298 subjects for (a) all 23,136 SNPs on chromosomes 18, 19, 20, 21 and 22 from the Nsp array of the Affymetrix 500K Array set; (b) the remaining 22,791 SNPs after removing 345 SNPs with genotyping errors. Lines with zero intercepts and unit gradients are included, and data under the null of HWE are expected to lie along these lines. SNPs in each of the two obviously deviating clusters in (b) have been ascertained to be in high LD (r2 > 0.5) with each other.

Figure 2.

Cluster plots of SNP rs5746679 on chromosome 22. The x-axis (and y-axis) represents the processed signal intensities which effectively measure the extent of hybridization for the designated A allele (and B allele). Every dot represents the signal for this SNP from one individual. Clusters are coloured according to the assigned genotypes: red –AA, green –AB, blue –BB, black – No call. No call genotypes are assigned when the metric assessing the confidence of the assigned genotypes fails to satisfy a user-determined threshold. (a) shows the output from the automated genotype calling procedure which, due to genotyping error, resulted in gross departures from HWE. (b) shows the output from manually curated genotypes for the same SNP, which supports the null of HWE. This is indicative of a homozygote-heterozygote miscall of an entire genotype cluster.

Automatic elimination of markers from association analysis based solely on a HWE threshold is imprudent, particularly if further assessment is not performed. Such an approach risks removal of potentially interesting SNPs, while some markers with obvious miscalling could be rescued (Figure 2). Manual inspection of HWE outliers can become impractical when handling large datasets. In this situation, one strategy is to set a less stringent HWE threshold (e.g. corresponding to a significance of 10−5) which would highlight few markers, even in a dataset of 500,000 SNPs, but is still likely to flag extreme genotyping errors. Alternatively combining a stringent HWE threshold with further automated assessment of SNPs could help investigators focus on problem markers. Such strategies, combined with streamlined methods for visualizing cluster plots, mean that it is practical and possible to check and maximise the performance of the current genomewide genotyping platforms. Also, if flagged SNPs could be grouped, on the basis of SNPs in high LD within each group, one could theoretically reduce the number of markers requiring visual inspection by only examining a few in each set. Unfortunately the design of genomewide genotyping platforms may undermine this approach – the Affymetrix 500K Array Set has uneven coverage of SNPs with regards to LD structure, while chips designed based on tagging approaches (e.g. the Illumina Sentrix HumanHap550 Genotyping BeadChip) intentionally genotype markers with weak inter-marker LD.

In summary, contrary to the assertions of Zou & Donner, testing for departure from HWE provides a very useful indicator for identifying gross genotyping errors. Given the advent of genomewide association studies which rely on genotypes called using automated and unsupervised procedures on one or more of the large-scale oligonucleotide genotyping arrays, it is advisable that tests of gross deviation from HWE be incorporated to flag SNPs that warrant further assessment, to ensure that the signals are not caused by erroneous genotype calls.