The Value of Statistical or Bioinformatics Annotation for Rare Variant Association With Quantitative Trait
Article first published online: 8 JUL 2013
© 2013 WILEY PERIODICALS, INC.
Volume 37, Issue 7, pages 666–674, November 2013
How to Cite
Byrnes, A. E., Wu, M. C., Wright, F. A., Li, M. and Li, Y. (2013), The Value of Statistical or Bioinformatics Annotation for Rare Variant Association With Quantitative Trait. Genet. Epidemiol., 37: 666–674. doi: 10.1002/gepi.21747
- Issue published online: 15 OCT 2013
- Article first published online: 8 JUL 2013
- Manuscript Accepted: 3 JUN 2013
- Manuscript Revised: 20 MAY 2013
- Manuscript Received: 25 MAR 2013
- rare variants;
- variable selection;
- variant annotation
In the past few years, a plethora of methods for rare variant association with phenotype have been proposed. These methods aggregate information from multiple rare variants across genomic region(s), but there is little consensus as to which method is most effective. The weighting scheme adopted when aggregating information across variants is one of the primary determinants of effectiveness. Here we present a systematic evaluation of multiple weighting schemes through a series of simulations intended to mimic large sequencing studies of a quantitative trait. We evaluate existing phenotype-independent and phenotype-dependent methods, as well as weights estimated by penalized regression approaches including Lasso, Elastic Net, and SCAD. We find that the difference in power between phenotype-dependent schemes is negligible when high-quality functional annotations are available. When functional annotations are unavailable or incomplete, all methods suffer from power loss; however, the variable selection methods outperform the others at the cost of increased computational time. Therefore, in the absence of good annotation, we recommend variable selection methods (which can be viewed as “statistical annotation”) on top of regions implicated by a phenotype-independent weighting scheme. Further, once a region is implicated, variable selection can help to identify potential causal single nucleotide polymorphisms for biological validation. These findings are supported by an analysis of a high coverage targeted sequencing study of 1,898 individuals.