Volume 40, Issue 3
Research Article

Smooth‐Threshold Multivariate Genetic Prediction with Unbiased Model Selection

Masao Ueki

Corresponding Author

Biostatistics Center, Kurume University, Kurume, Fukuoka, Japan

Correspondence to: Masao Ueki, Biostatistics Center, Kurume University, 67 Asahi‐Machi, Kurume, Fukuoka 830‐0011, Japan. E‐mail: uekimrsd@nifty.com.Search for more papers by this author
Gen Tamiya

Tohoku Medical Megabank Organization, Tohoku University, Aoba‐Ku, Sendai, Miyagi, Japan

Search for more papers by this author
for Alzheimer's Disease Neuroimaging Initiative

Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp‐content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf.

Search for more papers by this author
First published: 06 March 2016
Citations: 7

ABSTRACT

We develop a new genetic prediction method, smooth‐threshold multivariate genetic prediction, using single nucleotide polymorphisms (SNPs) data in genome‐wide association studies (GWASs). Our method consists of two stages. At the first stage, unlike the usual discontinuous SNP screening as used in the gene score method, our method continuously screens SNPs based on the output from standard univariate analysis for marginal association of each SNP. At the second stage, the predictive model is built by a generalized ridge regression simultaneously using the screened SNPs with SNP weight determined by the strength of marginal association. Continuous SNP screening by the smooth thresholding not only makes prediction stable but also leads to a closed form expression of generalized degrees of freedom (GDF). The GDF leads to the Stein's unbiased risk estimation (SURE), which enables data‐dependent choice of optimal SNP screening cutoff without using cross‐validation. Our method is very rapid because computationally expensive genome‐wide scan is required only once in contrast to the penalized regression methods including lasso and elastic net. Simulation studies that mimic real GWAS data with quantitative and binary traits demonstrate that the proposed method outperforms the gene score method and genomic best linear unbiased prediction (GBLUP), and also shows comparable or sometimes improved performance with the lasso and elastic net being known to have good predictive ability but with heavy computational cost. Application to whole‐genome sequencing (WGS) data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) exhibits that the proposed method shows higher predictive power than the gene score and GBLUP methods.

Number of times cited according to CrossRef: 7

  • Machine learning for effectively avoiding overfitting is a crucial strategy for the genetic prediction of polygenic psychiatric phenotypes, Translational Psychiatry, 10.1038/s41398-020-00957-5, 10, 1, (2020).
  • , Impulse Control Disorders in Parkinson’s disease, 10.31265/usps.39, (2020).
  • Artificial intelligence powered statistical genetics in biobanks, Journal of Human Genetics, 10.1038/s10038-020-0822-y, (2020).
  • Prediction of treatment response in rheumatoid arthritis patients using genome‐wide SNP data, Genetic Epidemiology, 10.1002/gepi.22159, 42, 8, (754-771), (2018).
  • Genome analyses for the Tohoku Medical Megabank Project towards establishment of personalized healthcare, The Journal of Biochemistry, 10.1093/jb/mvy096, (2018).
  • Evaluating Common Strategies for the Efficiency of Feature Selection in the Context of Microarray Analysis, Journal of Data Analysis and Information Processing, 10.4236/jdaip.2017.51002, 05, 01, (11-32), (2017).
  • An adaptive threshold determination method of feature screening for genomic selection, BMC Bioinformatics, 10.1186/s12859-017-1617-9, 18, 1, (2017).

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.