Volume 63, Issue 4

Semiparametric Regression of Multidimensional Genetic Pathway Data: Least‐Squares Kernel Machines and Linear Mixed Models

Dawei Liu

Corresponding Author

Center for Statistical Sciences, Brown University, Providence, Rhode Island 02912, U.S.A

email:daweiliu@stat.brown.edu

email:xlin@hsph.harvard.edu

email:ghoshd@umich.edu

Search for more papers by this author
Xihong Lin

Corresponding Author

Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts 02115, U.S.A

email:daweiliu@stat.brown.edu

email:xlin@hsph.harvard.edu

email:ghoshd@umich.edu

Search for more papers by this author
Debashis Ghosh

Corresponding Author

Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109, U.S.A

email:daweiliu@stat.brown.edu

email:xlin@hsph.harvard.edu

email:ghoshd@umich.edu

Search for more papers by this author
First published: 01 May 2007
Citations: 173

Abstract

Summary We consider a semiparametric regression model that relates a normal outcome to covariates and a genetic pathway, where the covariate effects are modeled parametrically and the pathway effect of multiple gene expressions is modeled parametrically or nonparametrically using least‐squares kernel machines (LSKMs). This unified framework allows a flexible function for the joint effect of multiple genes within a pathway by specifying a kernel function and allows for the possibility that each gene expression effect might be nonlinear and the genes within the same pathway are likely to interact with each other in a complicated way. This semiparametric model also makes it possible to test for the overall genetic pathway effect. We show that the LSKM semiparametric regression can be formulated using a linear mixed model. Estimation and inference hence can proceed within the linear mixed model framework using standard mixed model software. Both the regression coefficients of the covariate effects and the LSKM estimator of the genetic pathway effect can be obtained using the best linear unbiased predictor in the corresponding linear mixed model formulation. The smoothing parameter and the kernel parameter can be estimated as variance components using restricted maximum likelihood. A score test is developed to test for the genetic pathway effect. Model/variable selection within the LSKM framework is discussed. The methods are illustrated using a prostate cancer data set and evaluated using simulations.

Number of times cited according to CrossRef: 173

  • Robust kernel association testing (RobKAT), Genetic Epidemiology, 10.1002/gepi.22280, 44, 3, (272-282), (2020).
  • Involvement of fine particulate matter exposure with gene expression pathways in breast tumor and adjacent-normal breast tissue, Environmental Research, 10.1016/j.envres.2020.109535, (109535), (2020).
  • A comprehensive analysis on the state‐of‐the‐art developments in reflectarray, transmitarray, and transmit‐reflectarray antennas, International Journal of RF and Microwave Computer-Aided Engineering, 10.1002/mmce.22272, 30, 9, (2020).
  • RAINBOW: Haplotype-based genome-wide association study using a novel SNP-set method, PLOS Computational Biology, 10.1371/journal.pcbi.1007663, 16, 2, (e1007663), (2020).
  • Metal/metalloid levels and variation in lifetime cancer risks among tissues, Human and Ecological Risk Assessment: An International Journal, 10.1080/10807039.2020.1732188, (1-13), (2020).
  • Association test using Copy Number Profile Curves (CONCUR) enhances power in rare copy number variant analysis, PLOS Computational Biology, 10.1371/journal.pcbi.1007797, 16, 5, (e1007797), (2020).
  • A generalized model for combining dependent SNP-level summary statistics and its extensions to statistics of other levels, Scientific Reports, 10.1038/s41598-019-41827-5, 9, 1, (2019).
  • Deep neural networks and kernel regression achieve comparable accuracies for functional connectivity prediction of behavior and demographics, NeuroImage, 10.1016/j.neuroimage.2019.116276, (116276), (2019).
  • A review of kernel methods for genetic association studies, Genetic Epidemiology, 10.1002/gepi.22180, 43, 2, (122-136), (2019).
  • A review of statistical methods in imaging genetics, Canadian Journal of Statistics, 10.1002/cjs.11487, 47, 1, (108-131), (2019).
  • Statistical Analysis of Metagenomics Data, Genomics & Informatics, 10.5808/GI.2019.17.1.e6, 17, 1, (e6), (2019).
  • An overview of methods to address distinct research questions on environmental mixtures: an application to persistent organic pollutants and leukocyte telomere length, Environmental Health, 10.1186/s12940-019-0515-1, 18, 1, (2019).
  • Machine Learning Based File Entropy Analysis for Ransomware Detection in Backup Systems, IEEE Access, 10.1109/ACCESS.2019.2931136, 7, (110205-110215), (2019).
  • Global test for high‐dimensional mediation: Testing groups of potential mediators, Statistics in Medicine, 10.1002/sim.8199, 38, 18, (3346-3360), (2019).
  • Composite kernel machine regression based on likelihood ratio test for joint testing of genetic and gene–environment interaction effect, Biometrics, 10.1111/biom.13003, 75, 2, (625-637), (2019).
  • Gaussian process regression for survival time prediction with genome-wide gene expression, Biostatistics, 10.1093/biostatistics/kxz023, (2019).
  • A Distance-Based Kernel Association Test Based on the Generalized Linear Mixed Model for Correlated Microbiome Studies, Frontiers in Genetics, 10.3389/fgene.2019.00458, 10, (2019).
  • Complex Mixtures, Complex Analyses: an Emphasis on Interpretable Results, Current Environmental Health Reports, 10.1007/s40572-019-00229-5, (2019).
  • An integrative association method for omics data based on a modified Fisher’s method with application to childhood asthma, PLOS Genetics, 10.1371/journal.pgen.1008142, 15, 5, (e1008142), (2019).
  • A Pathway-Based Kernel Boosting Method for Sample Classification Using Genomic Data, Genes, 10.3390/genes10090670, 10, 9, (670), (2019).
  • Variance components genetic association test for zero‐inflated count outcomes, Genetic Epidemiology, 10.1002/gepi.22162, 43, 1, (82-101), (2018).
  • An optimal kernel‐based U‐statistic method for quantitative gene‐set association analysis, Genetic Epidemiology, 10.1002/gepi.22170, 43, 2, (137-149), (2018).
  • A weighted kernel machine regression approach to environmental pollutants and infertility, Statistics in Medicine, 10.1002/sim.8003, 38, 5, (809-827), (2018).
  • Nonlinear predictive model selection and model averaging using information criteria, Systems Science & Control Engineering, 10.1080/21642583.2018.1496042, 6, 1, (319-328), (2018).
  • Inference on phenotype‐specific effects of genes using multivariate kernel machine regression, Genetic Epidemiology, 10.1002/gepi.22096, 42, 1, (64-79), (2018).
  • Modeling the health effects of time‐varying complex environmental mixtures: Mean field variational Bayes for lagged kernel machine regression, Environmetrics, 10.1002/env.2504, 29, 4, (2018).
  • A kernel-based mixed effect regression model for earthquake ground motions, Advances in Engineering Software, 10.1016/j.advengsoft.2016.06.002, 120, (26-35), (2018).
  • A Guide to Illumina BeadChip Data Analysis, DNA Methylation Protocols, 10.1007/978-1-4939-7481-8_16, (303-330), (2018).
  • Statistical methods and challenges in connectome genetics, Statistics & Probability Letters, 10.1016/j.spl.2018.02.048, 136, (83-86), (2018).
  • Goodness-of-fit test for nonparametric regression models: Smoothing spline ANOVA models as example, Computational Statistics & Data Analysis, 10.1016/j.csda.2018.01.004, 122, (135-155), (2018).
  • A powerful and efficient multivariate approach for voxel-level connectome-wide association studies, NeuroImage, 10.1016/j.neuroimage.2018.12.032, (2018).
  • PASNet: pathway-associated sparse deep neural network for prognosis prediction from high-throughput data, BMC Bioinformatics, 10.1186/s12859-018-2500-z, 19, 1, (2018).
  • Efficient region-based test strategy uncovers genetic risk factors for functional outcome in bipolar disorder, European Neuropsychopharmacology, 10.1016/j.euroneuro.2018.10.005, (2018).
  • A kernel machine method for detecting higher order interactions in multimodal datasets: Application to schizophrenia, Journal of Neuroscience Methods, 10.1016/j.jneumeth.2018.08.027, 309, (161-174), (2018).
  • Pathway aggregation for survival prediction via multiple kernel learning, Statistics in Medicine, 10.1002/sim.7681, 37, 16, (2501-2515), (2018).
  • Bayesian varying coefficient kernel machine regression to assess neurodevelopmental trajectories associated with exposure to complex mixtures, Statistics in Medicine, 10.1002/sim.7947, 37, 30, (4680-4694), (2018).
  • Statistical software for analyzing the health effects of multiple concurrent exposures via Bayesian kernel machine regression, Environmental Health, 10.1186/s12940-018-0413-y, 17, 1, (2018).
  • Semiparametric Bayesian kernel survival model for evaluating pathway effects, Statistical Methods in Medical Research, 10.1177/0962280218797360, (096228021879736), (2018).
  • Predictive Modeling of Microbiome Data Using a Phylogeny-Regularized Generalized Linear Mixed Model, Frontiers in Microbiology, 10.3389/fmicb.2018.01391, 9, (2018).
  • Furnace operation optimization with hybrid model based on mechanism and data analytics, Soft Computing, 10.1007/s00500-018-3519-9, (2018).
  • Weighted functional linear regression models for gene-based association analysis, PLOS ONE, 10.1371/journal.pone.0190486, 13, 1, (e0190486), (2018).
  • A Geometric Perspective on the Power of Principal Component Association Tests in Multiple Phenotype Studies, Journal of the American Statistical Association, 10.1080/01621459.2018.1513363, (1-32), (2018).
  • Accurate and efficient estimation of small P -values with the cross-entropy method: applications in genomic data analysis , Bioinformatics, 10.1093/bioinformatics/bty1005, (2018).
  • Kernel machine methods for integrative analysis of genome‐wide methylation and genotyping studies, Genetic Epidemiology, 10.1002/gepi.22100, 42, 2, (156-167), (2017).
  • Semiparametric Kernel-Based Regression for Evaluating Interaction Between Pathway Effect and Covariate, Journal of Agricultural, Biological and Environmental Statistics, 10.1007/s13253-017-0317-2, 23, 1, (129-152), (2017).
  • Lagged kernel machine regression for identifying time windows of susceptibility to exposures of complex mixtures, Biostatistics, 10.1093/biostatistics/kxx036, 19, 3, (325-341), (2017).
  • Adaptive testing for multiple traits in a proportional odds model with applications to detect SNP‐brain network associations, Genetic Epidemiology, 10.1002/gepi.22033, 41, 3, (259-277), (2017).
  • Sparse Additive Gaussian Process with Soft Interactions, Open Journal of Statistics, 10.4236/ojs.2017.74039, 07, 04, (567-588), (2017).
  • Integrating Clinical and Multiple Omics Data for Prognostic Assessment across Human Cancers, Scientific Reports, 10.1038/s41598-017-17031-8, 7, 1, (2017).
  • Conditional asymptotic inference for the kernel association test, Bioinformatics, 10.1093/bioinformatics/btx511, 33, 23, (3733-3739), (2017).
  • Current approaches used in epidemiologic studies to examine short-term multipollutant air pollution exposures, Annals of Epidemiology, 10.1016/j.annepidem.2016.11.016, 27, 2, (145-153.e1), (2017).
  • Phylogeny-Based Kernels with Application to Microbiome Association Studies, New Advances in Statistics and Data Science, 10.1007/978-3-319-69416-0_13, (217-237), (2017).
  • On the use of kernel machines for Mendelian randomization, Quantitative Biology, 10.1007/s40484-017-0124-3, 5, 4, (368-379), (2017).
  • Powerful Genetic Association Analysis for Common or Rare Variants with High-Dimensional Structured Traits, Genetics, 10.1534/genetics.116.199646, 206, 4, (1779-1790), (2017).
  • The Joint Effect of Prenatal Exposure to Metal Mixtures on Neurodevelopmental Outcomes at 20–40 Months of Age: Evidence from Rural Bangladesh, Environmental Health Perspectives, 10.1289/EHP614, 125, 6, (067015), (2017).
  • Bayesian kernel machine models for testing genetic pathway effects in prostate cancer prognosis, Statistical Analysis and Data Mining: The ASA Data Science Journal, 10.1002/sam.11349, 10, 6, (378-392), (2017).
  • Nonparametric functional concurrent regression models, WIREs Computational Statistics , 10.1002/wics.1394, 9, 2, (2017).
  • Sequence robust association test for familial data, Biometrics, 10.1111/biom.12643, 73, 3, (876-884), (2017).
  • A fast small‐sample kernel independence test for microbiome community‐level association analysis, Biometrics, 10.1111/biom.12684, 73, 4, (1453-1463), (2017).
  • MiRKAT-S: a community-level test of association between the microbiota and survival times, Microbiome, 10.1186/s40168-017-0239-9, 5, 1, (2017).
  • Generalized reduced rank latent factor regression for high dimensional tensor fields, and neuroimaging-genetic applications, NeuroImage, 10.1016/j.neuroimage.2016.08.027, 144, (35-57), (2017).
  • Meta-analysis of peripheral blood gene expression modules for COPD phenotypes, PLOS ONE, 10.1371/journal.pone.0185682, 12, 10, (e0185682), (2017).
  • Robust Rare-Variant Association Tests for Quantitative Traits in General Pedigrees, Statistics in Biosciences, 10.1007/s12561-017-9197-9, (2017).
  • Kernel machine score test for pathway analysis in the presence of semi-competing risks, Statistical Methods in Medical Research, 10.1177/0962280216653427, 27, 4, (1099-1114), (2016).
  • A small‐sample multivariate kernel machine test for microbiome association studies, Genetic Epidemiology, 10.1002/gepi.22030, 41, 3, (210-220), (2016).
  • Links Between the Sequence Kernel Association and the Kernel-Based Adaptive Cluster Tests, Statistics in Biosciences, 10.1007/s12561-016-9175-7, 9, 1, (246-258), (2016).
  • Testing Allele Transmission of an SNP Set Using a Family‐Based Generalized Genetic Random Field Method, Genetic Epidemiology, 10.1002/gepi.21970, 40, 4, (341-351), (2016).
  • Prioritizing individual genetic variants after kernel machine testing using variable selection, Genetic Epidemiology, 10.1002/gepi.21993, 40, 8, (722-731), (2016).
  • Bayesian Semiparametric Model for Pathway-Based Analysis with Zero-Inflated Clinical Outcomes, Journal of Agricultural, Biological and Environmental Statistics, 10.1007/s13253-016-0264-3, 21, 4, (641-662), (2016).
  • Filtering genetic variants and placing informative priors based on putative biological function, BMC Genetics, 10.1186/s12863-015-0313-x, 17, S2, (2016).
  • Overlapping Group Logistic Regression with Applications to Genetic Pathway Selection, Cancer Informatics, 10.4137/CIN.S40043, 15, (CIN.S40043), (2016).
  • Morphometricity as a measure of the neuroanatomical signature of a trait, Proceedings of the National Academy of Sciences, 10.1073/pnas.1604378113, 113, 39, (E5749-E5756), (2016).
  • Boosting the Power of the Sequence Kernel Association Test by Properly Estimating Its Null Distribution, The American Journal of Human Genetics, 10.1016/j.ajhg.2016.05.011, 99, 1, (104-114), (2016).
  • A novel statistical method for rare-variant association studies in general pedigrees, BMC Proceedings, 10.1186/s12919-016-0029-6, 10, S7, (2016).
  • Nonlinear association criterion, nonlinear Granger causality and related issues with applications to neuroimage studies, Journal of Neuroscience Methods, 10.1016/j.jneumeth.2016.01.003, 262, (110-132), (2016).
  • A novel power-based approach to Gaussian kernel selection in the kernel-based association test, Statistical Methodology, 10.1016/j.stamet.2016.09.003, 33, (180-191), (2016).
  • Multikernel linear mixed models for complex phenotype prediction, Genome Research, 10.1101/gr.201996.115, 26, 7, (969-979), (2016).
  • Variable selection in semi-parametric models, Statistical Methods in Medical Research, 10.1177/0962280213499679, 25, 4, (1736-1752), (2016).
  • KDSNP: A kernel-based approach to detecting high-order SNP interactions, Journal of Bioinformatics and Computational Biology, 10.1142/S0219720016440030, 14, 05, (1644003), (2016).
  • A method for analyzing multiple continuous phenotypes in rare variant association studies allowing for flexible correlations in variant effects, European Journal of Human Genetics, 10.1038/ejhg.2016.8, 24, 9, (1344-1351), (2016).
  • Powerful and Adaptive Testing for Multi-trait and Multi-SNP Associations with GWAS and Sequencing Data, Genetics, 10.1534/genetics.115.186502, 203, 2, (715-731), (2016).
  • A novel random effect model for GWAS meta‐analysis and its application to trans‐ethnic meta‐analysis, Biometrics, 10.1111/biom.12481, 72, 3, (945-954), (2016).
  • Flexible variable selection for recovering sparsity in nonadditive nonparametric models, Biometrics, 10.1111/biom.12518, 72, 4, (1155-1163), (2016).
  • Functional linear models for region-based association analysis, Russian Journal of Genetics, 10.1134/S1022795416100124, 52, 10, (1094-1100), (2016).
  • Integrative Bayesian analysis of neuroimaging-genetic data with application to cocaine dependence, NeuroImage, 10.1016/j.neuroimage.2015.10.033, 125, (813-824), (2016).
  • Kernel machine regression in neuroimaging genetics, Machine Learning and Medical Imaging, 10.1016/B978-0-12-804076-8.00002-5, (31-68), (2016).
  • A novel copy number variants kernel association test with application to autism spectrum disorders studies, Bioinformatics, 10.1093/bioinformatics/btw500, (btw500), (2016).
  • PERIOD ESTIMATION FOR SPARSELY SAMPLED QUASI-PERIODIC LIGHT CURVES APPLIED TO MIRAS, The Astronomical Journal, 10.3847/0004-6256/152/6/164, 152, 6, (164), (2016).
  • Unified variable selection in semi-parametric models, Statistical Methods in Medical Research, 10.1177/0962280215610928, 26, 6, (2821-2831), (2015).
  • Small Sample Kernel Association Tests for Human Genetic and Microbiome Association Studies, Genetic Epidemiology, 10.1002/gepi.21934, 40, 1, (5-19), (2015).
  • Adaptive gene- and pathway-trait association testing with GWAS summary statistics, Bioinformatics, 10.1093/bioinformatics/btv719, 32, 8, (1178-1184), (2015).
  • A significance test for graph‐constrained estimation, Biometrics, 10.1111/biom.12418, 72, 2, (484-493), (2015).
  • Testing and estimation in marker‐set association study using semiparametric quantile regression kernel machine, Biometrics, 10.1111/biom.12438, 72, 2, (364-371), (2015).
  • Kernel Approach for Modeling Interaction Effects in Genetic Association Studies of Complex Quantitative Traits, Genetic Epidemiology, 10.1002/gepi.21901, 39, 5, (366-375), (2015).
  • A Fast Multiple‐Kernel Method With Applications to Detect Gene‐Environment Interaction, Genetic Epidemiology, 10.1002/gepi.21909, 39, 6, (456-468), (2015).
  • Sequence Kernel Association Analysis of Rare Variant Set Based on the Marginal Regression Model for Binary Traits, Genetic Epidemiology, 10.1002/gepi.21913, 39, 6, (399-405), (2015).
  • Incorporating auxiliary information for improved prediction using combination of kernel machines, Statistical Methodology, 10.1016/j.stamet.2014.08.001, 22, (47-57), (2015).
  • A kernel machine method for detecting effects of interaction between multidimensional variable sets: An imaging genetics application, NeuroImage, 10.1016/j.neuroimage.2015.01.029, 109, (505-514), (2015).
  • Risk Classification With an Adaptive Naive Bayes Kernel Machine Model, Journal of the American Statistical Association, 10.1080/01621459.2014.908778, 110, 509, (393-404), (2015).
  • Kernel approaches for differential expression analysis of mass spectrometry-based metabolomics data, BMC Bioinformatics, 10.1186/s12859-015-0506-3, 16, 1, (2015).
  • See more

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.