Identification of Partially Linear Structure in Additive Models with an Application to Gene Expression Prediction from Sequences

Authors

  • Heng Lian,

    Corresponding author
    1. Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore
    Search for more papers by this author
  • Xin Chen,

    1. Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore
    Search for more papers by this author
  • Jian-Yi Yang

    1. Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore
    Search for more papers by this author

email: henglian@ntu.edu.sg

Abstract

Summary The additive model is a semiparametric class of models that has become extremely popular because it is more flexible than the linear model and can be fitted to high-dimensional data when fully nonparametric models become infeasible. We consider the problem of simultaneous variable selection and parametric component identification using spline approximation aided by two smoothly clipped absolute deviation (SCAD) penalties. The advantage of our approach is that one can automatically choose between additive models, partially linear additive models and linear models, in a single estimation step. Simulation studies are used to illustrate our method, and we also present its applications to motif regression.

Ancillary