Get access

Comparison between splines and fractional polynomials for multivariable model building with continuous covariates: a simulation study with continuous response

Authors

  • Harald Binder,

    Corresponding author
    1. Institute of Medical Biometry and Medical Informatics, University Medical Center Freiburg, 79104 Freiburg, Germany
    • Institute of Medical Biostatistics, Epidemiology and Informatics, University Medical Center of Johannes Gutenberg University Mainz, 55101 Mainz, Germany
    Search for more papers by this author
  • Willi Sauerbrei,

    1. Institute of Medical Biometry and Medical Informatics, University Medical Center Freiburg, 79104 Freiburg, Germany
    Search for more papers by this author
  • Patrick Royston

    1. Hub for Trials Methodology Research, MRC Clinical Trials Unit, London WC2B 6NH, U.K.
    Search for more papers by this author

Correspondence to: Harald Binder, Institute of Medical Biostatistics, Epidemiology and Informatics, University Medical Center Johannes Gutenberg University Mainz, 55101 Mainz, Germany.

E-mail: binderh@uni-mainz.de

Abstract

In observational studies, many continuous or categorical covariates may be related to an outcome. Various spline-based procedures or the multivariable fractional polynomial (MFP) procedure can be used to identify important variables and functional forms for continuous covariates. This is the main aim of an explanatory model, as opposed to a model only for prediction. The type of analysis often guides the complexity of the final model. Spline-based procedures and MFP have tuning parameters for choosing the required complexity. To compare model selection approaches, we perform a simulation study in the linear regression context based on a data structure intended to reflect realistic biomedical data. We vary the sample size, variance explained and complexity parameters for model selection. We consider 15 variables. A sample size of 200 (1000) and R2 = 0.2 (0.8) is the scenario with the smallest (largest) amount of information. For assessing performance, we consider prediction error, correct and incorrect inclusion of covariates, qualitative measures for judging selected functional forms and further novel criteria. From limited information, a suitable explanatory model cannot be obtained. Prediction performance from all types of models is similar. With a medium amount of information, MFP performs better than splines on several criteria. MFP better recovers simpler functions, whereas splines better recover more complex functions. For a large amount of information and no local structure, MFP and the spline procedures often select similar explanatory models. Copyright © 2012 John Wiley & Sons, Ltd.

Ancillary