Methods for categorizing a prognostic variable in a multivariable setting

Authors

  • Madhu Mazumdar,

    Corresponding author
    1. Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, 1275 York Avenue, New York, NY 10021, U.S.A.
    • Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, 1275 York Avenue, Box 44, New York, New York 10021, U.S.A.
    Search for more papers by this author
  • Alex Smith,

    1. Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, 1275 York Avenue, New York, NY 10021, U.S.A.
    Search for more papers by this author
  • Jennifer Bacik

    1. Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, 1275 York Avenue, New York, NY 10021, U.S.A.
    Search for more papers by this author

Abstract

The literature is filled with examples of categorization of a continuous prognostic variable in a univariable setting followed by the addition of this categorical variable to an existing multivariable model. Typically, an ‘optimal’ cutpoint for a new prognostic variable is obtained through a systematic search relating the variable to the outcome in an univariable manner. The corresponding categorical variable is then fitted in a multivariable model along with other already established prognostic covariates to assess the additional value of the new variable. This prompts the question whether the cutpoint search should have been performed in the same multivariable setting where it will ultimately be used. In this paper, we extend the univariable cutpoint search methods (split-sample approach and two-fold cross-validation approach) to the multivariable setting using -2×log-likelihood statistic as the correlative measure. A Monte Carlo simulation study demonstrates that both methods are more efficient in detecting the true cutpoint and in estimating the effect size under the multivariable setting as opposed to the univariable setting. The cross-validation method performs better than the split-sample method in univariable as well as multivariable scenarios. For the cross-validation method in the multivariable setting, there is still a substantial loss of power when a cutpoint model is used in cases where there is a continuous relationship between the covariate and the outcome. An example is provided to illustrate the value of the multivariable cross-validation approach. Copyright © 2003 John Wiley & Sons, Ltd.

Ancillary