Novel head and neck cancer survival analysis approach: Random survival forests versus cox proportional hazards regression


  • Conflict of interest: none.



Electronic patient files generate an enormous amount of medical data. These data can be used for research, such as prognostic modeling. Automatization of statistical prognostication processes allows automatic updating of models when new data is gathered. The increase of power behind an automated prognostic model makes its predictive capability more reliable. Cox proportional hazard regression is most frequently used in prognostication. Automatization of a Cox model is possible, but we expect the updating process to be time-consuming. A possible solution lies in an alternative modeling technique called random survival forests (RSFs). RSF is easily automated and is known to handle the proportionality assumption coherently and automatically. Performance of RSF has not yet been tested on a large head and neck oncological dataset. This study investigates performance of head and neck overall survival of RSF models. Performances are compared to a Cox model as the “gold standard.” RSF might be an interesting alternative modeling approach for automatization when performances are similar.


RSF models were created in R (Cox also in SPSS). Four RSF splitting rules were used: log-rank, conservation of events, log-rank score, and log-rank approximation. Models were based on historical data of 1371 patients with primary head-and-neck cancer, diagnosed between 1981 and 1998. Models contain 8 covariates: tumor site, T classification, N classification, M classification, age, sex, prior malignancies, and comorbidity. Model performances were determined by Harrell's concordance error rate, in which 33% of the original data served as a validation sample.


RSF and Cox models delivered similar error rates. The Cox model performed slightly better (error rate, 0.2826). The log-rank splitting approach gave the best RSF performance (error rate, 0.2873). In accord with Cox and RSF models, high T classification, high N classification, and severe comorbidity are very important covariates in the model, whereas sex, mild comorbidity, and a supraglottic larynx tumor are less important. A discrepancy arose regarding the importance of M1 classification (see Discussion).


Both approaches delivered similar error rates. The Cox model gives a clinically understandable output on covariate impact, whereas RSF becomes more of a “black box.” RSF complements the Cox model by giving more insight and confidence toward relative importance of model covariates. RSF can be recommended as the approach of choice in automating survival analyses. © 2011 Wiley Periodicals, Inc. Head Neck, 2012