Improving lung cancer prognosis assessment by incorporating synthetic minority oversampling technique and score fusion method

Authors

  • Yan Shiju,

    1. School of Medical Instrument and Food Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China and School of Electrical and Computer Engineering, University of Oklahoma, Norman, Oklahoma 73019
    Search for more papers by this author
  • Qian Wei,

    1. Department of Electrical and Computer Engineering, University of Texas, El Paso, Texas 79968 and Sino-Dutch Biomedical and Information Engineering School, Northeastern University, Shenyang 110819, China
    Search for more papers by this author
  • Guan Yubao,

    1. Department of Radiology, Guangzhou Medical University, Guangzhou 510182, China
    Search for more papers by this author
  • Zheng Bin

    1. School of Electrical and Computer Engineering, University of Oklahoma, Norman, Oklahoma 73019
    Search for more papers by this author

Abstract

Purpose:

This study aims to investigate the potential to improve lung cancer recurrence risk prediction performance for stage I NSCLS patients by integrating oversampling, feature selection, and score fusion techniques and develop an optimal prediction model.

Methods:

A dataset involving 94 early stage lung cancer patients was retrospectively assembled, which includes CT images, nine clinical and biological (CB) markers, and outcome of 3-yr disease-free survival (DFS) after surgery. Among the 94 patients, 74 remained DFS and 20 had cancer recurrence. Applying a computer-aided detection scheme, tumors were segmented from the CT images and 35 quantitative image (QI) features were initially computed. Two normalized Gaussian radial basis function network (RBFN) based classifiers were built based on QI features and CB markers separately. To improve prediction performance, the authors applied a synthetic minority oversampling technique (SMOTE) and a BestFirst based feature selection method to optimize the classifiers and also tested fusion methods to combine QI and CB based prediction results.

Results:

Using a leave-one-case-out cross-validation (K-fold cross-validation) method, the computed areas under a receiver operating characteristic curve (AUCs) were 0.716 ± 0.071 and 0.642 ± 0.061, when using the QI and CB based classifiers, respectively. By fusion of the scores generated by the two classifiers, AUC significantly increased to 0.859 ± 0.052 (p < 0.05) with an overall prediction accuracy of 89.4%.

Conclusions:

This study demonstrated the feasibility of improving prediction performance by integrating SMOTE, feature selection, and score fusion techniques. Combining QI features and CB markers and performing SMOTE prior to feature selection in classifier training enabled RBFN based classifier to yield improved prediction accuracy.

Ancillary