Evaluating classification accuracy for modern learning approaches
Abstract
Deep learning neural network models such as multilayer perceptron (MLP) and convolutional neural network (CNN) are novel and attractive artificial intelligence computing tools. However, evaluation of the performance of these methods is not readily available for practitioners yet. We provide a tutorial for evaluating classification accuracy for various state‐of‐the‐art learning approaches, including familiar shallow and deep learning methods. For qualitative response variables with more than two categories, many traditional accuracy measures such as sensitivity, specificity, and area under the receiver operating characteristic curve are not applicable and we have to consider their extensions properly. In this paper, a few important statistical concepts for multicategory classification accuracy are reviewed and their utilities for various learning algorithms are demonstrated with real medical examples. We offer problem‐based R code to illustrate how to perform these statistical computations step by step. We expect that such analysis tools will become more familiar to practitioners and receive broader applications in biostatistics.
Citing Literature
Number of times cited according to CrossRef: 7
- Maryam Farhadian, Parisa Shokouhi, Parviz Torkzaban, A decision support system based on support vector machine for diagnosis of periodontal disease, BMC Research Notes, 10.1186/s13104-020-05180-5, 13, 1, (2020).
- Sumaiya Z. Sande, Jialiang Li, Ralph D'Agostino, Tien Yin Wong, Ching‐Yu Cheng, Statistical inference for decision curve analysis, with applications to cataract diagnosis, Statistics in Medicine, 10.1002/sim.8588, 39, 22, (2980-3002), (2020).
- Marle Gemmeke, Ellen S. Koster, Romin Pajouheshnia, Martine Kruijtbosch, Katja Taxis, Marcel L. Bouvy, Using pharmacy dispensing data to predict falls in older individuals, British Journal of Clinical Pharmacology, 10.1111/bcp.14506, 0, 0, (2020).
- Rutger R. van de Leur, Lennart J. Blom, Efstratios Gavves, Irene E. Hof, Jeroen F. van der Heijden, Nick C. Clappers, Pieter A. Doevendans, Rutger J. Hassink, René van Es, Automatic Triage of 12‐Lead ECGs Using Deep Convolutional Neural Networks, Journal of the American Heart Association, 10.1161/JAHA.119.015138, (2020).
- Taku Harada, Taro Shimizu, Yuki Kaji, Yasuhiro Suyama, Tomohiro Matsumoto, Chintaro Kosaka, Hidefumi Shimizu, Takatoshi Nei, Satoshi Watanuki, A Perspective from a Case Conference on Comparing the Diagnostic Process: Human Diagnostic Thinking vs. Artificial Intelligence (AI) Decision Support Tools, International Journal of Environmental Research and Public Health, 10.3390/ijerph17176110, 17, 17, (6110), (2020).
- Miroslav Stojadinovic, Teodora Trifunovic, Slobodan Jankovic, Adaptation of the prostate biopsy collaborative group risk calculator in patients with PSA less than 10 ng/ml improves its performance, International Urology and Nephrology, 10.1007/s11255-020-02517-8, (2020).
- Jianping Yang, Pei-Fen Kuan, Jialiang Li, Non-monotone transformation of biomarkers to improve diagnostic and screening accuracy in a DNA methylation study with trichotomous phenotypes, Statistical Methods in Medical Research, 10.1177/0962280219882047, (096228021988204), (2019).




