Volume 69, Issue 3

Idiot's Bayes—Not So Stupid After All?

David J. Hand

Department of Mathematics, Imperial College, London, UK. E‐mail: d.j.hand@ic.ac.uk

Search for more papers by this author
First published: 21 May 2007
Citations: 234

Summary

en

Folklore has it that a very simple supervised classification rule, based on the typically false assumption that the predictor variables are independent, can be highly effective, and often more effective than sophisticated rules. We examine the evidence for this, both empirical, as observed in real data applications, and theoretical, summarising explanations for why this simple rule might be effective.

Résumé

fr

La tradition veunt qu'une règle très simple assumant l'independance des variables prédictives. une hypothèse fausse dans la plupart des cas, peut être très efficace, souvent même plus efficace qu'une méthode plus sophistiquée en ce qui concerne l'attribution de classes a un groupe d'objets. A ce sujet, nous examinons les preuves empiriques, et les preuves théoriques, e'est‐a‐dire les raisons pour lesquelles cette simple règle pourrait faciliter le processus de tri.

Number of times cited according to CrossRef: 234

  • A Disquisition on the Active Sites of Heterogeneous Catalysts for Electrochemical Reduction of CO2 to Value‐Added Chemicals and Fuel, Advanced Energy Materials, 10.1002/aenm.201902106, 10, 11, (2020).
  • Machine learning: Accelerating materials development for energy storage and conversion, InfoMat, 10.1002/inf2.12094, 2, 3, (553-576), (2020).
  • Objective stress monitoring based on wearable sensors in everyday settings, Journal of Medical Engineering & Technology, 10.1080/03091902.2020.1759707, 44, 4, (177-189), (2020).
  • undefined, 2020 21st IEEE International Conference on Mobile Data Management (MDM), 10.1109/MDM48529.2020.00037, (148-157), (2020).
  • Unobtrusive Inference of Affective States in Virtual Rehabilitation from Upper Limb Motions: A Feasibility Study, IEEE Transactions on Affective Computing, 10.1109/TAFFC.2018.2808295, 11, 3, (470-481), (2020).
  • Smartphone Naïve Bayes Human Activity Recognition Using Personalized Datasets, Journal of Advanced Computational Intelligence and Intelligent Informatics, 10.20965/jaciii.2020.p0685, 24, 5, (685-702), (2020).
  • undefined, 2020 International Joint Conference on Neural Networks (IJCNN), 10.1109/IJCNN48605.2020.9207187, (1-8), (2020).
  • A data-mining approach towards damage modelling for El Niño events in Peru , Geomatics, Natural Hazards and Risk, 10.1080/19475705.2020.1818636, 11, 1, (1966-1990), (2020).
  • Diagnostic potential of gut microbiota in Parkinson’s disease, Bulletin of Siberian Medicine, 10.20538/1682-0363-2019-4-92-101, 18, 4, (92-101), (2020).
  • A Comparative Study of Long and Short GRBs. II. A Multiwavelength Method to Distinguish Type II (Massive Star) and Type I (Compact Star) GRBs, The Astrophysical Journal, 10.3847/1538-4357/ab96b8, 897, 2, (154), (2020).
  • A Comparative Study of Host Galaxy Properties between Fast Radio Bursts and Stellar Transients, The Astrophysical Journal, 10.3847/2041-8213/aba907, 899, 1, (L6), (2020).
  • Can structured EHR data support clinical coding? A data mining approach, Health Systems, 10.1080/20476965.2020.1729666, (1-24), (2020).
  • Improved Activity Recognition Combining Inertial Motion Sensors and Electroencephalogram Signals, International Journal of Neural Systems, 10.1142/S0129065720500537, (2050053), (2020).
  • Empirical Studies of a Kernel Density Estimation Based Naive Bayes Method for Software Defect Prediction, IEICE Transactions on Information and Systems, 10.1587/transinf.2018EDP7177, E102.D, 1, (75-84), (2019).
  • undefined, 2019 IEEE International Conference on Healthcare Informatics (ICHI), 10.1109/ICHI.2019.8904504, (1-10), (2019).
  • Applicability of a food chain analysis on aquaculture of Atlantic salmon to identify and monitor vulnerabilities and drivers of change for the identification of emerging risks, EFSA Supporting Publications, 10.2903/sp.efsa.2019.EN-1619, 16, 7, (2019).
  • Machine learning in materials science, InfoMat, 10.1002/inf2.12028, 1, 3, (338-358), (2019).
  • Artificial Intelligence and Machine Learning in Pathology: The Present Landscape of Supervised Methods, Academic Pathology, 10.1177/2374289519873088, 6, (237428951987308), (2019).
  • undefined, 2019 International Conference on Technologies and Applications of Artificial Intelligence (TAAI), 10.1109/TAAI48200.2019.8959854, (1-6), (2019).
  • undefined, 2019 International Conference on Technologies and Applications of Artificial Intelligence (TAAI), 10.1109/TAAI48200.2019.8959922, (1-6), (2019).
  • undefined, 2019 Systems of Signals Generating and Processing in the Field of on Board Communications, 10.1109/SOSG.2019.8706723, (1-4), (2019).
  • Machine Learning for Accelerated Discovery of Solar Photocatalysts, ACS Catalysis, 10.1021/acscatal.9b02531, (2019).
  • Online Detection of Multiple Stimulus Changes Based on Single Neuron Interspike Intervals, Frontiers in Computational Neuroscience, 10.3389/fncom.2019.00069, 13, (2019).
  • A statistical approach to the morphological classification of Prunus sp. seeds , Plant Biosystems - An International Journal Dealing with all Aspects of Plant Biology, 10.1080/11263504.2019.1701126, (1-10), (2019).
  • Geo-spatial text-mining from Twitter – a feature space analysis with a view toward building classification in urban regions, European Journal of Remote Sensing, 10.1080/22797254.2019.1586451, (1-10), (2019).
  • Can biophysical parameters derived from Sentinel-2 space-borne sensor improve land cover characterisation in semi-arid regions?, Geocarto International, 10.1080/10106049.2019.1695956, (1-20), (2019).
  • Interpoint Distance Classification of High Dimensional Discrete Observations, International Statistical Review, 10.1111/insr.12281, 87, 2, (191-206), (2018).
  • A metabolomics-based approach for non-invasive screening of fetal central nervous system anomalies, Metabolomics, 10.1007/s11306-018-1370-8, 14, 6, (2018).
  • Using Supervised Learning to Select Audit Targets in Performance-Based Financing in Health: An Example from Zambia, SSRN Electronic Journal, 10.2139/ssrn.3208855, (2018).
  • Classification of Diffusion Tensor Metrics for the Diagnosis of a Myelopathic Cord Using Machine Learning, International Journal of Neural Systems, 10.1142/S0129065717500368, 28, 02, (1750036), (2018).
  • A Correction Method of a Binary Classifier Applied to Multi-Label Pairwise Models, International Journal of Neural Systems, 10.1142/S0129065717500629, 28, 09, (1750062), (2018).
  • undefined, 2018 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT), 10.1109/3ICT.2018.8855790, (1-5), (2018).
  • Trace-Based Multi- Cristeria Preselection Approach for Decision Making in Interactive Applications like Video Games, The Digital Turn in Higher Education, 10.1007/978-3-658-19925-8, (211-234), (2018).
  • Feature clustering and ranking for selecting stable features from high dimensional remotely sensed data, International Journal of Remote Sensing, 10.1080/01431161.2018.1500730, 39, 23, (8934-8949), (2018).
  • Efficient Integration of Sufficient Dimension Reduction and Prediction in Discriminant Analysis, Technometrics, 10.1080/00401706.2018.1512901, (0-0), (2018).
  • The diverging dictionaries of science and law, The International Journal of Evidence & Proof, 10.1177/1365712717725536, 22, 1, (30-44), (2017).
  • ALLO: A tool to discriminate and prioritize allosteric pockets, Chemical Biology & Drug Design, 10.1111/cbdd.13161, 91, 4, (845-853), (2017).
  • Statistical Approaches to Type Determination of the Ejector Marks on Cartridge Cases, Journal of Forensic Sciences, 10.1111/1556-4029.13529, 63, 2, (431-439), (2017).
  • Random Subclasses Ensembles by Using 1-Nearest Neighbor Framework, International Journal of Pattern Recognition and Artificial Intelligence, 10.1142/S0218001417500318, 31, 10, (1750031), (2017).
  • Sentiment Analysis as Reputational Risk Indicator, SSRN Electronic Journal, 10.2139/ssrn.3051870, (2017).
  • undefined, 2017 12th International Workshop on Self-Organizing Maps and Learning Vector Quantization, Clustering and Data Visualization (WSOM), 10.1109/WSOM.2017.8020022, (1-8), (2017).
  • undefined, 2017 IEEE PES Innovative Smart Grid Technologies Conference - Latin America (ISGT Latin America), 10.1109/ISGT-LA.2017.8126683, (1-6), (2017).
  • undefined, 2017 IEEE International Conference on Big Data (Big Data), 10.1109/BigData.2017.8258113, (1712-1715), (2017).
  • undefined, Proceedings of the Tenth ACM International Conference on Web Search and Data Mining - WSDM '17, 10.1145/3018661.3018701, (581-590), (2017).
  • undefined, Proceedings of the 11th International Conference on Ubiquitous Information Management and Communication - IMCOM '17, 10.1145/3022227.3022283, (1-8), (2017).
  • ADMET Evaluation in Drug Discovery. Part 17: Development of Quantitative and Qualitative Prediction Models for Chemical-Induced Respiratory Toxicity, Molecular Pharmaceutics, 10.1021/acs.molpharmaceut.7b00317, 14, 7, (2407-2421), (2017).
  • undefined, 2017 Seventh International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW), 10.1109/ACIIW.2017.8272607, (159-164), (2017).
  • ClonEstiMate, a Bayesian method for quantifying rates of clonality of populations genotyped at two‐time steps, Molecular Ecology Resources, 10.1111/1755-0998.12698, 17, 6, (e251-e267), (2017).
  • Biomarker signatures of aging, Aging Cell, 10.1111/acel.12557, 16, 2, (329-338), (2017).
  • Interpersonal early‐life trauma alters amygdala connectivity and sustained attention performance, Brain and Behavior, 10.1002/brb3.684, 7, 5, (2017).
  • undefined, Proceedings of the 1st International Conference on Internet of Things and Machine Learning - IML '17, 10.1145/3109761.3109792, (1-10), (2017).
  • Model-based clustering of Gaussian copulas for mixed data, Communications in Statistics - Theory and Methods, 10.1080/03610926.2016.1277753, 46, 23, (11635-11656), (2017).
  • Bayesian Naïve Bayes classifiers to text classification, Journal of Information Science, 10.1177/0165551516677946, 44, 1, (48-59), (2016).
  • Fitting the data from embryo implantation prediction: Learning from label proportions, Statistical Methods in Medical Research, 10.1177/0962280216651098, 27, 4, (1056-1066), (2016).
  • Learning from Proportions of Positive and Unlabeled Examples, International Journal of Intelligent Systems, 10.1002/int.21832, 32, 2, (109-133), (2016).
  • Variable selection for model-based clustering using the integrated complete-data likelihood, Statistics and Computing, 10.1007/s11222-016-9670-1, 27, 4, (1049-1063), (2016).
  • ENRI: A tool for selecting structure‐based virtual screening target conformations, Chemical Biology & Drug Design, 10.1111/cbdd.12900, 89, 5, (762-771), (2016).
  • Comparison of approaches for incorporating new information into existing risk prediction models, Statistics in Medicine, 10.1002/sim.7190, 36, 7, (1134-1156), (2016).
  • APPLICATION OF S-TRANSFORM FOR AUTOMATED DETECTION OF VIGILANCE LEVEL USING EEG SIGNALS, Journal of Biological Systems, 10.1142/S0218339016500017, 24, 01, (1-27), (2016).
  • undefined, 2016 7th IEEE International Conference on Software Engineering and Service Science (ICSESS), 10.1109/ICSESS.2016.7883127, (538-541), (2016).
  • Online Feature Selection Based on Fuzzy Clustering and Its Applications, IEEE Transactions on Fuzzy Systems, 10.1109/TFUZZ.2015.2513091, 24, 6, (1294-1306), (2016).
  • The Convergence Behavior of Naive Bayes on Large Sparse Datasets, ACM Transactions on Knowledge Discovery from Data, 10.1145/2948068, 11, 1, (1-24), (2016).
  • undefined, 2016 IEEE Congress on Evolutionary Computation (CEC), 10.1109/CEC.2016.7743932, (1262-1266), (2016).
  • Ship Classification in SAR Image by Joint Feature and Classifier Selection, IEEE Geoscience and Remote Sensing Letters, 10.1109/LGRS.2015.2506570, 13, 2, (212-216), (2016).
  • Modelling-based experiment retrieval: a case study with gene expression clustering, Bioinformatics, 10.1093/bioinformatics/btv762, 32, 9, (1388-1394), (2016).
  • undefined, 2016 Third International Conference on Digital Information Processing, Data Mining, and Wireless Communications (DIPDMWC), 10.1109/DIPDMWC.2016.7529358, (26-30), (2016).
  • Representative Vector Machines: A Unified Framework for Classical Classifiers, IEEE Transactions on Cybernetics, 10.1109/TCYB.2015.2457234, 46, 8, (1877-1888), (2016).
  • A Statistical Approach to Crime Linkage, The American Statistician, 10.1080/00031305.2015.1123185, 70, 2, (152-165), (2016).
  • Theoretical grounding for estimation in conditional independence multivariate finite mixture models, Journal of Nonparametric Statistics, 10.1080/10485252.2016.1225049, 28, 4, (683-701), (2016).
  • Insights Exploration of Structured and Unstructured Data and Construction of Automated Knowledge Banks, International Journal of Machine Learning and Computing, 10.18178/ijmlc.2016.6.2.584, 6, 2, (117-122), (2016).
  • Assessing spatial likelihood of flooding hazard using naïve Bayes and GIS: a case study in Bowen Basin, Australia, Stochastic Environmental Research and Risk Assessment, 10.1007/s00477-015-1198-y, 30, 6, (1575-1590), (2015).
  • Adversarial and Amiable Inference in Medical Diagnosis, Reliability and Survival Analysis, International Statistical Review, 10.1111/insr.12104, 84, 3, (390-412), (2015).
  • A simple model‐based approach to variable selection in classification and clustering, Canadian Journal of Statistics, 10.1002/cjs.11241, 43, 2, (157-175), (2015).
  • undefined, 2015 IEEE International Conference on Data Mining, 10.1109/ICDM.2015.53, (853-858), (2015).
  • Exploring user engagement strategies and their impacts with social media mining: the case of public libraries, Journal of Management Analytics, 10.1080/23270012.2015.1100969, 2, 4, (295-313), (2015).
  • Differential protein expression and peak selection in mass spectrometry data by binary discriminant analysis, Bioinformatics, 10.1093/bioinformatics/btv334, 31, 19, (3156-3162), (2015).
  • undefined, 2015 International Joint Conference on Neural Networks (IJCNN), 10.1109/IJCNN.2015.7280460, (1-9), (2015).
  • undefined, 2015 12th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), 10.1109/FSKD.2015.7382083, (1022-1026), (2015).
  • A machine learning approach for predicting delays in construction logistics, International Journal of Advanced Logistics, 10.1080/2287108X.2015.1059920, 4, 2, (115-130), (2015).
  • Naïve Bayes classifier, Data Mining Algorithms, 10.1002/9781118950951, (118-133), (2015).
  • A Naive Bayes machine learning approach to risk prediction using censored, time‐to‐event data, Statistics in Medicine, 10.1002/sim.6526, 34, 21, (2941-2957), (2015).
  • A two‐way clustering framework to identify disparities in multimorbidity patterns of mental and physical health conditions among Australians, Statistics in Medicine, 10.1002/sim.6542, 34, 26, (3444-3460), (2015).
  • Model-Based Clustering for Conditionally Correlated Categorical Data, Journal of Classification, 10.1007/s00357-015-9180-4, 32, 2, (145-175), (2015).
  • BANYAN. VII. A NEW POPULATION OF YOUNG SUBSTELLAR CANDIDATE MEMBERS OF NEARBY MOVING GROUPS FROM THE BASS SURVEY, The Astrophysical Journal Supplement Series, 10.1088/0067-0049/219/2/33, 219, 2, (33), (2015).
  • Peripheral blood cells inform on the presence of breast cancer: A population‐based case–control study, International Journal of Cancer, 10.1002/ijc.29030, 136, 3, (656-667), (2014).
  • undefined, 2014 International Conference and Exposition on Electrical and Power Engineering (EPE), 10.1109/ICEPE.2014.6969867, (052-056), (2014).
  • undefined, 2014 International Conference on Control, Decision and Information Technologies (CoDIT), 10.1109/CoDIT.2014.6996880, (123-127), (2014).
  • Variable Selection for Naive Bayes Semisupervised Learning, Communications in Statistics - Simulation and Computation, 10.1080/03610918.2012.762391, 43, 10, (2702-2713), (2014).
  • References, Introduction to Imprecise Probabilities, undefined, (338-373), (2014).
  • References, Combining Pattern Classifiers, 10.1002/9781118914564, (327-351), (2014).
  • Discrete Bayesian Network Classifiers, ACM Computing Surveys, 10.1145/2576868, 47, 1, (1-43), (2014).
  • BANYAN. V. A SYSTEMATIC ALL-SKY SURVEY FOR NEW VERY LATE-TYPE LOW-MASS STARS AND BROWN DWARFS IN NEARBY YOUNG MOVING GROUPS, The Astrophysical Journal, 10.1088/0004-637X/798/2/73, 798, 2, (73), (2014).
  • BANYAN. II. VERY LOW MASS AND SUBSTELLAR CANDIDATE MEMBERS TO NEARBY, YOUNG KINEMATIC GROUPS WITH PREVIOUSLY KNOWN SIGNS OF YOUTH, The Astrophysical Journal, 10.1088/0004-637X/783/2/121, 783, 2, (121), (2014).
  • PNFS: PERSONALIZED WEB NEWS FILTERING AND SUMMARIZATION, International Journal on Artificial Intelligence Tools, 10.1142/S0218213013600075, 22, 05, (1360007), (2013).
  • Sentiment Analysis of Online Media, Algorithms from and for Nature and Life, 10.1007/978-3-319-00035-0_13, (137-145), (2013).
  • undefined, 2013 International Conference on Social Computing, 10.1109/SocialCom.2013.24, (114-121), (2013).
  • Category-Based Infidelity Bounded Queries over Unstructured Data Streams, IEEE Transactions on Knowledge and Data Engineering, 10.1109/TKDE.2012.200, 25, 11, (2448-2462), (2013).
  • COMBINATION OF MULTIPLE FEATURE SELECTION METHODS FOR TEXT CATEGORIZATION BY USING COMBINATORIAL FUSION ANALYSIS AND RANK-SCORE CHARACTERISTIC, International Journal on Artificial Intelligence Tools, 10.1142/S0218213013500012, 22, 02, (1350001), (2013).
  • Recognition of Promoters in DNA Sequences Using Weightily Averaged One-dependence Estimators, Procedia Computer Science, 10.1016/j.procs.2013.10.009, 23, (60-67), (2013).
  • Classification of Eukaryotic Splice-junction Genetic Sequences Using Averaged One-dependence Estimators with Subsumption Resolution, Procedia Computer Science, 10.1016/j.procs.2013.10.006, 23, (36-43), (2013).
  • See more

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.