Volume 47, Issue 2
ORIGINAL ARTICLE

Outlier detection in contingency tables using decomposable graphical models

Mads Lindskou

Corresponding Author

E-mail address: mads@math.aau.dk

Department of Mathematical Sciences, Aalborg University, Aalborg, Denmark

Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark

Mads Lindskou, Department of Mathematical Sciences, Aalborg University, 9200 Aalborg, Denmark.

Email: mads@math.aau.dk

Search for more papers by this author
Poul Svante Eriksen

Department of Mathematical Sciences, Aalborg University, Aalborg, Denmark

Search for more papers by this author
Torben Tvedebrink

Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark

Search for more papers by this author
First published: 16 August 2019
Citations: 1

Abstract

For high‐dimensional data, it is a tedious task to determine anomalies such as outliers. We present a novel outlier detection method for high‐dimensional contingency tables. We use the class of decomposable graphical models to model the relationship among the variables of interest, which can be depicted by an undirected graph called the interaction graph. Given an interaction graph, we derive a closed‐form expression of the likelihood ratio test (LRT) statistic and an exact distribution for efficient simulation of the test statistic. An observation is declared an outlier if it deviates significantly from the approximated distribution of the test statistic under the null hypothesis. We demonstrate the use of the LRT outlier detection framework on genetic data modeled by Chow–Liu trees.

Number of times cited according to CrossRef: 1

  • molic: An R package for multivariate outlier detection in contingency tables, Journal of Open Source Software, 10.21105/joss.01665, 4, 42, (1665), (2019).

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.