Volume 28, Issue 2

Modelling Heterogeneity With and Without the Dirichlet Process

First published: 21 December 2001
Citations: 101

Abstract

We investigate the relationships between Dirichlet process (DP) based models and allocation models for a variable number of components, based on exchangeable distributions. It is shown that the DP partition distribution is a limiting case of a Dirichlet–multinomial allocation model. Comparisons of posterior performance of DP and allocation models are made in the Bayesian paradigm and illustrated in the context of univariate mixture models. It is shown in particular that the unbalancedness of the allocation distribution, present in the prior DP model, persists a posteriori. Exploiting the model connections, a new MCMC sampler for general DP based models is introduced, which uses split/merge moves in a reversible jump framework. Performance of this new sampler relative to that of some traditional samplers for DP processes is then explored.

Number of times cited according to CrossRef: 101

  • Flexible Modeling of Frailty Effects in Clustered Survival Data, Computational and Methodological Statistics and Biostatistics, 10.1007/978-3-030-42196-0_21, (489-509), (2020).
  • Inference for a generalised stochastic block model with unknown number of blocks and non-conjugate edge models, Computational Statistics & Data Analysis, 10.1016/j.csda.2020.107051, (107051), (2020).
  • Bayesian Learning: Approximate Inference and Nonparametric Models, Machine Learning, 10.1016/B978-0-12-818803-3.00025-8, (647-730), (2020).
  • ParticleMDI: particle Monte Carlo methods for the cluster analysis of multiple datasets with applications to cancer subtype identification, Advances in Data Analysis and Classification, 10.1007/s11634-020-00401-y, (2020).
  • Modelling and prediction of financial trading networks: an application to the New York Mercantile Exchange natural gas futures market, Journal of the Royal Statistical Society: Series C (Applied Statistics), 10.1111/rssc.12387, 69, 1, (195-218), (2019).
  • A Bayesian Nonparametric Model for Integrative Clustering of Omics Data, Bayesian Statistics and New Generations, 10.1007/978-3-030-30611-3_11, (105-114), (2019).
  • Event modeling and mining: a long journey toward explainable events, The VLDB Journal, 10.1007/s00778-019-00545-0, (2019).
  • Stochastic Variational Inference for Bayesian Phylogenetics: A Case of CAT Model, Molecular Biology and Evolution, 10.1093/molbev/msz020, (2019).
  • The Hastings algorithm at fifty, Biometrika, 10.1093/biomet/asz066, (2019).
  • Demand Models with Random Partitions, Journal of the American Statistical Association, 10.1080/01621459.2019.1604360, (1-33), (2019).
  • Methods for Inferences, Partitions, Hypergeometric Systems, and Dirichlet Processes in Statistics, 10.1007/978-4-431-55888-0_5, (105-122), (2018).
  • Demand Models With Random Partitions, SSRN Electronic Journal, 10.2139/ssrn.3192926, (2018).
  • From here to infinity: sparse finite versus Dirichlet process mixtures in model-based clustering, Advances in Data Analysis and Classification, 10.1007/s11634-018-0329-y, (2018).
  • Dirichlet process mixture models for insurance loss data, Scandinavian Actuarial Journal, 10.1080/03461238.2017.1402086, 2018, 6, (545-554), (2017).
  • Mixture Models With a Prior on the Number of Components, Journal of the American Statistical Association, 10.1080/01621459.2016.1255636, 113, 521, (340-356), (2017).
  • A nonparametric Bayesian modeling approach for heterogeneous lifetime data with covariates, Reliability Engineering & System Safety, 10.1016/j.ress.2017.05.029, 167, (95-104), (2017).
  • Dirichlet Process Mixture Models for Insurance Loss Data, SSRN Electronic Journal, 10.2139/ssrn.2949036, (2017).
  • A generalized mixture model applied to diabetes incidence data, Biometrical Journal, 10.1002/bimj.201600086, 59, 4, (826-842), (2017).
  • Interactive POMDPs with finite-state models of other agents, Autonomous Agents and Multi-Agent Systems, 10.1007/s10458-016-9359-z, 31, 4, (861-904), (2017).
  • Improving Document Clustering for Short Texts by Long Documents via a Dirichlet Multinomial Allocation Model, Web and Big Data, 10.1007/978-3-319-63579-8_47, (626-641), (2017).
  • A Flexible Bayesian Nonparametric Model for Predicting Future Insurance Claims, North American Actuarial Journal, 10.1080/10920277.2016.1247720, 21, 2, (228-241), (2017).
  • Bayesian Volterra system identification using reversible jump MCMC algorithm, Signal Processing, 10.1016/j.sigpro.2017.05.031, 141, (125-136), (2017).
  • Bi-level heterogeneity modeling of functional performance degradation for the aging population, IISE Transactions on Healthcare Systems Engineering, 10.1080/24725579.2017.1339147, 7, 3, (156-167), (2017).
  • Identifying Mixtures of Mixtures Using Bayesian Estimation, Journal of Computational and Graphical Statistics, 10.1080/10618600.2016.1200472, 26, 2, (285-295), (2017).
  • Bayesian model averaging of possibly similar nonparametric densities, Computational Statistics, 10.1007/s00180-016-0700-4, 32, 1, (349-365), (2016).
  • Bayesian nonparametric modeling of heterogeneous time-to-event data with an unknown number of sub-populations, IISE Transactions, 10.1080/0740817X.2016.1234732, 49, 5, (481-492), (2016).
  • Bayesian Hierarchical Mixture Models, Statistical Analysis for High-Dimensional Data, 10.1007/978-3-319-27099-9_5, (91-103), (2016).
  • A Nonparametric Bayesian Model for Nested Clustering, Statistical Analysis in Proteomics, 10.1007/978-1-4939-3106-4_8, (129-141), (2016).
  • Integrated Genomic and Network-Based Analyses of Complex Diseases and Human Disease Network, Journal of Genetics and Genomics, 10.1016/j.jgg.2015.11.002, 43, 6, (349-367), (2016).
  • Ontology-Based Semantic Image Segmentation Using Mixture Models and Multiple CRFs, IEEE Transactions on Image Processing, 10.1109/TIP.2016.2552401, 25, 7, (3233-3248), (2016).
  • Methods for the integration of multi-omics data: mathematical aspects, BMC Bioinformatics, 10.1186/s12859-015-0857-9, 17, S2, (2016).
  • On the Frequentist Properties of Bayesian Nonparametric Methods, Annual Review of Statistics and Its Application, 10.1146/annurev-statistics-041715-033523, 3, 1, (211-231), (2016).
  • Exploring dependence between categorical variables: Benefits and limitations of using variable selection within Bayesian clustering in relation to log-linear modelling with interaction terms, Journal of Statistical Planning and Inference, 10.1016/j.jspi.2016.01.002, 173, (47-63), (2016).
  • Bayesian Ensemble Trees (BET) for Clustering and Prediction in Heterogeneous Data, Journal of Computational and Graphical Statistics, 10.1080/10618600.2015.1089774, 25, 3, (748-761), (2016).
  • GPU-Powered Shotgun Stochastic Search for Dirichlet Process Mixtures of Gaussian Graphical Models, Journal of Computational and Graphical Statistics, 10.1080/10618600.2015.1037883, 25, 3, (762-788), (2016).
  • Experience rating with Poisson mixtures, Annals of Actuarial Science, 10.1017/S1748499515000019, 9, 2, (304-321), (2015).
  • Flexible Bayesian Nonparametric Credibility Models, SSRN Electronic Journal, 10.2139/ssrn.2690843, (2015).
  • Clustering and Feature Allocation, Bayesian Nonparametric Data Analysis, 10.1007/978-3-319-18968-0_8, (145-174), (2015).
  • Density Estimation: DP Models, Bayesian Nonparametric Data Analysis, 10.1007/978-3-319-18968-0_2, (7-31), (2015).
  • Bayesian Learning, Machine Learning, 10.1016/B978-0-12-801522-3.00013-6, (639-706), (2015).
  • A Semi-parametric Bayesian Approach for Differential Expression Analysis of RNA-seq Data, Journal of Agricultural, Biological, and Environmental Statistics, 10.1007/s13253-015-0227-0, 20, 4, (555-576), (2015).
  • Bayesian cluster identification in single-molecule localization microscopy data, Nature Methods, 10.1038/nmeth.3612, 12, 11, (1072-1076), (2015).
  • Space‐time areal mixture model: relabeling algorithm and model selection issues, Environmetrics, 10.1002/env.2265, 25, 2, (84-96), (2014).
  • Dirichlet Process, Simulation of, Wiley StatsRef: Statistics Reference Online, 10.1002/9781118445112, (2014).
  • Unguided Species Delimitation Using DNA Sequence Data from Multiple Loci, Molecular Biology and Evolution, 10.1093/molbev/msu279, 31, 12, (3125-3135), (2014).
  • Cross‐validation based assessment of a new Bayesian palaeoclimate model, Environmetrics, 10.1002/env.2248, 24, 8, (550-568), (2013).
  • PhyloBayes MPI: Phylogenetic Reconstruction with Infinite Mixtures of Profiles in a Parallel Environment, Systematic Biology, 10.1093/sysbio/syt022, 62, 4, (611-615), (2013).
  • Dirichlet Process Mixture Model for Document Clustering with Feature Partition, IEEE Transactions on Knowledge and Data Engineering, 10.1109/TKDE.2012.27, 25, 8, (1748-1759), (2013).
  • A semi-parametric approach to estimate risk functions associated with multi-dimensional exposure profiles: application to smoking and lung cancer, BMC Medical Research Methodology, 10.1186/1471-2288-13-129, 13, 1, (2013).
  • Nonparametric Bayesian modeling of complex networks: an introduction, IEEE Signal Processing Magazine, 10.1109/MSP.2012.2235191, 30, 3, (110-128), (2013).
  • Space-time stick-breaking processes for small area disease cluster estimation, Environmental and Ecological Statistics, 10.1007/s10651-012-0209-0, 20, 1, (91-107), (2012).
  • Bayesian semiparametric regression models to characterize molecular evolution, BMC Bioinformatics, 10.1186/1471-2105-13-278, 13, 1, (2012).
  • Model choice using reversible jump Markov chain Monte Carlo, Statistica Neerlandica, 10.1111/j.1467-9574.2012.00516.x, 66, 3, (309-338), (2012).
  • Exploring Data From Genetic Association Studies Using Bayesian Variable Selection and the Dirichlet Process: Application to Searching for Gene × Gene Patterns, Genetic Epidemiology, 10.1002/gepi.21661, 36, 6, (663-674), (2012).
  • Bayesian correlated clustering to integrate multiple datasets, Bioinformatics, 10.1093/bioinformatics/bts595, 28, 24, (3290-3297), (2012).
  • undefined, 2012 Information Theory and Applications Workshop, 10.1109/ITA.2012.6181768, (407-414), (2012).
  • Generalized smooth finite mixtures, Journal of Econometrics, 10.1016/j.jeconom.2012.06.012, 171, 2, (121-133), (2012).
  • Dirichlet process mixture models for unsupervised clustering of symptoms in Parkinson's disease, Journal of Applied Statistics, 10.1080/02664763.2012.710897, 39, 11, (2363-2377), (2012).
  • A Rational Analysis of the Acquisition of Multisensory Representations, Cognitive Science, 10.1111/j.1551-6709.2011.01216.x, 36, 2, (305-332), (2011).
  • Flexible Modelling of Dependence in Volatility Processes, SSRN Electronic Journal, 10.2139/ssrn.1769655, (2011).
  • Dealing with Label Switching under Model Uncertainty, Mixtures, undefined, (213-239), (2011).
  • undefined, 2011 IEEE International Workshop on Machine Learning for Signal Processing, 10.1109/MLSP.2011.6064611, (1-6), (2011).
  • A rational model of the effects of distributional information on feature learning, Cognitive Psychology, 10.1016/j.cogpsych.2011.08.002, 63, 4, (173-209), (2011).
  • A Nonparametric Frailty Model for Clustered Survival Data, Communications in Statistics - Theory and Methods, 10.1080/03610920903480882, 40, 5, (863-875), (2011).
  • Identifying Vulnerable Populations through an Examination of the Association Between Multipollutant Profiles and Poverty, Environmental Science & Technology, 10.1021/es104017x, 45, 18, (7754-7760), (2011).
  • Bayesian clustering of distributions in stochastic frontier analysis, Journal of Productivity Analysis, 10.1007/s11123-011-0213-7, 36, 3, (275-283), (2011).
  • Dirichlet Process Gaussian Mixture Models: Choice of the Base Distribution, Journal of Computer Science and Technology, 10.1007/s11390-010-9355-8, 25, 4, (653-664), (2010).
  • Bayesian Hidden Markov Models for Financial Data, Data Analysis and Classification, 10.1007/978-3-642-03739-9_51, (453-461), (2010).
  • A Dirichlet Process Mixture of Generalized Dirichlet Distributions for Proportional Data Modeling, IEEE Transactions on Neural Networks, 10.1109/TNN.2009.2034851, 21, 1, (107-122), (2010).
  • Bayesian profile regression with an application to the National survey of children's health, Biostatistics, 10.1093/biostatistics/kxq013, 11, 3, (484-498), (2010).
  • Gibbs sampling in DP-based nonlinear mixed effects models, Journal of Applied Statistics, 10.1080/02664760903117721, 37, 2, (325-340), (2010).
  • References, Applied Bayesian Hierarchical Methods, 10.1201/9781584887218, (495-500), (2010).
  • Examining the Association between Deprivation Profiles and Air Pollution in Greater London using Bayesian Dirichlet Process Mixture Models, Proceedings of COMPSTAT'2010, 10.1007/978-3-7908-2604-3, (277-283), (2010).
  • A New Bayesian Nonparametric Mixture Model, Communications in Statistics - Simulation and Computation, 10.1080/03610910903580963, 39, 4, (669-682), (2010).
  • A Monte Carlo Markov chain algorithm for a class of mixture time series models, Statistics and Computing, 10.1007/s11222-009-9147-6, 21, 1, (69-81), (2009).
  • Slice sampling mixture models, Statistics and Computing, 10.1007/s11222-009-9150-y, 21, 1, (93-105), (2009).
  • Bayesian mixture modeling using a hybrid sampler with application to protein subfamily identification, Biostatistics, 10.1093/biostatistics/kxp033, 11, 1, (18-33), (2009).
  • Density Estimation for Protein Conformation Angles Using a Bivariate von Mises Distribution and Bayesian Nonparametrics, Journal of the American Statistical Association, 10.1198/jasa.2009.0024, 104, 486, (586-596), (2009).
  • Regression density estimation using smooth adaptive Gaussian mixtures, Journal of Econometrics, 10.1016/j.jeconom.2009.05.004, 153, 2, (155-173), (2009).
  • Bayesian nonparametric functional data analysis through density estimation, Biometrika, 10.1093/biomet/asn054, 96, 1, (149-162), (2009).
  • Statistical mixture modeling for cell subtype identification in flow cytometry, Cytometry Part A, 10.1002/cyto.a.20583, 73A, 8, (693-701), (2008).
  • Dirichlet Process, Simulation of, Encyclopedia of Statistics in Quality and Reliability, 10.1002/9780470061572, (2008).
  • Bayesian mixture of autoregressive models, Computational Statistics & Data Analysis, 10.1016/j.csda.2008.06.001, 53, 1, (38-60), (2008).
  • A Bayesian mixed logit–probit model for multinomial choice, Journal of Econometrics, 10.1016/j.jeconom.2008.09.029, 147, 2, (232-246), (2008).
  • Bayesian Inference for Linear Dynamic Models With Dirichlet Process Mixtures, IEEE Transactions on Signal Processing, 10.1109/TSP.2007.900167, 56, 1, (71-84), (2008).
  • Exact and approximate sum representations for the Dirichlet process, Canadian Journal of Statistics, 10.2307/3315951, 30, 2, (269-283), (2008).
  • Bayesian hierarchically weighted finite mixture models for samples of distributions, Biostatistics, 10.1093/biostatistics/kxn024, 10, 1, (155-171), (2008).
  • Nonparametric Regression Density Estimation Using Smoothly Varying Normal Mixtures, SSRN Electronic Journal, 10.2139/ssrn.1024701, (2007).
  • A general approach to heteroscedastic linear regression, Statistics and Computing, 10.1007/s11222-006-9013-8, 17, 2, (131-146), (2007).
  • MULTIVARIATE MIXTURE OF NORMALS WITH UNKNOWN NUMBER OF COMPONENTS: AN APPLICATION TO CLUSTER NEOLITHIC CERAMICS FROM AEGEAN AND ASIA MINOR USING PORTABLE XRF*, Archaeometry, 10.1111/j.1475-4754.2007.00336.x, 49, 4, (795-813), (2007).
  • Bayesian modelling strategies for spatially varying regression coefficients: A multivariate perspective for multiple outcomes, Computational Statistics & Data Analysis, 10.1016/j.csda.2006.01.004, 51, 5, (2586-2601), (2007).
  • Bayesian Model-Based Clustering Procedures, Journal of Computational and Graphical Statistics, 10.1198/106186007X238855, 16, 3, (526-558), (2007).
  • Flexible random‐effects models using Bayesian semi‐parametric models: applications to institutional comparisons, Statistics in Medicine, 10.1002/sim.2666, 26, 9, (2088-2112), (2006).
  • Multivariate mixtures of normals with unknown number of components, Statistics and Computing, 10.1007/s11222-006-5338-6, 16, 1, (57-68), (2006).
  • Dirichlet Processes, Encyclopedia of Actuarial Science, 10.1002/9780470012505, (2006).
  • Modeling individual differences using Dirichlet processes, Journal of Mathematical Psychology, 10.1016/j.jmp.2005.11.006, 50, 2, (101-122), (2006).
  • Investigating Heterogeneity in Pneumococcal Transmission, Journal of the American Statistical Association, 10.1198/016214506000000230, 101, 475, (946-958), (2006).
  • Bayesian Modelling and Inference on Mixtures of Distributions, Bayesian Thinking - Modeling and Computation, 10.1016/S0169-7161(05)25016-2, (459-507), (2005).
  • Size-biased sampling and discrete nonparametric Bayesian inference, Journal of Statistical Planning and Inference, 10.1016/j.jspi.2003.10.005, 128, 1, (123-148), (2005).
  • A Split-Merge Markov chain Monte Carlo Procedure for the Dirichlet Process Mixture Model, Journal of Computational and Graphical Statistics, 10.1198/1061860043001, 13, 1, (158-182), (2004).
  • See more

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.