ESTIMATING BAYESIAN NETWORKS FOR HIGH-DIMENSIONAL DATA WITH COMPLEX MEAN STRUCTURE AND RANDOM EFFECTS

Authors


  • Acknowledgments. The authors thank an editor and two anonymous reviewers for constructive comments and suggestions. They also thank Terry Speed for helpful advice on an earlier version of the paper.

Author to whom correspondence should be addressed.

Summary

The estimation of Bayesian networks given high-dimensional data, in particular gene expression data, has been the focus of much recent research. Whilst there are several methods available for the estimation of such networks, these typically assume that the data consist of independent and identically distributed samples. It is often the case, however, that the available data have a more complex mean structure, plus additional components of variance, which must then be accounted for in the estimation of a Bayesian network. In this paper, score metrics that take account of such complexities are proposed for use in conjunction with score-based methods for the estimation of Bayesian networks. We propose first, a fully Bayesian score metric, and second, a metric inspired by the notion of restricted maximum likelihood. We demonstrate the performance of these new metrics for the estimation of Bayesian networks using simulated data with known complex mean structures. We then present the analysis of expression levels of grape-berry genes adjusting for exogenous variables believed to affect the expression levels of the genes. Demonstrable biological effects can be inferred from the estimated conditional independence relationships and correlations amongst the grape-berry genes.

Ancillary