Application of model selection technique in chemogenomic data analysis



With the advent of high-throughput chemogenomic data, it becomes crucially important to extract the important genes which influence the drug activities among the huge number of candidate genes. By employing model selection technique, especially designed for high-dimensional data, we propose to develop a systematic approach to construct the network elucidating the dependency relationships among the drugs and the genes. Based on the extended Bayesian Information Criterion, we are able to select the best parsimonious network structure. A real National Cancer Institute (NCI)-60 panel data set is analyzed to demonstrate the utility of the method. The biological implications of the results are discussed. Copyright © 2009 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 2: 186–191, 2009