Time course analysis of microarray data for the pathway of reproductive development in female rainbow trout



New statistical procedures are introduced to investigate gene activity in support of the hypothalamus–pituitary–gonad–liver signaling network that provides the neuroendocrine regulation for reproduction in female, oviparous fishes. The methods include Shrunken Centroid Ordering by Orthogonal Projections (SCOOP) and a robust encoding of B-splines via Friedman's Generalized Elastic Net (GEN). SCOOP orders genes according to a novel criterion that balances discriminatory and correlative information. It is particularly useful in the present context where genes are only partially annotated, relevant networks are not known a priori, and the sample size is adequate for finding natural clusters. In this application, microarray measurements of gene expression were made in the pituitary, liver, and ovary of female rainbow trout (Oncorhynchus mykiss) over the course of their 1 year spawning cycle, and new methods were developed to detect systematic changes in potential gene networks. B-splines were fitted to gently smooth the estimates of expression versus time and provide a common framework for analysis. Unlike other methods, SCOOP selected not only the genes whose curves vary the most over time, but also genes closely correlated with these. This tended to recognize genes that may be part of an active network, but whose expressions undergo more modest fluctuations. To compensate for the high degree of uncertainty in fitting B-splines to individual genes, GEN methods were used to provide robust fits, and these were summarized through a novel GEN-transform as variable importance measures (VIMs). Clustering of genes via VIMs produced much more stable results than directly via B-spline coefficients, and the mean time course pattern of each cluster provided biologists with a reliable summary from which to interpret systematic patterns. Ultimately, the genes selected by SCOOP and clustered though the GEN-transform strongly suggested supportive pathways involving immunology, muscle contraction, reproduction, protein transport, metabolism, and reduction/oxidation. Copyright © 2009 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 2: 192–208, 2009