Get access

Regression models, scan statistics and reappearance probabilities to detect regions of association between gene expression and copy number

Authors

  • Jennifer L. Asimit,

    1. Samuel Lunenfeld Research Institute of Mount Sinai Hospital, University of Toronto, Toronto, ON, Canada
    Current affiliation:
    1. Wellcome Trust Sanger Institute, Hinxton, Cambridge, U.K.
    Search for more papers by this author
  • Irene L. Andrulis,

    1. Samuel Lunenfeld Research Institute of Mount Sinai Hospital, University of Toronto, Toronto, ON, Canada
    Search for more papers by this author
  • Shelley B. Bull

    Corresponding author
    1. Samuel Lunenfeld Research Institute of Mount Sinai Hospital and Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
    • Samuel Lunenfeld Research Institute of Mount Sinai Hospital, University of Toronto 60 Murray Street, Box #18, Prosserman Centre for Health Research, Toronto, ON, Canada M5T 3L9.
    Search for more papers by this author

Abstract

Early studies of breast cancer microarray data used linear models to quantify the relationship between measures of gene expression (GE) and copy number (CN) obtained from tumour samples. Motivated by a study of women with axillary node-negative breast cancer, we propose a regression-based scan statistic to identify within-chromosome clusters of genetic probes that exhibit association between GE and CN, while accounting for tumour characteristics known to be prognostic for clinical outcome. As a measure of the association between GE and CN, for each genetic probe available from a microarray we regress GE on CN, and include subject-specific covariates. In the development of the scan statistic, the within-chromosome spatial distribution of the subset of probes with a statistically significant association is approximated by a Poisson process. By incorporating the distance between the probe positions, the scan statistic accounts for the spatial nature of CN alterations. Regions identified as clusters of significant associations are hypothesized to harbour genes involved in breast cancer progression. Using simulations, we examine the sensitivity of the method to certain factors, and to address issues of repeatability, we consider reappearance probabilities for each probe within detected regions and assess the utility of a quantity estimated by bootstrap sample frequencies. Applications of the proposed method to joint analysis of GE and CN in breast tumours, with and without an informative covariate, and comparisons with alternative methods suggest that inclusion of covariates and the use of a regional test statistic can serve to refine regions for further investigation including the analysis of their association with outcome. Copyright © 2011 John Wiley & Sons, Ltd.

Get access to the full text of this article

Ancillary