Block principal component analysis with application to gene microarray data classification

Authors

  • Aiyi Liu,

    Corresponding author
    1. Biostatistics Unit, Lombardi Cancer Center, Georgetown University Medical Center, 3800 Reservoir Road, NW, Washington, DC 20007, U.S.A.
    • Biostatistics Unit, Lombardi Cancer Center S112, Georgetown University Medical Center, 3800 Reservoir Road, Washington, DC 20007, U.S.A.
    Search for more papers by this author
  • Ying Zhang,

    1. Biostatistics Unit, Lombardi Cancer Center, Georgetown University Medical Center, 3800 Reservoir Road, NW, Washington, DC 20007, U.S.A.
    Search for more papers by this author
  • Edmund Gehan,

    1. Biostatistics Unit, Lombardi Cancer Center, Georgetown University Medical Center, 3800 Reservoir Road, NW, Washington, DC 20007, U.S.A.
    Search for more papers by this author
  • Robert Clarke

    1. Department of Oncology, Lombardi Cancer Center, Georgetown University Medical Center, Washington, DC 20007, U.S.A.
    Search for more papers by this author

Abstract

We propose a block principal component analysis method for extracting information from a database with a large number of variables and a relatively small number of subjects, such as a microarray gene expression database. This new procedure has the advantage of computational simplicity, and theory and numerical results demonstrate it to be as efficient as the ordinary principal component analysis when used for dimension reduction, variable selection and data visualization and classification. The method is illustrated with the well-known National Cancer Institute database of 60 human cancer cell lines data (NCI60) of gene microarray expressions, in the context of classification of cancer cell lines. Copyright © 2002 John Wiley & Sons, Ltd.

Ancillary