Use of an artificial neural network to detect association between a disease and multiple marker genotypes

Authors

  • D. CURTIS,

    Corresponding author
    1. Joint Academic Department of Psychological Medicine, St Bartholomew's and Royal London School of Medicine and Dentistry, 3rd Floor Alexandra Wing, Turner Street, London E1 1BB, UK
      D. Curtis. E-mail: dcurtis@:hgmp.mrc.ac.uk
    Search for more papers by this author
  • B. V. NORTH,

    1. Joint Academic Department of Psychological Medicine, St Bartholomew's and Royal London School of Medicine and Dentistry, 3rd Floor Alexandra Wing, Turner Street, London E1 1BB, UK
    Search for more papers by this author
  • P. C. SHAM

    1. Department of Psychological Medicine, Institute of Psychiatry, De Crespigny Park, London SE5 8AF, UK
    Search for more papers by this author

D. Curtis. E-mail: dcurtis@:hgmp.mrc.ac.uk

Abstract

Single nucleotide polymorphisms (SNPs) are very common throughout the genome and hence are potentially valuable for mapping disease susceptibility loci by detecting association between SNP markers and disease. However as SNPs are biallelic they may have relatively little power in association studies compared with the information that would be obtainable if marker haplotypes were available and could be used efficiently. Modelling the evolutionary events leading to linkage disequilibrium is very complex and many methods that seek to use information from multiple markers simultaneously need to make simplifying assumptions and may only be applicable when marker haplotypes, rather than genotypes, are available for analysis. We explore the properties of a simple application of a standard artificial neural network to this problem. The pattern-recognition properties of the network are used in the hope that marker haplotypes implicit in the genotypes will differ between cases and controls in a way which will lead to the network being able to classify the subjects correctly, according to their marker genotype. This method makes no assumptions at all regarding population history or the marker map, and can be applied to genotypes, as would be available from a simple case-control sample, without any need to determine haplotypes. Through application to data simulated under a very wide range of assumptions we show that such an analysis produces a useful augmentation in power above that which would be achieved by testing each marker individually, in particular when more than one mutation has occurred in a disease gene at different points in evolution. The application of neural networks to such problems shows considerable promise and further work could usefully be directed towards optimising the design and implementation of such networks.

Ancillary