Review of proteomics with applications to genetic epidemiology



Mapping of the human genome has the potential to transform the traditional methods of genetic epidemiology. The complete draft sequence of the 3.3 billion nucleotides comprising the genome is now available over the Internet, including the location and nearly complete sequence of the 26,000 to 31,000 protein-encoding genes. However, aside from water, almost everything in the human body is either made of, or by, proteins. Although the DNA code provides the instructions for their amino acid sequence, there are an estimated 1.5 million proteins. Thus, the correlation between DNA sequence and protein is low, reflecting alternate splicing as well as post-translational modification. The purpose of this article is to explore ways in which the emerging field of proteomics, the study of proteins in a cell, may inform our approach to gene mapping. This article reviews the various technical approaches currently available for proteomics. Technologies are available to quantify protein expression (and compare normal versus disease states), identify proteins through comparison with sequence information in databases or direct sequencing (which can then be mapped to chromosomal locations to ensure appropriate markers), elucidate protein-protein interactions (which may underlie disease), determine localization of proteins within the cell (abnormal trafficking of proteins could have an inherited basis), and characterize modifications of proteins (which is relevant to modifier gene candidates). Several examples are presented to illustrate the potential application of proteomics to the field of genetic epidemiology, and we conclude with various considerations regarding design and analysis. Genet Epidemiol 24:83–98, 2003. © 2003 Wiley-Liss, Inc.