Multiple approaches to data-mining of proteomic data based on statistical and pattern classification methods



The data-mining challenge presented is composed of two fundamental problems. Problem one is the separation of forty-one subjects into two classifications based on the data produced by the mass spectrometry of protein samples from each subject. Problem two is to find the specific differences between protein expression data of two sets of subjects. In each problem, one group of subjects has a disease, while the other group is nondiseased. Each problem was approached with the intent to introduce a new and potentially useful tool to analyze protein expression from mass spectrometry data. A variety of methodologies, both conventional and nonconventional were used in the analysis of these problems. The results presented show both overlap and discrepancies. What is important is the breadth of the techniques and the future direction this analysis will create.