Using bootstrapping to identify protein locations

Authors

  • Catherine Blake,

    1. Center for Informatics Research in Science and Scholarship (CIRSS), Graduate School of Library & Information Science, University of Illinois at Urbana Champaign, 501 E Daniel St, Champaign, Illinois 61820-6211
    Search for more papers by this author
  • Wu Zheng

    1. Center for Informatics Research in Science and Scholarship (CIRSS), Graduate School of Library & Information Science, University of Illinois at Urbana Champaign, 501 E Daniel St, Champaign, Illinois 61820-6211
    Search for more papers by this author

Abstract

Automated methods that leverage both large quantities of text, and knowledge resources are increasingly being used to identify relations from the web. One of the challenges with these approaches is that quantity of relations identified makes it difficult to evaluate system performance. Our goal in this paper is to demonstrate how bootstrapping can be used to identify subcellular location relations of proteins, which is a critically important to biologists in order to understand protein function. Specifically, we use protein-location pairs in the UniProt knowledge base and dependency paths from a collection of text, to infer new proteins, new locations and new protein-location pairs. Our second goal is to conduct a detailed manual analysis of the first iteration of the bootstrapping process. Such an analysis reveals pitfalls of this approach and enables us to conclude with specific recommendations to improve bootstrapping performance in subsequent iterations.

Ancillary