Developmental Dynamics: What was the motivation to launch your database?
Monte Westerfield: The first international meeting of the entire zebrafish research community was held at Cold Spring Harbor in 1994. In addition to discussion of science, the community identified two outstanding infrastructure needs: to establish a forum for information exchange and to establish a stock center. At that time, the World Wide Web was relatively new but held promise for online databases, so we started ZFIN as a Web-based resource that serves as the official zebrafish model organism database. Subsequently, we also established ZIRC that serves as the zebrafish stock center.
Akhilesh Pandey: I feel that the promise of Systems Biology can only be realized if we have good building blocks, i.e., basic information about individual pieces before we start attempting a systems approach. If the building blocks do not exist in sufficient numbers or are defective, I do not think that we could move beyond proof-of-principle systems biology studies. In 2002, we were struck by the lack of crucial information about the building blocks of life, i.e., proteins, in publicly available databases and decided to do something to correct the situation. To accomplish this, however, I first founded a nonprofit research institute called the Institute of Bioinformatics in Bangalore, India, where most of the curation and database work for the HPRD is carried out.
Where does funding come from?
M.W.: ZFIN is funded primarily by the National Human Genome Research Institute of the National Institutes of Health with contributions from some of the other Institutes and Centers.
A.P.: The entire HPRD effort came from my conviction that such a resource was sorely needed. I did not have any funding from any agency to accomplish such a goal. Therefore, the Institute of Bioinformatics and consequently the HPRD were funded for the first 2 1/2 years from several personal credit card loans that I took out. Since then, it has been funded sporadically by the NCBI (there are link outs to HPRD from Entrez-Gene and RefSeq pages, as well as information on protein–protein interactions and posttranslational modifications that is provided on these pages that is derived from HPRD).
What is the purpose of the database?
M.W.: The long-term goals for ZFIN are (a) to be the community database resource for the laboratory use of zebrafish; (b) to develop and support integrated zebrafish genetic, genomic, developmental, and physiological information; (c) to maintain the definitive reference data sets of zebrafish research information; (d) to link this information extensively to corresponding data in other model organism and human databases; (e) to facilitate the use of zebrafish as a model for human biology; and (f) to help serve broadly the needs of the biomedical research community.
A.P.: The purpose of HPRD was to provide carefully curated and accurate information from the published literature. The features of proteins that we focused on were those features that were essentially absent from other databases: protein–protein interactions, posttranslational modifications, subcellular localization, and tissue expression. Another goal of this project was to provide this information to the users in a simple manner.
What is the most used feature?
M.W.: Gene expression patterns are the most popular data, with mutant phenotypes running a close second. ZFIN has tens of thousands of images that illustrate gene expression and mutant phenotypes. In addition to using gene names, researchers can search these data using key words from ontologies that describe biological processes, cellular components, molecular functions, and zebrafish anatomy, including organs and cells.
A.P.: Protein–protein interactions are the most used features of HPRD.
How is HPRD different from other protein databases?
A.P.: Currently, the HPRD database is the largest database in terms of its coverage of protein–protein interactions, posttranslational modifications, subcellular localization, and tissue expression of human proteins. All of the features of this database can be queried in a simple manner. It also has a very nice graphic user interface that is lacking in most other databases.
What is the most important thing that someone who is not a zebrafish expert needs to know when using ZFIN?
M.W.: Non-zebrafishologists should know that ZFIN has a diverse, competent and highly professional staff of scientific curators and software developers who are available to help. All the curators were originally trained as researchers and can handle scientific questions. Also, because ZFIN needs to be responsive to the needs of the scientific community, nonzebrafish researchers should feel encouraged to contact ZFIN if they have questions or suggestions for changes or improvements. There is a button on every ZFIN page to provide input.
What is the most often requested item or service that your Web site does not provide?
M.W.: Users very much want to see ZFIN data integrated with the zebrafish genome sequence. ZFIN plans to implement this feature during the coming year, using a genome browser that will place ZFIN data onto the genome.
A.P.: We do not provide any data that are automatically extracted from abstracts of published papers. This is based on our philosophy that we would like to have experimentally proven data in HPRD.
What are the most difficult aspects of running the database and keeping it up to date?
M.W.: The information generated by scientific research is a moving target. New techniques are always being developed. Because it takes considerable planning and work to extend the database, the ZFIN staff has to work very hard to anticipate the future and implement new features in a timely manner while keeping up with curation.
A.P.: The data in the published literature are growing at a tremendous pace; this finding makes it increasingly difficult to keep it current and yet error-free. The bottleneck is essentially that we do not have enough trained biologists to carry out the curation on the scale that is necessary. This is primarily due to lack of a funding mechanism for HPRD.
What new features do you anticipate adding in the next year or two?
M.W.: In addition to a genome browser, ZFIN plans to add support for antibodies and to start curating antibody labeling patterns. ZFIN will also provide expanded support for mutant phenotypes, including much more powerful methods to search the data.
A.P.: We would like to have greater community participation. In this regard, we have already initiated a community effort called Human Proteinpedia (http://www.humanproteinpedia.org), which is a portal like Wikipedia that allows scientists to annotate their own data (for which they have experimental results) on top of HPRD data. We have over 70 proteomics labs already participating in this and hope to expand it such that, one day, every scientist who generates proteomic data will participate. We are also developing a pathway resource that is linked to HPRD called NetPath (http://www.netpath.org). We have been thinking of adding transcript level information as well as more detailed information related to human diseases such as cancers and levels of proteins in health and disease.
What have been the most significant advances over the past few years?
M.W.: One of the most significant recent advances was our successful negotiation with most of the scientific journals for permission to include figures from zebrafish research articles in ZFIN. The majority of these figures illustrate gene expression patterns and mutant phenotypes. Soon, ZFIN will also display journal figures with antibody labeling patterns. These data records link directly to the original journal article at the publisher's site, thus providing an easy and powerful means for researchers to find what they need.
A.P.: The amount of data in HPRD has been steadily increasing and the community is making greater use of the data (we have 70,000 downloads per month).
Do you anticipate that the database will look much different 5 or 10 years from now?
M.W.: ZFIN will unquestionably look much different in 5 to 10 years. The research enterprise constantly changes as new (and old) questions are studied with new methods. ZFIN will similarly change as it extends to support the new types of data generated by these studies. ZFIN will also need to change in response to constantly changing Web technology. The Web is much different today than it was in 1994, and we can expect similarly dramatic changes in the next 5 to 10 years.
A.P.: Yes, I think it will turn into a systems biology platform and become the reference for information about human proteins, much like an encyclopedia. It will serve as a model for how the community can get involved to continuously update and curate biological datasets in general.
What career opportunities are there for biologists in bioinformatics?
M.W.: One of the most interesting and exciting careers for biologists in bioinformatics is to become a scientific curator. In addition to working with a broad range of different kinds of data, curators often communicate with researchers to help resolve contradictions and ambiguities. Curators also help drive development of new software by providing an analysis of the needs of the scientific community and by participating in interface development. Recently, curators at the model organism databases formed an organization of biocurators that holds an annual meeting to discuss new developments in the research community. There is currently a shortage of scientific curators; more developmental biologists, in particular, are desperately needed.
A.P.: I think that biologists make the best bioinformaticians. We are already at a point where we are producing far more data that we can effectively analyze in a reasonable time frame. Some parts of the data are simply discarded or wrongly analyzed because of lack of competent bioinformaticians. With training in specific disciplines of biology such as genomics or proteomics along with additional expertise in computer programming and statistics, I think such biologists would be instantly recruited by most labs that produce lots of data. In addition, such biologists can also pore over the vast amounts of data in the public domain for meta-analysis or data mining purposes.