Standards and best practices in scientific data management: Promoting interoperability and re-use (parts 1 & 2)



The practice of science has changed in the last three decades due to the rapid development of information and communication technologies and massive increases in computing capacity. The networked environment enables scientists around the world to access the real time data and information directly from his or her desktop. As the International Council for Science states, “Secondary analyses of data, and the combining of data from multiple sources, are opening up exciting new scientific horizons. Scientific publication practices are changing rapidly” (ICSU, 2005, 16–17). These changes have inspired the introduction of the term, e-Science, to refer to the “dynamic, multi-sector, and highly heterogeneous” scientific practices that new technologies have made possible (ARL, 2007). E-Science is science that will be increasingly carried out through distributed global collaborations enabled by the Internet. A typical feature of such collaborative scientific enterprises is that they require access to very large data collections, very large scale computing resources and high performance visualization back to the individual user scientists (National e-Science Center, 2004, p. 4). At the same time, there is a significant increase in small scale science, allowing for disparate yet related research to be unified via eScience developments.

This session will reflect the impact that e-Science has had on scientific data management. Computers and computerized instruments, in the laboratory, in the field, across the world, and above the earth, produce and analyze large volumes of data, manipulate variables that can't easily be controlled in nature, run models and simulations that would otherwise be harmful to humans or to the environment, replicate situations that occur at a single point in time, and produce results and analysis faster and cheaper than science conducted at the bench or in the field. E-Science not only uses large datasets, it creates them. The link between online journals and digital data, relatively inexpensive storage, the ability of ever improving network infrastructures to transport this data, increased computing power, even at the desktop, and the implementation of Service Oriented Architectures makes the data more accessible and promotes its reuse. Governments, foundations and academic institutions are increasingly requiring the deposit of data. Increasingly, validating research means finding the data, and ensuring its chain of custody.

This session will give insight into the role and activities of libraries in this new science environment. Librarians, information scientists and information management professionals in the sciences have been organizing, managing and describing traditional scientific products, such as journal articles and technical reports, for years, using a body of standards and best practices developed over time. However, scientists, practitioners, engineers, educators and students are now looking for and using data from a variety of sources, and information professionals supporting science are called upon to manage data to ensure accessibility, educated selection of the data, appropriate combination across disciplines, informed re-use; and preservation and archiving.

This double session, organized by the ASIST Standards Committee and SIG-STI, will explore the emerging standards and best practices related to scientific data management. The findings and recommendations of a multi-agency, multi-disciplinary U.S. federal government effort to promote reliable, accessible digital data will provide an introduction to the requirements and issues surrounding the management of scientific data. The importance of basing standards and best practices on the needs of a community of practice will be discussed. Dryad, a data repository for research data underlying published research in the field of evolutionary biology and related disciplines, will serve as an example of how specific standards and practices can be employed in an information architecture to enable data discovery, access, preservation and reuse as well as interoperability with other data repositories. The unique characteristics of data and efforts to make data more findable through federation and data modeling will be described, and the application of existing library standards will be assessed. International practices and standards, such as the W3C's Simple Knowledge Organization System (SKOS), the Geospatial Metadata Standard (ISO 19115) and Data Registry Standard (ISO/IEC 11179), will be discussed in the context of exchanging international environmental data. Finally, the impact of these activities on the emerging field of data librarianship/curation will be described.

The speakers' presentations will be followed by a panel discussion, which will encourage audience participation. The objective will be to identify similar approaches across the science communities represented, and to propose next steps that the ASIST Standards Committee and others might undertake.