DataONE: Protecting the future of environmental and ecological data

Authors


Abstract

This interactive panel addresses the theme of the conference “Thriving on Diversity- Information Opportunities in a Pluralistic World” by exploring the role of librarians and information scientists in DataONE (Observation Network for Earth), a cyberinfrastructure project that supports the full data lifecycle for scientists in the diverse domains that are embodied in environmental and ecological science. Collaborations like this between scientists and information professionals ((see Table 1 for full list of DataONE team members), are increasingly important, which is reflected by the “Science and Metadata Community” that is forming within the Dublin Core Metadata Initiative.

The research activities of environmental and ecological scientists produce diverse multi-scale, multi-discipline, and multi-national observational data. These data provide insights to address new environmental, social and technological challenges caused by climate variability, altered land use, population shifts, and changes in resource availability (e.g., food, water, and oil). Therefore scientists, educators, librarians, land managers, and the public need open, persistent, and secure access to well-described and easily discovered Earth observational data. These data are critical because they form the basis for good scientific decisions, wise management and use of resources, and informed decision-making.

Securing our global scientific knowledge base by preserving these documents and datasets requires active management of the resources and the supporting technology, as well as, an awareness of current research. DataONE is one of two National Science Foundation DataNet Partners. The DataNet Partners serve as exemplars for national and global data management and research infrastructure organizations, and also provide unique opportunities to communities of researchers to advance science and engineering research and learning. Eventually there will be five DataNet Partners.

DataONE focuses on multi-disciplinary observational data collected by biological (genome to ecosystem) and environmental (atmospheric, ecological, hydrological, and oceanographic) scientists, national and international research networks, and environmental observatories. However, the DataONE structure is designed to be domain-agnostic, so that it can be extended to serve a broader range of science domains both directly and through interoperability with other DataNet Partners.

This panel of DataONE investigators focuses on four areas related to the challenges of the preservation of digital scientific data in any setting, not just for DataONE. The panel will be facilitated to encourage the audience to provide feedback, exchange ideas, and ask questions related to the role of information science, libraries and librarians in the process of creation, discovery, access, and manipulation of electronic scientific data.

About DataONE

A Case Study: DataONE's potential is best illustrated by considering a case study that helps illustrate the relationship between scientists and librarians/information professionals (see figure below).

STEP 1: The scientist needs efficient and effective tools to manage and upload her work to a database that can be accessed through the DataONE network. Usability design and testing will create tools that can be used easily by practicing scientists while only minimally impacting their normal workflow. These tools will also help tag data so that it can be more easily discovered. Additionally the tools provide the foundation for better preservation practices. The scientist takes the time to learn these tools because she has learned through various media, the importance of best practices for electronic ecological data.

STEP 2: Proper labeling of the data through standardized metadata and interoperability of the system allow other ecologists to discover and access her records.

STEP 3: Metadata and interoperability also provide the foundation for these data to become available to researchers outside the original domain.

STEP 4: Communication with citizen scientists is enhanced with best communication practices.

original image

DataONE is enhanced by expertise in communication and information. From a socio-cultural perspective, research areas include studying the creation, use and communication of information between scientists and with the non-scientific community; the management and sustainability of virtual data organizations; the creation of data preservation best practices in the scientific community; the role of libraries in the scientific process in regards to datasets; and the influence of social networks within and between scientific domains. From a usability and assessment perspective, research areas include the usability and effectiveness of DataONE services and tools, and assessment of the effectiveness of DataONE across the many sectors interested in the data sets. From a cyberinfrastructure perspective, DataONE will focus on capabilities such as architecture of portals; distributed approaches to preservation and access; replication; secure, controlled access; authentication methods; tools deployed, and supported; and data discovery and interoperability methods.

Areas the panel will discuss

Preservation challenges and opportunities. DataONE faces myriad preservation challenges, but there are also exciting opportunities presented by leveraging the expertise of academic and digital librarians. This discussion will be led by Patricia Cruse, who is the founding director of the California Digital Library's Digital Preservation Program. She works collaboratively with the ten University of California libraries to develop strategies for the preservation of content that is important to the research, teaching, and learning mission of the University.

Infrastructure primer: The architectural underpinnings of DataONE aim to provide agility and sustainability. Libraries and librarians play pivotal roles in building and maintaining this infrastructure. This discussion will be led by Robert Sandusky, Assistant University Librarian for Information Technology and Clinical Associate Professor at the University of Illinois at Chicago's Richard J. Daley Library. He has experience designing, building, and operating highly-secure and reliable national-scale data communications and networks.

Creating a preservation culture: Digital preservation can be hindered by socio-cultural barriers such as: resistance to change; unwillingness to learn new technologies; lack of incentives for adopting new processes; and lack of supporting policies. Librarians and libraries can be essential in helping to establish a preservation culture at their organization, and among scientists. This discussion will be led by Suzie Allard, Assistant Professor at the University of Tennessee School of Information Sciences. Her work focuses on how scientists and engineers use and communicate information in both informal and formal channels, and how these communication processes influence the data cycle from creation to preservation. Allard will also serve as the session facilitator.

Helping scientists use DataNet ONE – Usability and Assessment: Project success relies on tools and processes that are easy for scientists to use. Considering usability during the design phase, and conducting usability tests throughout development can help assure success. Additionally, ongoing assessment is essential to determining if DataONE is successfully meeting its goals. This discussion will be led by Carol Tenopir, Professor at the University of Tennessee School of Information Sciences. She has studied patterns of scientific communication and scholarly publishing, in particular the use and design of digital publications for researchers and the role of the library with digital resources.

Table 1. DataONE (Observation Network for Earth) Team and their affiliations

PI: William Michener, U. New Mexico (UNM)

Co-PIs: Robert Cook, Oak Ridge National Laboratory (ORNL); Mike Frame, U.S. Geological Survey (USGS) National Biological Information Infrastructure (NBII); Stephanie Hampton, National Center for Ecological Analysis and Synthesis (NCEAS), UC-Santa Barbara; Kathleen Smith, National Evolutionary Synthesis Center (NESCent), Duke U.

Co-Is (Core Cyberinfrastructure Team): Paul Allen, Cornell; Jeffery Horsburgh, Utah State U.; Matthew Jones, NCEAS; Robert Sandusky, U. Illinois-Chicago; Ryan Scherle, NESCent; Mark Servilla, UNM; Dave Vieglais, U. Kansas; Bruce Wilson, ORNL

Co-Is (Working Group and Education/Outreach Leaders and International Participants): Suzie Allard, U. Tennessee; Peter Buneman, U. Edinburgh; Randy Butler, UIUC-NCSA; John Cobb, ORNL; Patricia Cruse, California Digital Library (CDL); Ewa Deelman, USC-ISI; David DeRoure, U. Southampton; Cliff Duke, Ecological Society of America; Carole Goble, U. Manchester; Donald Hobern CSIRO, Australia; Peter Honeyman, U. Michigan; Vivian Hutchison, NBII; Steve Kelling, Cornell U.; Jeremy Kranowitz, The Keystone Center; John Kunze, CDL; Bertram Ludaescher, UC Davis; Maribeth Manoff, U. Tennessee; Ricardo Pereira, Brazil; Line Pouchard, ORNL; Carol Tenopir, U. Tennessee; Jake Weltzin, USGS; Von Welch, UIUC-NCSA

Acknowledgements

This session is sponsored by SIG-DL, SIG-STI, and SIG-KM

Ancillary