Data management and curation of research data in academic scientific research environments



The Structural Bioinformatics Core Facility at the University of North Carolina at Chapel Hill (SBI Core) assists researchers university-wide in computational structural biology techniques and incorporating structural biology/bioinformatics into their grants and publications. The SBI Core works with a diverse population of researchers from numerous departments and provides support to an ever-changing body of research. The computational biology services provided by the SBI Core are data-intensive and use a diverse and distributed set of applications for processing, data storage, and data management.

As the amount of data and number of projects have increased, the SBI Core requires an effective strategy for managing data and facilitating data sharing between the SBI Core and the researchers it assists. The UNC-CH Health Sciences Library (HSL) has begun a collaborative project with the SBI Core to identify the crucial data management needs and to envision new roles for the library in e-science and data management. In partnership, the SBI Core and the HSL have identified major obstacles in data sharing, data management, and data access. Furthermore, the SBI Core and the HSL will develop solutions in which the library facilitates collaboration among campus resources and matches unmet needs to external resources. One of the library's goals in this proof-of-concept project with the SBI Core is to become a central campus resource for research support and data management.


Scientific research data is increasingly digital and extensive in size. Digital experimental data is characterized by variations in configuration parameters, systems used, and output formats that do not exist with print or textual data stored in laboratory notebooks. Effective and efficient management of these growing, complex, and computationally-generated data sets are a particular challenge for researchers in academic scientific research environments. One issue facing researchers is that grant funding typically does not require or fund a data management plan. A second issue is that much of the research data is primarily generated, organized and stored by graduate students and post doctoral fellows who regularly join and leave the research team. For this project, the director of the SBI Core is collaborating with information and library science students and professionals to analyze the data environment and workflows and to determine viable solutions for improving data management processes in that environment.

Body Text

SBI Core Facility at UNC: Computational Research and Data Generation

The Structural Bioinformatics Core Facility at the University of North Carolina – Chapel Hill (SBI Core) works with a variety of research laboratories to perform X-ray crystallographic structure determination, Nuclear Magnetic Resonance spectroscopy, molecular dynamics simulations, and bioinformatic analyses for UNC research projects. Data from these various analyses is generated and managed primarily by graduate students and post doctoral fellows working for the particular research lab. There are frequently no mandated formats, processes or policies for access, storage, and retention of this data.

Challenges Curating, Sharing, and Archiving Computational Biomedical Data in an Academic Setting

Issues that commonly arise in this environment when graduate students and post doctoral fellows leave the institution are that data the departing student/post doc was managing are often inaccessible, incomplete, or missing altogether, resulting in losses in valuable researcher time and research outcomes.

An additional factor is that although the SBI Core generates and stores the data, it is not directly affiliated with the specific research departments with which it works. The data is temporarily stored in this environment and needs to remain accessible to the principal investigator and lab team conducting the research. Additionally, there is no adequate and secure institutional research computing storage space available for shared use by the lab team or by the SBI Core.

Cooperative Efforts Between the SBI Core and the UNC Health Sciences Library to Model Research Data Management Practices

Evidence suggests that these challenges are common in scientific research environments [1, 3, 6, 7]. In this case, the SBI Core Facility has assistance from the UNC Libraries in addressing these issues. To date, staff from the UNC Health Sciences Library has met with the SBI Core director to define and describe the facility context, specific data workflows, and the issues of concern. This team is working with a committee of campus science librarians to identify partners among campus Information Technology Services, the School of Information and Library Science, and other key information management experts. This project team will review data management heuristics, guidelines, best practices, institutional and external resources and policies.

The team is led by members of the Health Sciences Library and the SBI Core as an initiative to identify the opportunities created by e-science [2, 4, 5, 6]. Based on its findings this group will recommend process, policy and software solutions to assist this Core Facility environment in standardizing and stabilizing data management practices. This prototype project with the SBI Core will function as proof-of-concept to extend services campus-wide and the team will use this model to work with other units on campus.