Data Practices and Curation Vocabulary (DPCVocab): An empirically derived framework of scientific data practices and curatorial processes



Conceptual frameworks and taxonomies are an important part of the emerging base of knowledge on the curation of research data. We present the Data Practices and Curation Vocabulary (DPCVocab), a functional vocabulary created for specifying relationships among data practices in research, types of data produced and used, and curation roles and activities. The vocabulary consists of 3 categories—Research Data Practices, Data, and Curation—with 187 terms validated through empirical studies of scientific data practices in the Earth and life sciences. The present article covers the DPCVocab development process and examines applications for mapping relationships across the 3 categories, identifying factors for projecting curation costs and important differences in curation requirements across disciplines. As a tool for curators, the vocabulary provides a framework for charting curation options and guiding systematic administration of curation services. It can serve as a shared terminology or lingua franca to support interactions and collaboration among curators, data producers, system developers, and other stakeholders in data infrastructure and services. The DPCVocab as a whole supports both the technical and the human aspects of professional curation work essential to the modern research system.