Organization as a means for informing repository design: The convergence of knowledge organization and personal information management in scientific data



Based on an international need for greater understanding of knowledge organization systems in science, this study is designed to show the differences and similarities found when comparing formalized knowledge organization systems in libraries with scientific personal organization schemes. This poster reports findings from a study that compares the knowledge organization practices of both scientists and information professionals when organizing scientific data.


Big data and the need to develop a sustainable infrastructure to assist in the sharing and long term use of scientific data are becoming forefront concerns to many grant giving organizations including the National Science Foundation (2010) and the United States Government (2009). Scientific data management and curation are quickly becoming pressing topics for scientists depositing data, repository designers, librarians, and information professionals (Heidorn, 2008). To address these growing needs, a range of repositories has emerged. Examples of these repositories include Dryad [1], a repository for data supporting publication in the area of evolutionary biology, and eCrystals [2], a repository for chrystollography data.

These data repositories present new challenges for information professionals in terms of knowledge organization. Instead of using traditional knowledge organization systems that rely heavily on standardized, unified, ‘whole-world’ approaches, the data deposited into these repositories use personal information management approaches for organizing data (White, forthcoming). Are formalized repositories prepared to accept a variety of personal organization methods that differ from traditional knowledge organization practices?

original image


Previous exploratory research in this area has shed some light on how scientists organize their own data sets. This research, conducted by White in 2008, indicates that 6 out of 7 scientists use metadata to organize their own data sets. Also, findings show that scientists are more likely to organize data according to research question, with over half of the participants using this organizing unit when preparing data for analysis (White, forthcoming).

These previous findings support many of the claims found in the personal information management literature that point to personal information as being different from knowledge organization (Barreau, 1995; Jones, 2007). This is especially true in organization. Personal organization schemes rely less on the overarching foci of form and subject than traditional knowledge organization schemes (Barreau and Nardi, 1995).


In order to understand how the underlying knowledge base of scientific repositories can better accommodate the differences between personal organization practices and more formalized systems, a study is being conducted to pinpoint the differences and similarities that can be found between the way scientists and information professionals organize scientific data. This study looks, not only at data arrangement, but also at the metadata and surrogacy creation of each group.

The goals of this study are to:

  • Characterize similarities and differences between how information professionals organize data and how scientists organize data

  • Establish how metadata is used, via surrogacy creation and tagging, in both communities

  • Determine what information professionals can learn about knowledge organization by observing how scientists organize their own data

This research study uses a mixed methods approach for collecting and analyzing data using a quasi-experimental research design, two questionnaires, and a content analysis for comparative qualitative data analysis.

Quasi-experimental design: Research subjects are divided into two naturally occurring groups: scientist who create data and information professionals who curate scientific data. These two groups are given the same data set and given relevant task scenarios in order to observe and analyze the ways in which their organizing practices may differ.

Questionnaires: Participants in both groups are given two questionnaires. The first questionnaire is completed before the experiment begins and collects basic descriptive information about each participant's experience working with research data. The second questionnaire is taken post-experiment and asks each participant to discuss rationale for organizing practices performed during the experiment.

Content Analysis: The data that is collected from the experiment and the questionnaires will be analyzed using a qualitative content analysis approach in order to categorize similarities both between and among the two participant groups.


This poster discusses preliminary results from the pilot study discussed above that examines the convergences and divergences of personal organization and knowledge organization. This poster will explain the mixed method research approach used to collect and analyze data from the scientific data community and information professionals. Preliminary results from a content analysis will also be reported in this poster.


Findings from this pilot study, as reported in this poster, will be used to create a more in-depth study that will examine how personal research data is transformed when it is organized in larger information systems, like Dryad. In this future study, personal organization schemas for scientific data will be examined more closely in comparison to actual data deposition practices in scientific repository systems.


Many thanks to the members of the Dryad repository team and the Metadata Research Center for support and advice during the research process.