Four Things You Can Do To Extend the Impact, Reach, and Longevity of Your Research Data

Authors

  • Leslie Hsu

    Search for more papers by this author
    • Leslie Hsu (ORCiD: 0000-0002-5353-807X; Researcher ID: D-5881-2012) is a postdoctoral research scientist in the Integrated Earth Data Applications (IEDA) research group at Lamont-Doherty Earth Observatory, Columbia University (www.iedadata.org). Her current work includes data and project management of IEDA EarthChem systems such as the EarthChem Library, SESAR System for Earth Sample Registration, and EarthChem Portal. She is also one of the co-PIs of the Sediment Experimentalists Network (SEN), a disciplinary network for communication about experimental Earth-surface process research. (e-mail: lhsu@ldeo.columbia.edu)


This section highlights new and emerging areas of technology and methodology. Topics may range from hardware and software, to statistical analyses and technologies that could be used in ecological research. Articles should be no longer than a few thousand words, and should be sent to the editor, Collin Bode (e-mail: collin@berkely.edu).

As a scientist, you may spend years toiling over data collection and analysis for a single project. But in 20 or 50 years, will future researchers be able to reuse your data for new scientific studies? Several technological advances are enabling improvements in the way that investigators manage and preserve research data, but in comparison, cultural adoption of these tools by the scientific research community is slow. How is the collection, analysis, and storage of your data different from when you started your research career? Given the time and financial investment in data production, a relatively small addition for data curation could greatly increase the impact of your research data. This article suggests four tech-related activities that are evolving to help your data and science achieve maximum impact, reach, and longevity.

Like many others in graduate school, most of my time was spent figuring out how to make my experiments work. My research was in the Earth Sciences, collecting data on debris flows in a geomorphology lab. Left with a virtual pile of data after my dissertation was completed (a small fraction thereof included in the thesis), I seized the opportunity to work in the field of geoinformatics and explore more efficient ways of data collection, analysis, and sharing. I found a community of scientists, computer scientists, and software developers working on a multitude of exciting projects, but I was most surprised at how little I had previously heard about these efforts.

But the world of informatics is becoming much more visible. You may have noticed terms like “cyberinfrastructure,” “data management,” and “altmetrics” more often when reading proposal guidelines or journal articles. Given the exponentially increasing capability of creating and storing data in the past decade, tools and services are emerging to help scientists analyze and manage their data. Funding agencies are taking steps to protect their investment in research data by specifying more guidelines about preserving data, such as the U.S. Office of Management and Budget Memo—Open Data Policy—Managing Information as an Asset (M-13-13) [http://www.whitehouse.gov/sites/default/files/omb/memoranda/2013/m-13-13.pdf]. In this rapidly changing landscape, it's hard to keep up with new developments. The next sections introduce four ways to get involved and start increasing the power of your data.

1. Participate in community efforts like EarthCube

In the Earth Sciences community, the NSF-led EarthCube initiative is working to bring data and knowledge management to the forefront as an important topic of discussion. In its web site's own words, “EarthCube is a bold new NSF activity to create a data and knowledge management system for the 21st Century.” While EarthCube is an Earth sciences initiative, its community crosses into the biosciences. What does this mean for Earth science investigators, and more specifically, for you? In order for software developers and computer scientists to create tools for you to better find, access, and analyze, manage, archive, and curate your data, communication with the user is a necessity. Thus, you should actively participate in forums where you can voice opinions on your needs and requirements as a researcher.

To date, EarthCube has gathered both the cyberinfrastructure community and the disciplinary scientists at over a dozen events to collect needs for the future of data collection, access, and analysis. Topics of the discussions include the science drivers, current capabilities, future requirements, and the vision of data and knowledge management in the future. EarthCube does not aim to replace current disciplinary solutions (databases and software solutions that already have community support), but to improve and connect them. Many disciplinary resources have been developed and fine-tuned for over a decade, and EarthCube cannot replace these with better alternatives, but instead can provide long-term stability and interoperability with other services. Summaries and schedules of past and future EarthCube meetings are available on its web site.

EarthCube is still in the beginning stages. Usable outputs are not yet available, but initial projects have been funded, including “Research Coordination Networks,” which aim to build the communication necessary to develop EarthCube, and “Building Blocks” and “Conceptual Designs” to create and connect the many necessary pieces of the overall infrastructure. Disciplinary communities that already have active discussion about data and knowledge management have an advantage when proposing new EarthCube-related projects. Whatever the eventual outcome, EarthCube, and similar ventures, will reach greater success with greater community participation.

Visit the EarthCube site to learn more:

2. Publish your full data sets, software, and other digital resources, not just manuscripts

Not too long ago, most data set publication was confined to supplemental material sections attached to peer-reviewed manuscripts, and limited to strict format and length requirements. But now the publication of data sets themselves is becoming more common, complete with a permanent citable identifier such as a doi (Digital Object Identifier). The definition of “publish” comes along with questions of quality assurance (peer review), long-term availability, as well as the worth of a published data set compared to a peer-reviewed manuscript. One organization working to advance data publication is DataCite, which works with data repositories to assign and maintain doi's for data sets that have been approved by an allocating agent.

When publishing data sets, both disciplinary and nondisciplinary data repositories are available. The advantages of disciplinary repositories are customization for specific data types and presence of disciplinary experts, and thus better data set quality control. Nondisciplinary repositories hold the advantage that they simply exist for the communities that have not yet established their own repository. University libraries are common nondisciplinary repositories for electronic content. Even the more traditional peer-reviewed journal article has evolved to accommodate data-focused papers; several peer-reviewed journals publish data-centric articles [Wiley's Geoscience Data Journal and Copernicus' Earth System Science Journal], and some disciplinary journals accept so-called data papers [e.g., Water Resources Research has a Paper Type called “Data and Analysis” to present “database access/context when they are likely to have wide interest in the hydrologic community.”].

Data set publication is still evolving. Unsolved issues include the best way to deal with real-time data and very large data sets, different disciplinary standards for publication, and the existence of multiple identifiers for the same data source. But despite the challenges, publication encourages proper citation of full data sets that cannot be included in traditional journal articles, making the data much more discoverable and reusable. The near future should bring many more options for data set publication.

For more information on data publication, check out

3. Register for your person identifiers

Finding all publications by a single researcher is challenging in today's world of information overload. A first and last name is not enough information to distinguish different investigators in citation indices. Individually managed sites such as personal web sites or Google Scholar pages can be helpful for uniquely identifying an author. But two initiatives, ResearcherID and ORCiD (Open Researcher and Contributor ID), strive to maintain a world-wide consistent registry of researchers with unique identifiers. The two efforts are integrating with citation indices and other software in order to unambiguously identify researchers.

To get started, visit the web sites to sign up for your IDs, make sure your list of publications is up to date, and start using the IDs on your web sites, e-mail signatures, and more. You will benefit from unique identification and web profiles that gather all of your research products.

4. Make your research and data available to altmetrics

Altmetrics, short for alternative metrics, is a way to get a fuller picture about how research is being shared and used. In contrast to the traditional citation in a published and reviewed manuscript, altmetrics include more informal “citations” like tweets, facebook mentions, shares, likes, and blog posts. Although the traditional citation in a published paper is still the trusted and well-established metric, altmetrics provides data about usage in a much faster way, with same-day statistics as opposed to the timescale of a year or longer for a citation in a peer-reviewed article. Altmetrics also provides a way to track use of software or other research products that are not well served by the traditional citation paradigm. Many of the altmetric statistics are enabled by the unique identifiers (e.g. doi's) assigned to papers and data sets.

Skeptics are justifiably suspicious of altmetrics, and wonder how these metrics show contribution to science, or predict that the system will be easily gamed and manipulated. I argue that if used honestly, the tweets and shares show a metric of discussion-generation, and possibly a better picture of reach to the non-academic community. Other community-sourced review systems, such as Yelp, have evolved their algorithm to reduce the impact of false reviews, and this will be a similar requirement as altmetrics software matures. And finally, altmetrics provides a valuable picture of scientific reach and impact to the author him or herself, provided they are not gaming the system.

Two examples of software currently being developed to capture these metrics are ImpactStory and Plum Analytics. These two applications provide detailed reports for the products of a specific researcher, listing the absolute and relative magnitude of nontraditional citations for each product. The reports are useful for investigators to demonstrate their research impact to funders or promotion committees.

To get involved, join the social media outlets and altmetrics communities to create altmetrics for other researchers, or take a look at your own profile.

Summary

This article has reviewed some steps you can take to expand the impact, reach, and longevity of your research. Many of these activities require actions that are new to the long-established and accepted research procedure, so skepticism of the efficacy and worth of these activities is not surprising. But I hope you see that these activities provide the opportunity to expand the audience and lengthen the life of your research output.

Each discipline is different, but progress will be made and rules will be set by the community itself. The examples here were from the broad Earth Science and general science domain, but ecology does and will have its own initiatives. One thing for certain is that research data volumes will continue to grow, and taking steps to promote and preserve your own painstakingly derived data will help both yourself and future scientists.

Ancillary