Volume 30, Issue 1 p. 81-86
Opinion Piece
Free Access

Full discovery: What is the publisher's role?

First published: 13 January 2017
Citations: 1

Abstract

Key points

  • Our collective authorship and publishing practices do not always end up ensuring that scholarly content is discoverable by readers.
  • Readers of all kinds rely on a variety of ‘discovery pathways’, such as search engines, library systems, and various electronic links, some of which are blind to the content they desire.
  • Efforts over the years to improve content discoverability have made great progress, but an increasing amount of freely available content brings up new issues.
  • The National Information Standards Organization (NISO)’s Discovery to Delivery (D2D) Topic Committee has developed a grid comparing various ways in which content is shared with various ways in which users discover such content.
  • This article brings to light a few of the current obstacles and opportunities for innovation by publishers, aggregators, search engines, and library systems, and invites Learned Publishing readers to step up and identify others.

In 2014 I accepted an invitation to join the National Information Standards Organization (NISO)’s ‘Discovery to Delivery Topic Committee’. This is one of three committees whose remit is to observe the landscape of technologies relevant to libraries, publishers, and system vendors with an eye towards identifying areas where a set of standards or recommended practices might serve the industry. An example is the Open Discovery Initiative (2014), which established a set of recommended practices for all members of the scholarly content supply chain to increase transparency and success in library search services.

I joined with an impression that I would be helping to identify ways in which serendipity can play an important role in discovery, just as it does with online encyclopaedias like Credo Reference. It is what I would call ‘helping people find things that they need, but are not looking for’. On the other hand, the discovery problem that is called ‘known-item searching’ seemed to me, at the time, to be pretty much a well solved problem.

But, as luck would have it, in June 2015, drawn by work with some of my clients and prospects, I started looking deeply into issues that related to open access. My interest in open access had me doing a significant number of informational interviews with various stakeholders in scholarly publishing, including scholars, researchers, publishers, system vendors, librarians, policy people, open access advocates (and opponents), and funding agencies. Through these interviews, I started to keep track of the various mechanisms by which various actors have tried to accomplish full sharing of their content with the world.

My thoughts about a variety of use-cases for how such open content could be easily discovered came into sharper focus when I read the extensively researched study by Tracy Gardner and Simon Inger on ‘How readers discover content in scholarly publications’ (Gardner & Inger, 2016).

It is an excellent body of research conducted with input from almost 30,000 researchers, scholars, librarians, students, and other library users around the world across a wide variety of disciplines about where they go to find information, and it offers up a good set of possibilities from open web search tools to various tools provided by libraries.

This set of discovery pathways got me thinking: If an institution passes an open access policy, it is in large part motivated by having the results of scholarship at that institution have maximum reach and influence. If a funding agency establishes an open access mandate, as most of them have, it often stems from a desire to accelerate research in areas in which they provide funding. If a scholarly publisher comes out with an open access journal or offers for a fee to make certain articles ‘open’, they should certainly seek to have that article or that journal achieve maximum reach. And scholars/researchers who share their papers on personal websites or on scholar sites, like academia.edu or ResearchGate, tell me that this is their way of making sure that colleagues in their field have access to their works.

This spawned in my mind a simple grid comparing locations of shared content and ‘discovery pathways’ by which someone might look for and hope to find such content. At a subsequent meeting of the NISO D2D Committee, I offered up an Open Content Discovery Grid (Fig. 1). In constructing my ‘grid’, I borrowed heavily from the choices Gardner and Inger offered up in their questionnaire.

image
Working version of the NISO D2D Committee's Open Content Discovery Grid (Dove, 2016).

Notice that this grid comes with lots of cells, each having no facts, no data, and no opinions. But it does cover the landscape of where there might be unnecessary friction between someone's intent to share and a user's desire to discover, and as such, it does cry out for facts, data, and opinions. For example, a university may have a policy that scholars who are on their payroll agree to provide submitted manuscripts reporting on research conducted at that university so that the results of the research can be shared with the world. But there may be problems with this intention to share being fully realized. Some of the problems are related to compliance by scholars with the policy. Others are related to the appropriate metadata being provided so that people searching for a specific submitted manuscript can find it. And some problems may exist with links that are intended to bring users to this shared version of the article, but they are not provided. A reasonable person may argue about how serious each of these problems is, but the purpose of this grid is to shine a light on the wide variety of cases where improvement is possible so that various stakeholders can either advocate for improvements or make the improvements themselves.

This is not a finished grid and continues to evolve and take shape, so I encourage comments and shared insights, opinions, and suggestions about areas where recommended practices by various stakeholders could eliminate unnecessary obstacles standing between a producer's desire to share and a user's desire to know.

It is clear that some parts of this grid are especially relevant to various players in the scholarly communication supply chain. Most scholarly publishers now have open access journals in their portfolio. Are libraries paying sufficient attention to which new publishers are appearing on the scene and perhaps should be included in their catalogues and knowledge bases that support their discovery services? And a question for those who have open monograph publishing programmes: Are these openly available books (at no cost to the library) being included in their online catalogues or discovery services?

As I say, this grid cries out for opinions and suggestions. Here are a couple of examples raised by people at conferences, like COASP 2016 and the Charleston Library Conference, where I have socialized this grid:
  • Gold OA journal articles hosted by publisher: Some publishers, especially those who have subscription-based journals, are fully experienced with what it means to pay attention to the use of their content in libraries; they have an ongoing revenue stream to protect, which necessitates this. They know what metadata is necessary to provide, which facilitates their inclusion in library catalogues, link-resolver and discovery tool databases, and even usage stats (COUNTER). But do new pure open access publishers know all the various mechanisms by which their content can be included in library systems. From a NISO prospective, this may reveal a need to provide better education with perhaps a ‘punch-list’ of all the various NISO standards and recommended practices that a new open access publisher ought to be aware of and follow.
  • Articles in hybrid journals that have been paid to beopen’: The discovery landscape for these articles is truly problematic. Recently, my niece was proud to be the lead author for an article published by a highly respected subscription journal. Her employer, a genetics company, paid $3,500 for her article to be freely available. It is certainly discoverable by Google Scholar, so her parents and her boss can see that it is ‘open’. But inside of a library that does not subscribe to this particular journal, this single open access article is very unlikely to participate in any of the discovery services, and when other articles cite her article, the link-resolvers that would normally provide the reader with a direct link to her article will not do so. This is because link-resolvers and the databases that support them are architected at the journal level, not the article level. Links will only be applied to that article if the library subscribes to the journal. In the case of library discovery tools, such as EDS from EBSCO or Primo and Summon from ProQuest, limitations arise from uneven implementation of appropriate tagging by both discovery service vendors and publishers, which must be assigned by the publisher to ‘open’ hybrid articles and need to be noticed by the discovery tools. This tag is part of a recommended practice developed by the Access License Indicators Working Group and approved by NISO on 5 January 2015.

WHERE MIGHT SOME INNOVATION BE CALLED FOR?

I think that a serious look at some of the rows and columns in this grid may also uncover areas where it is too early to define a standard or recommended practice but call out, instead, for some innovation by publishers, aggregators, and recommendation services.

One such area for innovation is the row labelled ‘Publisher provided links in reference lists’. This could also apply to anywhere that citations are presented to users, including such things as annotated bibliographies, and recommendation features in various scholarly systems.

The 2016 Gardner/Simon research included a question related to how users search for citations, specifically ‘where people start their search when they know exactly what article they are looking for’ (Gardner & Inger, 2016, p. 36). However, their question in this situation skips over a very important step in defining the use-cases where people already know what article they are looking for, namely, where did they come to know about that specific article?

There are multiple possibilities to consider where someone is likely to have obtained a citation (from an e-mail, from notes taken in class, from a syllabus, etc.). But a highly likely place is in an online resource, like an annotated bibliography, an index, or other subject guides. I am of the opinion, given how academics I have interviewed about how they go about reading articles in their field, that one of the main ways people find citations that they want to follow-up is from references in other articles that they are reading.

In my interviews of academics, I often ask the question ‘When do you need help from others to find articles?’\‘A typical answer is something like ‘As I'm reading I keep a list which I pass on to my [admin, librarian, etc.] of articles I need to find. They then chase after them for me.’

Citations were developed and standardized with a view both to acknowledge prior work but also to provide the user with an ability to go check the referenced work. When journal content went online, links were developed that would allow the reader to simply click on a link and hopefully find more information about the referenced source (abstracts, etc.) and even the full text of the article. If a link to the full text is presented, then the user following this particular citation is never in the situation where they need to start a ‘search when they know exactly what article they are looking for’.

I think the users’ need in this case is a link or a set of links that give them their best access choices. In my opinion, these links should be revealing as to whether or not clicking on them will get you to the full text or not. One important choice is the most available link that will not cost anything. And one or another link should be able to discover a version of that article that may appear in any of the columns of this grid. Some readers will know that they need the article of record. A very high percentage, however, who are reading this article for the first time and want to follow a citation simply need to reassure themselves that they understand the argument being presented. They will simply need read-access to the cited work. This is a very important use-case in our community. Ask researchers/scholars in any discipline about how they read a paper in their field, ‘How important is your ability to read some of the cited works?’ They will tell you that they cannot fully absorb the argument of the new paper without being able to check through a few of the cited sources.

So, if nothing else, if you are a publisher, an aggregator, or provide an online journal platform, you will want to facilitate your readers in the absorption of the articles you publish, by providing unfettered access to as many of the cited works as possible. This will improve the reading experience of your articles.

I am reminded of the first sentence in the preface of the first edition of Encyclopaedia Britannica (Encyclopaedia Britannica, 1771). This sentence is engraved in granite in the vestibule of the Encyclopaedia Britannica offices in Chicago, where my mother worked as a library researcher back in the 1950s and 1960s:

Utility ought to be the principal intention of every publication. Wherever this intention does not plainly appear, neither the books nor their authors have the smallest claim to the approbation of mankind.

So what innovation would earn today's publisher 'the approbation of mankind'? I think a good candidate is I think it would be links provided by publishers, aggregators, and journal platforms enabling users their best chance to find what they are looking for with a minimum of clicks.

Over 15 years ago, as full-text access to journals was beginning to be ubiquitous, a problem was arising that researchers, students, and scholars in academic libraries were sometimes, in the press of time, purchasing online content they needed when, in fact, their library had already purchased that content. To address this situation, it was necessary to offer the users a link that would take them to their best choices for access provided by their library. Herbert Van de Sompel stepped forward and designed the OpenURL Framework (Van de Sompel & Beit-Arie, 2001), which allowed citations to be presented to the users along with a link that was smart enough to take them to the various access options available. These so-called link-resolvers saved the libraries and the users a lot of money and saved time of the reader by providing quick access right at the point of need. No searching was necessary.

Today, the world of academic research is changing significantly. In 2016, 16% of scholarly articles were published as open content. A full 80% of publishers acknowledge that authors can share their submitted manuscripts (Sherpa/Romeo, 2016), and almost all funding agencies, and a good number of major universities, are requiring that submitted manuscripts be shared. As a result, if you want to facilitate researchers’ access to content referred to by a citation, in today's world, this means facilitating their access to shared versions of that content as well as the article of record, which may not be available to them (either at the moment or ever).

Some publishers already attempt to facilitate a reader's ability to get to versions of articles that support a particular citation. For example, it is becoming more common for different links to be provided, see, for example, Fig. 2, which shows a reference list from a journal published by MDPI, an open access publisher.

image
Example of a publisher providing multiple ways for their readers to find available versions of referenced sources.

You will notice that each citation has one to three links. The first is always to Google Scholar. This link is prefilled with metadata from the citation and provides the user with the best chance at finding a version of the article, which may reside in any of the columns in the grid. If the citation includes a DOI, then there is a link to CrossRef. And finally, if there is a PubMed-ID, there is a link to PubMed.

This set of links almost gets the users what they need, but not quite. The Google link does not reveal before clicking whether or not it is going to be successful at finding a shared full text. Not uncommonly, clicking on that link takes the user on an unstructured web search hoping to find a version of the article that is cited. One might think that someone could build a link that used Google Scholar under the covers and could then reveal its likely success. Unfortunately, however, as I understand it, Google Scholar does not provide a suitable application programming interface (API) for this and even detects and prevents ‘screen-scraping’ by which someone could build such a feature. Perhaps Google should be pressured to provide such an API. The CrossRef link will take the user to the article of record but not to other versions of the article. Another interesting possibility is that CrossRef could build a link that would bifurcate between shared versions of an article and the article of record. As CrossRef serves publishers, they could build such a link so that it provides traffic information back to the publisher of the article of record of where and how many users went to a shared version of the article. This might be useful marketing information for the publisher, but there needs to be a willingness by publishers who are members of CrossRef, to allow them to put these needs of the reader front and centre.

SOME PIECES OF WHAT IS NEEDED

So, what would it take to build what I would call an ‘Open Web Smart Link’, which would facilitate a user getting to a shared version of an article via a clearly labelled link so that the user knows what to expect? Librarians have been working on things like this for many years (Sugita et al., 2007), but I am not aware of any publishers adopting similar linking technologies.

I have recently become aware of two projects that are developed (or will be developed) with open source code. They show that some of the thorniest parts of this problem can be solved. And, each, in its own way is part of what is needed.

Late last year, the OADOI website was launched. It was built by Heather Piwowar and Jason Priem at Impactstory.org (OADOI, 2016). Appending a DOI to ‘http://oadoi.org/’ gives you a link to an open access version of the article, if there is one; otherwise it takes the user to the article-of-record via the DOI. This site is built with open source software, and unlike Google or Google Scholar, it has an API, so its search functionality can be embedded in other tools. It finds content in many, although not all, of the columns in the Open Content Discovery Grid (Fig. 1). It is being actively developed, so it will continue to get broader and broader reach.

The other is an open access project within the Wikipedia community called ‘signalling OA-ness’ (Wikipedia:WIkiProject, 2016). This project currently focuses on providing a ‘tag-suite’ by which editors of Wikipedia articles can give indications of the relative open-ness of a cited source. It does not currently serve as a link to shared content, but it does have a very compelling one-minute video showing the user's experience of reading an article with citations and what it would be like if accessibility options were revealed up front (Wikimedia.org, 2016). It is easy to imagine the reader in this video being delighted by a link that was both revealing of its likely success and capable of finding an open version of the cited source (see video available at: https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Open_Access/Signalling_OA-ness).

In preparation for writing this article, I contacted Herbert Van de Sompel and asked his opinion about whether or not it is technically feasible for someone to build a link like I have described. His reply was affirmative (while pointing out where some of the challenges lie). In particular he points out that for this to work smoothly requires that publishers, repositories, and any of the locations designated by columns in the Open Content Discovery Grid would need to implement the NISO Standard for the ResourceSync Framework (ResourceSync Framework, 2014). This would facilitate the automatic updating of the links upon changes to any of the versions.

FINAL THOUGHTS

Anyone in the scholarly publishing supply chain that is presenting information to users needs to recognize that your most important constituencies are readers and authors. Although they are not yet working in a fully ‘open world’, they are all now working in a world that includes both ‘open’ and pay-walled content simultaneously. Facilitating users’ access to shared versions of cited sources is not just an advantage to the reading experience. It also can be argued that making an article more useful will be a positive contributor to an article's citation benefit of being openly shared.

Just imagine two cases and a reader who is in an under-served context (away from his/her host institution or in an under-served institution). In the first case, a newly published article contains no effective linking to shared content. In the other, there are links provided that make reading this article, what some user-experience professionals call, ‘in the flow’. They are not interrupted in their thought processes by having to stop and write down a task to be taken up later, ‘check out this cited source’, thereby delaying their full appreciation of this new article's argument. Across all the readers who are going to be reading this article, which situation maximizes the chances that this new article will be fully understood and thereby be more likely to be cited in future articles? This is what is called the ‘citation benefit of open’, and such citation benefits accrue not just to the article that is easier to read but to author of the article, to the journal it appears in, and to the publisher of the journal as well.

Hopefully, this Open Content Discovery Grid can spawn discussions that will lead to recommended practices or whole new innovations on how scholars absorb content in a world that is increasingly open.

Biography

  • biography image

    John Dove

    The full text of this article hosted at iucr.org is unavailable due to technical difficulties.