Altmetrics as a means of assessing scholarly output

Career progression for scientists involves an assessment of their contribution to their field and a prediction of their future potential. Traditional measures, such as the impact factor of the journal that a researcher publishes in, may not be an appropriate or accurate means of assessing the overall output of an individual. The development of altmetrics offers the potential for fuller assessments of a researcher's output based on both their traditional and non‐traditional scholarly outputs. New tools should make it easier to include non‐traditional outputs such as data, software and contributions to peer review in the evaluation of early‐ and mid‐career researchers.

ABSTRACT. Career progression for scientists involves an assessment of their contribution to their fi eld and a prediction of their future potential. Traditional measures, such as the impact factor of the journal that a researcher publishes in, may not be an appropriate or accurate means of assessing the overall output of an individual. The development of altmetrics offers the potential for fuller assessments of a researcher's output based on both their traditional and non-traditional scholarly outputs. New tools should make it easier to include non-traditional outputs such as data, software and contributions to peer review in the evaluation of early-and mid-career researchers.
A cademic researchers work on highly specialized projects which are, by nature, at the limits of current understanding. When it comes to evaluating the quality of a researcher's work for promotion or other purposes, the esoteric nature of the work being assessed means that a proxy measure for the quality of the research is often used. The number of times a particular work has been cited has traditionally been used to measure the quality of a researcher's output relative to others in the same fi eld. However, an article's citation rate is no longer the only metric available to assess scholarly impact.
In addition, the formally recognized output of scholarly research is also changing. The peer-reviewed article is no longer the sole measure by which a researcher's productivity can be assessed. The generation of scientifi c data, the coding of useful software, and contributions to the peer-review process are increasingly recognized as being vital to scientifi c progress.
In this paper we discuss the changing nature of accreditation for scholarly work. We begin with a brief overview of the existing and emerging metrics for assessing the impact of the traditional scholarly output, the article. We provide an overview of the novel outputs being increasingly formally recognized and discuss how the emerging impact metrics might be applied to these. We end by discussing how metrics and research outputs might feed into current assessment processes.

Citation-dependent measures of impact
A journal's impact factor, as defi ned and calculated by Thomson's Journal Citation Reports (JCR; thomsonreuters.com/journalcitation-reports), is the mean number of citations for all the articles published in a particular journal in the preceding two years and is extrapolated to indicate the likely quality of the research published by that journal in subsequent years. Journal impact factors are often used as a proxy for forecasting the quality and impact of future work that will be produced by the scientists who have previously published in these journals. This forecasting of a researcher's future productivity is used during hiring and promotion processes, as well as for other activities such as evaluating funding proposals. Thus the impact factor of the journal in which a paper is published can have a bearing on the career prospects of the authors of that paper. However, the use of journal impact factors for researcher evaluation is not ideal.
As impact factors depend on an average, extreme deviations from the mean can signifi cantly skew the resultant impact factor. Because of the way impact factors are calculated (using article citation counts), the citation counts of articles published in a particular journal have been shown to correlate with the journal impact factor (Hansen and Henriksen 1997). However, a lack of correlation between the impact of an individual research paper and the impact factor of the journal in which it was published has also been well documented (Seglen 1997, Hecht et al. 1998, Rostami-Hodjegan and Tucker 2001, Tort et al. 2012, Casadevall and Fang 2014. Another problem with the use of the impact factor for judging research impact is that there is often a time lag before citations to an article begin to appear, with the maximum number of citations per year usually appearing 3-7 years after publication (Hansen and Henriksen 1997). Thus these later citations will not contribute to a journal's impact factor. The Eigenfactor Score and the Article Infl uence Score have been developed by the Eigenfactor Project as alternative (and openly available) measures to the JCR impact factors (Bergstrom et al. 2008). Although based on JCR data, the Article Infl uence score for a journal is a normalized measure of the average infl uence of each of its articles over the fi rst 5 years after publication and so addresses the citation time lag that can occur.
The use of the journal impact factor for research evaluation also relies on an underlying assumption of the citation count having a direct correlation to quality. The assumption that papers published in highly selective journals with high impact factors will be of higher quality and correspondingly have higher rates of citation is not necessarily correct. It is possible that highly cited papers are being cited due to their fi ndings being questioned, as shown by Aksnes (2006).
In recognition of the limited usefulness of the impact factor, researcher-centric metrics have been developed in an attempt to provide more direct measures of the quality of an individual researcher's output. The h-index was developed to measure both the productivity and impact of an individual, based on the number of articles that they have published and the number of citations those publications have received (Hirsch 2005). Whether the h-index is better or worse than using journal impact factors for predicting a researcher's future productivity is unclear (Lehmann et al. 2006, Hirsch 2007, Penner et al. 2013.
Several modifi ed versions of the h-index have also been developed, e.g. Google Scholar's i10-index. A detailed discussion of these variations is outside the scope of this paper. However a drawback of any metric relying on article citations is the length of time it takes for the citations to accumulate, with the effect that mid-to-late career scientists will have a higher measured impact than earlycareer researchers. As it is the early-career researchers who are likely to be undergoing promotional evaluation, earlier measures of research impact are required.

Citation-independent measures of impact
The emergence of the Internet in the 1990s and associated development of novel methods of disseminating research articles, allowed the development of citation-independent measures of scholarly impact. In addition, the availability of usage data at the article level supported the development of metrics measuring the impact of individual articles as opposed to journal averages (Neylon and Wu 2009). PLOS has collated and displayed several article-level metrics (ALMs) for its published articles since 2009 (Fenner 2013). PLOS ALMs report usage statistics, comments on articles in forums and on social media, social book-marking statistics and expert recommendations, alongside citations (Fenner 2013).
The term 'altmetrics' was coined in 2010 as a grouping term for these alternatives to citation-dependent and journal-level metrics (Priem et al. 2012), but is sometimes also used as short-hand for article-level metrics.

a lack of correlation between the impact of an individual paper and the journal has been well documented
In this paper we use the 'alternative metrics' (non-citation based) defi nition of altmetrics, and discuss the application of altmetrics to articles as well as to other outputs of scholarly research. In general terms, these alternative measures seek to judge the impact of a research output by examining the number of times that it is viewed, downloaded, saved, discussed by the scientifi c community, and recommended to others. The most commonly implemented altmetrics for online scholarly publications are simple counts of the numbers of views and downloads ( Figure 1). In order to standardize the way in which online usage is presented, some journals may adhere to standards such as COUNTER (www.projectcounter.org) when assessing the number of views and downloads in order to exclude any possible 'gaming' activity due to robots. Additional altmetrics can include the number of times an article has been shared via email or social media platforms such as Twitter, mentioned in blogs or added to citation management tools such as CiteULike (www.citeulike.org) or Mendeley (www.mendeley.com). These are now well-established article-level metrics that have been implemented by many publishers and can be provided by companies such as Altmetric (www.altmetric.com) or generated in-house.
Key advantages of altmetrics include the fact that they begin to accumulate as soon as the article is published, and that they are independent of the altmetrics being generated for any other article published in the same journal at the same time. Additionally, several studies have showed a signifi cant correlation between altmetrics and traditional measures of impact   (Priem et al. 2011, Eysenbach 2012, Shema et al. 2013 and have demonstrated that altmetrics may outperform traditional measures of scholarly impact (Liu et al. 2013). However, the use of article-level altmetrics for scholarly evaluation and accreditation is not widespread.

Post-publication commentary as an evaluation tool
Post-publication comments can also contribute to researcher evaluation. Researchers could be evaluated both in terms of the individual's contribution to post-publication discussions of others' work, as well as by evaluations of the researcher's own work by their peers. We consider the latter case now, and discuss the former case in the next section.
There are several post-publication selection and discussion platforms currently available, including F1000Prime (www.F1000.com/ prime), PubMed Commons (www.ncbi.nlm. nih.gov/pubmedcommons), PubPeer (www. pubpeer.com), and Publons (www.publons. com). Studies have shown a correlation between articles that are selected for recommendation on platforms such as F1000Prime and the number of citations they receive (Waltman and Costas 2013). It has also been stated that expert reviews could outperform bibliometric indicators such as citations for some articles in the evaluation of research (Allen et al. 2009). Thus post-publication assessment and discussion could prove useful when a relatively fast evaluation of the importance and quality of an article is required.

Non-traditional scholarly outputs
Traditionally, peer-reviewed articles have been the main research output that has counted towards scholarly accreditation. However, the generation of research data, software to help analyse the data, and peer-review contributions are all important outputs of academic activity. For example, in order to gain the maximum benefi t from large-scale sequencing projects such as the human genome project, specialized software to annotate, curate, and understand the data had to be developed (Lander et al. 2001). The development of relevant software has been essential in order to gain the full benefi t from the time, effort and fi nancial cost of undertaking large-scale research studies, respectively.
However, the impact of this alternative research output is not always adequately captured by the number of citations to the relevant papers. For example, to date, the paper announcing the draft of the human genome (Lander et al. 2001) has gained 16,528 citations (Google Scholar, accessed 30 July 2014), whereas the two papers describing the PHRED software package, crucial to making sense of the sequence, have only 4,430 (Ewing and Green 1998) and 4,950  citations (Google Scholar, accessed 30 July 2014).
Data have always been a cornerstone of scientifi c endeavour, but as the scientifi c questions being asked become increasingly complex, so the generation of datasets to answer those questions becomes more complicated. It is often assumed that data-generating scientists are the same individuals who will analyse the dataset but this may not be the case due to the increasingly specialized skills required to both produce and analyse the data. Scientists analysing data produced by others are highly dependent on the skills of the data-producing researchers in ensuring the data were generated using appropriate methods and controls, and that any equipment used was calibrated correctly. Hence both data and software are valuable and important research outputs.
As these different types of scholarly outputs become increasingly important for future research, fi nding ways in which they can legitimately feed into the scholarly accreditation process is necessary to ensure retention and recognition of these skills within academia.
Currently the most mature method of incorporating data generation into the scholarly accreditation process is via the publication of a data paper . Data papers allow the data generators to provide context for their dataset, stating why and how the data were generated. This then feeds into the current accreditation framework via the production of a peer-reviewed article, thus providing a way for researchers to gain formal recognition for this non-traditional scholarly output.
Likewise for software, a paper describing the application and use of the code for a scien-the impact of this alternative research output is not always adequately captured tifi c problem is currently the best way for this work to be formally recognized. However, this does rely on the scholarly community ensuring they cite the appropriate papers when utilizing data and software for their own research. As discussed above in the example of the PHRED software, the citation rates of such papers often do not fully represent the usefulness and importance of these alternative academic outputs to future research. One signifi cant contribution to scientifi c research that is often overlooked by traditional methods of researcher evaluation is participation in the peer-review process. Peer review is a skilled task and is vital for the evalua-tion of the output of the scientifi c community. With traditional closed peer-review systems, a researcher's contribution might only be noted in a 'Thank you to reviewers' notice published annually in a journal. As more journals adopt open peer-review models, it is now possible for researchers to include this important contribution as part of their scientifi c output.

Metrics for non-traditional scholarly outputs
Implementing Digital Object Identifi ers (DOIs) for research data (Figure 2a) adds weight to the notion of data as a legitimate research the citation rate of data papers often does not represent their usefulness and importance object, as DOIs enable data to be traceably cited. F1000Research additionally mints DOIs for source code that has been shared as part of an academic publication. Readers of F1000Research papers are alerted to the data and software contained within the context of an article with a data and software availability section (Figure 2b). Citation-based indices such as Thomson-Reuters Data Citation Index (wokinfo.com/products_tools/multidiscip linary/dci) can track data and software citations as part of their normal tracking of citations (Figure 2c).
Altmetrics-based indicators such as downloads, views, and shares have also been applied to datasets. Repositories for data, such as fi gshare (www.fi gshare.com), already provide information on views, shares, and citations of datasets. It has been debated whether these metrics, originally developed to assess article impact, are best suited to measure data impact (Wouters and Costas 2012;Costas et al. 2013), though it is also recognized that these are currently the best metrics that are widely available to capture data usage (Ingwersen and Chavan 2011).
Within software-specifi c repositories, e.g. GitHub (www.github.com), such softwarespecifi c altmetrics include the number of times a piece of code has been forked (copied) and pull requests (when a useful update is merged back into the original code). GitHub also allows users to follow each other, the assumption being that you will follow people producing code that you fi nd useful, and so the larger the number of followers, the more useful the work you are sharing.
Basic altmetrics, such as views, downloads, and shares, can also be applied to peer reviews that are open. If they are also attributed it is possible for the quality of a specifi c researcher's peer-reviewing record to be assessed. The assignment of DOIs to open peer reviews, as already implemented by journals such as F1000Research and eLife, allows researchers to include these in a list of publications and also enables the citation of referee reports by others.
While adding altmetric information to a single article may add value and demonstrate impact, researchers and academic institutes need to aggregate this information so that the impact of all their activities can be assessed.
Several services are now available for researchers to 'harvest' their outputs and generate a personal profi le. If the researcher has an ORCID profi le (www.orcid.org), all their published articles can be imported into a profi le page associated with their unique ID (often facilitated by publishers or bibliographic databases such as Scopus, www.scopus.com, or Web of Science, www.wokinfo.com). The addition of DOIs to peer reviews, data, and software will enable these non-traditional outputs to feed into individual ORCID profi les and thus be recognized as a part of their formal scholarly output. A link to the researcher's ORCID page can be added to a curriculum vitae, job application, or grant application. Similar profi les can be generated by the use of Google Scholar, ImpactStory, and ResearchGate, and these sites also include aggregated information on article views, downloads, and citations. An additional feature of ImpactStory is the ability to link to repositories of non-traditional outputs that have been deposited in GitHub, fi gshare, and SlideShare. Examples of one of the author's profi les in these different aggregators of scholarly output can be found at ResearchGate, (www.researchgate.net/profi le/ Varsha_Khodiyar), ORCID (orcid.org/0000-0002-2743-6918), and ImpactStory (impactstory.org/VarshaKhodiyar).

Conclusions
It is possible for researchers to spend their time being extremely productive and positively impacting progress in their fi eld without producing formally recognized outputs. As earlycareer researchers are most likely to be producing novel research output, their research impact could be grossly underestimated exactly during the time that evaluations are being carried out for career progression purposes. The majority of grant and tenure processes are set up to recognize citation-dependent metrics and, in some cases, altmetrics. However, it will take time for data-and software-specifi c metrics to be developed, widely implemented, and fi nally recognized by scholarly accreditation bodies.
A specifi c issue with data publication is the anxiety that publishing data 'too early' could be detrimental to a researcher's career progression. Researchers, especially those subject to it will take time for data-and softwarespecifi c metrics to be recognized by scholarly accreditation bodies the 'publish or perish' ethos, may worry about having their key fi ndings scooped or not being able to publish all the papers that may be possible based on a large dataset. A solution to this is for academic institutions to implement formal recognition of FAIR data sharing during tenure and promotion processes (Nokes et al. 2013). While researchers are reluctant to share data because it might hamper their individual promotion prospects (Harley 2013), there will continue to be a negative impact on the scientifi c progress that would otherwise be achieved if data were shared openly earlier. For researchers who currently produce non-traditional research outputs or wish to make their data available to the wider scientifi c community, publishing their data and software formally as a scholarly article is currently the best way to obtain formal academic recognition. The acceptance of anything other than peer-reviewed publications for decisions on promotion and funding (at both individual and institutional levels) is still controversial (Nokes et al. 2013, Sanberg et al. 2014). However, since traditional measures of impact are generally recognized as being not very informative in predicting the quality of a researcher's future output, the formal consideration of more immediate and alternative measures such as altmetrics that cover the full range of scholarly outputs may become more widespread. A possible barrier to adoption of these alternative measures may be lack of standardization across platforms that host the various research outputs. A project funded by the Alfred P. Sloan Foundation and run by NISO is currently investigating best practices and standards for the new wave of metrics and is aiming to incorporate datasets, software, and visualizations (NISO 2013). Once a standard has been adopted, it may be easier to persuade institutions to recognize altmetrics as a valid tool for researcher evaluation and accreditation.