Measuring the Measurable: A Commentary on Impact Factor


  • (The impact factor for year x, reported in June of year x + 1, is calculated as the total number of citations to articles published in the journal in years x – 1 and x – 2, divided by the number of “substantive articles” or “citable items” published in the journal in years x – 1 and x – 2. “Substantive articles” include “articles” and “reviews” and “proceedings papers”[2]; editorials/commentaries, correspondence, essays, and several other types of papers are not included in the denominator. Academic Emergency Medicine (AEM)'s 2011 impact factor, reported in June 2012, is calculated as follows: (486 citations in 2009 + 305 citations in 2010)/(194 articles in 2009 + 231 articles in 2010) = 791/425 = 1.861.).

  • A related article appears on page 1248.

Measure what is measurable, and make measurable what is not so.

–Galileo Galilei (1564–1642)

We (that is, just about everyone) have taken Galileo's dictum to heart. A TV station's importance is measured by viewers in “Sweeps Month,” notwithstanding that the programming that month consists of specials and sensational news features. A medical student measures the value of a subject by the number of questions on the board exams, regardless of what the faculty thinks is important. Sports statisticians produce innumerable performance metrics (e.g., at least 82 separate and two summary football performance measures[1]). Clinicians measure patients' blood and urine to characterize their health and have summary statistics such as APACHE II to describe illness severity. Journals are ranked by impact factor, a performance metric that reports how often the papers published in a journal are cited in other papers. However, impact factor has been a very controversial topic in the scientific publishing industry, and a source of angst for journal editors, for years. A MEDLINE search for the string impact factor in the title returns well over 500 articles, mostly editorials or commentaries, with surprisingly few actual studies of impact factor. In this issue of Academic Emergency Medicine we publish such a study in which Reynolds and colleagues[3] report data on the impact factors of emergency medicine (EM) journals taken as a whole and how the impact factors of journals in our specialty compare to those in other specialties. The findings are discouraging: despite a general increase in the impact factor of EM journals, we rank no better than 24th of the 31 specialty categories.

The concept of impact factor was first proposed in 1955[4] (although not really developed until the early 1960s[5]), and the Journal Citation Reports currently provide impact factors on well over 10,000 journals. Given that it is well established and widespread, why is impact factor so controversial?

First, there are a number of purely mathematical considerations. Since the impact factor represents the mean number of citations per published article, it can be elevated by a very small number of very highly cited articles. In other words, since the mean as a measure of central tendency is subject to skew from outliers, even one very highly cited paper can significantly boost a journal's impact factor, as illustrated by Reynolds et al. in their discussion.[3] Also, miscategorization of a nonsubstantive article as substantive increases the denominator, thus decreasing the impact factor. (A number of AEM's humanistic “Reflections” papers were miscategorized into the denominator in 2010, lowering our impact factor by perhaps as much as 0.15.)

Second, its use in academia has spread far beyond its stated purpose. While intended to provide a performance measure for journals, impact factor often is used to assess authors, by examining the impact factors of the journals in which they publish their papers. However, as noted by the editors of PLoS Medicine, “because a journal's impact factor is derived from citations to all articles in a journal, this number cannot tell us anything about the quality of any specific research article in that journal, nor of the quality of the work of any specific author.”[6] Using impact factor, nevertheless, is a common practice among appointment, promotion, and tenure committees and grant funding agencies. It should be noted that Garfield and others argue that it would be preferable for these committees to examine the actual citation counts of the articles written by the authors being assessed,[5] something that is becoming quite easy thanks to online search systems and indexing. Impact factor (or a slight modification) has even been used to “rank” institutions[7]—a purpose for which it was never intended.

Third, there is clear evidence of “gaming the system” emerging in recent years as the perceived academic importance of the impact factor has grown. While obvious examples such as that which occurred at Folia Phoniatrica et Logopaedica (described by Reynolds et al.[3]) are rare, there are other, more subtle examples. Many of these are probably quite legitimate and benign: a journal can improve its impact factor by focusing on categories of submissions that have been highly cited in the past,[8] by crafting article titles to maximize electronic searchability,[9] and by clustering articles on the same topic in one issue.[10] Journals seeking to improve their impact factor can publish methodology papers, which then are cited by papers reporting on studies using those methods. They can publish editorials and letters to the editor, since each editorial and letter cites the source article being discussed. (Simply by writing this commentary, we have increased our own impact factor a bit since we cite the Reynolds et al. article.) Journals can also favor review articles, which are cited about twice as often as other articles,[5] although these present a bit of a paradox. Generally, a nonsystematic review does not add new knowledge and is typically counted by appointment, promotion, and tenure committees as less important than an original research article. However, despite being a work considered of lesser impact to the body of research and to professional development, the review article may increase the impact factor by being cited frequently itself and by citing many articles from the parent journal.

Other methods are, at best, disagreeable: we know from firsthand experience as peer reviewers that some journals ask reviewers to suggest other articles published in that journal in the past 2 years for inclusion in “the references” list. While a complete and up-to-date examination of the extant literature is an important part of the “discussion” section of any research study, the wording of these requests seems aimed at increasing citations of papers to improve the journal's impact factor. Of course, a smart author may initially cite a number of papers from the target journal, hoping that the editor will look favorably on this—perhaps another unintended consequence of the impact factor system.

Fourth is the potential for citation. EM has relatively few investigators (hopefully, the newly created NIH Office of Emergency Care Research will increase these numbers) and many questions. This demographic, and editorial policies that discourage articles that add little, reduces the probability for citation. This is probably less an issue for basic science articles where the nature of the enterprise is small incremental advances. As Reynolds and colleagues discuss, specialties such as EM that overlap with many other specialties are likely to have investigators publishing in the journals of those other specialties, thus reducing the number of citations in the journals of the “home” specialty.

Finally, and perhaps most important, we ask what is meant by “impact.” To us, it is the value to the readership. Will the results improve patient care, enhance educational offerings and approaches, or increase understanding of fundamental concepts? How often articles are cited does not begin to assess these crucial reasons for the very existence of scientific journals. None of the citation-based metrics (including newer measures such as Eigenfactor and h-index) address the real utility of an article to the reader. A study that is truly practice-changing may not engender additional research and thus may not be cited heavily (except in review articles), despite its usefulness to the reader. Our primary interest as editors, in selecting articles for publication, is how useful the ideas presented are to clinical practice, education, and research, not how likely they are to result in citations down the road. What is needed in the area of journal performance metrics are measures of utility to the reader—metrics that will be much more complicated to develop than mathematically based citation measures. About four decades ago, the late Alvan Feinstein used the term “Overmensuration” to describe a situation where some convenient and measurable metric is used when what the researcher was interested in is difficult or impossible to measure (personal communication). One example used back then was measuring death when the procedure or treatment was designed to improve quality of life. A JAMA commentary speaks to this: “The impact factor has gone from being a measure of a journal's citation influence in the broader literature to a surrogate that assesses the scholarly value of work published in that journal. These misappropriated metrics have been used to assess individual researchers, institutions, and departments.”[2]

It seems clear, then, that impact factor has grown from its humble beginnings as a simple citation metric to something that even its creator could not have predicted. Garfield himself, presenting a plenary session to the International Congress on Peer Review and Biomedical Publication in 2005, 50 years after first proposing the idea, considered “Citation sanity and insanity—the obsession and paranoia of citations and impact factors” and “Uses and abuses of impact factors” as titles for the session before settling on “The agony and the ecstasy—the history and meaning of the journal impact factor.”[3] We suppose he could have called his session “Caveat Lector.”