The Distribution of the H-index Among Academic Emergency Physicians in the United States

Authors


  • Presented at the Society for Academic Emergency Medicine Western Regional Research Forum, Sonoma, CA, March 2010; and the Society for Academic Emergency Medicine Annual Meeting, Phoenix, AZ, June 2010.

  • There was no financial support provided for this work. The authors have no conflicts of interest to disclose.

  • A related commentary appears on page 1067

Abstract

Background

Hirsch's h-index (h) attempts to measure the combined academic impact and productivity of a scientist by counting the number of publications by an author, ranked in descending order by number of citations, until the paper number equals the number of citations. This approach provides a natural number or index of the number of publications and the number of citations per publication. H was first described in physics and was demonstrated to be highly predictive of continued academic activity, including recognized measures of scientific excellence such as membership in the National Academy of Sciences and being a Nobel laureate. Citation rates, research environments, and years of experience all affect h, making any comparisons appropriate only for scientists working in the same field for a similar time period. The authors are unaware of any report describing the distribution of h among academic emergency physicians (AEPs).

Objectives

The objective was to describe the distribution of h for AEPs and to determine whether Hirsch's demonstration of the h-index as a predictor of continued scholarly activity among physicists would also apply to AEPs.

Methods

Academic EPs were identified from lists provided on allopathic U.S. emergency medicine (EM) residency program websites. “Harzing's Publish or Perish,” a free program available on the Web that queries Google Scholar, was used to calculate h for each AEP. Agreement between raters was analyzed on a subset of 100 EPs. An analysis of the 20 EPs with the top h-indices was performed to characterize the entire body of their scholarly work, and their h-indices were calculated at 12 and 24 years into their careers.

Results

A total of 4,744 AEPs from 136 programs were evaluated. Nine programs did not publicly list the faculty at their institutions and were excluded. A linear weighted kappa was used to measure rater concordance, with agreement of 98.3% and κ = 0.92 (95% confidence interval [CI] = 0.861 to 0.957). The majority of AEPs had h-indices of zero or one (59%), 85% had h-indices less than six, 95% less than 13, and 99% less than 24. Ten percent of AEPs had h/(years in publication) of 0.5 or greater. For the top 20 EPs, the mean (± standard deviation [±SD]) h-index increased from 7.6 (±4.6) to 23.5 (±9.4) between years 12 and 24. The mean (±SD) increase in h-index was 15.8 (±7.6).

Conclusions

The h-index can be used to characterize the academic productivity of AEPs. An h/year of 0.5 or greater is characteristic of the most productive EPs and represents only 10% of all AEPs. The 12-year h-index of top-performing EPs was strongly related to their future academic productivity. The distribution of h among EPs may provide a means for individual investigators and academic leaders to evaluate performance and identify EPs with future success in EM research.

Resumen

Introducción

El índice de Hirsch (índice-h) intenta cuantificar la combinación del impacto académico y la productividad de un científico mediante la contabilización del número de publicaciones de un autor, ordenadas en orden descendente por el número de citaciones, hasta que el número del artículo iguale el número de citaciones. Esta aproximación proporciona un número natural o índice del número de publicaciones y el número de citaciones por publicación. El índice-h fue descrito inicialmente en física, y demostró ser altamente predictivo de la actividad académica continuada, incluyendo medidas reconocidas de excelencia científica como ser miembro en la National Academy of Sciences o tener el Premio Nobel. Los porcentajes de citación, los entornos de investigación y los años de experiencia afectan al índice-h, y la comparación sólo es apropriada para científicos que trabajan en el mismo campo durante un periodo de tiempo similar. Los autores no conocen ningún documento que describa la distribución del índice-h entre los urgenciólogos universitarios.

Objetivos

Describir la distribución del índice-h de los urgenciólogos universitarios y determinar si la demostración de Hirsch del índice-h como factor predictivo de una actividad académica continuada entre los físicos podría también aplicarse a los urgenciólogos universitarios.

Metodología

Se identificaron urgenciólogos universitarios de las listas proporcionadas por las webs del programa de residencia de medicina de urgencias y emergencias de Estados Unidos. “Harzing's Publish or Perish,” un programa libre disponible en internet que busca en Google Scholar, se usó para calcular el índice-h de cada urgenciólogo universitario. Se analizó la concordancia entre puestos en un subconjunto de 100 urgenciólogos universitarios. Se llevó a cabo un análisis de los 20 urgenciólogos universitarios con los mayores índices-h para caracterizar la totalidad de su trabajo docente, y sus índices-h se calcularon en los años 12 y 24 de sus carreras.

Resultados

Se evaluaron 4.744 urgenciólogos universitarios de 136 programas. Nueve programas no publicitaron la lista del profesorado y se excluyeron. Se usó un coeficiente kappa para medir la concordancia del puesto, con una concordancia del 98,3% y una kappa de 0,92 (IC 95% = 0,861 a 0,957). La mayoría de los urgenciólogos universitarios tuvo un índice-h de cero o uno (59%), un 85% tuvieron un índice-h de menos de seis, un 95% tuvo un índice-h menor de 13 y un 99% tuvo un índice-h menor de 24. Un 10% de los urgenciólogos universitarios tuvo un índice-h/años en publicación de 0,5 o más. Para los mejores 20 urgenciólogos universitarios, la media de índice-h se incrementó de un 7,6 (desviación estándar [DE]± 4,6) a un 23,5 (DE ± 9,4) entre los años 12 y 24. La media de incremento del índice-h fue de un 15,8 (DE ± 7,6).

Conclusiones

El índice-h puede ser usado para caracterizar la productividad académica de los urgenciólogos universitarios. Un índice h/por año de 0,5 o más es característico de los urgenciólogos más productivos, y representa sólo un 10% de todos los urgenciólogos universitarios. El índice-h a los 12 años de los urgenciólogos más destacados estuvo fuertemente relacionado con su futura productividad académica. La distribución del índice-h entre los urgenciólogos puede proporcionar un medio para los investigadores individuales y los responsables universitarios para evaluar el rendimiento e identificar a los urgenciólogos de futuro éxito en la investigación en Medicina de Urgencias y Emergencias.

Scholarly activity has long been accepted as one of the means of building a reputation in academic medicine, alongside teaching, excellence in clinical practice, and service at the university, national, and international levels. Evaluating scholarly performance is required for decisions about who to hire, who to promote, or who needs additional mentorship. Even among those with well-established track records of publication, there is an ongoing question of relevance: is what they have written of value or use to the academic community?

Simple frequency counts of publications cannot address the issue of scholarly relevance and the effect of an individual's work on his or her field of study. Measures such as the impact factor, a means of ranking journals by citation frequency to determine their relative importance, have been introduced in an attempt to measure the impact of scholarly work.[1] However, the impact factor is a measure of the relative quality of academic journals and not of individual scholars who may be published in them. It is also possible that a clinician-scientist working in a relatively narrow area of study could be producing significant numbers of works of scientific importance. However, such works might appear in more specialized journals with lower impact factors, given the smaller audience inherent in such specialized fields, and thus may have fewer citations.

A potentially useful measure of academic impact would be a measure not only of the number of articles produced by an individual or the total number of citations, but a measure that combines publications and citations to estimate the effect of those articles within an individual's field of endeavor. An ideal metric should be sufficiently robust that it is not easily perturbed by a single, widely read article from a researcher who makes no further contributions to the field (although one or two seminal papers with large numbers of citations is itself a useful measure of academic impact), nor should it be susceptible to inflation by large numbers of publications that are rarely referenced or referenced only by self-citation. Such a metric would be easy to calculate so that comparisons could be made across a given academic field to assist in hiring and promotion decisions, as well as grant funding decisions, and more broadly, to judge the standing of individuals within their profession as a whole.

Hirsch's h-index (h) is a relatively new measure of academic productivity that has been suggested to meet all of the criteria outlined above.[2] The h-index is defined as follows:

A scientist has index h if h of his or her Np papers have at least h citations each, and the other (Np – h) papers have no more than h citations each.[2]

The two independent variables in the h-index (number of publications, and number of citations for each paper) are intended to make it a stable measure of continued productivity. This is depicted in Figure 1, where articles are ranked from most to least citations on the x axis and the numbers of citations for a given article are shown on the y axis.

Figure 1.

Calculating the h-index for an individual researcher.

Neither a single, widely acclaimed paper, nor a large number of obscure publications, will result in a high h-index. An individual's h-index will only increase if the researcher produces large numbers of papers that are also widely cited. Therefore, the larger the h-index, the larger the presumed impact the individual has had in his or her academic discipline. Hirsch also describes the rate of growth of the h-index per year (h/year, or m) and empirically derives values of m to describe the rates of h-index growth among subgroups of prominent scientists. While a high h-index in general identifies productive (in terms of research with high impact) individuals, the h per year may further stratify the most productive subgroup.

Anne-Wil Harzing has developed a software tool known as “Publish or Perish,” which calculates the h-index and several other numeric descriptors of academic productivity and uses Google Scholar as its database.[3] She notes that Google Scholar has two main advantages. First, it is freely available to the general public. Second, she notes several limitations to a widely used academic database, Web of Science, including the fact that it is limited to ISI journals, that its citation reference list is limited to citations from ISI-listed journals, and that Web of Science cited reference counts for non-ISI journals are only directed toward the first author.[4-6] Google Scholar does a superior job of eliminating duplicates compared to Web of Science and has better coverage of non-English sources.[7-9]

While the h-index and the h/year (m) are relatively easy to calculate for any individual, they have less value without a context in which to consider relative ranking. For example, in a study of radiation oncology, the mean h-index was 8.5, with a breakpoint of 15 between junior and senior faculty.[10] Yet in a study of urology, the mean h-index for full professors was only 10.7, with associate and assistant professors at 6.9 and 6.2, respectively.[11] It is therefore essential that the h-index distribution for any given specialty be adequately characterized so that accurate and generalizable conclusions can be drawn about an individual's place within the distribution (e.g., is this person on a par with other individuals up for promotion to associate professor in their specialty?).

In this study we characterize the h-index distribution for academic emergency physicians (AEPs) in the United States. We also demonstrate the predictive power of the h-index for top research performers in academic emergency medicine (EM).

Methods

Study Design

This was a cross-sectional design using existing publically accessible databases. The study protocol was reviewed by the University of Arizona Human Subjects Protection Program and determined to be exempt from review.

Study Setting and Population

The Society for Academic Emergency Medicine maintains a list of all active allopathic EM residency programs. From this list of programs, the corresponding websites were accessed to identify individual faculty members.

Study Protocol

For each faculty member, Harzing's Publish or Perish software was used to generate a preliminary h-index.[12] Harzing's software calculates multiple measures of academic productivity, including the h-index. Manual review was performed iteratively by data collectors to eliminate duplicate and extraneous references. A final h-index was then calculated for each faculty member. This process is outlined in Table 1.

Table 1. Algorithm Used for Calculation of the H-index
Procedure for h-index search:
1. Find program through SAEM website.
2. Visit program's website, find faculty profiles or list of faculty names.
3. Type full name into author field and “emergency medicine” into all of the words field of “general citation search.”
4. If all papers by this author are published with a middle initial, use the middle initial in addition to the first initial in all searches for this author. Otherwise, only use the first initial.
5. New search: For author type first initial (+/– middle initial) + last name.
6. Check some of the resulting papers to ensure that it is the correct author. Choose a few papers from each field of study as determined by the journal name.
7. Results:
  1. If all results are from the correct author, you are done.
  2. If a small proportion of the results is not from the correct author, individually uncheck those papers.
  3. If a large proportion of the results is not the correct author, modify your search parameters.
8. Modifying search parameters:
  1. If there are a small number of results (<50), find names of authors that are not the desired author and include their name in the “none of the words” search field. Now repeat step 7.
  2. If there are a large number of results (>50), further modify search parameters.
9. Include year parameters if available on faculty profile or state board of medical examiners website. Start from 8 years before graduation from medical school; end at the present year.
10. Repeat step 7.
11. Type “emergency medicine” into the “all of the words” search field.
12. Repeat step 7.
13. Check faculty profiles and the author's state board of medical examiners to try to find out where the author went to medical school and residency.
14. If author's medical school and residency are available, include the name of each as well as the name of the author's current program into the “any of the words” search field. If the information is not available, skip this step.
15. Individually uncheck undesired papers until you have only papers written by the desired author.

When all h-indices were calculated across all available programs, plots of the h-index distribution and the h-index increment per year (m) were created, as described by Hirsch[2] and Burrell.[13] From these curves the percentiles for key h-indices of interest were calculated.

A second, exhaustive search was performed for each of the top 20 academic EPs to identify their entire body of scholarly work. Their h-indices were then manually calculated at both 12 and 24 years from the start of their first academic appointment, reproducing the seminal work done by Hirsch, which demonstrated the predictive power of the h-index to identify future research productivity among physicists.[14]

Outcome Measures

The key outcome measures were the h-index for each AEP, and the associated h-index increment per year (m). A subgroup analysis was performed measuring the change in h-index between academic years 12 and 24 for the 20 AEPs with the highest indices in our analysis.[14]

Data Analysis

The h-index was calculated as described in the introduction (by LM). The h-index increment per year was calculated by dividing the h-index by the number of years since the physician's first publication. These values are plotted for each AEP identified, providing an overall distribution of the h-index for academic EM.

To ensure reliability, for 100 of the faculty in the data set, a second reviewer (ASJ) independently calculated the h-index and the two were compared. This group was selected as a stratified sample designed to be representative of the population as a whole. The distribution of this sample is depicted in Figure 2. The interrater agreement for this verification step was analyzed using a weighted kappa (linear weights).[15] The 95% confidence interval (CI) for the weighted kappa was estimated using a bias-corrected bootstrap method (500 replications, N = 100).[16] Interrater agreement was also assessed using the methods outlined by Bland and Altman.[17]

Figure 2.

Distribution of h-indices for representative, stratified sample of 100 individuals used for inter-rater reliability assessment.

We used a linear weighted kappa coefficient to measure interrater agreement to account for the continuous nature of the h-index.[15] Using the linear weighted kappa gives a greater penalty (or weight) to a greater difference between two independent raters so that a difference of 1 between two raters is weighted less than a difference of 2, which is weighted less than a difference of 3, and so on. Given the continuous distribution of the h-index, using a standard yes/no agreement classification (unweighted kappa) would be overly conservative and would ignore the continuous distribution of the h-index. Linear weighting was chosen because it has a relatively straightforward interpretation and derivation from standard 2 × 2 disagreement tables and has more desirable properties than quadratic weighting, a common alternative.[18, 19] Statistical analyses were conducted using Stata (version 12.1, StataCorp, College Station, TX).

Results

There were 145 allopathic EM residency programs in the United States at the time of data collection and analysis, and of these, 136 listed their faculty members on their website. The remaining nine programs were excluded from the analysis. From the 136 programs, 4,744 AEPs were identified. The distribution of the h-indices is presented in Figure 3. The annual change in h-index is presented in Figure 4.

Figure 3.

The distribution of the h-index among academic EPs.

Figure 4.

The distribution in growth of the h-index per year for academic EPs.

As shown in Figure 3, the vast majority (59%) of EM faculty had h-indices of either 0 or 1, and 85% had h-indices ≤ 6. Only 5% of EM faculty had indices h-index ≥ 13, and only 1% had h-indices ≥ 24. The annual change in h-index reflects a similar distribution (Figure 4). Only 10% of academic EPs had an annual increase ≥ 0.5 points per year.

Interrater agreement for the subset of scores calculated by two evaluators was as follows: 81% had a difference of zero (identical h-index), 13% differed by one point, and 6% differed by two or more points (1% by 2 points, 2% by 3 points, 2% by 4 points, and 1% by 7 points). The weighted kappa statistic was 0.921 (95% CI = 0.861 to 0.957), indicating excellent agreement between raters. For the two reviewers, the mean difference was 0.060 with limits of agreement of –2.102 to 2.222. Pitman's test of the difference in variance demonstrated r = 0.259, n = 100, p = 0.009. The Bland-Altman plot is provided in Figure 5.

Figure 5.

Bland-Altman plot showing limits of agreement for inter-rater reliability assessment.

For the top 20 AEPs, the mean (±SD) h-index increased from 7.6 (±4.6) to 23.5 (±9.4) between years 12 and 24. The mean increase in h-index for any given academician was 15.8 (±7.6). These results are presented in Figure 6.

Figure 6.

Changes in the h-index between years 12 and 24 for the top 20 academic EPs.

Discussion

Our study encompasses the bulk of the peer-reviewed literature that comprises the specialty of EM, given that all but 9 of 145 programs are represented in the data set. The data set includes nearly the entire population of AEPs from allopathic programs. While the majority of individuals have an h-index of zero or one, the range is from zero to 44 and encompasses the most junior and the most senior and accomplished AEPs. These facts together lead us to conclude that the h distribution for AEPs can be used to assess the relative scholarly productivity of AEPs.

The h-index was first described in the field of physics, and the h-index has also been shown to be relatively resistant to distortion by self-citations.[2, 20] While the h-index has been used to describe academic productivity in several disciplines, it is not a useful metric to compare individuals across all of academia, as the publication patterns of individuals vary widely between disciplines.[21] Therefore, while the productivity of individuals in any given field of study might well be characterized and compared using the h-index, it is essential that the distribution be rederived for each academic specialty to ensure that there is a rational basis for comparison between individuals.

Interest in the h-index has been increasing in medicine, and it has recently been used to characterize fields as diverse as otolaryngology, neuroradiology, urology, radiation oncology, and complementary and alternative medicine (CAM).[10, 11, 22-25] It has been used to predict overall influence of senior scientists and has been shown to be a strong predictor of academic rank in radiology and urology.[11, 22] These same data illustrate the variability between specialties, as a full professor of urology has a mean h-index of 22, while a full professor of radiology has a corresponding h-index of only 12.5. Choi et al.[10] describe a mean h-index of 8.5 for radiation oncologists, compared to mean h-index of Kulasegarah and Fenton[25] of 15 for otolaryngologists. CAM practitioners have not yet provided a comprehensive distribution of the h-index among CAM scholars, but rather describe “a few prominent” CAM researchers, with h-indices of 21 to 38.[24]

The utility of an individual's h-index is only as great as the quality of the underlying data set to which it is compared.[26, 27] Large disciplines with tens of thousands of practitioners and aspirants will have corresponding large bodies of work and small numbers of citations miscategorized due to similar spellings of names, typographical errors in references, etc., may arguably be insignificant. When the specialty involves a smaller number of practitioners, however, accuracy becomes paramount. While h-indices generated by automated tools can provide a useful starting point, the citation list output by such programs must be carefully reviewed and corrected before a final h-index for any individual is calculated.

Our own experience of generating the h-index distribution for the population of AEPs reflects the difficulties and limitations of using such tools. Despite trained reviewers, a clearly defined algorithm, and strict attention to detail, we were able to achieve excellent, but not perfect, interrater reliability. Erroneous faculty rosters, present or absent middle initials, common given names or surnames, or changes in academic affiliation between the time of a paper's submission and its ultimate publication all contributed to errors in calculation of any given h-index. Our kappa analysis suggests that our interrater agreement is sufficiently robust, however, and so the overall distribution should not be negatively affected. Our Bland-Altman plot and agreement analysis shows that as might be expected, the higher the h-index, the higher the variance of the difference between two raters.

With respect to any given individual, even a small number of missed or misattributed articles or citations may affect the calculation of his or her individual h-index. However, it is expected that an individual calculating his or her own h-index will have a vested interest in ensuring that it is accurate and will provide supporting documentation where necessary.

While there is a broad range of h-indices among AEPs, the majority are at the low end of the range. There are several possible reasons for this. First, EM is still a relatively young specialty, having only been recognized as a primary specialty by the American Board of Medical Specialties in 1989.[28] This fact alone has the combined effect of sharply limiting the available time frame, population of potential authors, and number of venues for publication. Given the rapid expansion of the specialty, it is likely that this effect will be diminished in the coming years as larger numbers of academics continue to enter the profession and contribute to the body of knowledge for the specialty.

For the top academicians in the specialty of EM, there is a significant increase in productivity between the 12- and 24-year values of the h-index, demonstrating that the h-index was a predictor of future academic productivity for these EPs. This is again consistent with the performance of the h-index in other specialties.

While Hirsch uses an h/year (m) of 1, 2, or 3 to differentiate among the most productive individuals in physics, we find that the top 10% of AEPs increase their h-indices by 0.5 or more points per year. This finding again underscores the importance of defining the h distribution for each specialty to place an individual's h-index or m in context. Further analysis is needed to determine the breakdown of this group and what characteristics separate them from their colleagues besides their publication track records. Such information may be useful to increase the amount of scholarly output (and therefore academic impact) of the remaining group of AEPs.

Academic medical specialties also fundamentally differ from basic science disciplines in that their mission encompasses patient care and bedside teaching, as well as peer-reviewed publications. The academic contributions of clinical faculty who may provide excellent patient care but do not publish large numbers of papers will not be reflected in the h-index. This is similar to the situation for the social sciences, where larger proportions of an academic's output may be in books and book chapters rather than journal articles.[3, 4, 7] The presence of these clinically oriented, nonresearch faculty in academic medical centers may be one factor contributing to the large proportion of AEPs who have h-indices of zero or one. This finding serves to reinforce Harzing's caveat that the h-index alone is insufficient to describe the totality of an academic career.[2]

When any new phenomenon is identified, description and characterization are necessary first steps. Future research using the data set compiled here may answer many other questions germane to academic EM. The relationship between academic rank and h-index among AEPs, and differences between programs and between AEPs within programs, can be characterized and studied further. Continued analysis of high-performing individuals or programs might help elucidate the reasons for their academic success.

Limitations

The greatest limitation of this study is that it provides a snapshot of the current scholarly activity of AEPs in the United States. As the specialty continues to mature, it is certainly possible and perhaps inevitable that the publishing patterns of AEPs will change or that other metrics may be developed that better characterize the totality of AEPs scientific productivity.

Because this study is limited to AEPs in U.S. allopathic EM residency programs, it is not known the extent to which osteopathic residency programs, nonacademic EPs, or EPs outside of the United States may contribute to the body of scholarly work for the specialty. Harzing notes that Google Scholar possesses some significant limitations, mostly related to what content is indexed in Google Scholar.[3] However, she notes that the social sciences in particular may benefit from improved coverage compared to Web of Science. Given the ease of calculating the h-index, though, it should be a relatively straightforward matter to start with an h-index estimated by Publish or Perish and then modify the h-index based on any additional citations uncovered from other sources.

The h-index seeks to summarize one of the principal metrics of academic performance: research productivity. By itself it cannot provide a full assessment of the value of any individual AEP, as it does not take into account teaching, clinical care, or academic and professional service.

Conclusions

The h-index can be used to characterize the academic productivity of academic emergency physicians. An average increase of h per year of 0.5 points or more is characteristic of the most productive academic emergency physicians. An academic emergency physician's h-index at 12 years is predictive of future productivity at 24 years.

Ancillary