Journal self‐citation trends in 1975–2017 and the effect on journal impact and article citations

This paper investigates journal self‐citation trends between 1975 and 2017. The research sought to answer whether articles that include journal self‐citations are more highly cited, if they affect the journal impact factor, and whether such articles are more relevant to their journals' content than others. We analysed approximately 24,000 active journals indexed in the Web of Science from 1975 to 2018 and found that, over time, whilst overall citations have increased dramatically, the percentage of journal self‐citations has reduced. Although self‐citations to recently published articles has increased since 2004, self‐citations also seem to have a decreasing effect on impact factor. High‐impact journals self‐cite recent publications more than lower‐impact journals but also cite recently published content more frequently than lower‐impact journals. There is a positive citation effect on articles that include journal self‐citations, and these appear more related to the current content of that journal based on the relatedness of references. Journal self‐citation can therefore be seen as a useful indicator to determine whether an article is a good fit for a journal and of interest to its readers. It may also contribute to higher visibility and impact of an article.


INTRODUCTION
The journal impact factor was initially proposed by Garfield (1955) as a method of evaluating scientific journals. However, it is now a major factor in the quantification of the quality of science in most countries (Not-so-deep impact, 2005;Powell, 2016;Buela-Casal & Zych, 2012), used to make decisions on hiring, promotion, tenure, and grants and to assess the scientific merit of universities, scholars, and departments. Meanwhile, this highly polemic metric is criticized for being vulnerable to gaming and manipulation.
Journal editors may attempt to manipulate their impact factors by means of coercive citations (The impact factor game, 2006; Agrawal, 2005;Falagas & Alexiou, 2008;Jacobs, 2016;Lynch, 2012;Reedijk & Moed, 2008), and it seems to happen more in high-impact journals (Fong & Wilhite, 2017). A coercive citation occurs where the editor of a journal forces an author to add a reference to a non-relevant document published in that journal. The editors may request that authors cite recent papers in their journal during the peer review process (Heneberg, 2016;Martin, 2013). In an even less satisfactory situation, an editor may reject an article because it lacks sufficient references to their journal (Mahian & Wongwises, 2015) or may choose to accept articles because they have already cited recent papers in their journal (Chorus & Waltman, 2016). Alternatively, it is thought that a large number of authors add superfluous citations to their articles to increase the chances of them being accepted (A. W. Wilhite & Fong, 2012). Chorus and Waltman (2016) introduced an indicator to investigate whether journals disproportionally cite their own papers from the past 2 years compared to the past 5 years. They found that, from 2004 onwards, journal self-citations to the past 2 years has risen sharply compared to the past decades. Their results are in line with the studies that claim prevalence of coercive citations in science.
Since coercive citations are difficult to identify, actual journal self-citations are generally seen as having the same purpose and are criticized for inflating the journal impact factor. Humphrey, Kiseleva, and Schleicher (2019) recently reported a dramatic increase in the number of journal selfcitations in business and management journals. As a consequence, the reliability of the journal impact factor has been questioned, and there have been recommendations that this indicator be calculated excluding journal self-citations (A. Wilhite, Fong, & Wilhite, 2019). However, some uncertainties have been expressed around removing journal selfcitations from the impact factor calculation due to insufficient evidence as to the prevalence of coercive citations (Andrade, González-Jonte, & Campanario, 2009;Kirchhof, Bornfeld, & Grehn, 2006;Sugimoto & Cronin, 2013). Moreover, there does not seem to be a consensus among researchers that editorial requests for self-citations are unethical; rather, they are often considered an expression of the loyalty of authors and editors to the journal (Krell, 2010). It is possible that journal self-citations represent a convergence of interests between the journal's authors, editors, and readers. Articles with more journal self-citations are presumably more relevant to their own journal content, especially in niche fields where a relatively high level of journal self-citations might be expected and reasonable. See Retraction Watch (2020) for a debate on journal suppression by Clarivate due to excessive self-citation and note that Clarivate does provide a citation metric, Journal Impact Factor (JIF) Without Self Cites.
While, as discussed above, a great deal of research argues that the dark side of journal self-citation is a powerful tool to enable journal editors to superficially boost their journal impact factor, no previous study has sought to explore journal self-citations to investigate whether journal self-citations are more relevant to the journal content and may help enhance the visibility and citation impact of an article. Moreover, there has been no previous examination of the extent of journal self-citations over a long period in relation to their effect on the impact factor. Therefore, this study aims to uncover the incentives for journal self-citation by looking at the citation impact of articles where there is a high instance of journal self-citation and the articles' appropriateness for the journals' readers. We also investigate journal self-citation rates over a long period. These objectives were addressed by asking three questions: • Have journal self-citations become more widespread over time? • Do papers with higher levels of journal self-citations subsequently receive more citations than other papers?
• Are articles with more journal self-citations more appropriate for the journal's readers?

Journals
This study uses all journals indexed in the Science Citation Index  (e.g. 2005-2007). This 3-year window was chosen because the calculation of a journal's impact factor requires citable documents from three consecutive years. The year 1975 was chosen because the first comprehensive list of impact factors was produced in the 1970s (Garfield, 1972). The 24,313 active journals published around 55 million papers in the selected time span.

Subsets of citations
Citations to articles are categorized into two groups as follows: • Total citations represent the total number of references that an article made to other papers published in the previous 2 years, that is, the impact factor years, for example, all references in an article published in 2018 to papers published in 2016 and 2017. It should be noted that only citations to articles indexed in WoS were considered in this study.

Journal impact factor based on journal selfcitations
To calculate this, 'the impact factor without self-citations' (1) for each journal together with 'the total impact factor', that is, the impact factor measured based on total citations including selfcitations (2), were retrieved from Journal Citation Reports (JCRs).
Then, (1) was subtracted from (2), and the result was called

Key points
• Articles that cite the journal in which they are published may receive higher citations themselves and may be more closely aligned to the journal's community.
• There is no evidence that journal self-citation is having a positive impact on the impact factor, and self-citation has reduced from over 20% of all references in 1975 to less than 15% by 2017.
• High-impact journals are more likely to include selfcitations to articles from the previous 2 years.
'journal impact factor based on journal self-citations only'. The average journal impact factor based on journal self-citations was then calculated for all journals in each year from 1997 to 2017.

Normalized citations
The number of citations received by a document was normalized by its type, subject field (i.e. WoS research areas), publication year, and its number of references made to WoS documents because citation behaviour varies by these factors. It is well researched that citation behaviour varies across different fields, publication years, or document types. Previous research also shows that articles with more references may receive more citations (Didegah & Thelwall, 2013). So, the number of references in each article was also normalized to prevent the increase in references observed during this period, affecting the journal self-citation rate. Based on the number of references citing WoSindexed journals, papers are grouped into five categories: papers that include 1-5 references, 6-11 references, 12-20 references, 21-33 references, and more than 33 references. Each group of papers comprises 20% of total papers in the WoS published between 1997 and 2017.
To normalize the number of citations, two steps were taken.

Article relevancy
The article's relatedness to all the content published in the same journal was measured through bibliographic coupling, that is, the number of references shared by the article and all other articles, published in the same journal within the last 2 years. To calculate the relatedness measure, for example, consider article X that was published in 2017. For that, the number of references that are common between article X and other articles published in the same journal between 2015 and 2016 was counted. Suppose article X has one shared reference with article Y and three shared references with article Z. That said, article X has four common references with its journal's articles; even if its common reference with article Y is the same as one of the shared references with article Z, we do not count distinct common references and consider it two common references.
The number of references a paper has and the number of references that are shared by different papers vary from one subject field to another; hence, there was a need for normalization. The shared references were normalized by document type, subject field (i.e. WoS research area), publication year, and number of references made to WoS documents. To control for the effect of journal self-citation, the number of shared references was measured twice: (1) with and (2) without journal self-citations.

Journal quartiles
Journals in the JCR are assigned to a quartile based on its impact factor. Since a journal may be classified under several subject categories, an average impact factor percentile was calculated for each journal in each JCR year. To calculate the average impact factor percentile for journal X in year 1, suppose journal X is classified under categories A, B, and C. In category A, journal X may fall in quartile one, while in categories B and C, it may be in the second or fourth quartiles, respectively. Now, an average impact factor percentile means the average of the three percentiles for journal X, which prevents the journal being included in the calculations more than once.
The journals were assigned to four quartilesquartile 1 (Q1), quartile 2 (Q2), quartile 3 (Q3), and quartile 4 (Q4); Q1 journals are the highest-impact journals, and Q4 journals are the lowest-impact journals. As the JCR data have only been available online since 1997, only journals published between 1997 and 2017 were assigned to a quartile.
As an alternative method, we also classified journals into quartiles based on their best score in the subject categories. For example, suppose journal X is categorized in subject categories A and B. In subject A, journal X gets into Q1, while it falls under Q2 in subject B. The best score for this journal is Q1. We examined the difference between the average percentile and the best score methods for the share of papers and citations. The difference was trivial, falling between 1 and 2%. We therefore decided to pick the average impact factor percentile to classify journals into quartiles.

RESULTS
Have journal self-citations become more widespread over time?
To investigate trends of journal self-citations over a long period of time , we studied both 'total number of journal self-citations' and also 'share of journal self-citations' over the period. Investigating only citations to the previous 2 years, we found that the total number of journal self-citations seems to be levelling off ( Fig. 1: dashed line), while there is a sharp increase in total number of citations over time ( Fig. 1: dotted line); this shows that the percentage of recently published articles being cited is generally increasing.
However, the share of journal self-citations from total number of citations has decreased from 21 to 12% between 1975 and 2018 ( Fig. 1: blue  To investigate the impact of self-citations on the journal impact factor, 'the average impact factor based on journal selfcitations' in each year was divided by 'the average total impact factor' in the same year. Based on the results, the share of 'journal impact factor based on journal self-citations only' decreased from 13% in 1997 to 9% in 2017 (Fig. 2).
Moreover, we calculated the proportion of journal selfcitations counted for the impact factor across different quartiles.   As illustrated in Fig. 3, the proportion of journal self-citations counting towards the impact factor (as a percentage of total citations) is higher in the Q1 journals (15%) than in the lower-impact journals (9%). However, when looking at the percentage of selfcitations to publications in the past 2 years (the impact factor window) compared to older articles, the picture is reversed, with Q4 journals citing a higher percentage of recent articles than Q1 journals (although the difference is small).
The percentage of articles with journal self-citations to the impact factor years is higher in the high-impact journals than in the lower-impact ones (44 and 19% for Q1 and Q4 journals, respectively; see Fig. 4). However, regardless of their impact, journals cite about the same percentage of articles published in other journals during the impact factor years (48 and 47% for Q1 and Q4 journals, respectively; see Fig. 4). Thus, even though Q1 journals publish more papers with journal self-citations to impact factor years than the lower-impact journals, they still cite articles from other journals more (44% of their articles cite their own journals, while 48% of their articles cite other journals also; see Fig. 4).
Do papers with higher levels of journal selfcitations subsequently receive more citations? Figure 5A,B show that the more an article self-cites papers published recently in the same journal, the more citations it may subsequently receive. Figure 5A demonstrates a positive correlation between the number of times an article self-cited the journal it was published in and the average number of normalized citations it has received. The same result is obtained when the selfcitations are removed from the calculation of normalized citations; however, the degree of association between the number of journal self-citations made and the average number of normalized citations received varies by the impact of journals. While the increase in the normalized citations is significant for Q1 journals, no significant association is found for Q3 and Q4 journals (Figs. 5A,5B).
Are papers with more journal self-citations more appropriate for the journal's readers?
A measurement was made of the relatedness of the articles with journal self-citations to the journal content. The results show that the more an article cites papers published in the same journal, the greater the relationship it has to the content recently published in that journal. This applies to both high-impact and lower-impact journals. However, the degree of relatedness varies by the impact of journals (Figs. 6A,B). The articles from Q1 journals are more related to their journal (degree of similarity ranges from 1.08 to 3.38) than the lower-impact journals (degree of similarity ranges between 0.38 and 1.27) (Fig. 6A).

LIMITATIONS OF THE STUDY
This study has limitations of its own that require further investigation. For instance, the decline in the journal self-citation rate (The blue line in Fig. 1) may be due to other factors such as an increase in research over the years, more journals indexed in WoS, or the emergence of research communication channels other than journals. It is therefore possible that the journal selfcitation rate could have been normalized by these factors. Moreover, the decreased trend could be also an element of increased multidisciplinary research that should be controlled. Such analyses are beyond the scope of this study, but they may be considered in a follow-up study. An analysis across different disciplines to compare the results could be very interesting, but it may be more suitable for a second study.

DISCUSSION AND CONCLUSION
This study aimed to shed light on aspects of journal self-citation where it contributes to article impact and relevancy. The findings reveal that the number of journal self-citations is levelling off and shows a steady trend, while total number of citations has dramatically increased over the last few decades. The proportion of journal self-citations to the impact factor years as a fraction of total citations to these years has decreased over the last few decades ( Fig. 1: blue line); however, the proportion of journal self-citations to the impact factor years as a fraction of total journal selfcitations to all years shows an upward trend since 2004 ( Fig. 1: orange line), which concurs with the results in Chorus and Waltman (2016). Chorus and Waltman (2016) suggest that their findings may be the result of increasing malpractices in science, but they further explain that there are many legitimate reasons for journal self-citations.
Figure 2 also shows that, over time, journal impact factor has become more dependent on the number of citations received from other journals and not on journal self-citations, which contradicts claims about the unreliability of this indicator in relation to self-citation (Agrawal, 2005;Jacobs, 2016).
High-impact journals appear to publish more papers with journal self-citations to impact factor years than the lower-impact journals. However, the Q1 journals also cite articles from other journals more (Fig. 4). The claim, therefore, that higher-impact journals contain a larger number of coercive citations, as also Fong and Wilhite (2017) found from a survey analysis, is somewhat debated and requires more in-depth investigation.
The results of this study also show that articles with more journal self-citations to the impact factor years are more related to the current content of the journal (Fig. 6). Moreover, they also receive more citations, although it varies by the journal quartile ( Fig. 5). Authors are inclined to publish their articles in prestigious journals that have a vast number of readers. However, in order to be published in such journals, they need to submit relevant work that is of interest to the readers and meets their needs. The references given in an article to some extent determine the types of issues addressed in an article and how far the chosen issues and recommended solutions align with the needs of the readers. Bibliographic coupling was used to gauge the degree of relatedness between an article citing recent articles in the same journal and other recent articles in the same journal. The results suggest that articles with higher numbers of journal self-citations are more related to other articles in the same journal on the basis of the references they share. Therefore, it is not surprising that the editors of high-impact journals have a strong tendency towards accepting articles with higher numbers of journal self-citations as their authors will have taken into consideration related issues and problems that are of great interest to the reader community. More interestingly, the findings reveal that articles with more journal self-citations also receive more citations, confirming their relevancy to the journal and the reader community.
In conclusion, while some studies have criticized journal selfcitations because of its probable role in impact factor inflation (A. Wilhite et al., 2019), this kind of citation may very likely demonstrate the convergence in research interests between the author and the journal, which will be rewarded by a higher citation impact. Journal self-citation is, therefore, a useful indicator that determines the extent to which an article is a good fit for a journal and of interest to its readers and helps the article achieve greater visibility and impact. Here, high-impact journals benefit to an even greater extent because their editors accept articles that are more relevant to, and meet the informational needs of, their readers. Hence, we recommend that a balance be achieved between the number of journal self-citations and the number of citations to other journals.