The effectiveness of journals as arbiters of scientific impact

Abstract Academic publishers purport to be arbiters of knowledge, aiming to publish studies that advance the frontiers of their research domain. Yet the effectiveness of journal editors at identifying novel and important research is generally unknown, in part because of the confidential nature of the editorial and peer review process. Using questionnaires, we evaluated the degree to which journals are effective arbiters of scientific impact on the domain of Ecology, quantified by three key criteria. First, journals discriminated against low‐impact manuscripts: The probability of rejection increased as the number of citations gained by the published paper decreased. Second, journals were more likely to publish high‐impact manuscripts (those that obtained citations in 90th percentile for their journal) than run‐of‐the‐mill manuscripts; editors were only 23% and 41% as likely to reject an eventual high‐impact paper (pre‐ versus postreview rejection) compared to a run‐of‐the‐mill paper. Third, editors did occasionally reject papers that went on to be highly cited. Error rates were low, however: Only 3.8% of rejected papers gained more citations than the median article in the journal that rejected them, and only 9.2% of rejected manuscripts went on to be high‐impact papers in the (generally lower impact factor) publishing journal. The effectiveness of scientific arbitration increased with journal prominence, although some highly prominent journals were no more effective than much less prominent ones. We conclude that the academic publishing system, founded on peer review, appropriately recognizes the significance of research contained in manuscripts, as measured by the number of citations that manuscripts obtain after publication, even though some errors are made. We therefore recommend that authors reduce publication delays by choosing journals appropriate to the significance of their research.

Meanwhile, the great disparity among the opinions of reviewers (Fox, 2017) and the lack of incentives for researchers to contribute to peer review (Hochberg, Chase, Gotelli, Hastings, & Naeem, 2009) limit the effectiveness of the peer review process and threaten to reduce the quality of published research. These concerns have led to calls to reform the peer review-based system of academic publishing (Aarssen & Lortie, 2009;Grossman, 2014;Lortie et al., 2007).
Nevertheless, we have little understanding of the quality of editorial decisions (Ioannidis, 2018). Study of the academic publishing system is complicated by its sensitive and confidential nature. The submission history of manuscripts is rarely public, especially for rejections. Moreover, controlled experiments, such as the simultaneous submission of manuscripts to multiple journals, are disallowed on ethical grounds (Larivière & Gingras, 2010). There is thus a pressing need to evaluate the effectiveness of journals as arbiters of scientific quality.
We surveyed authors of papers published in 146 journals in the research domain of ecology. We queried them on the passage of their manuscripts through the stages of submission, peer review, rejection, and revision, prior to eventual publication. Here, we examine the relationship between editorial decisions and the eventual impact of published papers. We examine the effectiveness of journals as scientific arbiters using three complementary analyses: discrimination against manuscripts of low perceived quality or significance, positive selection of manuscripts that have the greatest scientific impact, and the frequency of mistaken rejections. Higher profile journals are expected to strive to be effective arbiters of scientific influence, given their goal of publishing high-impact research (Bornmann, Mutz, Marx, Schier, & Daniel, 2011). Therefore, we also assess the degree to which journal performance varies with journal impact factor (JIF). In the absence of a perfect metric of quality and significance for scientific research, we examine the number of citations obtained by a published article. Citations are an imperfect measure of influence because they miss types of impact that do not lead to a reference in a research publication. However, they can be objectively quantified (Rashid, 1991), and they covary with other measures of scientific influence (Mingers & Xu, 2010), including article downloads (Perneger, 2004). Thus, we use the number of citations as the best available indicator of the quality and significance of research, remaining mindful of the fact that they are an imperfect proxy.

| DATA COLLEC TI ON
We obtained metadata, including author details and citations ob- We sent questionnaires to the corresponding authors of a subsample of these manuscripts. First, we randomly selected 100 articles from every journal by publication-year combination, excluding those classified by WoS as review papers. We then filtered this dataset to include only one randomly chosen article per corresponding author, yielding 38,017 articles. This stratified sampling assured that we had a representative sampling of articles from every journal and publishing year. Second, we extracted articles written by corresponding authors not in the first dataset and again randomly selected one article per corresponding author such that each author was represented only once, yielding an additional 15,579 articles. After excluding duplicated email addresses, 52,543 unique corresponding authors remained.
Using the Qualtrics platform, we sent questionnaires to each corresponding author to request information about the publication history of their paper (Appendix Data 1). We requested details on the history of the published article, including the journals to which the manuscript had been sent, whether it was invited by the journal and the year and outcome of each submission. In total, 12,655 authors, or 24.1% of those contacted, responded to our questionnaire. After removal of incomplete or unintelligible responses, invited manuscripts, and repeated rounds of submission to the same journal, 10,580 questionnaires remained, with a total of 16,981 rounds of manuscript submission.
The data collected through our questionnaire included personal identifiers. Thus, it was essential to maintain the confidentiality of all participants. The manuscript, figures, appendices, and datasets were anonymized to maintain the privacy of authors. CET Paine was the only person with access to data that contained personal identifiers.
Human subjects' ethical approval for this study was obtained from the University of Stirling.

| DE SCRIP TIVE RE SULTS
Of the 10,580 manuscripts about which authors answered our query, 64.8% were published in the first journal to which they were submitted, whereas those rejected at least once were submitted to a mean of 2.42 journals and took on average 478 days to be published ( Figure 1). 13.0% of manuscripts took two or more years from first submission to publication, and 1.6% took four or more years, with the time to publication increasing by 234 days per rejection. 3.4% of manuscripts were rejected from four or more journals.
Fourteen journals accepted all submissions and 17 rejected >50% of the manuscripts (Appendix Table A1). Overall, rejection rate was strongly positively associated with journal impact factor (Appendix Figure A1). Rejection after review was more frequent than desk rejection at most journals, but this reversed at journals with JIF greater than nine (Generalized linear mixed-effect model: p < 0.0001).
Questionnaire-derived rejection rates were positively associated with, but consistently underestimated, true (journal-reported) rejection rates for the five journals for which independent estimates of rejection rates were available (major axis regression; desk rejection: p = 0.0009, rejection after review: p = 0.07; Appendix Figure A2); author-reported rejection rates were 58%-71% of true rejection rates.

| ANALY TI C AL ME THODS AND RE SULTS
To determine the effectiveness of journals as arbiters of scientific significance, that is, how well they screen out low-impact papers and accept high-impact papers, we need a metric of the significance of both accepted and rejected papers. However, manuscripts rejected by a particular journal are (by definition) never published by that journal. This means that the effects of rejection on the number of citations obtained are confounded with the effects of constructive feedback from the editor and reviewers and the influence of the publishing journal (Larivière & Gingras, 2010).
Moreover, rejected manuscripts are published later, following additional rounds of resubmission and review. Finally, there is a "halo effect" of publishing in prominent journals, which stems, in part, from the tendency of authors to disproportionately cite articles from higher impact journals (Merton, 1968). Thus, it is difficult to directly compare the counts of citations between accepted and rejected manuscripts. We do so using four complementary metrics, the performance ratio, the gatekeeping ratio, the rejection rates of future high-impact studies, and the frequency of mistaken rejections (summarized in Table 1). To reduce heteroscedasticity, these metrics were log-transformed prior to analysis. Analyses were conducted in R 3.4.4 (R Core Team 2018). Significance tests and confidence intervals were determined through parametric bootstrapping, implemented in packages lme4 and pbkrtest (Bates, Mächler, Bolker, & Walker, 2015;Halekoh & Højsgaard, 2014).

| Methods
The first aspect of scientific arbitration is the degree to which journals identify and reject low-quality manuscripts. If a journal is an effective arbiter of scientific impact, the manuscripts they reject should go on to gain fewer citations, once published in a different journal, than papers published in that journal and year that had not previously been rejected. A mean "performance ratio" for manuscripts rejected from a particular journal of <1.0 indicates that that journal, on average, correctly rejects low-impact manuscripts and is therefore an effective scientific arbiter (Table 1).
To assess this aspect of scientific arbitration, we first calculated, for every combination of journal and publishing year, the median number of citations obtained by studies that were published by that journal that had not previously been rejected from another journal (i.e., "first intents", Calcagno et al., 2012). Medians capture the central tendency of the skewed distribution of citation numbers for articles within journals better than do means. We then calculated, for each manuscript that had previously been rejected from another journal, the ratio of the number of citations it obtained to the median number of citations for first-intent manuscripts at the relevant publishing journal and year. We added one to the numerator and denominator to avoid divide-by-zero errors for journals where the median number of citations was zero. Because both the numerator (citations obtained by previously rejected papers) and denominator (median citations obtained by papers not previously rejected) are calculated including F I G U R E 1 A summary of time to acceptance and number of journals to which manuscripts were submitted. Points are scaled to the number of manuscripts in each category, which is also represented numerically. Distributions are truncated to six journals and six years for presentation. Histograms at the top and right summarize the univariate distributions only papers for which we had survey responses, any nonresponse bias toward (or against) high-impact papers should not influence this metric. Because we expected that the degree of underperformance of previously rejected papers would covary with difference in JIF between the rejecting journal and the publishing journal, we predicted this "performance ratio" as a function of the percentage change in JIF between journals in a linear mixed-effect model. The percentage change in JIF between journals was binned into categories of < −60%, −60% to −20%, −20% -+20%, and > +20%. The model included rejecting journal and publishing year as random effects.

| Results
78.0% of rejected manuscripts were resubmitted to journals of lower JIF. Manuscripts that were resubmitted to a journal of similar or higher JIF obtained 16.3% and 21.9% fewer citations, respectively, than the median study published in the same year and journal that had not previously been rejected ( Figure 2). Papers resubmitted to journals with 20-60% lower JIF than the rejecting journal performed equivalently to other manuscripts published in that same journal and thus appear to have found the right outlet for their significance. Papers resubmitted to substantially lower JIF journals, on the other hand, obtained 3.3% more citations than did direct submissions (p < 0.0001; statistical details of all analyses are available in Appendix Table A2). These results indicate that manuscripts rejected by editors generally underperform (in terms of number of citations) other papers had they not been rejected and thus that journals disproportionately reject manuscripts with reduced potential to advance their field of study, compared to those they accept.

| Methods
The performance ratio, described above, quantifies the degree to which previously rejected manuscripts underperform first intents (manuscripts published in the first journal to which they are submitted) within a publishing journal. Next, we examine the degree to which editors are appropriately rejecting low-impact manuscripts.
Specifically, we asked whether papers rejected from each individual journal k generally go on to under or overperform the average paper in their final publication outlet, j, and tested how this varied across journals. We assessed this by calculating a "gatekeeping ratio," which is similar to the performance ratio ( Figure 2) in that previously rejected papers are compared to papers published in the same final journal, but differ in that the gatekeeping index averages relative performance of papers across submissions to the rejecting journal, rather than across submissions to the publishing journal. We calculated the median number of citations obtained by all articles published in journal j in year y, divided by the number of citations obtained by manuscript i rejected from journal k and published in journal j in year y (Table 1). We calculated the gatekeeping ratio of every manuscript that experienced a rejection prior to publication. We estimate the gatekeeping ratio for journal k by taking the mean over manuscripts rejected by that journal. A "gatekeeping ratio" of 1.0 for a journal indicates that papers it rejected go on to be just as well cited as does the median paper in the journal in which they are eventually published. Ratios > 1.0 indicate that manuscripts rejected by the journal obtain fewer citations than the median paper published in their final outlets, with higher ratios indicating a greater difference.
We predicted the gatekeeping ratio of each rejected manuscript as a function of the editorial stage at which it was rejected (preor postreview) and tested how it varied with the impact factor of the rejecting journal, using a linear mixed-effect model that allowed for random variation among rejecting journals. The magnitude of the gatekeeping ratio is also affected by the difference in prominence between journals j and k. We account for this difference by including the log-transformed ratio of their JIFs as a predictor in the model. There was no evidence for an effect of rejection stage (i.e., desk rejection versus postreview rejection) on gatekeeping ratio (p = 0.28), so we deleted that term and its interaction with JIF from the models.
TA B L E 1 Summary of metrics used to assess the effectiveness of journals as scientific arbiters. Although this study focuses on studies published in 146 journals in the domain of ecology, some analyses included more journals, as many of the studies we examined had been rejected from journals outside that set

| Results
Papers that had been rejected underperformed the median paper in the publishing journal for 93% of journals ( Figure 3). Manuscripts that were rejected gained just 40.3% as many citations (averaged across rejecting journals) as the median paper published in the publishing journal (IQR: 29.2%-52.4%). The number of citations obtained by rejected manuscripts was negatively associated with the change in JIF between rejecting and publishing journals (p < 0.0001), meaning that papers that cascade further down in JIF rankings between submissions underperform less, consistent with the performance ratio results (Figure 2). Regardless of the change in JIF, however, gatekeeping ratios were consistently >1.0.
Gatekeeping ratio was positively associated with JIF, meaning that papers rejected from higher JIF journals underperformed papers published in the publishing journal more strongly than did papers rejected from lower JIF journals (p < 0.0001; Figure 3).

| Methods
An effective scientific arbiter would be sensitive to, and more likely to accept for publication, high-impact manuscripts that go on to make greater contributions to the field of study than do solid but run-ofthe-mill manuscripts (Siler et al., 2015). In contrast to the previous analyses, here we focus on the ability of journals to detect the most impactful studies. We categorized submissions to each journal in each year into citation quantiles as "run-of-the-mill" or "high-impact," depending on whether they received more than the 90th percentile of citations received by manuscripts published in that journal in that year. This categorization was performed separately for each journal and year, making it independent of the rejection rates of journals and accounting for the nonlinear accumulation of citations through time.
Journal-by-year combinations for which fewer than ten manuscripts were available in our dataset were excluded. We predicted the probability of manuscript rejection as a function of citation quantile and editorial stage (desk rejection or rejection after review), allowing for random variation in the effect of citation quantile among journals, in a generalized mixed-effect model. Residuals were assumed to follow a binomial distribution, as manuscript rejection is a binomial (reject/ not reject) process.
To test the assertion that high-JIF journals are more effective scientific arbiters than are less prominent journals, we predicted the rejection rate for every combination of journal, citation quantile, and editorial stage from a generalized linear mixed-effect model. To do so, we calculated the "rejection ratio" for each journal at each stage as the estimated rejection rate for future run-of-the-mill papers divided by the estimated rejection rate for future high-impact papers (Table 1). A ratio of 1.0 indicates that a journal was equally likely to reject "high-impact" and "run-of-the-mill" manuscripts, with increased effectiveness indicated by larger values. Finally, we predicted the rejection ratio as a function of journal impact factor and editorial decision (desk rejection or rejection after review) in a weighted linear regression. Weights were the number of manuscripts submitted to each journal, to account for variation in the number of submissions among journals. Note that this analysis does not assess how gatekeeping effectiveness may vary through time. Accordingly, we used a time-averaged impact factor for each journal. There was no evidence of different JIF rejection ratio relationships between stages, so we simplified the model by deleting the interaction.

| Results
Overall, manuscripts that went on to become high-impact papers were significantly less likely to have been rejected, both before (desk rejection) and after review, than were manuscripts that became runof-the-mill papers (Figure 4; p < 0.0001). By this metric, journals were twice as effective as scientific arbiters at the desk rejection stage than after review (p < 0.0001). At the average journal, highimpact manuscripts were 23.0% as likely to have previously been F I G U R E 2 Manuscripts that are rejected from one journal and resubmitted to another tend to gain fewer citations than those accepted at the first journal to which they were submitted (i.e., "first intents"). The "performance ratio" of resubmitted manuscripts was lowest for manuscripts sent to journals of a greater impact factor than the original journal. The shaded region highlights manuscripts that were resubmitted to a journal of similar JIF to that from which they were rejected. Mean estimates are shown with 95% confidence intervals, obtained by parametric bootstrapping. Points indicate the mean performance ratio for each rejecting journal and are sized according to the number of manuscripts rejected by each journal in each category. Selected prominent journals are indicated with colored symbols. The number of manuscripts in each percentage change category is shown. All groups differ significantly in performance ratio (Tukey's honest significant difference, p ≤ 0.0051) desk-rejected than were run-of-the-mill manuscripts, but 41.1% as likely to have been rejected after review. This indicates that editors are less likely to reject future high-impact papers than they are to reject future run-of-the-mill papers, but that they have a fairly high error rate, especially after review. Journals varied significantly in the rate at which they rejected high-impact studies, as indicated by a significant interaction between publishing journal and citation quantile (p = 0.0001).
High-JIF journals were generally better at distinguishing future high-impact papers than low-JIF journals (p < 0.0001; Figure 5).
High-JIF journals were much more likely to accept high-impact manuscripts relative to run-of-the-mill manuscripts, whereas the disparity in rejection rates between these two types of manuscripts was smaller at low-JIF journals. Doubling JIF was associated with a 24% increase in a journal's rejection ratio. The strength of the relationship between JIF and rejection ratio did not differ between editorial decision stages (p = 0.36; Figure 5). By this metric, 74% of journals examined could be considered effective arbiters of science, in that they were less likely to reject future high-impact papers than future run-of-the-mill papers both before and after review. 18% of journals were ineffective at distinguishing these two types of papers at one stage or the other, and 6% were ineffective gatekeepers at both stages (Appendix Figure A3). Notably, some of the most prominent journals were no more likely to accept high-impact manuscripts than were substantially less prominent journals ( Figure 5).

| Methods
As a final metric of scientific arbitration, we define two types of errors in manuscript handling by journals: the rejection of manuscripts that go on to (a) obtain more citations than the median paper published in the rejecting journal or (b) become high-impact papers in the publishing journal. For the former analysis, we considered only rejected manuscripts that were resubmitted to journals of lower JIF.
If such manuscripts outperform their original journal, they are likely to be true mistakes, as they have had to overcome a decrease in journal prominence. However, the possibility that authors have increased the quality of their manuscript through substantial revision cannot be discounted. In both analyses, we predicted the frequency of mistaken rejection as a function of the JIF of the rejecting journal using generalized linear models with a binomial error distribution.

| Results
3.8% of rejected manuscripts went on to become papers that gained more citations than the median article in the journal that rejected them, whereas 9.2% of rejected manuscripts went on to become high-impact papers in the publishing journal ( Figure 6). Thus, both types of rejection mistakes were infrequent. The frequency of the first type of error was independent of JIF, whereas the frequency The effectiveness of journals as scientific arbiters, as measured through the "gatekeeping ratio," increased with the impact factor (JIF) of the rejecting journal. Each point is one journal mean, with point size scaled to the number of manuscripts evaluated. Regression lines are shown with 95% confidence intervals. The solid line indicates rejected manuscripts resubmitted to journals with JIF half of the rejecting journal, whereas the dashed line indicates rejected manuscripts resubmitted to journals with JIF twice that of the rejecting journal. The horizontal dotted line indicates a gatekeeping ratio of 1.0, meaning that rejected manuscripts obtain the same number of citations as the average paper in their publishing journal F I G U R E 4 Most journals were more likely to reject run-ofthe-mill manuscripts than high-impact manuscripts. High-impact manuscripts are defined as the 10% of manuscripts that obtained the most citations in each individual journal, and run-of-themill manuscripts are the remaining 90%. Mean estimates are shown with 95% confidence intervals, obtained by parametric bootstrapping. Each point represents a journal, with point sizes scaled by the number of manuscripts evaluated of the second type increased with increasing JIF (generalized linear models: p = 0.685 and p = 0.0012, respectively).

| D ISCUSS I ON
Interacting with the academic publishing system imposes a steep burden on researchers, who struggle with rejections of their manuscripts, potentially from multiple journals. They tolerate it because publication in peer-reviewed journals is linked to professional rewards, including hiring, promotion, and research funding (Merton, 1968;Ziman, 2000;Zuckerman & Merton, 1971). The continuing acceptance of the academic publishing system depends, at least in part, on the understanding that editors effectively sort manuscripts into outlets appropriate to their level of impact and that they help to improve manuscript quality (Ioannidis, 2018). Our results indicate that the journals we investigated distinguish, by and large, the quality of the studies they receive and publish and are therefore effective arbiters of scientific quality. Our results thus counsel for moderation in efforts to reform the system of academic publishing (Aarssen & Lortie, 2009;Grossman, 2014;Lortie et al., 2007). Previous studies of journals as arbiters of research impact have examined at most a few journals (Bornmann et al., 2011;Braun, Dióspatonyi, Zsindely, & Zádor, 2007;Siler et al., 2015;Zuckerman & Merton, 1971), whereas our study includes all 146 of the Web of Science-indexed journals in the domain of ecology. We therefore expect our results to apply widely across academic publishing.
Journals varied tremendously in the degree to which they screened out low-impact manuscripts and identified high-flyer manuscripts. On average, high-JIF journals were more likely to reject manuscripts that underperform (in citations) an average paper published in that same journal (Figure 3). They were also less likely to reject high-impact studies ( Figure 5). The success of high-JIF journals at identifying high-impact papers may be because they generally have large editorial boards, allowing them to assign papers to editors with the most appropriate experience for the papers being evaluated. Similarly, their prominence likely allows them to recruit higher profile editors and reviewers, the experience of whom presumably makes high-JIF journals better able to identify studies of great scientific merit. High-JIF journals likely also receive more papers that are inappropriately targeted to the wrong tier of journal, because of benefits accruing to authors that successfully publish in a high-impact journal (Ziman, 2000;Zuckerman & Merton, 1971 and not-for-profit publishers and assess gatekeeping effectiveness in terms of a journal's support for reproducible research (Ioannidis, 2005). Overall, our analyses strongly support the interpretation that high-JIF journals are effective, but not perfect, at discerning highimpact research from less significant research.
What is the role played by editors and reviewers in scientific gatekeeping? The gatekeeping ratio-the degree to which previously rejected papers underperform the average manuscript published in their final outlet-did not differ between editorial stages (i.e., preand postreview), but the rejection ratio-a measure of the ability to distinguish high-flying manuscripts-was greater before than after review (Figures 4 and 5). If editors were able to identify high-impact manuscripts before review and thus sent only the most promising manuscripts out for review, we would expect the rejection ratio to be negatively associated between editorial stages because there would be less variance in quality among manuscripts sent for review, making it more difficult to select among those manuscripts. We see no support for such a trade-off among journals (Appendix Figure A3).
In fact, there was a positive relationship; journals that had a high rejection ratio at the desk rejection stage also tended to have a high F I G U R E 5 Sensitivity to high-impact research, as measured through the "rejection ratio," increased with journal impact factor (JIF) and was greater before review (i.e., desk rejection; solid line and uptriangular points) than postreview (dashed line and downtriangular points). Each point represents one journal mean and is sized according to the number of manuscripts evaluated at each editorial stage. Regression lines are derived from a linear model, which was weighted by the number of manuscripts evaluated by each journal and are shown with 95% confidence intervals. Horizontal dotted line indicates a rejection ratio of 1.0, corresponding to journals that are equally likely to reject "highimpact" and "run-of-the-mill" manuscripts rejection ratio following review (major axis regression: p < 0.0001).
This suggests that editors and reviewers contribute differently to the gatekeeping role of journals. Editors desk reject the weakest papers, shape the remit of their journals by rejecting papers that are out of scope, and shepherd studies that they see as having great potential through peer review (Rowland, 2002;Smith, 1985). Reviewers, on the other hand, contribute by distinguishing among, and helping to improve, the mass of average-quality manuscripts (Bakanic, McPhail, & Simon, 1987;Goodman et al., 1994), but likely play a lesser role in distinguishing high-impact manuscripts from run-of-the-mill ones.
High-JIF journals desk reject a greater proportion of manuscripts (Appendix Figure A1) and more selectively discriminate between high-impact and run-of-the-mill papers than do less prominent journals ( Figure 5). Although the causality of this relationship is not easy to decipher, it suggests that the relative importance of editors in assessing manuscript impact increases with JIF, relative to that of reviewers, consistent with a recent simulation study (Esarey, 2017).
Author-reported rejection rates, assessed through questionnaires, underestimated true, journal-reported, rejection rates in our study (Appendix Figure A4). There are several possible explanations for this. First, questionnaires were only sent to authors whose research was published in journals indexed by Web of Science (WoS).
Thus, authors whose papers were published in nonindexed journals, or not published at all, were never sent questionnaires and are thus omitted from the current study. If we assume the rejected papers that were published in nonindexed journals or were never published are those of the lowest quality, then our analyses underestimate gatekeeper effectiveness, as the lowest impact factor papers are not included in our dataset. It is possible that this bias is greatest for low-JIF journals, as there are fewer WoS-indexed outlets of lower JIF available for manuscripts rejected from such journals. Additionally, authors whose manuscripts were rejected prior to publication may have been less likely to respond to the questionnaire than were authors of manuscripts that were accepted at one of the first journals they targeted.
We investigated the potential for nonresponse bias by evaluating the rate of response to our questionnaire as a function of the JIF of the publishing journal and whether the published paper gained more citations than the median for papers published in the same journal and year. Authors were more likely to respond to the questionnaire if their papers were published in higher JIF journals, but also if their paper gained fewer than the median number of citations (generalized linear mixed-effect model: p < 0.0001; Appendix Figure A4). These two patterns run counter to each other-higher response rates for higher impact papers among journals, but lower response rates for higher impact papers within journals-such that their effects on outcomes could largely counteract each other. There are two further reasons to expect that nonresponse bias was minor. First, only authors who were ultimately successful in publishing their manuscript in an indexed journal were contacted. Eventual publication lessens the sting of rejection (Clapham, 2016). Second, many authors described in their responses the difficulties they had experienced in the publication of their manuscript. Although we cannot confirm that nonresponse biases would have no effect on our results, such biases would need to be large to obscure the main effects we report; we are thus confident that the results we present are reliable.
Authors appeared receptive to the feedback received from journals, in that the great majority of resubmissions were sent to lower JIF journals (Figure 2). Controlling for publication year and rejecting journal, resubmitted manuscripts were less well cited than were papers accepted at the first journal they targeted, regardless of whether they were resubmitted to a higher or lower JIF journal (Figures 2   and 3). However, the degree of underperformance varied with the change in JIF between submissions; papers resubmitted to higher JIF journals tended to underperform most (Figure 2). These results are F I G U R E 6 The frequency of mistaken rejections per journal, as assessed by (a) the proportion of rejected manuscripts that attain more citations than the median article in the rejecting journal or (b) the proportion of rejected manuscripts that go on to be highimpact papers in the publishing journal. Each point represents one journal mean and is scaled according to the number of manuscripts evaluated. Mean estimates are shown with 95% confidence intervals, obtained by parametric bootstrapping. Note that in panel A, the y-axis is on a log scale, and points are slightly jittered to improve visualization not driven by the delay incurred in revision and resubmission, as our analysis controlled for publication year. Our finding is consistent with a study of chemistry manuscripts (Bornmann & Daniel, 2008), but stands in contrast to two recent studies that found that resubmitted manuscripts received more citations (Calcagno et al., 2012;Siler et al., 2015). The studies of Bornmann and Daniel (2008) Table A2). We are therefore confident that for a diverse set of journals, resubmitted manuscripts are less cited than are those accepted at the first targeted journal.
One possible explanation for the underperformance of previously rejected papers is that authors commonly target for their first submission journals with higher JIF than is warranted from the significance of their manuscript, leading to rejection (Figures 2 and   3). Moreover, when their paper is rejected, some authors persist in resubmitting to overly high-JIF journals, likely explaining why some manuscripts are repeatedly rejected (Figure 1). Such papers are then poorly cited in their final publication outlet, relative to submissions that did not submit to overly high-JIF journals and, thus, were published in the first journal to which they were submitted. This interpretation implies that some authors disregard the message sent by editors who declined their paper, either explicitly in editorial comments or implicitly by the decision made (i.e., that the quality or significance of the paper is not appropriate for such a high-JIF journal).
Many authors may also fail to revise their manuscripts following rejection in a manner that increases their quality or significance, which is unfortunate, given the evidence that peer review improves scientific articles (Bakanic et al., 1987;Goodman et al., 1994).
Our results suggest that researchers often submit their manuscripts to journals that for which they are unsuitable, but that editors and reviewers are generally good at identifying and rejecting such papers (Figures 3 and 4). Choosing the wrong outlet for one's scholarly manuscript, especially when editorial advice is ignored, can lead to repeated rejection and substantially delayed publication. It also imposes substantial costs on editors and reviewers, due to the need for repeated reconsideration of the same manuscripts. The amount of work involved in reviewing and rejecting a manuscript is similar to that of reviewing and accepting it (Rowland, 2002). In theory, researchers should know their study better than any editor or reviewer and thus should be uniquely well equipped to judge its quality. In reality, however, authors often appear to overestimate the significance of their own work (Wynder, Higgins, & Harris, 1990).
It also may be that the rewards accruing to authors who publish at a high-impact journal outweigh the costs (delays in publication if rejected). That many investigators persist in resubmitting to journal of similar impact factor despite multiple rejections suggests that both mechanisms are at work, contributing to the crisis of reviewing that confronts scientific publishing (Hochberg et al., 2009).
Reducing the number of low-quality manuscripts flung at high-JIF journals would make it less difficult to find peer reviewers (Fox, 2017;Fox, Albert, & Vines, 2017), accelerate scientific publishing, and potentially forestall a decline in the quality of published research (Higginson & Munafò, 2016).
Despite the well-publicized flaws in peer review-based academic publishing, our results show that journals are effective at identifying the most impactful research from the diversity of submissions that they receive and are therefore effective scientific gatekeepers.

ACK N OWLED G EM ENTS
The authors owe an enormous debt of thanks to many researchers who took the time to respond to our questionnaire. Samuel Bennett and Rachel Crockett assisted with questionnaire implementation.
We thank Lisa Haddow and Massimo Guinta for providing access to Thomson Reuters Article Match Service. Comments from Sabine Both, Sean Burns, Luc Bussière, Allyssa Kilanowski, Josiah Ritchey, and Boris Sauterey improved the manuscript.

C. E. T. Paine is an associate editor at Functional Ecology and at
Biotropica, and C. W. Fox is an executive editor at Functional Ecology. Both of these journals were examined in this study.

AUTH O R CO NTR I B UTI O N S
CETP and CWF conceived the study, and CETP collected the data and led the analyses with contributions from CWF. CETP and CWF wrote the manuscript together. Please answer the following questions with reference to the initial submission of the manuscript.

1.
Which was the FIRST journal where this manuscript was submitted?

2.
In what year did you initially submit this manuscript for publication?

3.
What was the outcome of your initial submission?

4.
How many peer reviews were performed on your initial submission? (free text)

5.
Rate the overall quality of the peer reviews performed on your initial submission: (Excellent, Good, Average, Poor, Terrible)

Please provide any further comments about your initial submission (Optional)
NOTE: Questions 1-6 repeated for every subsequent round of submission, until submissions were marked as "Accepted for publication without peer review." 7. Please rank the importance of factors that led you to select this/these scientific journal(s) to publish your study: Dear Dr $Last_Name: I am conducting a study of the fluxes of manuscripts among scientific journals in the biological sciences, and one of your articles has been selected: $Authors. $Year_Published}. $Title. $Publication_Name} $Volume:$Beginning_Page-$Ending_Page.
The questionnaire will inquire about each time you submitted the manuscript that led to this paper to a scientific journal. For the purposes of the questionnaire, please consider every round of review separately. In other words, if your manuscript went through multiple rounds of review at a single journal, report each of them separately.
Why participate? You could win one of three £50 Amazon gift certificates, to be awarded to three randomly selected participants at the end of the data-collection period. This study will generate recommendations to authors, editors and publishers to maximize the likelihood that any given manuscript will result in a highly cited paper. Your participation will help to increase the quality of submitted manuscripts and the peer-review process.

Privacy, independence and ethics
The data collected in this study include personal identifiers. Thus, it is essential to maintain the confidentiality of all participants. I take your privacy very seriously. I will be the only person with access to data that contains personal identifiers. The information you provide will be used solely for the purpose of this study. Any data that I share with collaborators will be stripped of all personally identifiable markers, such as names, addresses, institutions, and the titles of manuscripts. This study is independent of the journals and publishers. They have had no role in the study design or execution. Nor will they have any role in the analysis, reporting or interpretation of the results. I have obtained human-subjects ethical approval for this study from the University of Stirling. Please complete the questionnaire with one month, by Sunday, August 14th. If you have any questions or concerns about this study please contact me directly. Additionally, if you feel that one of your co-authors would be better able to complete the questionnaire, please forward this email to them.
Thank you very much for your cooperation!  F I G U R E A 4 Response rates of authors to the manuscript history questionnaire, as predicted by the impact factor of the publishing journal and whether the published paper obtained more citations than the median paper published in the same journal and year. Authors were more likely to respond to the questionnaire if their papers were published in higher JIF journals and gained fewer than the median number of citations. These contrasting patterns suggest that nonresponse bias was minor and therefore had little effect on our study We present the number of manuscripts examined, the overall rejection percentage, the percentage of rejection before review (desk rejection), and rejection after review, calculated from author responses to our survey. We also present the performance of each journal in terms of our four metrics scientific arbitration. Empty cells indicate journals for which too few data were available to calculate the relevant statistic

TA B LE A 2 St
at is t ic a l d et ails fo r a ll a n a l y s e s All p-values and confidence intervals for mixed-effect models were determined through parametric bootstrapping.