Artificial intelligence to support publishing and peer review: A summary and review

Technology is being developed to support the peer review processes of journals, conferences, funders, universities, and national research evaluations. This literature and software summary discusses the partial or complete automation of several publishing‐related tasks: suggesting appropriate journals for an article, providing quality control for submitted papers, finding suitable reviewers for submitted papers or grant proposals, reviewing, and review evaluation. It also discusses attempts to estimate article quality from peer review text and scores as well as from post‐publication scores but not from bibliometric data. The literature and existing examples of working technology show that automation is useful for helping to find reviewers and there is good evidence that it can sometimes help with initial quality control of submitted manuscripts. Much other software supporting publishing and editorial work exists and is being used, but without published academic evaluations of its efficacy. The value of artificial intelligence (AI) to support reviewing has not been clearly demonstrated yet, however. Finally, whilst peer review text and scores can theoretically have value for post‐publication research assessment, it is not yet widely enough available to be a practical evidence source for systematic automation.


INTRODUCTION
Within the academic publishing system, complex decisions are made tens of millions of times annually by highly trained researchers, raising the possibility that substantial efficiency improvements could be made by software that would automate or otherwise support them.The decision makers include authors (Is this manuscript written and formatted appropriately?Which journals are suitable publication venues for it?);editors (Is this submission plagiarised?Is it in scope?Does it have clear style/ guideline/methodological flaws?Who are suitable reviewers for it?);reviewers (Does the submission contain clear errors?What is its quality?What changes should be recommended?);editors again (What decision should be made from the reviewer reports?); and research managers (How good is the published paper?).This review discusses existing software to support these tasks.
The time savings from automation are potentially huge because reviewing alone consumes a large amount of expert time.It was estimated that 'over 15 million hours' are spent on reviewing rejected papers each year (American Journal Experts, 2018).For example, 1.2 million manuscripts are submitted to 2300 Elsevier journals every year and only 30% (about 350,000) are published (Tedford, 2015).Due to increasing numbers of submissions and peer review workloads, a report from BioMed Central and Digital Science entitled 'What might peer review look like in 2030?', recommended to 'use technology to support and enhance the peer review process, including finding automated ways to identify inconsistencies that are difficult for reviewers to spot' (Burley & Moylan, 2017, p. 3).For instance, about 20% of biomedical researchers conducted '69% to 94% of the reviews' in 2015 (Kovanis et al., 2016).This review discusses artificial intelligence (AI) and other software to fully or partially automate aspects of the academic publication process, including post-publication quality evaluation.It starts with software to recommend journals to authors, then covers programs for initial quality control of submitted manuscripts (e.g., plagiarism detection), followed by initiatives to suggest appropriate reviewers for papers that have not been desk rejected and then programs to conduct reviews or make recommendations from human reviewers' reports or scores.Finally, the review discusses software to estimate the quality of submitted articles or predict their future citation counts from peer review text or scores.Research quality assessment and citation impact prediction from bibliometric data is not covered here.Bias and transparency in technology-assisted assessment are also not in scope (all are covered in separate reviews), although software to detect bias and transparency within manuscripts is mentioned.This paper primarily summarizes existing software, including where there is no academic evidence of its capabilities, but also analyses relevant academic research in the minority of cases where it exists-primarily for reviewer identification and postpublication peer review.

IDENTIFYING SUITABLE JOURNALS FOR A SUBMISSION
Identifying a relevant journal or conference is a significant step towards publishing research because a good article might be rejected from an inappropriate journal or find a smaller audience.
Several publishers have developed web services to suggest journals for manuscripts, either for initial submissions or to redirect rejected manuscripts to another of the publisher's journals.
This software seems to be mainly based on comparing article texts (titles, abstracts, or keywords) against previously published articles.Current journal recommendation tools include Springer Nature Journal Suggester, 1 Wiley Journal Finder 2 or IEEE Publication Recommender. 3EndNote Manuscript Matcher also uses a manuscript's title, abstract, and references with Web of Science data to suggest related journals for manuscripts. 4The Journal/Author Name Estimator (JANE) is another free service that uses text similarity scores and PubMed data to suggest the most relevant journals based on manuscript titles or abstracts. 5Similarly, JournalGuide 6 from Research Square claims to recommend a journal from a title and abstract.
Various types of AI are used in these systems to match the subjects of manuscripts to related journals.For instance, Elsevier's JournalFinder 7 service 'uses smart search technology and field-ofresearch specific vocabularies' to match paper to scientific journals; the Taylor & Francis' Journal Suggester 8 applies 'artificial intelligence to match the subjects covered in articles'; and the Sage Journal Selector 9 utilizes 'an advanced AI technology' to recommend journals with similar published articles.
Several studies have shown that AI or machine learning can be useful to identify appropriate academic journals or conferences with relatively high accuracy for papers (e.g., Feng et al., 2019;Ghosal, Raj, et al., 2019;Pradhan & Pal, 2020;Wang et al., 2018).For instance, a recent experiment used the XGBoost Central journals, reporting an accuracy of 87%.This was claimed to be much higher than Elsevier's Journal Finder and Springer's Journal Suggester tools (Feng et al., 2019).GraphConfRec has also been developed to recommend relevant computer science conferences based on paper text, co-authorship, and citation networks (Iana & Paulheim, 2021).

Key points
• Artificial Intelligence in peer review is useful for helping to find reviewers.
• Artificial Intelligence can sometimes help with initial quality control for manuscripts submitted to academic journals.
• This article reviews the use of Artificial Intelligence to support academic peer review and publishing.Kim et al., 2018).Tools that can automate some tasks of editorial management or the peer review process include the following.
• Plagiarism detection: iThenticate10 detects partially or fully copied text in a manuscript by comparing it to other manuscripts in the system (Kalnins et al., 2015).A procedure to detect tortured phrases can identify plagiarism in the form of papers or sections of papers that have been processed by paraphrasing software or that have been translated from English to another language and back again.It identifies meaningless phrases that are non-idiomatic translations of scientific phrases, such as 'counterfeit consciousness' from 'artificial Intelligence' (Cabanac et al., 2021).• Robot author detection: A more recent threat is the use of AI based on large language models (LLMs) like ChatGPT to write parts of articles in a way that may be plausible but lacks deep understanding of the research issues (Khalil & Er, 2023).Software to detect ChatGPT-authored text, such as ZeroGPT, 11   although not yet designed for manuscript reviewing, may help with this issue.
• Methods checking: SciScore12 generates an automated assessment of articles methods on a scale of 1-10 and other reports (Design Analysis Reporting checklist and the Rigor and Transparency Index), assisting reviewers to find key information throughout a paper in a standard format (see also Menke et al., 2020).RobotReviewer13 attempts to detect study design problems and bias in randomized controlled trials.
• Automated statistical checking: StatReviewer14 checks manuscripts against standardized reporting guidelines and StatCheck15 detects some statistical errors in the submitted works (Nuijten & Polanin, 2020).It is possible to check the plausibility of some statistical results in papers by testing if the numbers reported for a test are theoretically capable of having been generated at the level of rounding reported.
• Transparency and reproducibility checking: Dimensions Research Integrity preCheck16 analyses submitted manuscripts for evidence of transparency and reproducibility, such as data access statements and naming the versions of software used.
• Manuscript structure checking: Penelope.ai17checks if the structure of a manuscript meets a journal's submission guidelines for the title page, abstract, citation style, references, tables and figures and information about other sections of articles (e. g., funding, acknowledgements, keywords, and data/ethics statements).This reduces the need for manual checks by reviewers, publishers, or editors.
• Reference matching with in-text citations: Recite18 automatically checks and highlights if citations in the manuscript text match the reference list and vice versa • Multipurpose manuscript evaluation: Other AI-assisted tools assess multiple quality control aspects of manuscripts such as Frontiers AI Review Assistant19 or UNSILO Manuscript Evaluation20 (see also Heaven, 2018).AuthorOne21 performs multiple surface level checks on manuscripts to see if they are written and formatted appropriately.

IDENTIFYING APPROPRIATE REVIEWERS FOR A SUBMISSION
Selecting an appropriate set of reviewers for a submitted journal article, conference paper, or grant proposal is an important and challenging task.It requires knowledge of the skills needed to assess the work as well as the people that possess those skills.
The same is true for large-scale post-publication peer review exercises such as the UK's Research Excellence Framework (REF), because the 34 subpanel chairs (senior professors) must start the review process by assigning each output (185,594 for REF2021; REF, 2022) to at least two reviewers, an enormous task.Fully or partly automating this labour-intensive process might improve the overall match between subpanel reviewers and articles and save time.Editorial systems for publishers and some conferences already support this task by suggesting possible reviewers for submitted articles, perhaps based on references in the submitted outputs or by matching article keywords to the keywords of registered reviewers.Conference-based reviewer bidding systems are also sometimes used to allow reviewers to choose articles that they would like to review, with algorithms or human judgements selecting the final set of reviewers (Fiez et al., 2020).
Several programs have been developed by commercial publishers to help journal editors identify suitable reviewers, but these do not seem to have publicly available algorithm descriptions or evaluations.The same principles that work to identify journal reviewers can also work for research grants.The Natural Science Foundation of China (NSFC) has developed an AI-assisted reviewer recommender for grant applications using natural language processing and an assignment decision support system to help select expert panels.An initial version of the AI system had chosen 'at least one member of each of nearly 44,000 panels that approved projects' in 2018, and the accuracy of the system was about 80% (Cyranoski, 2019, p. 317), although accuracy improvements are still being made (Liu et al., 2022).The system classifies the reviewers and proposals by discipline and uses information from scientific databases (e.g., Web of Science) and referee profiles in NSFC databases about the publication records or research projects of potential reviewers and then uses lexical semantic analysis to compare the extracted information with the grant applications.Different rules were used in the system to avoid conflicts of interests between reviewers and applicants (e. g., affiliation, co-authorship, project, and tutor-student relationships; Liu et al., 2016).Automatic reviewer assignment is also possible for conferences.The Toronto Paper Matching System automatically suggested reviewer assignments for the NIPS 2010 conference using a topic modelling approach to estimate reviewers' expertise areas.The system extracts publication records from Google Scholar to generate profiles for reviewers and uses supervised score prediction models to suggest reviewer assignments (Charlin & Zemel, 2013).Several other studies have also suggested algorithms for the automatic assignment of reviewers to conference papers (e.g., Al Mahmud et al., 2018;Kalmukov, 2020;Li & Watanabe, 2013), mostly for AI-related conferences.No robust accuracy measures seem to have been generated for these systems yet, however.Presumably the ground truth for such a system would be human editor assignments or (in conferences that allow this) reviewer requests to review.A simulation platform has also been developed to generate artificial review threads, reducing peer review time by about 30% (Mrowinski et al., 2017).
Automation does not seem to have been used yet to find reviewers in post-publication peer review exercises like the REF, but the technologies described above could presumably achieve this quite easily, at least in journal-based fields.

REVIEWING A SUBMISSION
Conducting peer review can be a fruitful learning experience for the reviewer but is usually primarily considered by reviewers to be a necessary task to contribute to the scientific community (Zaharie & Osoian, 2016), in addition to its quality control, improvement, and educational roles (Resnik & Elmore, 2016).As such, any technological solutions to save time in peer review or associated editorial work could create an immediate tangible benefit to science.
Although some software has been developed to review papers or make editorial decisions, these tasks are challenging, with limited progress on a few topics so far.Whilst positive correlations between human and automated decisions have been generated, no current system challenges human reviewing yet.
The positive correlations between peer review judgements and machine learning found so far do not necessarily mean that further progress is likely soon because an AI system would achieve a positive correlation by rejecting papers with obvious errors, such as very poor grammar, too short, or lacking references.
ReviewAdvisor 26 is a natural language processing toolkit designed to help select good manuscripts for a journal and provide feedback to help authors improve their submitted articles.
Whilst its performance on the authors' ASAP-review set of 28,119 machine learning conference paper reviews was weak, it provides a starting point and might help reviewers by suggesting comments on paper aspects that they may have overlooked.
Another study developed a neural network AI tool trained on features from papers including word frequencies, readability scores, and formatting measures, finding that automated systems developed unethical biases, such as against grammar and formatting errors, that helped them be more accurate (Checco et al., 2021).
Similarly, the pReview software package was developed for automatically generating summarization, contribution detection, writing quality analysis, and potential related works of academic papers to support reviewers (Roberts & Fisher, 2020).Another study found that Natural Language Processing models to generate reviews for scientific papers could make the peer review task easier and more effective, but the reviews were not good enough to replace human experts (Yuan et al., 2021).
Review decision-making software makes recommendations without writing detailed comments.For example, a study used or perhaps even strategies to help authors correct problems identified by reviewers (Lund et al., 2023).Nevertheless, reviewers normally agree to keep the manuscripts assessed confidential; as such, papers under assessment should not be uploaded to LLMs because they may be saved and incorporated into responses to future questions for other users (Hosseini & Horbach, 2023), or may even grant the LLM owner the right to repurpose content.Similarly, some authors may wish to avoid using LLMs to protect their ideas.

MAKING REVIEW DECISIONS FROM PEER REVIEW COMMENTS
After reviewers have submitted their recommendations and comments, the next stage is for the journal editor, grant award board, or conference organizers to make a final decision.Since it is common for reviewers to disagree and their recommendations might not align with their comments due to inadequate norm referencing, systems to automate final judgements might be helpful, especially if they can give reasons for their decisions that can be checked.
Based on a dataset of scientific peer reviews from PeerRead, 27 a deep learning network has been used to predict the acceptance or rejection of articles from peer review reports and to generate the final meta-review, finding 'good consistency between the recommended decisions and original decisions', with 74%-86% accuracy at predicting the binary decision accept/ reject, which was better than standard machine learning algorithms and prior bespoke peer review judgement algorithms (Pradhan et al., 2021, p. 237).There is also evidence that sentiment analysis of review reports could be helpful to predict the final decision (acceptance or rejection) of conference papers (Chakraborty et al., 2020;Ghosal, Verma, et al., 2019;Wang & Wan, 2018) or review scores of funding programmes (Luo et al., 2021).For example, PeerJudge 28 uses AI-assisted sentiment detection to estimate the strength of praise and criticism in peer review reports on academic papers that could be useful for editorial management decisions based on analysing a large number or review reports.PeerJudge can predict F1000Research reviewer decisions with a moderate degree of accuracy (Thelwall et al., 2020).

POST-PUBLICATION PEER REVIEW ANALYSIS
After the formal publication of research outputs, an increasing minority have peer review reports or post-publication comments associated with them.This can occur because the publishing journal or conference either mandate (e.g., BMJ, BMC Cancer) or allow (e.g., MDPI journals) peer review reports to be published, because the reviewer published their reports elsewhere, or because other academics posted comments to the publishing website (e.g., BMJ, F1000Research) or a bespoke academic commenting website (e.g., PubPeer).There are some parallel initiatives to move to a publishing first model in which preprints are first posted and then reviews are solicited either through editorial requests (e.g., F1000Reseach) or on a voluntary basis.Pre-and postpublication reviews may help universities to access formal peer review reports to 'improve and promote research excellence assessment' (Wilkinson & Down, 2018).This also raises the possibility that in the future the public availability of peer review comments will increase to the extent that automated processing of them could be useful in post-publication research assessment, such as the UK REF.
This section reviews related work in the hope that this will happen.
A technical problem with the automatic processing of peer review reports is that they are often difficult to parse for text mining because of their format, such as unstructured PDFs with extra text or commented copies of reviewed article PDFs or Word documents (e.g., Thelwall et al., 2022).A deeper problem is that most open reviews are from reviewers and address pre-final versions of the article, so it is not clear that they provide useful information about the final published article.Moreover, if public peer reviews are used in formal research assessments, then this might incentivize manipulation of them.

Sources of open peer review
There are several open review platforms sharing pre-publication reviews (formal reviews or editorial comments from the publishing journal) or post-publication comments (recommendations or feedback by researchers or experts) for scholarly publications.
Publons (publons.com)was an open review platform, now taken over by Clarivate Analytics, that claimed to include 'over 6.9-million reviews for more than 5,000 partnered journals'. 29Partner journals could share pre-publication reviews publicly on this site, with reviewers and authors deciding what information to reveal (e. g., review text, reviewer identities).In contrast, PubPeer (pubpeer.com) is an online open platform that focuses on post-publication 27 https://paperswithcode.com/paper/a-dataset-of-peer-reviews-peerreadcollection 28http://sentistrength.wlv.ac.uk/PeerJudge.html29 https://publons.freshdesk.com/support/solutions/articles/12000012231-what-is-publons-and-why-partner-with-us-peer review, where researchers can provide feedback or comments about published research and authors can respond.Reviews in PubPeer have identified mistakes published in leading cell biology journals, 30,31 suggesting that post-publication reviews might be a helpful source for quality assessment of published research.A study of a sample of PubPeer comments about publications found that two-thirds were related to some type of misconduct (Ortega, 2022).ScienceOpen (scienceopen.com)combines publishing and promotion services for journals with a recommendation capability where other researchers or experts can write public reviews and use a five-star score about the 'importance ', 'validity', 'comprehensibility', and 'completeness'   of published research. 32Peer Community in peercommunityin.org is a free recommendation platform for preprints.It publishes peer reviews of preprints in 14 subject areas, including Ecology, Genomics, Animal Science, and Evolutionary Biology.The peer review process is managed by 1700 'Recommenders' making editorial decisions about public reviews. 33 The Multidisciplinary Digital Publishing Institute (MDPI) and several other publishers and journals provide an open peer review option, where authors can decide to publish their reviews and reviewers can choose to be named or remain anonymous (for a review see Wolfram et al., 2020).These provide a collective source of open peer review reports for the journals covered.This can be used for systemic quality control purposes.For example, a study of 45,385 open standard article reviews for 288 MDPI journals found large disciplinary differences in review lengths, reviewer anonymity, review outcomes, and the use of attachments.For instance, reviewers in the Physical Sciences are most likely to ask for major revisions and to use attachments in the review process, although they are less likely to disclose their identity.In the Life Sciences and Social Sciences fields, reviewers tend to write longer review reports than in the Physical Sciences (Thelwall, 2022).The rest of this subsection reviews research into a former and a current major peer review site to illustrate how this type of content might be helpful for research assessment systems, including systemic self-assessment.

Publons
Publons.com was a website to help academic reviewers record and display their peer review work and editorial roles.It served to promote peer reviewing by tracking it publicly so that academics could actively or passively use it as evidence of their reviewing contribution to science, which would otherwise be invisible.In August 2022, the website started to redirect to the Web of Science and, at the time of writing, its basic free services seemed to be no longer available.Nevertheless, several studies took advantage of the information it provided before August 2022 to investigate characteristics of peer review.
A study of 45,819 articles from Publons found low or insignificant correlations between bibliometric scores (e.g., WoS or Scopus citations) and Publons metrics (e.g., Quality, Significance and Overall Publons score of articles; Ortega, 2019; see also Ortega, 2017).For four small experimental groups of papers from Publons with neutral, negative, positive, and both negative and positive post-publication reviews, papers with positive reviews had significantly more citations (rho = 0.498, p < 0.05) while very low or non-significant associations were found between citation counts and other review polarities (Zong et al., 2020).Thus, peer review metrics might not be useful indicators of citation impact or, by extension (because the two correlate in many fields), research quality.The availability of open peer review reports varies substantially between journals and fields (Ortega, 2019), and between reviewer countries (Severin et al., 2021), further undermining its value as an input for AI.

Faculty opinions (formerly F1000Prime)
In contrast to the above, Faculty Opinions (facultyopinions.com,previously F1000 Prime) is a paywalled source of post-publication biomedical research reviews written 'by over 8,000 experts in the Life Sciences and Medicine'.Articles are classified based on contribution type, such as 'Good for Teaching', 'New Finding', 'Technical Advance,' or 'Interesting Hypothesis' and can be given one (Good), two (Very Good) or three (Exceptional) stars. 34The article recommendations are exclusively positive: 'Good' (58.6%), 'Very Good' (34.6%), and 'Exceptional' (6.9%) (Waltman & Costas, 2014).
Several investigations have reported significant positive associations between Faculty Opinions ratings and citation metrics, suggesting that they have value as quality or impact indicators (Bornmann, 2015;Bornmann & Leydesdorff, 2013, 2015;Du et al., 2016;Li & Thelwall, 2012;Mohammadi & Thelwall, 2013;Smith et al., 2019;Waltman & Costas, 2014), with one early exception (Wardle, 2010) Wellcome Trust reviewer ratings and citation counts to articles 3 years after the reviews had been conducted.The Journal Impact Factor had the strongest association (0.625) with Wellcome Trust reviewer ratings, however (Allen et al., 2009).

Potential for artificial intelligence
Despite the research reviewed above, there do not seem to have been any AI analyses of post-publication peer review.This is presumably because it is scarce and AI needs a large amount of data to work well.If the number of post-publication peer review comments increases sufficiently then it would become possible to develop applications to help estimate the quality of published research through these post-publication comments.In this situation, the above results suggest that there will be disciplinary differences in the number of reviews and therefore probably the accuracy of future programs.Moreover, thought should be given to ethical considerations to avoid penalizing some sectors of society through algorithmic bias due to the gender and geographic imbalances mentioned above.

CONCLUSIONS
The evidence above conclusively shows that AI is useful for helping to find reviewers and the spread of this technology to other contexts where it is not yet used, such as REF reviewer assignments, is recommended.There is also evidence that AI can sometimes support the initial quality control of submitted manuscripts.
Although plagiarism detection is the obvious example, and is presumably widely used by publishers, statistical checking also seems useful and extending the capability of such software would be valuable.In contrast, there is insufficient evidence yet to use AI to support reviewing and it should not be used to replace human reviewers.Further testing of software for reviewing is important, however, perhaps for desk rejection of obviously poor papers.
Finally, whilst peer review text and scores are currently too sparsely available to support post-publication research assessment, further research will be helpful if this changes in the future.

AUTHOR CONTRIBUTIONS
Kayvan Kousha wrote the article.Mike Thelwall revised it.

•
Clarivate's Reviewer Locator 22 automatically suggests reviewers based on data from the Web of Science and Publons peer review databases and connects to the ScholarOne submission management system integrating editorial and peer-review processes.•Reviewer Discovery 23 is a tool from Aries Systems that uses