Disclosing quantitative RT‐PCR raw data during manuscript submission: a call for action

Accuracy and transparency of scientific data are becoming more and more relevant with the increasing concern regarding the evaluation of data reproducibility in many research areas. This concern is also true for quantifying coding and noncoding RNAs, with the remarkable increase in publications reporting RNA profiling and sequencing studies. To address the problem, we propose the following recommendations: (a) accurate documentation of experimental procedures in Materials and methods (and not only in the supplementary information, as many journals have a strict mandate for making Materials and methods as visible as possible in the main text); (b) submission of RT‐qPCR raw data for all experiments reported; and (c) adoption of a unified, simple format for submitted RT‐qPCR raw data. The Real‐time PCR Data Essential Spreadsheet Format (RDES) was created for this purpose.

Accuracy and transparency of scientific data are becoming more and more relevant with the increasing concern regarding the evaluation of data reproducibility in many research areas. This concern is also true for quantifying coding and noncoding RNAs, with the remarkable increase in publications reporting RNA profiling and sequencing studies. To address the problem, we propose the following recommendations: (a) accurate documentation of experimental procedures in Materials and methods (and not only in the supplementary information, as many journals have a strict mandate for making Materials and methods as visible as possible in the main text); (b) submission of RT-qPCR raw data for all experiments reported; and (c) adoption of a unified, simple format for submitted RT-qPCR raw data. The Real-time PCR Data Essential Spreadsheet Format (RDES) was created for this purpose.
Accurate quantification of coding and noncoding RNAs constitutes an integral part of a multi-component workflow used to establish differences in gene expression levels among samples in bio-medical, agricultural, environmental, and industrial research [1]. Reverse transcription-quantitative polymerase chain reaction (RT-qPCR) lies at the core of this workflow and has become a ubiquitous method for gene expression analysis. A search of PubMed entries for the words 'quantitative PCR or real-time PCR' in either title or abstract identified about 22 000 papers for 2021 alone, corresponding to~60 publications per day. The past 20 years have witnessed a persistent effort aimed at establishing technical parameters for reliable, reproducible, and biologically meaningful RT-qPCR experiments [2][3][4][5][6]. This resulted in the 2009 compilation of the minimum information for publication of quantitative realtime PCR experiments (MIQE) guidelines [7], as well as in generic requirements for evaluating the performances as described in several International Organization for Standardization documents (such as ISO 20395:2019 or ISO17822:2020). MIQE defines the helpful basic information that should be provided in publications and is necessary for evaluating the technical validity of published RT-qPCR experiments (especially essential ones such as biomarker development) [8].
However, the quality of most published RT-qPCRbased results remains inconsistent, resulting in varying levels of reproducibility and evident concern among researchers, clinicians, journal reviewers, and editors. For example, most published papers provide the reader with no information about RNA purity or integrity [9], RT-qPCR efficiency [2], detailed amplification conditions, and rationale for chosen normalization strategies. The MIQE guidelines were drafted by scientists to address these exact shortcomings for the benefit of scientists. Moreover, the need to include the PCR efficiency in calculating target quantity, normalized gene expression, or fold-difference is essential for unbiased reporting of the results of RT-qPCR experiments [10]. However, reporting quantification cycle (Cq) and PCR efficiency values is insufficient to enable reviewers or readers of a paper to assess bias [11]. Evaluation of the validity of conclusions relying on RT-qPCR results can be considerably improved if reviewers and readers can examine the amplification curves on which the results were based.
The last two decades have witnessed an increasing concern regarding the evaluation of data reproducibility in many research areas [12]. The conclusions of many assessments, including the most extensively funded and coordinated, the 'Reproducibility Project: Cancer Biology' [13], were that more than half of the experiments under scrutiny were not reproduced either in part or totally [14]. Scientists readily acknowledge this to be a major issue. For example, a Nature online survey revealed that about 90% of respondents believed there is a reproducibility crisis in the peerreviewed scientific literature, with two-thirds of respondents experiencing failure to repeat their own results [15,16]. However, in the case of RT-qPCR experiments, an essential source of the lack of reproducibility might be the failure to calculate efficiency-corrected results. Ignoring the assay-specific PCR efficiency and the inability to standardize the setting of the quantification threshold can lead to significant Cq-dependent biases in the reported absolute and relative results [17].
One way to tackle this widespread scientific crisis is to start addressing the concerns associated with each component individually. As scientists with wideranging and extensive peer-reviewed work on the use of RT-qPCR in the biomedical sciences, we regard the submission of comprehensive RT-qPCR data as an essential and straightforward step toward addressing this reproducibility crisis. Therefore, we propose to authors, editors, reviewers, publishers, and publication integrity and ethics committees from the biomedical field, as well as the RT-qPCR equipment producers, the followings: 1 Transparent documentation of the whole experimental process, including factors such as specimen collection, extraction procedure, RNA quality, choice of reverse transcription strategy, oligonucleotide choice (and sequences), and reference gene justification in Materials and methods of a research article, as defined in the MIQE guidelines [7]. Where the laboratory procedures do not change significantly over time, this information can be used for several publications once collected. Successful examples of minimum information that should be included when describing microarray or sequencing studies are the MIAME (Minimum Information About a Microarray Experiment) and the MINSEQE (Minimum Information About a Next-generation Sequencing Experiment), respectively. In addition, many indexed journals require specific raw data to comply with these standards at the time of submission. 2 Submission of all RT-qPCR raw data used to generate results reported in a manuscript, ideally at the time of submission to the journal. This will increase the quality of the review, allowing reviewers to assess datasets early on during the peer review process. Furthermore, it will allow editors and editorial staff to analyze data completeness and data integrity even before the initiation of peer review. Alternative options could include requesting RT-qPCR raw data at the revision step or through preacceptance checklists. These options should be seriously examined by journals and incorporated into editorial workflows as soon as possible.
Technically, we envisage two options for making RT-qPCR data available. First, raw data could be directly submitted to the journal site. This requires that publishers have in-house storage capacity and the appropriate security systems to ensure confidentiality while editors and reviewers access raw data to mine the quality and make the data publicly available only upon publication. This will also improve the transparency of data reporting for many journals. Although there is a vast variability in the format of results, depending on the types of analyzed samples (from cell lines with abundant high-quality RNA to clinical samples from few diseased cells with minute amounts of low-quality RNA), we support the submission of RT-qPCR raw data for each experiment included in a specific manuscript. With the expansion of cloud data storage capabilities and given a 384-well PCR plate generates less than 1 MB of data (usually the amplification data are 150-400 KB and melting curve data 500 KB), we believe the data size issues will be insignificant. Secondly, dedicated data publishing platforms, such as Scientific Data (https://www.nature. com/sdata/) or repositories such as figshare (https:// figshare.com/) and github (https://github.com), can be used. Data deposited on such platforms could also be cited in the original paper, and the use of data repositories would overcome the need for each journal to create a searchable database of results. For sizeable experiments of at least 20 samples and 20 genes analyzed, RT-qPCR raw data could be deposited with a public database such as the Gene Expression Omnibus (GEO) database: https://www.ncbi.nlm.nih.gov/geo/ info/geo_rtpcr.html.
1 Broad adoption of a simple format of RT-qPCR raw data for submission to (preferably all) biomedical journals. We present several options to achieve this (see Appendix S1 for the format description and for examples of the format): (a) The use of Real-time PCR Data Essential Spreadsheet Format (RDES, https://rdml.org/rdes.html; Table 1  was developed initially to enable the direct exchange of data and related information between RT-qPCR instruments and third-party data analysis software, between colleagues and collaborators and between experimenters and journals or public repositories [18]. We further request that instrument manufacturers implement an option permitting the export of one of the formats in their software. (c) The submission of files according to the requests of the selected database. For example, GEO has specific submission guidelines at https://www.ncbi.nlm.nih.gov/geo/info/ geo_rtpcr.html.
This coordinated effort between scientists, authors, editors, publishers, and equipment producers will pave the way for more data transparency and less erroneous data published. The development of simplified submission tools will be helpful in the near future for raw data deposition from novel technologies with massive expansion at the present time, such as digital PCR or CRISPR genetic screenings. This uncomplicated effort will enhance the RT-qPCR nucleic acid analysis quality when ever-increasing demands are being made regarding precision and throughput across the life science sector.

Conflict of interest
GAC is the scientific founder of Ithax Pharmaceuticals. AU drafted the RDES format and is a member of the RDML consortium. The other authors declare no conflict of interest.