Criteria for considering studies for this review
Types of studies
Studies of educational or policy interventions are usually conducted in natural settings where a true randomised controlled design may not always be feasible. This is probably even truer for the topic of research integrity, which has only relatively recently become the focus of scientific investigations. For these reasons, we will include in our review not only randomised controlled trials, but also non-randomised controlled trials, such as controlled before and after studies, interrupted time series and regression discontinuity designs. We will exclude observational (survey) data when there is no clear intervention or manipulation. We will also exclude study designs without a comparison group; we will examine quasi-experimental designs closely for threats to validity.
Studies will be included irrespective of publication status and language.
Types of data
We will include studies which measured the effects of one or more interventions, such as teaching, training, mentoring, checklists, screening and policy, on research integrity or responsible conduct of research in its broadest sense, including ‘questionable research practices’ and publication misconduct. We will consider interventions in any type of researcher or student, in all fields of research, including sciences, social sciences and humanities, and at any period of their research career. We will not evaluate the effectiveness of reporting guidelines for improving the presentation of research data, as this is not directly related to research integrity, but rather to the quality of reporting, and has been covered by other systematic reviews (e.g. Plint 2006; and protocol from Moher et al from March 2010).
Types of participants
The participants will include all stakeholders in the research and publication process, such as: 1) undergraduate students, who may or may not have an interest in becoming a researcher, if they receive intervention related to research integrity; 2) health workers involved with research; 3) researchers working at institutions or commercial research establishments; 4) authors, peer reviewers and editors of scholarly journals; 5) professional and/or research organisations; and 6) policy makers.
Types of interventions
Acceptable interventions include any direct or indirect procedure that may have an impact on research integrity, from direct educational interventions, such as a formal course or training required by institutions or authorities (such as training required by Institutional Review Boards/Ethics Committees), to indirect interventions, such as policy change (e.g. introduction of statements on conflict of interest or authorship contribution declarations in journals).
Types of methods
Comparisons of outcomes in intervention versus non-intervention group or before versus after the intervention. We will assess the groups for baseline comparability, such as age, gender, educational/professional level and other relevant variables. We will include the studies which we judge to have reasonable baseline comparability on the pre-test score, as well as to be similar in important demographic characteristics that might reasonably be thought to influence reaction to the intervention or otherwise affect outcomes.
Studies without a control group will not be included in the review.
Types of outcome measures
The basis for our classification of outcomes will be the four-level typology first described by Kirkpatrick (Kirkpatrick 1967) and modified by Barr et al (Barr 2000):
Level 1 outcomes refer to learners' reaction to the intervention, including participants' views of their learning experience and satisfaction with the programme.
Level 2a outcomes refer to modification of attitudes, perceptions or both regarding responsible conduct of research.
Level 2b outcomes refer to acquisition of knowledge, skills or both related to responsible conduct of research.
Level 3 outcomes refer to behavioural change transferred from the learning environment to the workplace prompted by modifications in attitudes or perceptions, or the application of newly acquired knowledge/skills in practice. We will further divide this level into:
3a – behavioural intentions; and
3b – actual change in research or publication practices, or both.
Level 4 outcomes refer to organisational changes attributable to the intervention.
Both outcomes at the individual level (e.g. individual behaviour change) and at aggregated units of analysis (e.g. frequency of retracted articles) will be included in our review.
There is no outcome measure or set of outcome measures that we could consider 'standard' for the purposes of this review as we expect a wide range of different outcomes to be found in included studies. Our intention is to classify them in a theoretically grounded way, so that we can meaningfully present the study results even if meta-analysis is not appropriate due to high heterogeneity of the studies. Kirkpatrick’s four-level model (Kirkpatrick 1967) is a standard approach in educational research (Barr 2000). As we will assess interventions aimed to reduce/prevent misconduct, we will consider actual change in behaviour, either on the individual level (3rd level in Kirkpatrick’s model) or organisational level (4th level) a hierarchically higher (or 'better', more desirable) outcome than participants’ satisfaction with an intervention (1st level). As 1st level outcomes are the easiest to assess, they are commonly used in educational research. However, they are the least informative and relevant, so we categorised them as secondary outcomes.
The inclusion of studies addressing perceptions and attitudes in the systematic review is based on Ajzen's theory of planned behaviour in which the main predictors of behavioural intentions are attitudes towards the behaviour, subjective norms and perceived behavioural control (Ajzen 2005; Armitage 2001).
The primary outcome will be change in practice of research. Ideally, this would be assessed by: 1) organisational change or 2) change in research practice and publication. However, we will also consider other outcomes, in keeping with Ayzen's theory (Ajzen 2005). Those include but are not restricted to 3) change in knowledge and/or skills, 4) change in behavioural intentions, and 5) change in attitudes, perceptions or both (e.g. subjective norms and perceived behavioural control).
Level of satisfaction with the intervention.
Search methods for identification of studies
As the concepts of research integrity and responsible conduct of research emerged in the scientific community only after the establishment and active work of the Office for Research Integrity (ORI) in the USA in 1989 (Steneck 2006) and in Denmark in 1992 (Nylenna 1999), we will limit our search to 1990 to present.
We will search the following.
The Cochrane Central Register of Controlled Clinical Trials (CENTRAL) (The Cochrane Library, current issue)
MEDLINE via OVID
LILACs via BIREME
CINAHL via EBSCO
We will also search other specialised or general electronic databases.
Academic Search™ Complete – multi-disciplinary full-text bibliographical database from the EBSCO Publishing platform (http://www.ebscohost.com/academic/academic-search-complete).
Agricola, multidisciplinary database from the US National Agricultural Library, available via the OVID platform (http://www.ovid.com/site/catalog/DataBase/9.jsp?top=2&mid=3&bottom=7&subsection=10).
GeoRef – database from the American Geological Institute, available via the EBSCO Publishing platform (http://www.ebscohost.com/academic/georef).
PsycINFO® – database from the American Psychological Association, available via the OVID platform (http://www.ovid.com/site/catalog/DataBase/139.jsp?top=2&mid=3&bottom=7&subsection=10).
ERIC – database of education literature, via EBSCO platform (http://www.eric.ed.gov/).
SCOPUS – citation database from Elsevier (http://www.info.sciverse.com/scopus/scopus-in-detail/facts).
Web of Science (WoS) – citation database from Thomson Reuters (http://thomsonreuters.com/products_services/science/science_products/a-z/web_of_science).
We will not separately search the EMBASE bibliographical database because SCOPUS includes EMBASE data (Burnham 2006).
For the identification of studies included or considered for this review, we will develop detailed search strategies for each database searched. These will be based on the search strategy developed for MEDLINE but revised appropriately for each database to take account of differences in controlled vocabulary and syntax rules.
The subject search will use a combination of controlled vocabulary and free-text terms based on the search strategy for searching MEDLINE (Appendix 1).
Searching other resources
We will search conference proceedings and abstracts in the following resources.
Research presented at Office of Research Integrity (ORI) Research Integrity Conferences (http://ori.dhhs.gov/conferences/past_conf.shtml).
Peer Review Congresses (http://www.ama-assn.org/public/peer/program_2009.html and http://www.ama-assn.org/public/peer/previous.html).
Public Responsibility in Medicine and Research Conferences (http://www.primr.org/Conferences.aspx?id=56).
Research Ethics site (http://researchethics.ca/).
We will also search the book on promoting research integrity by education (Institute of Medicine 2002).
We will also search publications from ORI funded research, listed at: http://ori.dhhs.gov/research/extra/rri_publications.shtml.
We will handsearch the electronic tables of contents of the following journals that regularly publish on research integrity topics.
Journal of Empirical Research on Human Research Ethics (available online from volume 1 in 2006).
Science and Engineering Ethics (available online from volume 1 in 1995).
Accountability in Research (available online from volume 1 in 1989).
Ethics and Behavior (available online from volume 1 in 1991).
Journal of Higher Education (last available volume: 2002).
Journal of Medical Ethics (available online from volume 1 in 1975).
Academic Medicine (available online from volume 1 in 1926).
Medical Education (available online from volume 1 in 1966).
Medical Teacher (available online from volume 1 in 1979).
Teaching and Learning in Medicine (available online from volume 1 in 1989).
Professional Ethics: A Multidisciplinary Journal (available online from 1992; merged in 2004 with Business and Professional Ethics, which is available online since 1981).
American Psychologist (available online from volume 1 in 1946).
Journal of Business Ethics (available online from volume 16 in 1997).
Journal of Academic Ethics (available online from volume 1 in 2003).
We will search the references of all the included studies, other reviews, guidelines and related articles using both 'forward' (through citation databases such as Web of Science) and 'backward' (examining reference lists) citation searching (Horsley 2011).
For abstracts whose results cannot be confirmed in subsequent publications, we will contact authors to collect the required data and details of unpublished studies (Young 2011)
Data collection and analysis
Selection of studies
At least two members of the review team will carry out selection of articles and decisions about eligibility for inclusion independently. If the relevance of an article cannot be ascertained based on its title and abstract, we will obtain and review the full text and make a decision on inclusion. All disagreements will be resolved by discussion and consensus.
We will translate studies into English when necessary.
Data extraction and management
At least two members of the review team will independently extract data. Two review authors will compare the two sets of extracted data against each other, and identify any disagreements, which will then be resolved by consensus. The review authors will not be blinded to the authors, interventions or results obtained in the included studies.
We will extract and enter the following data, if appropriate and available, in a customised collection form:
Study design (e.g. randomised controlled trial, cohort study, controlled before and after, etc.), date and the length of follow-up.
inclusion and exclusion criteria, demographic characteristics of participants: age, sex, country of origin, ethnicity, gender, field of research, academic level or research experience.
Setting: type of institution or broader setting where the intervention(s) took place.
Intervention: details on the type and duration of intervention and comparison (quality of implementation, was the trainer/teacher/interventionist a researcher on the study).
Outcome: detailed description of the outcomes of interest, including the method and timing of measurement.
We will extract results for pre-specified outcomes of interest. If any relevant outcome not pre-specified in this protocol appears in any of the included studies, we will record it. We will extract the raw data (means and standard deviations for continuous outcomes and number of events and participants for dichotomous outcomes) for outcomes of interest.
We will design the data extraction form for this review and pilot it before use. When needed, coding instructions will accompany the data extraction form. If several articles report different outcomes of the same study, we will consider them a single entry in the data extraction form. In cases of studies reporting both preliminary and final results, only the latest or most complete report will be included.
Assessment of risk of bias in included studies
For randomised controlled trials, we will carry out assessment of risk of bias using The Cochrane Collaboration's 'Risk of bias' tool, which addresses the following domains: sequence generation, allocation sequence concealment, blinding, incomplete outcome data, selective outcome reporting and other sources of bias. Since blinding of the study participants for the interventions of interest may not be realistic, we will give primary consideration to the blinding of the outcome assessors.
For the assessment of non-randomised studies, we will use a modified Cochrane 'Risk of bias' tool. We will use four out of six original domains (blinding, selective data reporting, selective outcome reporting, other sources of bias) and two additional domains (comparability of groups and confounding factors) to assess the risk of bias in the included non-randomised studies. For the additional two domains we will use the following questions for assessment: "Were the study groups comparable at baseline?" and "Were potential confounding factors adequately addressed?". As recommended in Chapter 13 of the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2011), we will collect factual information on the confounders considered in the included studies and report them in a table. We will not assess the risk of bias on the domains of sequence generation and concealment of allocation sequence, as a high risk on these domains is inherent in the design of non-randomised studies and therefore expectable by default.
For all study designs, we will assess compliance to intervention and possible contamination (spillover effect) between the groups under the domain of other sources of bias. For studies with missing information relevant to the risk of bias, we will follow the suggested method of Young and Hopewell (Young 2011).
We will record each piece of information extracted for the 'Risk of bias' tool together with the precise source of this information. We will test data collection forms and assessments of the risk of bias on a pilot sample of articles. The assessors will not be blinded to the names of the authors, institutions, journal or results of a study. At least two assessors will carry out the assessment of risk of bias independently. If any piece of information important for the assessment of risk of bias is missing in the included reports, we will make attempts to contact the study investigators and obtain the needed information by use of open-ended questions.
We will tabulate risk of bias for each included study, along with a judgement of low, high or unclear risk of bias, as described in Chapter 8 of the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2011).
Measures of the effect of the methods
As the starting point, we will identify the data type for each of the outcome measurements. We expect the measures of intervention effect will mostly be continuous. We also expect studies to use different scales in measuring outcomes, so we will use standardised mean differences (SMD) with 95% confidence intervals. For outcomes where frequencies or proportions are prevalent, we will convert them first to risk ratios and then transform them into SMDs. The SMD expresses the size of the intervention effect in each study relative to the variability observed in that study. We will calculate SMDs by dividing the MD by the pooled standard deviation of outcome among participants.
If results of included studies are reported as dichotomous data, we will express them as risk ratio (RR) with corresponding 95% confidence interval (95% CI). If it seems sensible, we will convert dichotomous data to SMDs and combine them with the continuous data.
Unit of analysis issues
The unit of analysis will be individual study participants. Where the unit of assignment to condition is not the individual, but some larger entity (e.g. a department or an institution) the unit of analysis will be the larger entity.
Dealing with missing data
For continuous outcomes, we will calculate the MD or SMD based on the number of participants analysed at the time point. If the number of participants analysed is not presented for each time point, we will use the number of participants in each group at baseline. For missing summary data, we will use the available data to calculate relevant data, such as calculating missing standard deviations from other statistics (standard errors, confidence intervals or P values), according to the methods recommended in the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2011). Whenever possible, we will contact the original investigators to request missing data.
For studies with missing data, we will contact the authors to request the data (Young 2011). The possible impact of missing data that could not be recovered will be addressed in the discussion of our review.
Assessment of heterogeneity
Prior to meta-analysis, we will assess studies for homogeneity, defined as variability in the participants, interventions and outcomes. Considering the review question, we expect a high level of heterogeneity. We will test statistical heterogeneity by Q test (Chi2) and index using Tau2, Tau and the I2 statistic. Assuming that we have sufficient statistical power to reasonably use the Chi2 test, we will interpret a Chi2 test resulting in a P value < 0.10 as indicating significant statistical heterogeneity. In order to assess and quantify the possible magnitude of inconsistency (i.e. heterogeneity) across studies, we will (1) use the I2 statistic with a rough guide for interpretation as follows: 0% to 40% might not be important, 30% to 60% may represent moderate heterogeneity, 50% to 90% may represent substantial heterogeneity, and 75% to 100% considerable heterogeneity; and (2) construct predictive intervals around the mean overall effect, based on the value of Tau.
Assessment of reporting biases
We will examine within-study selective outcome reporting as a part of the overall 'Risk of bias' assessment. As the studies of interest are not clinical trials, and some of them may not be trials at all, we do not expect to find their published protocols.
We will compare outcomes listed in the methods sections of the articles against those in the results section. If there are at least 10 studies included in the review, we will create a funnel plot of effect estimates against their standard errors. If an asymmetry of the funnel plot is found either by visual inspection (Palmer 2008; Peters 2008) or statistical tests (Egger 1997; Harbord 2006), we will consider possible explanations and take these into account in the interpretation of the overall estimate of treatment effects (Sterne 2011). We will perform publication bias analyses using STATA, Comprehensive Meta Analysis (CMA) or both.
Meta-analyses will include only the studies reporting the same outcomes, separately for randomised controlled trials and non-randomised studies. As purely narrative summaries may sometimes be misleading and the use of quantitative synthesis may be preferable to qualitative interpretation of the results (Ioannidis 2008), we will try to find broadest common denominators among the included studies to statistically pool their findings. This would include combining all intervention versus non-intervention groups and all before versus all after study groups. Considering the time after intervention, we will consider time points ≤ 6 months, ≤ 12 months and > 12 months as the most relevant time categories.
Studies that are judged as considerably different (heterogeneous) from others and those without sufficient data to allow inclusion in meta-analyses will be displayed in a separate forest plot or series of forest plots without a summary line. The decision about the inclusion in meta-analyses will not be based on whether the findings were significant or not.
We will perform a qualitative content analysis of interventions used in the included studies and present a typology of interventions to prevent misconduct and promote integrity in research and publication. Two authors will code and categorise interventions independently and the final typology will be agreed among all authors.
Subgroup analysis and investigation of heterogeneity
If enough data are available (Valentine 2010) we will perform the following subgroup analyses:
healthcare researchers only versus other researchers;
early-career researchers versus experienced researchers;
developed countries setting versus developing countries setting;
educational intervention versus policy intervention.
We will examine the 'Risk of bias' results for each study. If there are sufficient studies with low overall risk of bias, we will perform meta-analyses on these studies first. If insufficient studies exist with low overall risk of bias, we will determine the number of studies where there is low risk of bias for the following domains: selective reporting, incomplete outcome assessment and other biases. If there are a sufficient number of studies we will first meta-analyse these studies. We will then perform sensitivity analysis to assess how the results of meta-analyses might be affected if studies with unclear or high risk of bias in these domains are included.
If the analysis of heterogeneity finds one or two outlying studies with results that conflict with the rest of the studies, we will perform sensitivity analysis to assess their effect on the results of meta-analyses. However, we will perform and interpret this sensitivity analysis with caution, as considerable heterogeneity is expected among all of the studies included in the meta-analyses. Alternatively, we will perform an analysis where we sequentially leave one study out at a time and redo the analyses, to see if any of the studies have an undue influence on the results (Fanelli 2009).
We will also perform sensitivity analysis to test how different assumptions about the missing data may affect the results. We will also assess publication bias as described in the section on assessment of reporting biases.