Factors influencing STEM researchers' data sharing behaviors



In modern research activities, scientific data sharing is essential in terms of data-intensive science and scholarly communication. Scientific communities are making continuous endeavors to promote scientific data sharing. Currently, however, it is not always well-deployed throughout science and engineering disciplines. The objective of this research is to investigate the factors which influence scientists' data sharing behaviors. Two theoretical perspectives, institutional theory and theory of planned behavior, are employed in developing a conceptual model, which shows the complementary nature of the institutional and individual factors influencing scientists' data sharing behaviors. Institutional theory can explain the context in which individual scientists are acting; whereas the theory of planned behavior can explain the underlying motivations behind scientists' data sharing behaviors in an institutional context. This research will use a survey method to investigate the data sharing factors at individual and institutional levels. The findings from this study have the potential to accelerate both scientific collaborations and further enable data-intensive scientific research. This research can provide useful guidelines for designing data sharing repositories, developing relevant policies for data sharing, and facilitating individual scientists' data sharing within different scientific communities.


Data sharing is a critical issue in modern scientific research with the emergence of e-Science or cyberinfrastructure. e-Science revolutionized the process of scientific discovery by enabling data-centric science or scientists sharing their data through technological development and collaborative effort (Hey et al. 2008). In the perspective of scholarly communication, primary data collected by individual scientists becomes an important “information currency” along with research analyses and finding in the traditional publications (Davis et al. 2007). As the primary data becomes important in terms of data-intensive scientific research and scholarly communication, data sharing is now essential in most modern research activities.

In the last few decades, the science and engineering communities, made continuous endeavors to promote scientists' data sharing, both in order to improve scholarly communication and eventually realize the vision of data-centric scientific research. However, despite of the continuous efforts by science funding agencies and science institutions, data sharing is not well-deployed throughout science and engineering disciplines. Disciplinary traditions, institutional barriers, lack of technological infrastructure, intellectual property concerns, and individual perceptions prevent scientists from sharing their data with others. This research assumes that both institutional contexts and individual scientists' perception toward data sharing significantly influences their data sharing behaviors.


Although data sharing is desirable according to scientific communities' norms of communalism and disinterestedness and can contribute to the advancement of scientific research, there is ample evidence that scientists nonetheless withhold their data rather than sharing it in popular science journals (Campbell et al. 2003; Cohen 1995; Piwowar 2011). Previous literature on scientists' data sharing and withholding has paid considerable attention to (1) the prevalence of data sharing and withholding, and (2) the motivations behind and barriers to data sharing and withholding, and (3) the benefits and (other) consequences of data sharing and withholding (Campbell et al. 2002; Campbell et al. 1998; Campbell et al. 2000; Louis et al. 2002).

Especially, prior studies research on diverse factors influencing scientists' data sharing and withholding, and those factors can be categorized into three groups including institutional factors (i.e. funding agency's policy (McCullough et al. 2008; Piwowar et al. 2008a), journal requirements (McCain 1995; Piwowar et al. 2008a; Piwowar et al. 2008b), and contract with industry sponsors (Louis et al. 2002)), IT resource factors (i.e. metadata (Bietz et al. 2010; Field et al. 2008; Hey et al. 2004; Karasti et al. 2010) and data repositories (Choudhury 2008; Witt 2008)), and individual factors (i.e. personal characteristics (Campbell et al. 2003; Campbell et al. 2002), perceived benefit (Kim 2007; Kankanhalli et al. 2005; Kling et al. 2003), perceived effort (Cambell et al. 2002; Louis et al. 2002; Tenopir et al. 2011), perceived risk (Reidpath et al. 2001; Savage et al. 2009; Stanley et al. 1988)). In addition, other organizational and environmental factors have been studied as important factors influencing scientists' data sharing and withholding (Tenopir et al. 2011; Vogeli et al. 2006).


Drawing upon institutional theory (Scott 2001) and the theory of planned behavior (Ajzen 1991), this research proposes a conceptual model to investigate how both institutional and individual drivers influence scientists' data sharing behaviors. Scientists' data sharing behavior can be understood through the lens of individual motivation and institutions seeking organizational legitimacy. Institutional theory (Scott 2001) provides significant insights regarding the importance of institutional environments including organizational rules, norms, and culture on individuals' actions (behaviors) (Tolbert 1985; Tolbert et al. 1983). In contrast, the theory of planned behavior provides its insights regarding how individuals' attitude, subjective norms, and perceived behavioral control influences individuals' behaviors mediated by intention (Ajzen 1991).

According to Scott (2001), institutions shape individuals' beliefs and their non-rational behaviors by positing institutional influences on behaviors. Individuals are embedded in institutional environments, which provide individuals with a basis for actions and shape individuals' behaviors (Powell 1991; Thornton et al. 2008). Scott's (2001) institutional theory posits that three kinds of institutional pressures influencing behaviors: regulative, normative, and cultural-cognitive. Regulative pressure arises from the rules that an authoritative organization or actor sets for desirable behaviors of other organizations or its organizational members. Normative pressure refers to social obligation caused by collective expectations in a community. Lastly, cultural-cognitive pressure refers to the shared understanding of the world that is taken for granted and deeply embedded in communities.

The theory of reasoned action and its successor, the theory of planned behavior are well-established social psychology theories that describe how salient beliefs influence behavioral intentions and subsequent behavior (Ajzen 1991; Fishbein et al. 1975). Theory of planned behavior explains an individual's behavior based on his or her behavioral intention, which is influenced by his/her attitude toward a behavior, perception of the subjective norms regarding that behavior, and perceived behavioral control. Behavioral intention refers to a person's aim to perform a particular behavior (Ajzen 1991). An attitude is a cognitive and emotional evaluation of an object (Ajzen 1991). A subjective norm is a person's belief that people who are important to him or her expect that he or she should or should not perform a particular behavior (Ajzen 1991). Perceived behavioral control is an individual's perceptions of his or her ability to perform a given behavior easily (Ajzen 1991). Each of the determinants of behavioral intention is in turn influenced by underlying belief structures such as behavioral, normative, and control beliefs (Ajzen 1991; Fishbein et al. 1975).

Drawing on theories and previous literature, this research identifies two groups of factors – individual influences and institutional influences, respectively – that influence scientists' data sharing behaviors. This research model shows the complementary nature of the institutional and individual factors influencing scientists' data sharing behaviors. In other words, the combination of two theoretical perspectives provides an opportunity to examine scientists' data sharing behaviors from both individual and institutional perspectives. Institutional theory explains the context within which individual scientists are acting; whereas the theory of planned behavior explains the underlying motivations behind scientists' data sharing behaviors in an institutional context. The Figure 1 below shows the proposed research model for scientists' data sharing behaviors.

Figure 1.

Research Model for Scientists' Data Sharing Behaviors.

The multi-level model above shows how institutional and individual factors influence scientists' data sharing behaviors. For the institutional level factors, this research includes resource-facilitating conditions (i.e. institutional resources: data standard and repository), normative pressure, and regulative pressures from funding agencies and journal publishers; for the individual level factors, this research considers individual scientist's extrinsic motivations (i.e. perceived efforts, benefits, and risks) and intrinsic motivation (i.e. altruism), attitude, intention, and actual behavior of data sharing. Scientists' data sharing behaviors can be best explained by considering both institutional and individual level factors together, and this research can shows the dynamic relationships between institutional and individual factors causing scientists to make their decisions on data sharing.


This research will use a survey method approach to investigate the data sharing factors at individual and institutional levels. The theoretical framework will be translated into the measurements of constructs. The survey method can help to examine the constructs and hypothesized relationships of the scientists' data sharing model. By conducting the survey in scientific disciplines, this research can validate the selected scientists' data sharing model. The survey method can produce more generalizable results about the factors influencing scientists' data sharing practice.

The target population of this research includes faculty members, (post-doctoral) researchers, and graduate student researchers in the U.S. The sampling frame of this research can be identified from the scholar list in the Community of Science's (CoS) Pivot (http://pivot.cos.com), which provides researcher profile directory in the world main from universities and colleges. The sample will be randomly selected based on the list of scholars who are registered in U.S. academic institutions. Based on the previous studies (Chan et al. 2004; Shin 2009) and statistical calculation for sample size (Iacobucci 2010; Keith 2005), the sample size which I need for my research is targeted towards 250–300 scientists. An email message will be distributed based on the email address provided at the CoS Pivot profile directory. The online survey questionnaire will consist of research introduction and purpose, specific questions to measure the constructs, and respondents' demographic information.

The majority of survey items will be adopted from previous studies and modified for this research. Before the actual survey, the survey items will be validated through a pre-test procedure with 10–15 scientists in different disciplines to ensure content validity, completeness, readability, and understandability. The well-developed items will ensure the reliability in terms of test-retest issue and internal consistency. For construct validity, the items are adapted from the supportive literature. For content validity, a pilot study will be conducted to examine the questionnaire. The previous studies already ensured the internal validity issue. Also, my sampling process would improve external validity issue. The survey method will help validate the new model and predict scientists' data sharing behavior.

The collected survey data will be analyzed by using appropriate statistical analysis methods including multilevel structural equation modeling, confirmatory factor analysis, and Cronbach's alpha. The statistical analysis will help me to observe whether the proposed conceptual framework provides an acceptable fit to the empirical data.


This research is significant in terms of theory, method, research, and practice. In the theoretical perspective, the integration of institutional theory and individual motivation theory (i.e. theory of planned behavior) can provide a new theoretical lens to understanding scientists' data sharing behaviors. In the methodological perspective, this research employs a survey method approach with multi-level analysis. In the research perspective, this research can provide valuable insights to the domains of scholarly communication, data curation, and data science. In the practical perspective, this research can help scientific communities by possibly accelerating scientists' data sharing, as a part of their scientific collaborations, and eventually enable the vision of data-intensive scientific research.