Behavior in cheating paradigms is linked to overall approval rates of crowdworkers

Published in: Journal of Behavioral Decision Making DOI: 10.1002/bdm.2195 Publication date: 2021 Document version Publisher's PDF, also known as Version of record Document license: CC BY Citation for published version (APA): Schild, C., Lilleholt, L., & Zettler, I. (2021). Behavior in cheating paradigms is linked to overall approval rates of crowdworkers. Journal of Behavioral Decision Making, 34(2), 157-166. https://doi.org/10.1002/bdm.2195


| INTRODUCTION AND THEORETICAL BACKGROUND
Individuals and societies are constantly affected by dishonest and fraudulent behavior. Dishonesty comes in many forms, including recent large-scale examples of emission cheating scandals (Volkswagen, 2015), money laundering (Danske Bank, 2018), and systematic college admission frauds (Thelin, 2019). Irrespective of the kind of dishonest behavior, most acts of dishonesty undermine interpersonal and/or societal well-functioning and can have tremendous negative consequences (Del Monte & Papagni, 2001;Gyimah-Brempong, 2002;Judge, McNatt, & Xu, 2011;Mo, 2001).
In the last years, many studies investigating the occurrence and extent of dishonesty as well as its predictors, correlates, and consequences have used (variants of) a few well-established cheating paradigms, which are conceptually quite similar to each other. In a recent meta-analysis on dishonesty, for instance, Gerlach, Teodorescu, and Hertwig (2019) considered (variants of) four different cheating paradigms, namely, the coin flip paradigm (Bucciol & Piovesan, 2011), the die roll paradigm (Fischbacher & Föllmi-Heusi, 2013), the matrix task (Mazar, Amir, & Ariely, 2008), and the sender-receiver game (Gneezy, 2005). In each of these paradigms, participants have the opportunity to act dishonestly in order to obtain an incentive.
In the coin flip paradigm, for instance, participants are asked to flip a coin in private and to report their outcome (i.e., "heads" or "tails"). Typically, the report of a specific outcome is incentivized (e.g., a participant earns $1 for reporting "heads"), making it possible for a participant to misreport their outcome in order to obtain the specified incentive. Other cheating paradigms follow a similar logicthat is, participants are given a chance to misreport the outcome of an event in a highly anonymous setting in order to obtain an incentive (or to avoid losing an advantage). Importantly, in such paradigms, it is typically not recorded (and, thus, known) whether any specific individual has cheated or not. 1 Rather, researchers draw conclusions about the proportion of dishonest individuals (and characteristics of these) by comparing the number of alleged wins for the whole sample with the stochastic baseline of winning (e.g., in the coin flip paradigm with one trial, the stochastic baseline of winning is 50%; for more details about this, see Moshagen & Hilbig, 2017).
Clearly, investigating dishonest behavior under such controlled and anonymized conditions has many advantages (e.g., protecting participants' anonymity which should also reduce socially desirable reporting of outcomes). At the same time, one might question whether people's behavior in such paradigms is externally valid and transferable to (larger-scale) real-life behavior (i.e., actual behavior outside a lab or online research studies). Surprisingly, though, comparing people's behavior in such cheating paradigms with socially questionable real-world behavior 2 has hardly been investigated to date.
Indeed, to the best of our knowledge, only six studies have so far investigated how behavior in cheating paradigms relates to socially questionable real-world behavior (see Table 1). Therein, behavior in cheating paradigms has been linked to classroom misbehavior in schools (Cohn & Maréchal, 2018), offenses against prison regulation among inmates (Cohn, Maréchal, & Noll, 2015), fare dodging in public transport (Dai, Galeotti, & Villeval, 2017), absence from work among nurses (Hanna & Wang, 2017), fraudulent salesmen behavior (Kröll & Rustagi, 2016), and nonreporting of overpayment (Potters & Stoop, 2016). Overall, these findings are clearly in line with personality trait theory (Allport, 1961), which assumes that individuals do have rather stable personality characteristics that influence behaviors, thoughts, and emotions across different contexts; that is, next to situational characteristics affecting the occurrence and/or extent of certain behavior, personality trait theory would predict that some individuals are generally more likely than others to engage in certain behaviors (such as socially questionable behavior) and that the increased likelihood of engaging in a certain kind behavior can be observed across different contexts. With regard to socially questionable behavior, this assumption is well supported by meta-analytic evidence. For instance, Zettler, Thielmann, Hilbig, and Moshagen (2020) recently found that people with rather low levels in the personality dimension of Honesty-Humility tend to show not only more cheating/dishonesty but also, among other things, more aggression, antisocial behavior, counterproductive behavior, or criminality/delinquency.
Although all of the studies mentioned in Table 1 do support that behavior in cheating paradigms is a valid indicator of socially questionable real-world behavior, they are limited by the used sample sizes in particular. Specifically, with an average sample size of 161 and considering that cheating paradigms come with certain limitations of statistical power (i.e., cheating is typically unknown on the individual level), 1 Note that in some studies, each participant is (un)knowingly observed by the experimenter, making it possible to identify honest and dishonest respondents on the individual level (e.g., Kocher, Schudy, & Spantig, 2017;Kröll & Rustagi, 2016) 2 Because not all of the following criteria might be clearly classified as dishonesty, we use the broader term socially questionable behavior. Please note, though, that each of the following criteria (as well as the criteria in our studies) relates to dishonest behavior to some degree.
T A B L E 1 Overview of studies linking behavior in cheating paradigms to real-world socially questionable behavior more well-powered studies are needed to test whether behavior in cheating paradigms can indeed be linked to real-world socially questionable behavior. We tackle this gap by a series of four studies.

| THE PRESENT INVESTIGATION
Adding to the existing literature on the external validity of cheating paradigms, we link two cheating paradigms-namely, the coin flip (Bucciol & Piovesan, 2011) and the Mind Game (Jiang, 2013) paradigm-to crowdworkers' approval rates on the crowdwork plat- Reasons for rejection are manifold but can include misrepresentation of study inclusion criteria, deception of the requester, or provision of quick random responses (e.g., Hydock, 2018;Johnstone, Tate, & Fielt, 2018;Prolific Team, 2018). It can thus be assumed that approval rates partly indicate workers' dishonesty in their past submissions on crowdworking platforms. Importantly, acting dishonestly on crowdsourcing platforms with regard to the task requirements 3 comes with an important trade-off: Crowdworkers can act dishonestly in order to save time and/or increase their financial benefit (e.g., by being able to participate in more studies in a specific timeframe), but they do also risk rejections which lower their approval rates and, in turn, might prevent them from participating in future tasks on the platform (for some tasks, requestors-i.e., the ones who conduct the tasks-set a minimum approval rate as a requirement for task participation; e.g., Ensor et al., 2019;Grysman, 2015).
Overall, crowdworkers' approval rates thus represent an indicator of real-world socially questionable behavior (with lower approval rates indicating more socially questionable behavior across numerous tasks).
In line with the predictions of personality trait theory, we hypothesize that individuals with lower approval rates are more likely to cheat in a cheating paradigm than individuals with higher approval rates. While most researchers set a minimum approval rate as a requirement for study participation (e.g., a Prolific score of at least 90; Ensor et al., 2019; or an approval rate of at least 95 on MTurk; Grysman, 2015), we also investigate whether the relation holds more generally beyond commonly used thresholds for study inclusion by using a broader range of approval rates.
Next to the aim of testing the external validity of cheating paradigms, this study also allows us to potentially identify a new control variable for studies conducted on crowdworking platforms, which are often used for studying dishonesty (e.g., Gerlach et al.,  Only participants with a Prolific score of 95 or higher were invited to the study. Prolific scores represent the upper bound of the 99th percentile binomial confidence interval (with respect to their percentage of approved submissions from the total) and range from 0 to 100.
Note that approximately 97% of the active users on Prolific (i.e., users that were active during the last 90 days) do have a Prolific score of 95 or higher (as of June 4, 2020).
Participants were informed that the main aim of the study was to investigate decision-making processes. After consenting to participate in the study, participants provided demographic information. Next, participants participated in an adapted version of the Mind Game paradigm (Schild et al., 2019). Specifically, participants were asked to write down a target number between 1 and 8 in private. Subsequently, a random number between 1 and 8 was displayed, and participants were asked whether the displayed number matched the target number they wrote down beforehand. Importantly, in addition to their flat-fee for participation (£0.40), participants received a bonus incentive of £0.40 when reporting a match. Consequently, participants had the opportunity to cheat in order to obtain the bonus incentive by reporting that the numbers matched even if they did not. Directly after the data collection was finished, approval rates (M = 99.59, SD = 0.83) were downloaded via the "export" function on Prolific.

| Analyses
In the cheating paradigm, the proportion of dishonest individuals d was estimated as described in Moshagen and Hilbig (2017). In contrast to analyses of binary cheating paradigms that simply compare the expected percentage of winners (which equals 12.5% in our case due to using eight random digits) with the observed proportion of winners, the modeling approach by Moshagen and Hilbig takes into account that the observed proportion of alleged wins is contaminated by honest respondents who actually won. To estimate the relation between the proportion of dishonest individuals and the approval rate scores, a modified logistic regression model was used. The described analyses were conducted using the RRreg package (Heck & Moshagen, 2018). Although our hypothesis is directional (i.e., lower Prolific scores are linked to a higher proportion of dishonest individuals d), two-tailed tests were used, because we originally conducted the study for a different purpose.

| Results
A total of 32.42% of the participants indicated a matching number, which is significantly different from the stochastic baseline of 12.5%, Specifically, overall, N = 1,737 participants completed the same cheating task, though at one of two measurement occasions (2 weeks apart); that is, 867 participants completed the cheating task at the first measurement occasion, and 914 different participants completed the same cheating task at the second measurement occasion. There was no difference in the experimental setup between the two measurement occasions (i.e., we run the exact same study, just with 2 weeks apart), so that we merged these participants. Two measurement occasions were also not initially planned but had to be done because of some technical problems during the first measurement occasion.
However, these did not influence the data (i.e., the conditions) reported herein. Forty-four participants had previously participated in Study 1 and were thus not included in the analyses. However, including them did not change the pattern of the results.
Participants were relatively heterogeneous with respect to gender (61.49% female, 37.82% male, 0.69% other) and age (M = 36.02, SD = 12.36 years). In the experiment, participants were first informed about the background of the study, following by providing consent and demographic information. Next, the participants were asked to play a standard version of the coin flip task as used by Zettler, Hilbig, Moshagen, and de Vries (2015). In this version of the coin flip task, participants were asked to flip a real coin twice and report the outcome in private. If participants reported flipping two heads in a row, they received a monetary payoff of £0.40, in addition to their flat fee for participation (£0.40).
For this data, we downloaded approval rates (M = 99.51, SD = 1.27) via the "export" function on Prolific approximately 4 months after the experiment (second measurement occasion) had been conducted. 4 In contrast to Study 1, data also include participants whose approval rates were lower than 95 (namely, between 85 and 100), at the time when this information was downloaded (when the experiment was launched, the required approval rate for participation was 95).

| Results
The same analytical approach as in Study 1 was used. However, note that the expected percentage of winners was 25% in this study were more likely to be dishonest.

| Discussion
Based on a much larger sample, Study 2 conceptually replicated the findings of Study 1 using a different cheating paradigm. Unlike in Study 1, however, when age and gender were controlled for, Prolific scores did not turn out to be a significant predictor of the proportion of dishonest individuals. We ran a third study again alternating the administered cheating paradigm (this time, a computerized coin flip paradigm was used) in order to further investigate the generalizability of whether Prolific scores are linked to cheating behavior.  include participants whose approval rates were lower than 95 (range 72-100) at the time when this information was downloaded (when the experiment was set up, only crowdworkers with an approval rate of min. 95 were allowed to participate).

| Results
The same analytical approach as in Studies 1 and 2 was implemented. As in Study 2, the expected percentage of winners was 25%. A total of 36.60% of the participants indicated observing two heads in a row, which is significantly different from the sto- This indicates that male individuals with lower Prolific scores were more likely to be dishonest.

| Discussion
Study 3 conceptually replicated the findings of Studies 1 and 2 using a different implementation of a cheating paradigm (namely, via an external panel provider). We ran a final study on a different crowdworking platform-MTurk-in order to further test the generalizability of the results.

| Procedure and variables
We again conducted an online experiment using the open-source survey framework formr  approval rates (i.e., 81-90). Indeed, after 1 week, we had less than 1,500 participants overall (N = 1,027), because there were too few participants in the lower batches. In line with our preregistration, additional batches were opened for very high approval rates (i.e., 98, 99, and 100) until 1,500 participants were reached. We only recruited participants that had more than 100 HITS, as workers with less than 100 HITS always have an approval rate of 100 regardless of how many studies were accepted/rejected.
Participants were relatively heterogeneous with respect to gender (42.33% female, 57.40% male, 0.27% other) and age (M = 33.02, SD = 9.72 years). After consenting to participate in the study, participants provided demographic information. Next, as in Study 1, the participants were asked to participate in an adapted version of the Mind Game paradigm. The participant received a monetary payoff of $0.40, in addition to their flat fee for participation ($0.40).

| Results
The same analytical approach as in Studies 1, 2, and 3 was  rates were more likely to be dishonest.

| Discussion
Study 4 replicated the findings of Studies 1-3 on a different crowdworking platform, namely, MTurk.

| Exploratory analyses across Studies 1-4
Although our hypothesis that lower approval rates are linked to a higher proportion of dishonest individuals was supported, we ran several further exploratory analyses. 5 First, we calculated an additional exploratory modified logistic regression including the quadratic term of the approval rates in Study 4, which was found to describe the data significantly better than the original model (ΔG 2 (1) = 16.00, p < .001).
Following this exploratory finding, we also tested whether curvilinear models are superior in Studies 1-3. A curvilinear model was found to describe the data better in Study 3 (ΔG 2 (1) = 10.79, p = .001) but neither in Study 1 (ΔG 2 (1) = 0.03, p = .860) nor Study 2 (ΔG 2 (1) = 0.19, p = .660). Plots including the curvilinear models for Studies 3 and 4 can be found in Figure S1, showing inverted U-shaped relations between approval rates and the proportion of dishonest individuals (i.e., there is a lower proportion of dishonest individuals among people with particularly low and high approval rates as compared with people with intermediate approval rates).
As previous research has found that honest and dishonest behav-  Further, requestors and researchers trying to avoid dishonest participants might use approval rates as a filter.
The underlying idea of this investigation is that people show somewhat similar kind of behavior across situations and is indeed supported by the observed findings; that is, the finding that crowdworkers who are more likely to cheat in a cheating paradigm are also more likely to show more socially questionable behavior in other situations (namely, other tasks on the crowdsourcing platform) is well-aligned with personality trait theory (Allport, 1961) Jaffé et al., 2019). 6 In line with this, a recent meta-analysis (Gerlach et al., 2019) showed that participants recruited via MTurk act more dishonest than other populations such as students. Future studies might thus also consider the platform on which the cheating paradigms are run, although one can (so far) only speculate about potential reasons for such observed differences.
Despite the consistency of the findings across Studies 1-4, relations between approval rates and cheating behavior were relatively weak overall. This is likely because approval rates are affected by not only dishonesty itself but also certain other factors such as sloppiness or actual performance; that is, in contrast to other studies linking cheating behavior in paradigms to "pure" real-life dishonesty (e.g., Dai et al., 2017;Kröll & Rustagi, 2016), our outcome measure can only be expected to be partly influenced by dishonesty. In fact, some tasks and studies on crowdworking platforms might even not allow for pure dishonesty. On the other hand, Prolific Team (2018) lists participants' behavior such as "little effort," "failed attention checks," and "lying [the] way into [a] study" as potential reasons for valid rejections, which can-at least partly-be labeled as dishonest or socially questionable behavior. In a similar vein, deception of the requester has also been listed as a valid rejection reason on MTurk (Johnstone et al., 2018). Future studies could set out to test potential explanations for the link between cheating behavior in paradigms and approval rates by testing which kinds of dishonesty affect approval rates.
Further, exploratory analyses suggested that relations between approval rates and dishonest behavior are better described by a curvilinear (namely, an inverted U-shaped) model in Studies 3 and 4. One potential reason for this could be that participants with lower scores try to act more honest in order to increase their scores as they might have noticed not being invited to many tasks/studies anymore. However, another explanation might be that participants with (very) low scores tend to provide random responses in surveys (e.g., Kennedy,