Effectiveness of incentives offered by mobile phone app to encourage cycling: A long-term study

Reduction of car use is one of the most effective ways to tackle congestion-related problems. Using positive incentives to stimulate bicycle use is one possibility to reduce car use. Cycling is a sustainable transport mode that uses little space and is healthy. There is evidence that positive incentives may be more effective than punishing travellers for undesir-able behaviour, and the emergence of mobile applications for delivering interventions has opened up new opportunities for inﬂuencing travellers. So far, few studies have focused on exploring the effectiveness of positive incentives on long-term behavioural change. We used the SMART app to deliver positive incentives to more than 6000 travellers in the Dutch region of Twente. The app automatically tracks users and provides incentives such as challenges with rewards, feedback, and messages. This study covers the period from March 2017 to June 2018, in which more than 1000 SMART users participated in monthly challenges. We evaluated the effects of the challenges and rewards and found that the challenges did encourage cycling and reduced car use in the short term. There is also some evidence for behavioural change over a longer time period.


INTRODUCTION AND BACKGROUND
Traffic and its externalities, such as the emission of greenhouse gases, is increasingly causing problems for almost all major cities. One of the challenges for transportation researchers is to change people's travel choices and getting them to use sustainable transport modes. The negative health effects of a sedentary lifestyle provide another compelling reason to encourage active transport modes. In the Netherlands, people cycle about 10% of all trip kilometres, which is a significantly larger fraction than in other countries. The Dutch government wants to encourage cycling even more and aims to increase the cycled distance by 20% between now and 2027 [1].
Offering positive incentives is a relatively cheap way to encourage cycling because it does not require infrastructure investment. Voluntary travel behavioural change (VTBC) schemes that use incentives such as rewards, feedback, subsidies and public transport discounts can result in a shift from This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2021 The Authors. IET Intelligent Transport Systems published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology car use towards more sustainable travel modes, for example, the work of Brög et al. [2], Sanjust et al. [3], Ben-Elia and Ettema [4], and Lachapelle [5]. Such schemes work better than fiscal measures in the sense that they do not encourage socioeconomic inequity [6].
Traditional VTBC solutions require person-based interaction either by phone or home interviews, which is inherently expensive and may induce biases stemming from social interaction and communication. The emergence of the social web and mobile applications for delivering interventions offers opportunities to reduce the costs and enables the use of ICT-based persuasion technology for influencing travellers [7]. Table 1 lists eight mobile app projects intended to encourage VTBCs [8,9,[18][19][20][10][11][12][13][14][15][16][17]. The app design of those studies followed or fitted in the persuasive systems design model [21][22][23][24], which offers a way to analyse, design and evaluate the persuasion context. Common persuasive design features used in those studies were personalised feedback, self-monitoring, challenges and goal setting, social comparison, and rewards. Fields experiments The main findings support the effectiveness of the incentives to encourage travel behaviour change.
The clear system architecture of the app, security layer, third-party services, city authorities, High accuracy of trip, modes, location recognition (need improvement), around 25% total corrections SUPERHUB app 2013 [14] 8 Four weeks A modest increase (14%) in 'sustainable transport choices' No automatic tracking data, No detail of support systems Quantified traveller (QT) 2013 [15,16] 135 Three weeks QT is useful to significantly reduce the car mileage and to a lesser extent to encourage walking/cycling Clear system architecture, third-party services, High accuracy of trip, modes, location recognition (need improvement). 13 [20] 76 Six weeks Apps were used to reward sustainable transport, including cycling. The amount of cycling respectively almost doubled or even more than doubled as a result No detail of support systems provide evidence that these features are important for influencing users to change their travel behaviour; however, clear results on their effectiveness based on field evaluations are still missing (Table 1). Most evaluations of behaviour change interventions proposed in this area are rather short term, involve small groups of participants, and provide limited evidence of lasting behavioural impact [24,25]. This study focuses on exploring positive incentives on behaviour change, especially for modal shift from car to bike, by using a mobile app (SMART app) in Enschede, the Netherlands. The new travel app includes persuasive design features such as feedback, travel information, self-monitoring, nudging and gamification elements. Due to the collaborative feature of VTBC-based travel apps, it is critical to develop the travel app or the experiment to get broad engagement and run for a long time. A better grasp of the behaviour change for the long term by using the mobile app will aid authorities and private entrepreneurs to design effective and appealing apps, eventually translating into a broader potential of VTBC.
Mobility is a highly habitual activity, which is not easily interrupted, except for short periods [26]. Many studies also high-light that long-term involvement more often leads to positive results [26][27][28]. Several studies are optimistic about gamification and its potential to extend the commitment of the user [11,27,[29][30][31][32]. That gamification would have a positive impact on participants' involvement is also supported by Seaborn and Fels [33]; however, they also implied that there is a lack of longitudinal study designs. Thus, more research is needed to discover the gamification effect on app involvement and commitment and its impact on behaviour change, especially under a long-term study period.
Different gamification strategies were involved in current travel apps. For rewards, tangible rewards such as money [20] [34] and in-kind rewards, i.e. points [17,9,10,11,32], and nontangible rewards, i.e. praise [14,12,18] were often used in current apps. Based on our previous study, in-kind rewards are suggested for travel apps. In reality, by having a web shop associated with in-kind rewards may make people more enthusiastic about the in-kind reward than about the money, because money has an impersonal character, which would decrease the feeling of enjoyment. [35][36][37]. Challenges and goal-setting are also essential features in gamification. Many goal-setting studies showed that specific and challenging tasks led to higher performance than easy or "do your best" tasks [38][39][40], however, current apps for cycling promotion reward users by directly calculating the kilometres they cycled, without a specific challenge, or massive challenges that are not suitable for every user. On the other hand, setting specific goals for long-term projects may be disadvantageous for the user participation, as users immediately tend to dropout upon finishing that goal [41]. Moreover, before considering personalised challenges or challenges that explicitly target a certain social-economic segment of travellers, it should be acknowledged that in many cases targeting specific segments or personalised targeting may be costly and even ethically troublesome. Hence, understanding of how to deal with challenges of involving users is important.
App credibility is an essential persuasive element which was often not achieved in mobile phone related studies. Ignorance or technical flaw may directly impact recruitment and engagement rate. Gamification may increase involvement and support long-term interest, but app credibility can be more crucial to lead a successful app. The current travel mobile apps are often related to technology, privacy and reliability issues (Table 1, column 'credibility system') [42], which can decrease the credibility of a mobile app [43][44][45][46]). Since users will engage with apps that they perceive as credible but navigate away from those they do not consider credible, high detection accuracy and also security features are therefore important so that the app does not award incentives unfairly or inappropriately. The greater the app's accuracy and security, the more users it will attract. Besides, social trust is important as a motivator for sustainable behaviour change since trust reinforces peoples' engaging behaviour, that is, acceptability and public involvement [47,48]. Therefore, third-party endorsements, authority and surface credibility to increase the trust between individuals and the community at large can also enhance app credibility [43]. Thus, a travel app engaging in encouraging sustainable behaviour for society need multidisciplinary cooperation, such as local authorities to manage, commercial parties to support rewards and IT company to support technology.
Given the gap in current literature, this study contributes to the body of knowledge by designing a monthly self-chosen challenge and by analysing travel behaviour tracked by the SMART app in a real-world environment for over a year, supported by local government, commercial parties and IT companies. The main aim of this study is to analyse and explore the app users' changes in travel behaviour and changes in interest and commitment, both in a short and long term; as well as to identify key factors that influence the response to gamification (challenge and reward), and to explore whether the type of challenge played a role in the change in behaviour. The hypotheses related to the objective are proposed as follow: 1. H1: The app has broad recruitment and high retention, and the user who has high challenge commitment stays longer in the app. 2. H2: The challenges and rewards as gamification mean motivate travel behaviour change for a short term and travel patterns; users challenge commitment impact behavioural change. 3. H3: There is long term sustained behaviour change but for users with high engagement.
The study is organised as follows: Section 2 provides methods about the SMART app, the case study and introduces the data. Section 3 presents the results, and finally, Section 4 presents the discussions, limitations and concludes the study.

METHODS
This research is based on the SMART application, which records the users' mobility on their mobile phones. Everyone can download and use this app and choose to participate in monthly cycling-related challenges.

SMART app
The SMART app attempts to nudge travellers towards sustainable transport modes, especially cycling, by providing incentives [49]. It runs on the Android and IOS platforms and records the users' trips. We designed our interventions by drawing on other mobile app-based studies, which aimed to inspire VTBC and contained common persuasive design features (personalised feedback, self-monitoring, challenges and goal setting, social comparison, rewards and praise, traffic information and travel suggestions) [50,51]. The cooperation between different stakeholders (among which the ICT company behind the app, the city of Enschede, the University) supports the app's credibility. Figure 1(a) shows the SMART app's dashboard, from which users can explore all functions. The following persuasive strategies are used. 1. Self-monitoring and feedback: On the SMART home page and 'my mobility' page, users can see their trip history per travel mode and daily CO 2 emission. 2. Challenge and goal setting: In SMART, the living lab operator (municipality) sets up challenges to promote cycling. Users who download the SMART app need to select and join challenges on the challenge page ('take a new challenge' in Figure 1(a)), which shows all available challenges. Users are able to choose any suitable challenge. When a user accepts a challenge, the system starts to keep track of the targeted behaviour and shows the progress. This is done by continuous tracking (using GPS, accelerometer data etc.) and by using advanced algorithms that combine travel speeds and routes to determine the transportation mode the traveller is using. 3. Rewards and praise: Rewards are provided upon completion of the challenge. The SMART app starts tracking the user's target behaviour when a challenge is chosen. When the challenge is completed, the system immediately awards the corresponding number of points, which is also shown in the dashboard. On the 'reward' page, the earned points can then be redeemed towards discounted products and services (Figure 1(b)). Multiple stakeholders joined the project to support the web shop in the app, which created a win-win, for the municipality (funding) and for the participating shops (sales). Huang et al. [37] concluded that in-kind gifts from a web shop have a more positive impact than cash rewards and may ultimately make users more satisfied [35]. Additionally, the SMART app is able to pop up notifications to praise the users if they travel sustainably. 4. Traffic information and suggestions: Event and traffic information are also offered through messages ('messages' in Figure 1(a)). SMART is able to give useful information about the actual local traffic situation and notifies users of road works or large-scale events that lead to extra traffic. Based on this, SMART may also suggest travel alternatives to enable the users to optimise their travel plans. 5. System credibility: System credibility provides users with credible and authentic information so that it decreases the dropout rate. The SMART app is updated regularly and maintained by Mobidot and supported by the city of Enschede. Moreover, its users are informed in advance about the app's privacy protection, and the travel mobility data is updated every day. Finally, new challenges are updated regularly and the shops where vouchers can be redeemed are all well known (Figure 1(b)).

Recruitment
Enschede (the municipality in question) used Facebook, flyer actions on the streets, digital advertisements, and cooperation with local employers to promote the SMART app, aiming to help the traveller to minimise travel time and cost or/and travel sustainably and healthily. To decrease the risk of selection bias, we emphasised that the SMART app can help travellers to travel smarter and make them more aware of the region. The recruited users were not asked to provide extra information (such as age or gender) and could participate in challenges immediately. In other words, they were not recruited for an experiment but they can simply use the app if they wanted to. This way, we created a realistic real-life context to analyse the travellers' true behaviours. However, as the users could immediately use all functionalities of the SMART app, we were not able to do a 'before' measurement, which is an important drawback of this study.

Case study: Monthly choice challenges
Based on literature reviews [38][39][40][41], we aimed to design a challenge that is specific to finish in a time period and is not too easy but also not too specific in a long term to prevent users from losing interest and thus drop out. Moreover, considering cosy, we wanted to avoid personalised challenges. Based on the above considerations, we designed a self-chosen challenge that comes every month, which is named 'monthly choice challenge'. From March 2017, SMART users could join the 'monthly choice challenge'; those who joined were challenged to accomplish a certain cycling distance or frequency or cycle to a certain location. The monthly choice challenge is still ongoing, but we were allowed to use data from March 2017 until June 2018 for this study. Every month, one monthly choice challenge is offered. In the first two weeks of each month, the SMART users can choose one challenge out of five or six; the challenge immediately starts when a SMART user clicks to join the challenge. The duration of the challenge is up to two weeks; participants can complete their challenge in less time.
We introduced different types of choice challenges: Frequency challenges, distance challenges, mixed challenges (combined frequency and distance), minimum-distance challenges (on certain days, cycle a minimum distance) and location challenges. The five different types of challenges were designed and distributed throughout the year to see which type is more effective. Rotating challenges may also encourage users to keep participating [52]. Table 2 lists the options and related rewards. The more difficult the challenge that the user chooses, the more points the user can gain by completing that challenge. The relation between challenge difficulty and reward is a simple linear function with a constant (baseline points) to ensure that users who rarely cycle can still collect a reasonable number of points if they cycle a short distance or a few times. Choosing a difficult challenge means more rewards; however, if the participant fails to complete the challenge within the challenge period (two weeks), he or she will not receive any points. Participants therefore have to choose between an easy challenge with a relatively small reward but a greater chance to be successful or a difficult challenge with a relatively high reward but a greater chance of failing. Participants immediately are awarded if they meet their chosen challenge in time. They can redeem the points in the app's web shop (see Figure 1(b)).
A post-survey was sent through the SMART experience sampling question service to collect more information on the rea-sons why users did or did not change their behaviours. The postsurvey questionnaire was sent out immediately after completion of the challenge or after the challenge period if participants did not complete the challenge.

Data resource
We used SMART data between March 2017 and June 2018. The data consists of trip data, post-survey data, and information about when users started and completed a particular challenge. The trip data contains the origin, destination, departure time, arrival time, transport mode, activity, average speed, and trip distance. The transport mode and trip activity are determined in the back-end by processing and learning algorithms (see [53]). When a user uses the functions of the SMART app, trigger data are generated. These contain the user's ID, type of trigger (challenge, survey question, message), the status of trigger (accepted, answered, awarded, read), response to the trigger (number of points awarded, reply to a survey question), trigger date and trigger time.

Indicators and segments
The indicators we chose to evaluate behavioural change are distance-based mode share and daily distance per mode. We inspected these indicators by aggregating data on a monthly level. We chose a period of a month because a new set of choice challenges was offered each month. We thus had data points per month for each user ('person-month data'). The distance-based mode share is the percentage of the total distance that is covered by each mode during the measurement period. We used distance rather than trip frequency because the latter might be disproportionally influenced by people shifting from walking to cycling for the shortest trips. The daily mode distance is the total distance covered by a certain mode divided by the number of days in the measurement period. We considered several measurement periods per month, that is, the whole month, the period during the challenge, the period outside the challenge period. Note that the latter two are variables among users, dependent on when participants started and completed a challenge. In the long-term analysis, we also used the last week of each month (corresponding with days outside the challenge period for almost all users) to determine the sustained effect of the campaign. We also performed segmentations to analyse differences between challenge and trip types. As mentioned earlier, we did not have data on personal characteristics of users. However, we made a distinction between users who chose easy challenges and users who chose difficult challenges and checked whether behavioural changes occurred in both groups. We defined the difficulty of the challenge by comparing the challenge against the user's behaviour in the previous month. In other words, if a challenge required the user to cycle more than in the previous month, we assessed the challenge as (relatively) difficult. By contrast, if a challenge required less effort than before, we considered the challenge (relatively) easy. In addition, we distinguished between different types of trips. For example, commuting behaviour may be more difficult to change than recreational travel behaviour [54]. More generally, we assumed that habitual or regular trips may be more difficult to change. If more than eight trips were made between the same origin and destination in one month, then we considered these trips regular trips. This criterion was chosen to obtain more or less equal sample sizes per group.

Data selection
In total, 6214 users downloaded and used the SMART app between March 2017 and June 2018, with 23,233 personmonths of data. To make sure that the tracking data from the SMART app reflected the real travel behaviour, we excluded person-months that had fewer than 15 days of recorded trips. Moreover, only trips below 20 km were counted for travel behaviour analysis, as cycling is not a viable option beyond 20 km [32]. These trips cover almost all trips within the urban area of Enschede (and the neighbouring city of Hengelo) and therefore can be regarded as urban trips. By selecting these trips, we also neutralised the disadvantage of distance-based mode share in which non-recurrent long-distance trips have a disproportionally large effect on mode shares. After error-checking, cleaning and filtering the data based on the above criteria, 5525 users remained with data from the total 22,174 person-months. In total, 1868 out of 5525 users joined the monthly choice challenge at least once. We call those users 'participants'; they make up the 'experiment group'. The other 3657 users never joined a challenge but used the SMART app regularly for other incentives, such as trip history, and travel information. They are in the No Challenge group. The 1868 participants had 11,076 person-months in total, but the participants only participated in the monthly choice challenge about every other month on average (50%). To check whether participants perhaps only used SMART during challenges and turned off the app to extend battery life after a challenge, we compared the days with recorded trips inside and outside challenge periods. Figure 2(a) shows the average proportion of days with trips recorded. Figure 2(b) displays the distribution of the differences between days inside and outside the challenge periods for the participants.
As the top panel in Figure 2(a) shows, there was a significantly higher number of days with recorded trips in the challenge period. This may imply that some participants turn off SMART when there is no challenge. Figure 2(b) shows a clearly skewed distribution with a peak located around 0. In most cases, SMART reports more or less the same fraction of days with trips inside and outside of the challenge period. However, the mean is significantly greater than 0 (greater than 0.15). For some participants, trips appear to have been underreported outside the challenge period. To correct for this possible bias, we only included participants who had a smaller than 0.2 average difference on both sides of the distribution. This left us with 1056 participants and 3269 person-months data. The bottom panel of Figure 2(a) shows the results after the correction. The figure shows there are still more days with recorded trips in the challenge periods. The remaining difference may be attributed to the fact that there were more workdays in the challenge periods, and days without trips are relatively rare during workdays. The remaining difference may also represent a real effect of the challenges, namely, that participants might have become encouraged to cycle on days on which they normally would not make a trip.

Data analysis
To analyse short-term behaviour, we compared the users' travel behaviour inside and outside the challenge period for each month. This analysis was done using descriptive statistics. We compared the behaviours in the groups to gain a better understanding of the influence of the monthly challenges on cycling.
The longitudinal analysis focused on quantifying the traveller's behaviours across the months. To capture long-term behavioural change, we only included users who used SMART for at least six months. All users have continues six months of data; therefore, there is no missing data.
As already explained, we had no 'before' data as users could immediately use all functionalities of the SMART app upon downloading. We used ANOVA tests with repeated measures in the SPSS software package to identify statistically significant differences between time-series averages.

RESULTS
This section begins with an analysis of the users, followed by a description of the short-term effects of incentives. Finally, longterm behavioural changes are analysed. This is mainly limited to statistically significant results, that is, using a significance level of p = 0.05. We mention it explicitly when differences are not statistically significant. The figures include error bars values and indicate two times the standard deviation of the mean.

Participants and SMART usage
After data selection, 1057 participants remained in the experiment group and 3657 users in the No Challenge group. Figure 3 shows the distribution over the number of months. Participants on average used SMART over a longer period than users in the No Challenge group. About 40% of the 1057 participants used SMART for six months or longer. However, around 40% of the 1057 participants joined a challenge only once, while 10% joined six times or more. In other words, we could only use a small part of the sample for the long-term analysis. In the whole 18-month period, only 68 participants and 36 users in the No Challenge group continuously used the SMART app for more than one year. Table 3 shows how long users used the SMART app continuously. For example, we had 526 active participants and 196 users in the No Challenge group who used the app for six continuous months. We used the data in Table 3 to examine long-term trends.

FIGURE 4
Behavioural change inside and outside the challenge periods, with mode share change on the left, and daily distance change on the right, the error bars indicating the two-sigma error

The direct effects of monthly challenges
The No Challenge group had an average bike share of 37.8%. The bike share for participants of monthly choice challenges, at 47.4%, is clearly higher. This implies that the monthly choice challenge might attract travellers who are already interested in cycling. Hence, there may be some self-selection among participants. However, it should be noted that for the other SMART users, the observed car mode shares for trips shorter than 20 km are lower than what is found in National Travel Surveys. Even so, the participants still have a high car mode share, which has the potential to get further decreased. Table 4 shows the travel behaviour changes and allows comparing differences between inside and outside the challenge period. On average, the bike mode share was higher and car share was lower in challenge periods. The daily bike distance was also greater in challenge periods. However, the car distance was not significantly lower.
In Table 4, we distinguish between regular and non-regular trips. Around 30% of the total covered distance was made by regular trips (at least eight trips per month between the same origin and destination), 65% by non-regular trips, and the rest by round trips. Table 4 shows that regular trips had a higher bike share and a lower car share, mostly because trip distances on average are shorter for regular trips. However, behavioural changes were greater for non-regular trips. This suggests that positive incentives to encourage cycling may be more effective for non-regular trips.
In SMART, participants can freely select any monthly challenge. The question is how participants choose and what the effect may be on behavioural change. Therefore, we used challenge difficulty and challenge accomplishment to categorise person-months into four groups: (1) A participant chose a difficult challenge and accomplished the challenge in the corresponding month (DiffACP), (2) a participant chose a difficult challenge but did not accomplish the challenge (DiffNotACP), (3) a participant chose an easy challenge and accomplished the challenge (EasyACP), and (4) a participant chose an easy challenge but did not accomplish the challenge (EasyNotACP). We also excluded the first month for each participant because it was not possible to assess the relative difficulty of the challenge based on previous behaviour. Figure 4 shows the travel behavioural changes for the four groups. More than 90% of the participants who chose easy challenges also accomplished the challenge, while the completion rate was about 50% for participants who chose difficult challenges. On the other hand, the strongest behaviour change occurred for participants who accomplished difficult challenges. This suggests that participants need to choose difficult challenges to obtain large positive effects. However, bike and car shares also increased and decreased, respectively, when participants accomplished easy challenges. The observed overall behaviour change was more or less the same, regardless of whether participants chose easy or difficult challenges. While the change in behaviour was relatively large when participants completed difficult  challenges, difficult challenges were completed much less frequently. These results were unexpected. Obviously, participants who choose easy challenges do not need to change their behaviour to accomplish the challenge. The fact that they did suggests that the challenges in themselves encourage participants to change their behaviour. Perhaps participants are more aware of their behaviour when they participate in a challenge and therefore more likely to change their behaviour even when they can accomplish the challenge without behaviour change. This would be a positive result and suggests that choice challenges encourage behaviour change in the short term even when participants do not need to change their behaviour to accomplish the challenge.

Longitudinal analysis
New users downloaded and started using the SMART app throughout the whole study period. Figure 5 shows the retention rate of users over time after they start using the app. For both participants in choice challenges and users in the No Challenge group, most of the dropouts occurred in the first six to nine months. After this period, retention rates tended to stabilise. The average retention rate was greater than 50% for participants but around 30% or even lower for the No Challenge group. Importantly, users who started using the SMART app after September had a higher retention rate, which appeared to be linked to an update of the SMART app (updated on 13 August 2017). Table 5 details the number of participants in different stages, that is, those using the app for six, seven to nine, 10 to 12, and more than 12 months (13 or more). We only included the participants who were still using the app in June 2018 (at the end of the study period). Since the retention rates tended to stabilise after six months, Table 5 contains almost all participants who used the app for six months or longer, and we assume that most of the participants in Table 5 have continued using the app afterwards. For the long-term analysis, it, therefore, makes sense to compare these groups in the January to June 2018 period because they more or less represent the same type of participants but who are in a different stage of usage.
In Figure 6, we show the modal shares for the different groups of participants (right panel) and the control group. The control group consisted of users who did not participate in the challenges (No Challenge group) but still may have been influenced by other features of the app over a longer period of time. To exclude those potential effects and enable a fair comparison between the groups of participants, the control group only includes users who started using the app in January 2018. Figure 6 reveals similar seasonal effects for all groups of participants and the control group. Due to increasingly favourable weather conditions, bike shares increased during the first six months, while car shares declined. As mentioned earlier, participants in challenges tended to have a higher bike share and lower car share than users in the control group as Figure 6 shows. The difference is about 15% points. We assume that this is mainly due to self-selection, but we cannot exclude the possibility that there was also some increase in bike share (and decrease in car share) when participants started choosing monthly challenges. Unfortunately, we were not able to verify this as we had no 'before' measurements. Figure 7 shows the bike and car mode share difference between the participants and the control group from January to June. We defined groups in the different stages of app usage as follows: 1. Six months (group 1); 2. seven to nine months (group 2); 3. 10 to 12 months (group 3); 4. more than 12 months (group 4).
We compared each group with the control group separately for bike and car share differences. Therefore, we run eight repeated measures ANOVA for the experimental group with the control group. Moreover, we have four segmentations as experimental subgroups. Thus, there were 40 ANOVA tests in total. Table 6 presents the repeated measures ANOVA results related to Figure 7. The challenge join rate and the share of easy and difficult challenges for each group are also shown in Table 6. Power analysis was operated by the program GLIMMPSE as recommend by Guo et al. [55]. The result shows that a sample size of 40 travellers per group, or a total of 80 travellers, would give a power of at least 0.8 for testing the hypothesis of whether there is a time × intervention interaction. For our study, all ANOVA tests fitted this requirement. Users in all groups had continuously six months of data. Therefore, there is no missing data.   The upper panel of the figure displays the trends for all participants per group (green lines). Groups 2 and 3 showed a slow increase in bike share and a decrease in car use over time relative to the control group. We found a significant linear increase (p < 0.05 and a middle effect size with r = 0.15-0.25 in corresponding one-way ANOVA with repeated measures in Table 6) in bike share and a decrease in car use after the first month. However, there is no significant trend for groups 1 and 4. This suggests that sustained behaviour change may only take place after several months and tends to be stable after one year. As a result, the bike share fractions are on average about 4% points higher for groups 2, 3 and 4, while car shares are about 3% points lower relative to group 1. However, this difference is only significant for the month of May. In this month, bike shares were 6% points higher for groups 2, 3, and 4 (combined). Figure 7 presents a long-term analysis. The upper panel of Figure 7 also shows highly active participants who entered the monthly challenge more than three times in the last six months (blue lines). After more than a year of using the app, there was a significant increase and decrease in the bike and car shares, respectively, for the participants in group 4 relative to the control group. Bike shares were also higher in group 4 than for nonactive users. This may suggest that sustained behaviour change may be more likely for the most active participants.
To have enough trips in our sample, we took all the data for each month. This means that we also included trips that took place in the challenge periods. When considering longterm trends, it is useful to exclude short-term effects due to the challenges. During the last week of each month, almost all participants are outside of the challenge period. Therefore, we also looked at trips from the final week of each month only (yellow lines). However, the trip sample sizes then became so small that the total variation mostly reflected random variation, making significant differences harder to detect. We, therefore, did not find significant trends over the months. However, for group 1, the participants' bike and car shares appeared to be lower and higher, respectively, during each month's last week. This was expected as this last week was outside the challenge period. Interestingly, this difference became smaller after six months, again suggesting that if sustained behaviour change takes place, this may only happen after several months.
In the lower panel of Figure 7, we distinguish between easy and difficult challenges. The group 'difficult challenges' includes users who entered a difficult challenge more than half of the times. For the long-term behaviour change, Figure 7(b) shows the same trends for all the participants. In the first six months (far left), there is no clear trend while there is an increase over time for groups 2 and 3, and bike shares seem to be higher for groups 2, 3 and 4 (relative to the control group). There also appears to be some difference depending on whether participants chose mainly difficult or mainly easy challenges. The bike distance difference between group 1 and the control group is slightly larger for those who predominantly chose difficult challenges. However, for groups 3 and 4, it is the other way around. Bike shares are higher and car shares are lower for the easy challenges (relative to the control group). In fact, for the difficult challenges, mode shares are almost the same for groups 2, 3, and 4 relative to group 1. On the other hand, for the choosers of easy challenges, bike shares are on average about 6% points higher and car shares are about 7% points lower for groups 2, 3 and 4 relative to group 1. This result suggests that sustained behaviour is mainly achieved through easy challenges that are also much more often accomplished. However, the results are not conclusive as none of the differences is significant (with a significance level of 0.05) due to the small sample size. However, these differences are still substantial, that is, they are of the same magnitude as the differences between inside and outside the challenge period (slightly above %). The main difference between short-term and long-term behaviour analyses is the sample size. The sample size is still relatively large compared with other studies, and a substantial fraction of the app users participated in at least one monthly challenge. However, although the retention rates are relatively high among the active participants, the number of participants that used the app over a longer period was relatively low as many users joined in a late stage of the study. Moreover, not all of the longer-term users showed sustained behaviour changed. Therefore, if substantial sustained behaviour changes occurred, this would only be seen in a small fraction of the users, namely, the ones who downloaded the app and used it for more than six months.

DISCUSSION AND CONCLUSION
In this study, we examined the impact of monthly challenges on travellers' cycling behaviour. In the SMART app, participants were able to choose a cycling-related challenge each month and received points upon completion of a challenge. The app also provided other incentives such as feedback, messages and traffic information.
Over 6000 users downloaded the SMART app with a steady 51% half-year retention rate. About 25% of these users joined the monthly choice challenge, indicating that other interventions, such as self-monitoring and feedback, are also important for attracting users. The users who participated in monthly challenges tended to cycle more, suggesting some form of selfselection among the participants in the monthly challenges. (For the other users, the mode shares were similar to those found in the National Travel Survey). While car use was still substantial among this group of participants, retention rates were relatively high. This shows that there is the potential, at least in theory, to encourage more cycling and reduce car use.
We found no clear evidence for behaviour change among users who never joined any of the monthly challenges, which indicates that for behavioural change, they need to become actively involved.
During the monthly two-week challenge periods, the intervention caused a modal shift from car to bike, suggesting that challenge-and-reward interventions are effective for short-term behavioural change. However, the differences between inside and outside the challenge periods were still relatively small. The distance-based modal shift rate was about 5% points. Interestingly, short-term changes appeared to be greater for non-regular trips than for regular trips. One reason may be that the car mode share for non-regular trips is greater than for regular trips, which would create more room to change.
We assumed users who have high challenge commitment would have more significant change; however, we observed the overall behaviour change was more or less the same regardless of whether participants chose easy or difficult challenges. These results were unexpected. Obviously, participants who choose easy challenges do not need to change their behaviour to accomplish the challenge. The fact that they did suggests that the challenges in themselves encourage participants to change their behaviour. Perhaps participants are more aware of their behaviour when they participate in a challenge and therefore more likely to change their behaviour even when they can accomplish the challenge without behaviour change. This would be a positive result and suggests that choice challenges encourage behaviour change in the short term even when participants do not need to change their behaviour to accomplish the challenge. In a longer term, users tend to choose easier challenges, and sustained behaviour is more likely achieved through easy challenges that are also much more often accomplished.
There is some indication that the interventions were effective for some long-term behavioural changes. In general, we observed some increase in cycling (relative to the control group) after between 6 and 12 months of usage. After 12 months, the bike share remained slightly larger for participants in the monthly challenges, even more so for highly active challenge participants. This pattern is in accordance with the transtheoretical model, which states that sustained behaviour is only observed after some time (often after one year). However, our sample sizes were too small to arrive at definitive conclusions. If sustained behaviour changes occurred, this may only have been the case for a small fraction of the app's users.
Behaviour change has often been equated with action since action is observable, but decisions to change a habit are preceded by a change in awareness [56,57] as habitual behaviour is relatively unintentional [58,59]. The change in awareness before action is often ignored in studies because the monitoring periods when incentives are offered are relatively short. This limits the potential for understanding awareness changes and why people may revert to old habits in the long term. We found that different users may take action over different time periods and that some people may not consciously change behaviour but will return to their old habit after a short period of behaviour change related to an incentive. This suggests that in order to cause conscious changes, experiments need to run long enough to cover all the early stages.
One way to raise awareness is to provide feedback and education; the SMART app does this. In addition, Geller [60] has found that several small incentives can be more effective than one large incentive since this causes individuals to develop internal motivation for behavioural change. In SMART, incentives are relatively small and recurrent. At first sight, it is therefore not obvious how sustained behaviour change could still be increased within the context of our experiment. Also, it is quite hard to control for all these parameters in a real-world environment. However, the interventions had a strong impact on the most active participants and accomplished long-term behavioural changes. Increasing the variety of challenge types and making the challenges feasible for all types of travellers might increase retention [52], increase the challenge join rate, and lead to more sustained behaviour change.
The limitation of this study is that we only analysed the period during which challenges were provided. We did not have 'before' data and had no information on the behaviour after no more monthly challenges would be provided. Moreover, there is self-selection for SMART users in the experiment group. Yet, the self-selection group still had a significant car mode share that can be decreased further, and this study proves that the behaviours of the self-selection group have the potential to be changed in the long term by the interventions. Moreover, some SMART users were in the control group who did not join the challenge yet but used the app for other functions and can join the challenge later. In line with this, future research could focus on to design apps to attract travellers who do not cycle often or focus on an even longer-term experiment because these users may need longer interventions to change in awareness. In addition, in order to let travellers truly behave as in a real-life environment, the real-world experiment was controlled not to involve privacy issues. As a result, there is no sociodemographic data of travellers.
However, this study does show the realistic potential of using positive interventions via a commercial app in the real world. The SMART app provides larger samples than most studies have so far and makes it possible to monitor behaviour continuously over a more extended period of time. As Prochaska and Velicer [61] found with regards to quitting smoking, the so-called maintenance period lasts from six months to about five years. Therefore, we will monitor SMART users for an even longer period of time in a future study.
In conclusion, our study has shown that monthly challenges can be effective persuasive interventions to accomplish shortterm change. The effects on the long term are less clear and suggest that behaviour change is a gradual and cyclical process in line with the transtheoretical theory. VTBC schemes need to pay more attention to the different stages of behaviour change and monitor them accordingly. Results from short-term studies probably overestimate the change that can be sustained.