Comparing Econometric Methods to Empirically Evaluate Activation Programs for Job Seekers

We test whether different identification strategies give similar results when evaluating activation programs. Budgetary problems at the Dutch unemployment insurance (UI) administration in March 2010 caused a sharp drop in the availability of these programs. Using administrative data provided by the UI administration, we evaluate the effect of the program (1) exploiting the policy discontinuity as a quasi-experiment, (2) using dynamic matching assuming conditional independence, and (3) applying the timing-of-events model. All three strategies use the same data to consider the same program in the same setting, and show that the program reduces job finding directly after enrollment. However, the magnitude of the estimated drop in job finding differs between the three estimation methods. In the longer run, all three methods show a zero effect on employment.


Introduction
In 2002, the Dutch market for activation programs was privatized, implying that the unemployment insurance (UI) administration buys services of private companies to assist benefits recipients in their job search. Due to the economic crisis, the demand for programs increased sharply in 2009 and early 2010, leading to budgetary problems in March 2010. The government refused to extend the budget, which terminated the purchase of new programs within a period of two weeks. New UI benefits recipients could no longer enroll in these programs. In this paper, we exploit this policy discontinuity to evaluate the effects of participating in the programs on job finding. In addition, we estimate the same effects using non-experimental estimators (matching and timing-of-events) and compare the results of the three identification strategies.
The main challenge in evaluating active labor market programs is selective participation (Heckman et al. (1999), Abbring and Heckman (2007)). As shown in a meta-analysis by Card et al. (2010), over 50% of evaluation studies use longitudinal data and compare a treatment group with a control group that is typically constructed by matching on observed characteristics (e.g. Brodaty et al. (2001), Sianesi (2004) and Lechner et al. (2011)). Some studies exploit institutional features or policy changes that generate random variation in program participation (e.g. Dolton and O'Neill (2002) and Cockx and Dejemeppe (2012)). Less than 10% of the studies use an experimental design (e.g. Van den Berg and Van der Klaauw (2006), Graversen and Van Ours (2008) and Card et al. (2011)). While Card et al. (2010) and Kluve (2010) find no evidence of a relationship between the identification strategy and the empirical results in their meta-analyses, we know since LaLonde (1986) that non-experimental estimators may produce results that do not concur with those from experimental evidence. Heckman et al. (1997) stress that experimental and non-experimental estimates can only produce similar results if three requirements are fulfilled. First, the data source should be the same for the treatment and control group. Second, treated and control individuals should be active in the same local labor market. Third, the data should contain a rich set of variables that affect both program participation and labor market outcomes. Smith and Todd (2005) argue that each of these requirements is likely to be violated in the non-experimental evaluations by LaLonde (1986). 1 However, Bléhaut and Rathelot (2014) show for a French job search assistance program that even when all conditions are satisfied the estimated average treatment effect on the treated is very different when using a random control group compared to using a matching estimator on the usual non-participants in the treatment group. 2 We contribute to this literature by empirically comparing identification strategies. Our contribution is threefold. First, our quasi-experimental estimates are identified from a nationwide policy discontinuity in 2010 and we use high-quality administrative data. 3 Therefore we observe, for a large sample of individuals around the policy discontinuity, detailed information on individual characteristics, preunemployment labor market variables, current unemployment spell characteristics and the timing of privately provided activation programs. Second, for the nonexperimental analysis we use a recently proposed dynamic matching estimator (Vikström (2017)) as well as the timing-of-events model (Abbring and Van den Berg (2003)).
All our identification strategies take the dynamic nature of program assignment into account, and correct for dynamic selection. 4 We discuss our identification strategies within a framework for treatment effects with dynamic enrollment (see Kastoryano and Van der Klaauw (2011) for a more extensive discussion). Third, we consider the private provision of activation programs that are commonly offered by public institutions. Current evidence on the effectiveness of private providers varies: Krug and Stephan (2016) show that placement services are less effective when offered privately and Behaghel et al. (2014) and Cottier et al. (2015) find that assistance provided through the private market is ineffective in stimulating exit to work. On the other hand, Bennemarker et al. (2013) find positive effects of privately provided employment services.
The policy discontinuity that we consider reduced the weekly enrollment in the privately provided activation programs from 1300 to less than 80 within one month and it remained below 50 afterwards. We estimate the treatment effect on the treated by comparing employment outcomes of cohorts entering unemployment relatively shortly after each other. Since these cohorts reach the policy discontinuity 2 Mueser et al. (2007), Lalive et al. (2008), Biewen et al. (2014) and Kastoryano and Van der Klaauw (2011) provide recent comparisons of (quasi-)experimental and non-experimental estimators for active labor market programs. 3 We refer to the program discontinuity as a "quasi-experiment". 4 Evaluations with dynamic enrollment struggle to find the appropriate control group (Fredriksson and Johansson (2008)). Often the dynamic setting is simplified to static enrollment (Brodaty et al. (2001), Bléhaut and Rathelot (2014) and Sianesi (2004)).
at different unemployment durations they are affected differently, which identifies the effect of the programs (Van den Berg et al. (2014)). Seasonal variation in the labor market is controlled for using cohorts from the previous year. The estimation results show that after starting a program, employment probabilities are reduced significantly. After two to three months these increase again, up to a zero difference in employment 12 to 18 months after enrolling in the program.
Where the quasi-experimental estimator compares employment outcomes between different cohorts, the matching estimator and timing-of-events model compare treated and non-treated individuals within the same cohorts. The timingof-events model allows for selection on unobservables, but adds more structure to the model. The estimation results of both approaches show again a significantly negative effect of program participation directly after entering the program, which disappears in the longer run. These results are in line with our quasi-experimental estimates, but the magnitude of the negative effect varies across the different estimators and differences are sometimes significant. 5 The dynamic matching estimator and the timing-of-events model can use larger samples than the quasi-experimental approach. Therefore, the quasi-experimental treatment effects are estimated much less precise, also because intention-to-treat effects are inflated with small shares of program participants. 6 The results from the dynamic matching estimator and the timing-of-events model are both robust against extending the sample beyond the discontinuity cohorts, but the treatment effects estimated using the timing-ofevents model are closer to zero. This may suggest that there is some selection on unobservables and in particular more disadvantaged unemployed workers are more likely to participate in the program.
The results from the three different approaches yield the same policy recommendations, but interpreting results should be done with care. The quasi-experimental results rely on a common trend assumption, stating that in the absence of the policy discontinuity, different cohorts experience similar employment probabilities, up to a constant difference. Furthermore, the activation programs contain a mixture of caseworker meetings, job referrals, training or schooling, subsidized employment 5 The negative impact on employment in the short run is in line with the well-documented lock-in effect (Lechner et al. (2011)). 6 In absolute terms the monthly number of individuals entering the program is large, but this remains a small fraction of all eligible UI benefits recipients in the Netherlands. and goal setting in the job search process. These elements are common in activation programs in other countries, but with different intensities. The programs are offered in addition to the "basic" assistance of the UI administration (mostly irregular meetings with caseworkers). Caseworkers have substantial discretion when assigning job seekers to programs, which is common to many UI administrations.
Finally, our observation period describes the period of recession during the economic crisis and relates to the Dutch setting, which offers similar benefits levels as other Northern-European countries, but more generous benefits than the US or UK.
The remainder of the paper is structured as follows. We describe the institutional setting and the budgetary problems which led to the policy discontinuity in Section 2. An overview of the data is provided in Section 3. In Section 4 we define our treatment effect of interest. In Section 5 we present non-experimental results from the matching and timing-of-events estimators. In Section 6 we discuss how the discontinuity allows identification of the treatment effect and present estimation results. Section 7 compares the results from the different approaches and provides a discussion. Section 8 concludes. Supplementary material and more results are available in the Online Appendix.
2 Institutional setting and the policy discontinuity A UI benefits recipient is required to search for work actively, which implies making at least one job application each week. Benefits recipients are obliged to accept any suitable job offer. 7 Caseworkers at the UI administration provide basic job search assistance and monitor compliance to job search obligations, but the intensity of individual meetings is low. In 2009, if caseworkers judged that the benefits recipient required more than the usual guidance, they could assign an individual to a program with the goal to increase the job finding rate. A large diversity of programs existed, including job search assistance, vacancy referral, training in writing application letters and CV's, wage subsidies, subsidized employment in the public sector and schooling. Some of these were provided internally by the UI administration, while others were purchased externally from private companies. These were for-profit companies and they faced payment schemes that consisted partly of lump-sum payments and partly of performance payments. 8 Our analysis focuses solely on the programs that were externally provided by private companies. These can be broadly classified as (with relative frequency in parentheses) job search assistance programs (50%), training or schooling (28%), subsidized employment and other programs (22%). However, different programs were frequently combined, so schooling is often followed by job search assistance or job search assistance can be accompanied with training or followed by subsidized employment. We consider the first moment of participating in a program as the start of treatment, which implies that our analysis relates to the mixture of programs.
Though some guidelines existed, caseworkers had a large degree of discretion in deciding about the assignment of the different programs.
The lack of centralized program assignment together with an increased inflow in unemployment due to the recession caused that many more individuals were assigned to these programs in 2009 and early 2010 than the budget allowed. Therefore, the entire budget had been exhausted by March 2010. Authorities refused to extend the budget and declared that no new programs could be purchased from that moment onward. 9 Assistance offered internally by the UI administration continued without change. In Section 3 we show that indeed the number of new program entrants dropped to almost zero in March 2010 and remained very low afterwards.
In our empirical analysis, we focus on job finding. Van Table 1 presents summary statistics for the full sample, as well as for three cohorts defined by their month of inflow into unemployment.
unemployed workers). 10 The original dataset contains 671,743 spells. We exclude 34,968 spells from individuals previously employed in the public sector and 17,454 from individuals older than 60 years. Next, we drop 25,778 spells for which important variables are missing, 524 spells from individuals working more than 60 hours or less than 12 hours in their previous job and 9,504 spells during which the job seeker also had ongoing employment. Finally, we exclude 288 spells with a negative unemployment duration and 34 spells with negative benefits eligibility. Column (1) shows that for the full sample the median duration of unemployment is 147 days (almost five months). About 44% of those exiting UI find work and for 13% the reason of exit is unknown. We include both in our outcome measure "outflow to work". 11 The end of the benefits entitlement period is reached for 32% of the spells. Almost 5% leave unemployment due to sickness or disability; the rest leave for other reasons.
In the full sample about 13% of the benefits recipients participate in one of the externally provided programs. Slightly more than half of these programs focus on job search assistance, over a quarter involve some sort of training, and less than a quarter consist of some other program. 12 Note that these numbers refer only to the first program that an individual participates in. About 26% of all individuals 11 Often UI recipients do not give a reason for terminating their benefits or these reasons are not registered by caseworkers. In most cases the individual starts generating income from other sources related to working. Our results are robust against including unknown as outflow to work.
12 To increase precision in exploiting the policy discontinuity, it is possible to select individuals with characteristics that are correlated to participation in the external activation programs. This has been done in the Discussion Paper version of this paper (Muller et al. (2017)). The gains of reducing the sample are small though, because the correlation between individual characteristics and program participation is weak.  The dataset contains many individual characteristics. The lower panel of Table   1 presents sample means for some characteristics. The average individual is almost full-time unemployed (34 hours) and 32% have experienced an earlier period of unemployment in the three years prior to entering UI. In columns (2) Each inflow cohort reaches the policy discontinuity at a different moment during their UI spell. Figure 3 illustrates that each subsequent cohort experiences the drop in the first program entry hazard one month earlier in their unemployment duration.
Participation in an external program is not restricted to a certain unemployment duration. Before the policy discontinuity the participation hazards of the different cohorts are very similar, indicating that there were no earlier policy changes. The cohort of March 2010 has an almost zero probability of entering an external program.
13 Figure D1 in Online Appendix D shows that the discontinuity occurs for all types of programs. A concern might be that caseworkers have responded to the inability to assign unemployed workers to external programs. However, resources for the internal programs (offered by the UI administration itself) remained unaffected around March 2010, limiting the scope for scaling up internal programs. The number of internal programs starting in each month is shown in panel (b) of Figure 2. There is no indication of a response around the date of the policy discontinuity. Separate graphs by type of program are provided in Online Appendix D ( Figure D3). The hazard rates into an internal program for different cohorts are shown in Figure 4. The hazard rates are very similar, supporting the assumption that internal program provision was unaffected by the policy change. 14 A further concern might be that even though the number of internal programs was not changed, caseworkers may have reacted to the unavailability of external programs by shifting their internal programs to these individuals that might otherwise have participated in external programs. This would imply that the policy does not 14 In theory, job seekers could decide to pay for an external program themselves once it is no longer offered through the UI administration. However, the costs of these programs are considerable, especially for unemployed workers such that this almost never happens in the Netherlands.

Treatment effects
In this section we define the treatment effects that we aim to estimate. Recall that only a small share of all unemployed workers enter an external program during their unemployment spell. Due to selectivity in the participation decision, the composition of program participants and non-participants is different. We focus on treatment effects for participants. These treatment effects are nonstandard because enrollment in the program is dynamic.
Our outcome variable is duration until employment, which is a random variable  (2007)). We define potential outcomes when treated as: The potential outcome under no treatment is defined as Y * 0,t = lim s→∞ Y * 1,t . We adopt the so-called no-anticipation assumption (Abbring and Van den Berg (2003)), which imposes that program participation at duration s only affects potential outcomes at durations t > s. This is required to define counterfactuals and thereby treatment effects. The no-anticipation assumption allows us to write the potential untreated outcomes as The no-anticipation assumption is strict since it rules out that individuals change their job search behavior prior to s in anticipation to learning that they will enroll in the program at time s. 15 Such behavior may be unlikely for the external programs we consider in this paper. Programs are assigned by caseworkers on an individual basis. There are no strict criteria for participation and only a small fraction of the unemployed workers can enroll, so it is impossible for job seekers to know in advance when they will enter a program.
Individuals leave unemployment after different durations, such that the compo- The average treatment effect on the treated survivors ATTS(s, t) provides a series of effects for different values of s and t. Policy makers often focus on all participants, which requires knowledge about the treatment assignment policy. Let f (s) denote the density function of starting treatment at time s and S 0 (s) the survivor function in unemployment until time s for individuals who did not participate in treatment before s. We define the average treatment effect on the treated evaluated at time t after entering unemployment as This describes the average treatment effect on those individuals who started participating in the treatment before time t. For tractability of the comparison of the different estimation approaches, we focus on estimating the ATTS(s, t) and only briefly discuss the estimated ATET(t) in Section 7.

Choice of samples
The different identification strategies that we apply, require sample selections that do not necessarily coincide. Exploiting the policy discontinuity requires using a specific sample of individuals entering unemployment around the time of the discontinuity.
Matching and timing-of-events can use a much larger sample including individuals entering unemployment earlier or later. To apply each identification strategy as they would be applied in a usual application, we construct two subsamples from the full sample (see Table 2).
A difference in results can be due to different sample selection rules. We argue that choice of sample is an essential part of an identification strategy. However, to investigate to what extent the sample choice causes differences in results, we perform each analysis also with a smaller sample that is the same across all strategies (column (2) in Table 2). A more extensive discussion on the selection of the smaller sample is presented in Section 6, where we discuss the quasi-experimental approach exploiting the policy discontinuity. In addition, we apply the matching and timing-of-events a In addition to restricting the inflow period, observations in this sample are also censored at the discontinuity.
estimators to a third sample that excludes the discontinuity period (column (3) in Table 2). This sample contains only individuals entering unemployment before the discontinuity and censors all observations at the time of the discontinuity. The rationale for analyzing such a sample is that the discontinuity creates exogenous variation in program participation, and we study how non-experimental methods perform without including such variation.
Our comparison considers the approaches presented in Table 2. The full sample and the pre-discontinuity sample are used for the non-experimental methods only, while the discontinuity sample is used for all three methods.
5 Non-experimental analysis

Matching estimator
We start the empirical analysis by applying a matching estimator. The identification strategy does not exploit the policy discontinuity, but instead compares individuals with similar characteristics differing only in treatment status. We apply a dynamic matching estimator (proposed by Vikström (2017)) to account for the dynamic setting and dynamic selection.
The approach relies on two main assumptions. First, selection into treatment is on observables only: This unconfoundedness assumption implies that after conditioning on a set of observed characteristics, assignment to treatment is independent of the potential outcomes. See Vikström (2017) for a discussion of how this assumption generalizes to a setting of dynamic treatment assignment. Our administrative data include a rich set of covariates, which is crucial for the likelihood that unconfoundedness holds. Employment histories are argued to be particularly important, because they tend to be strong predictors of future labor market outcomes as well as program participation (e.g. Card and Sullivan (1988), Heckman et al. (1999), Gerfin and Lechner (2002), Lechner et al. (2011) and Lechner and Wunsch (2013)). In addition to employment history (previous hourly wage, unemployment history, industry), we observe individual characteristics (age, gender, education level, marital status, region) and variables describing the current unemployment spell (unemployment size in hours, sickness or disability, maximum benefits entitlement). This set of covariates is at least as extensive as usually available when evaluating active labor market programs. This ensures that if the sample size is sufficiently large, counterfactuals can always be found. This assumption is likely to hold, since there are no (combinations of) individual characteristics that perfectly predict program participation in our data.
As baseline, we consider the full sample, and as robustness checks we use two subsamples (as discussed in Subsection 4.1). We compare those that are treated at s with those that are never treated. Clearly, the probability of never receiving treatment depends on the length of an unemployment spell. That is, the longer someone remains unemployed, the larger the probability of starting treatment. To correct for the dynamic selection, we apply the estimator proposed by Vikström (2017). 16 The treatment group contains all individuals that start treatment at s, while the control group includes all individuals that have not been treated and are still unemployed by duration s. When measuring the outcome at time t > s, only those in the control group that have not yet started treatment at time t are included (spells of those in the control group that start treatment between s and t are censored).
Since program assignment is not random, the surviving control group individuals are weighted using propensity scores, making them comparable to the treatment group in terms of observed characteristics. Given the continuously changing composition of the control group, the weights are recalculated for each period. The estimator thus corrects for both selective exits and potentially selective right-censoring. A formal description of the estimator is provided in Online Appendix A.
We aggregate all spell data into 30-day intervals and estimate the impact of treatment on job finding for different values of s, that is, programs starting at different elapsed durations. The estimates are presented in Figure 6 for treatment starting at s = 3, 4, 5, 6 months, and measured up to an unemployment duration of 16 months. We focus on these four values of s because inflow into programs is highest at these durations (see Figure D2 in Online Appendix D). The presented estimates are differences in survivor probabilities between the treatment group and the (weighted) control group.
The estimation results show that program participants are less likely to find employment. Around two to three months after the program has started, the probability of finding employment in the control group is 10 to 20%-points higher than in the treatment group. This difference remains for at least 12 months. The pattern of program effects is very similar across the four panels (a)-(d) and estimated impacts for other values of s (s = 1, . . . , 8) show very similar results (see Figure D4 in Online Appendix D). The effects of the program are thus equally negative when a benefits recipient enters the program relatively fast or later during the period of unemployment. The declining impact of program participation during the first couple of months is consistent with a lock-in effect. Recall that we estimate the effect of a mixture of activation programs that regularly includes schooling or training. 16 An alternative approach that has been applied in the literature is to compare those treated at s with those not yet treated at s and interpret the difference as the effect of early treatment relative to potential later treatment (see Sianesi (2004)).  Note: Estimates based on too few observations are not shown in the graphs. In panel (d) the discontinuity sample is omitted, because no individual starts a program beyond six months of unemployment. Figures D4 -D7 in Online Appendix D show separate graph for all estimates also including confidence intervals. The negative effects are statistically significant until at least 12 month of unemployment. After that some estimates become insignificant, both because estimated effects move to zero and sample sizes become small.
Participation in such programs often causes that unemployed workers invest less time in job search. Furthermore, job search assistance programs may crowd out certain types of job search effort which are more effective for finding work quickly (e.g. Van den Berg and Van der Klaauw (2006)). Figure 6 shows that using different samples yields very similar estimates for the impact of program participation. In the discontinuity sample selectivity in program participation should be less of an issue than in the other samples. The similarity of estimated effects can either imply that the set of covariates is sufficiently rich to deal with selective program assignment or that programs are assigned relatively random. Therefore, we repeat the matching estimator, but only using age dummies as observed covariate. This hardly affects the estimates, which provides evidence that selectivity on observed covariates is not large when assigning benefits recipients to the programs. Whether selectivity on unobserved characteristics is important when assigning the program requires comparing the findings to those from the other empirical approaches.
We conclude that the negative impact of the program on employment probabilities is (1) the same for assignment at different durations, (2) very robust across different samples and (3) even robust against excluding most observed characteristics from the matching estimator. 17

Timing of events model
Matching requires very modest functional-form assumptions, but relies on a potentially strong unconfoundedness assumption. The timing-of-events model (Abbring and Van den Berg (2003)) allows for selection on unobserved variables, but makes stronger functional-form assumptions. This model has been applied regularly in the recent literature on dynamic treatment evaluation. 18 The timing-of-events model jointly specifies job finding and entry into the program using continuous-time duration models. To control for unobserved characteristics, the unobserved heterogeneity terms in both hazard rates are allowed to be correlated. Identification relies on the mixed proportional structure of both hazard rates. Since the model is a continuous-time model, we use daily spell data and do not have to make an assumption on which unit of time to take when discretizing the unemployment durations.
We present a concise description of the model here, while a detailed version is presented in Online Appendix B. Consider an individual entering unemployment at calendar date τ 0 . The job finding (hazard) rate depends on the number of days of unemployment t (modeled by ψ e (t)), on the calendar time τ 0 + t (modeled by 17 Recall that we also include exits from UI for unknown reasons as job finding. Figure D8 in Online Appendix D shows that estimated program effects are slightly more positive if we consider exits for unknown reasons as job finding and slightly more negative when we consider all exits as job finding. Our estimation results are thus not specific to the definition of job finding and this also holds for the estimation results in the following sections.  (ψ e (τ 0 + t)), observed characteristics x and unobserved characteristics v e . When starting the activation program after s periods of unemployment, the hazard rate shifts by the treatment effect δ t−s , which can depend on the elapsed duration t − s since entering the program. The treatment effect is modeled as a piece-wise constant function of the elapsed duration since starting the program. The job finding rate is given by: Estimation of equation (4) yields an inconsistent estimate of the treatment effects if program participation is (conditional on the observed characteristics) non-random.
To account for this, program participation is modeled jointly, also using a mixed proportional hazard rate: with all notation similar to equation (4), but subscript e replaced by subscript p.
The unobserved term v p is allowed to be correlated with v e , with joint discrete distribution g(v e , v p ). We take g(v e , v p ) to be a bivariate discrete distribution with an unrestricted number of mass points. The duration dependence patterns and the calendar time effects are parameterized with a piece-wise constant function.
Estimation of the parameters is performed by maximizing the log-likelihood, in which right-censoring is straightforwardly taken into account. Table 3 shows the estimated hazard ratios using the full sample, the smaller discontinuity sample and the pre-discontinuity sample. 19 Program participation has a statistically significant negative effect on the job finding rate in the first three months (δ (1−3 months) ). In the next three months the effect is very close to zero (hazard ratio is about one). After six months (δ (≥6 months) ) program participation has a significantly positive effect on the probability of finding a job (1.247). The smaller discontinuity sample and pre-discontinuity sample give slightly smaller positive program effects. Standard errors are larger due to the smaller sample size.
All estimation results show evidence for correlated unobserved heterogeneity, expressed by two distinct mass points. 20 Conditional on the observed characteristics, there is negative selection in the activation program. Unemployed workers who have unobserved characteristics that indicate longer UI spells have a higher rate of participating in the activation programs.
The estimates for the parameters δ provide a multiplicative effect on the job finding rates. They can not be directly interpreted as measure for the treatment effects ATTS(s, t). Therefore, we follow Kastoryano and Van der Klaauw (2011), To translate this into the average treatment effect on the treated survivors, we should condition on the rate of receiving treatment after s periods. We use the hazard rate model for entering the program, which gives is the rate at which individual i enters the activation program after s periods. Since we have estimated all right-hand-side parameters in (6), we can obtain an estimate for the ATTS(s, t). in Online Appendix D, confidence intervals are computed using the delta-method).
Finally, the overall magnitude of the negative impact in each of the four panels is smaller compared to what we found using the dynamic matching estimator. In this section we focus on the policy discontinuity. We first discuss how the exogenous variation due to the policy discontinuity allows to identify the effect of the program on outflow to work. Next, we discuss the estimation results.

Identification
The policy discontinuity creates exogenous variation in program participation over inflow cohorts in unemployment. Consider two cohorts of entrants in unemployment.
The first enters unemployment t 1 periods before the policy change. The second cohort enters unemployment later, but still before the policy change. For this cohort the time until the policy change equals t 2 < t 1 (see Figure 8). The two cohorts face the same policy of potential program assignment for t 2 time periods, implying that dynamic selection is the same up to this point. After t 2 , the first cohort faces another period of potential program assignment, with length t 1 − t 2 , while the second cohort is excluded from program participation. As a result, we can compare the outflow to employment in the two cohorts, for those individuals that survived up to t 2 and did not enroll in a program prior to t 2 .
To assign differences in job finding between these cohorts to program participation, three conditions should hold. First, the policy discontinuity should be unanticipated by unemployed workers and caseworkers. Job seekers and caseworkers should not have changed their behavior in anticipation to the policy change in the period just before March 2010. Our policy change has the advantage that the UI administration only realized late that the budget for these programs had run out and expected that the Ministry of Social Affairs would extend the budget. Enrollment in the external programs stopped immediately after an extension of the budget was unexpectedly rejected.
The second condition is that there should not be compositional differences between cohorts in terms of unobserved characteristics. This condition is equivalent to a conditional independence assumption, but has milder implications. The composition of cohorts is less likely to suffer from selectivity than program assignment.
We use weighting to make each cohort equivalent to the March 2010 cohort in composition of observed characteristics. We distinguish 288 groups based on interacting covariates. 22 Define the share of group g in cohort c by α c,g . The weight assigned to an observation belonging to group g in cohort c is given by: We define the survivor functions that will be estimated in the analysis, as the weighted average of the survivor functions of each cohort-group: The third condition is that there should be no differences between the two cohorts in factors that affect job finding, other than the difference in program assignment.
However, job finding probabilities change over the business cycle and may fluctuate due to seasonality. To reduce the impact of the business cycle and seasonality, cohorts that are close in time should be compared. But evaluating the program requires cohorts that are substantially spaced apart so that there is sufficient difference in program participation between the cohorts. In the short-run, seasonalities are the main source of fluctuations in unemployment.
22 As characteristics we use three previous hourly wage categories, an indicator for having been unemployed in the past three years, an indicator for being married or cohabiting, age categories, three education categories and an indicator for being part-time unemployed (less than 34 hours per week). Also inflow into and outflow from UI are relatively stable around the policy discontinuity. Changing labor market conditions may affect outcomes in two ways. First, they affect the composition of the inflow into unemployment and this affects aggregate outflow probabilities (which we address by weighting). Second, labor market conditions affect outflow probabilities directly, as it is more difficult to find work when unemployment is high (e.g. Van den Berg and Van der Klaauw (2001)).
To deal with business cycle and seasonal effects, we consider a model with an additive structure. Let cohort τ be defined by the month of inflow into unemployment. The survivor function in unemployment (S(t)) has some baseline shape λ(t) and further depends on the effect of the business cycle (b τ (t)), the effect of seasonalities (l τ (t)) and the effect of entering the program after s periods on being employed after t periods (which is the ATTS(t, s)). Furthermore, f (s) describes the density function of program enrollment after s periods. Our additive specification of the survivor function is given by The policy discontinuity caused a stop on program entry at momentτ , without affecting f (s) prior to this moment. Comparing the survivor function of two cohorts that entered unemployment before the policy discontinuity gives Assuming that τ describes the more recent inflow cohort (τ > τ ), then This simple cohort difference identifies the impact of program participation plus potential business cycle and seasonality differences. To eliminate the effect of seasonality we consider the inflow cohorts 12 months earlier, so τ − 12 and τ − 12.
We take t such that both these inflow cohorts are up to t unaffected by the policy discontinuity, then (10) This double difference identifies the impact of program participation if we make This assumption imposes that during the observation period the business cycle effects change only very smoothly. Recall that Figure 9 suggests that, if the observation period is sufficiently small, seasonal effects are larger than business cycle effects. Our doubledifference estimator is an extension of the approach suggested by Van den Berg et al. (2014), who exploit a policy discontinuity to estimate effects on a duration variable. In Online Appendix C we discuss in more detail the assumption that business cycle effects are small (which is similar to the standard common trend assumption in difference-in-differences), and provide placebo estimates that support the assumption.
In our empirical approach we estimate the intention-to-treat effects, specified as τ −τ s=τ −τ ATTS(s, t)f (s)ds, but using the empirical program participation rate f (s) it is straightforward to obtain ATTS(s, t), which describes the average treatment effect on the treated survivors.

Estimation results
We start by defining which cohorts to compare. The time interval between cohorts should be small to minimize business cycle effects, but the trade-off is that more time between cohorts increases the difference in exposure to program participation.
We use cohorts three months apart. To exploit the policy discontinuity, the cohorts should not enter unemployment too long before March 2010. Therefore, we use the cohorts of October 2009 until January 2010, facing between five and two months of potential program participation, respectively. Each cohort is compared to the cohort entering unemployment three months earlier. The survivor functions of each cohort are presented in Figure 10, showing that around 50% of the UI recipients find work within 12 months.
We first take the difference between the survivor function and the survivor function of the cohort entering unemployment three months earlier. 23 This compares the outflow in a cohort without enrollment in the program to the outflow in a cohort with regular enrollment in the program during the period before the policy discontinuity. We condition on survival and no-treatment up to the duration at which the later cohort reaches the policy discontinuity. The three-months differences are presented in panel (a) of Figure 11. We find very similar patterns across the different comparisons: a negative effect on job finding ranging from 3%-points to almost 8%-points which decreases in magnitude over time. The negative effects persist up to at least 13 to 17 months of unemployment.
23 All estimates presented in this section are estimated using weights as discussed in Subsection 6.1. Figure 11: Intention-to-treat effect estimates exploiting the polciy discontinuity These estimates are based on simple differences between cohorts (equation (9)).
By subtracting the same differences from a year earlier, we correct for seasonalities (equation (10)). Estimates from such a "difference-in-differences" approach are presented in panel (b) of Figure 11. 24  We find a negative impact of program participation on job finding that is consistent across different cohort comparisons and across the two estimators. This finding is in line with the lock-in effect that we have discussed earlier. These estimates measure intention-to-treat effects, and should be divided by the cohort differences in treatment participation to obtain average treatment effects.
24 When estimating the effects we only present estimates up to the duration at which the cohorts from a year earlier reach the policy discontinuity, which is between 15 and 18 months. Estimates at longer durations are biased as the earlier cohorts are affected by the policy discontinuity as well.  We estimate the difference in program participation by whereτ is the moment of the policy discontinuity ( The difference in program participation can also be computed using the same "difference-in-differences" approach, which are presented in panel (b) of Figure   12. The differences are less smooth, but exhibit the same pattern. 25 The estimated ATET(s, t) are presented in Figure 13. Since differences in treatment participation between cohorts accrue over a three-months period, the initially small differences inflate the intention-to-treat effects substantially. To facilitate visual presentation, we remove some of the extreme estimates in the early months. 26 We find that the impact on the probability on being employed is around 20 to 50%points, both when using simple differences (panel (a)) and double differencing (panel  Note that from an identification perspective it is a major advantage that participation was reduced to zero (rather than to some positive value), because that implies that our estimates can be interpreted as average treatment effects on the treated, rather than local average treatment effects. 27

Discussion
In this section we compare the estimated effects of the activation programs obtained using the three approaches. Table 4 shows the estimated program effects of enrolling after s = 3, 4, 5, 6 months on being employed after 12 months. 28 The pattern of program effects are very similar for the three methods, but the magnitudes of the estimates differ. Immediately after enrolling in the program, the job finding rate declines, and after some months the negative effect becomes smaller. The program thus postpones job finding, but does not affect the probability to have work in the long run. 29 This 27 In the terminology of instrumental variables, there is full compliance with the instrument (the discontinuity) and there are no "always-takers".
28 The dynamic matching estimates coincide with those depicted in Figure 6, while the timingof-events estimates coincide with Figure 7. For the quasi-experimental estimates, we include the double-difference estimates presented in panel (b) of Figure 13. 29 We cannot predict what happens in the very long run beyond our observation period, but job search assistance programs (which constitute a large share of the activation programs that we consider) typically do not have very strong long-run effects (e.g. Card et al. (2010)). All coefficients are estimates of ATTS(s, t). a Confidence intervals (95%) of the difference between estimators (only for the discontinuity sample), computed using bootstrapping (100 repetitions). Stars indicate significance levels (** 0.05, * 0.10) finding is also robust against using different samples and consistent with the results of Vikström (2017) for a Swedish work practice program. The common explanation for this pattern is the presence of lock-in effects (Lechner et al. (2011)), which cause that effects on employment are negative and decline immediately after starting a program. In the period that the program effects start increasing, the effect on job finding is dominant. The program effects decline for about two to three months, which can be considered as the lock-in period. Lechner et al. (2011) report that lock-in periods can be quite long and even longer than the lock-in period that we find.
In their meta-analysis, Card et al. (2011) classify evaluation study results as a positive, negative or zero impact. However, if one goes beyond the sign of the impact, we find substantial differences in the magnitudes of the estimated treatment effects.
The quasi-experimental estimate is largest in magnitude. This approach estimates the average treatment effect by dividing the intention-to-treat effect by a small fraction of treated individuals. This causes that the treatment effect is estimated relatively imprecise. In the lower panel of Table 4 we present confidence intervals for the differences between estimates from the different approaches (only for the discontinuity sample). These confidence intervals are computed using bootstrapping.
We find that the quasi-experimental estimate for (s = 3, t = 12) differs significantly from the matching estimate, while this is not the case for s = 4 and s = 5. The timing-of-events estimates are substantially smaller in magnitude than the quasiexperimental and matching estimates, and we find that these differences are mostly statistically significant.
The difference between the magnitudes of the estimates may suggest that not all identifying assumptions hold. First, the timing-of-events study finds some relevant unobserved heterogeneity. This may either imply that the conditional independence assumption required for the matching estimator might be violated or that the mixed proportional hazard specification in the timing-of-events model is too restrictive. Second, the common-trend assumption in the quasi-experimental approach could be violated, leading to a (small) downward bias. Finally, the no-anticipation assumption is required to define treatment effects. Due to the unexpected policy discontinuity this assumption is less likely to be violated in the quasi-experimental evaluation.
Our empirical analysis provides estimates for the ATTS(t, s), which is the impact on individual employment probabilities for specific values of s and t. A policy maker might be interested in the average effect for all participants. We have defined the ATET(t) in equation (2), which is essentially an average impact of program participation weighted by the inflow in the program at duration s. Since all three identification strategies suggest a negative impact of program participation for any s, the corresponding ATET(t) will also be negative for all values of t. As illustration, we present the estimated ATET(t) for the dynamic matching estimates in Figure   D18 in Online Appendix D. For any unemployment duration t participation in the program reduces the probability to have found work.

Conclusion
Several methods are available when evaluating activation programs for unemployed job seekers. In this paper we compare estimates from three different methods. First, we apply a dynamic matching estimator, second we estimate the timing-of-events model, and third we exploit exogenous variation in program participation, caused by budgetary problems of the UI administration.
All three methods suggest a significantly negative effect of program participation on unemployment durations. The quasi-experimental estimates suggest reductions in the probability of being employed of up to 50%-points shortly after inflow in the program, the matching and timing-of-events estimates are smaller in magnitude (2.5 -15%-points). In the longer run, all three methods suggest an (imprecise) zero effect on being employment. The robust conclusion drawn from each approach is that the programs are not effective in increasing outflow, and even reduce employment rates in the short and medium long run. We find some unobserved heterogeneity in the timing-of-events model which might explain the difference in magnitude compared to the dynamic matching estimates. However, it does not change the pattern of treatment effects between these methods. Our results concur with the meta-analysis performed by Kluve (2010), who does not find a relation between the methodology and the likelihood of estimating positive or negative effects. Our results confirm that this also holds when evaluating the same program using the same data, rather than when comparing across studies.
The policy discontinuity was caused by budgetary problems at the UI administration and the refusal of the Ministry of Social Affairs to extend the budget. This resulted in the termination of using privately provided activation problems, which can be considered as a policy measure to deal with the consequences of the ongoing economic crisis early 2010. Our finding that participation in privately provided activation programs does not shorten UI benefits periods, suggests that it was quite a successful policy measure to cut government expenditures. Our results also concur with other studies showing the lack of effectiveness of activation programs offered by commercial providers (e.g. Behaghel et al. (2014), Cottier et al. (2015) and Krug and Stephan (2016)). We have focused on employment as the key outcome variable.
We cannot assess whether the program affects job quality (such as the salary or job stability), and thus we cannot exclude that perhaps it has a positive impact along these dimensions. However, using data on UI benefits recipients in the Netherlands, De Groot and Van der Klaauw (2019) and Van der Klaauw and Ziegler (2019) find that similar policy interventions have stronger effects on job finding than on wages and that effects on wages are mostly negative. 30 30 A back of the envelop calculation using our matching estimates suggests that program partic-