Whether microcredit truly helps to improve the welfare of the poor is a fundamental question asked by both practitioners and scholars. In this section, we will examine this question while reviewing the existing published literature. Before discussing the empirical evidence in detail, we present key problems in the impact evaluation with the aim of helping the reader to gain insight into why most impact studies should be interpreted with great care.
A. Evaluating the Impact
Although anecdotal evidence of the welfare impact of microcredit is mounting in the literature, empirical evidence based on rigorous evaluation is scarce. As Karlan and Goldberg (2007, p. 1) point, the rigorous impact evaluation has to answer “How are the lives of the participants different relative to how they would have been had the program, product, service, or policy not been implemented?” This requires the comparison of two potential outcomes, such as income, business profits, or physical and human capital investment, of the same individual, i.e., one with the treatment and the other without it. Because we can never observe both statuses for a particular individual simultaneously, the existing reports often try to compare the states before and after microcredit programs to estimate their impact on individuals (before–after comparison). However, in most cases, this does not provide reliable estimates because other factors like macroeconomic shocks also affect the post-treatment outcome. In other words, this approach fails to isolate the impact of microfinance from the time trend that affects the result. This implies that it is almost impossible to measure the impact of a program on a given individual (Duflo, Glennerster, and Kremer 2006). However, it is possible to obtain the average impact of microfinance if the counterfactual outcome of the treatment group can be constructed from the pool of the remaining population who have a similar outcome to the treatment group in the absence of the treatment. This can serve as a comparison group. Therefore, a major challenge for impact evaluation is to create a good counterfactual through the use of appropriate techniques under a set of acceptable assumptions.
A line of studies have used non-clients to approximate the counterfactual of clients without treatment (with–without comparison). In this case, the average difference in outcome of interest between clients and non-clients is regarded as the impact of microcredit. This is a good strategy as long as the outcome is independent of participation in microcredit. However, under a situation where participation is voluntary, it may well be that clients who decided to participate significantly differ from non-clients in terms of their expected gains and other relevant characteristics explaining participation and outcomes. If there are systematic differences between them, the outcome of non-clients cannot represent those of clients without treatment, and, as a result, the evaluated impact is highly likely to be biased. Even if such self-selection does not matter, nonrandom program placement by MFIs can potentially yield a similar bias. For example, MFIs might implement a program in villages where impacts are expected to be high or where poverty is more pervasive. In the former case, the outcome of the selected villages might be higher than for nonselected villages even without microcredit programs, so that the impact tends to be biased upward. The opposite will hold in the latter case (Morduch 1999).
To illustrate, suppose we are interested in the average treatment effect (ATE) and the average treatment effect on the treated (ATT), two of the most widely known impact measurements. ATE measures what impact program participation would have on individuals/households drawn randomly from the population, whereas ATT measures what impact program participation has on individuals/households who actually participated. Symbolically, letting y1 be the outcome with treatment and y0 be the outcome without treatment, ATE can be expressed as
where E(·) is an expectation operator. Similarly, letting d denote the treatment indicator which takes the value 1 if treated and 0 otherwise, ATT can be written as
Provided that we can observe both y1 and y0 for any given individual, the average differences of E(y1−y0) as well as E(y1|d= 1) −E(y0|d= 1) should be attributable to the difference in the access to microcredit programs, because factors other than the treatment status are exactly the same. As such, if E(y1−y0) and E(y1|d= 1) −E(y0|d= 1) are positive, it can be safely claimed that microcredit has a positive effect on the outcome of interest and vice versa.
The common approach used by various research reports simply compares program participants with nonparticipants and treats E(y1|d= 1) −E(y0|d= 0) as the average impact. Yet,
The last equality holds if the treatment status is independent of outcome without treatment; that is, d?y0 and E(y0|d= 1) =E(y0|d= 0). If program participants are more entrepreneurial, or if the program is placed in a more promising area, it is likely that the expected gain of clients will be higher than that of non-clients even without treatment, such that E(y0|d= 1) > E(y0|d= 0), which overestimates the true ATT. By the same token,
if d? (y0, y1). Therefore, unless the condition of d? (y0, y1) is satisfied, the estimated impact will be exaggerated.
The experimental approach, known as randomized control trials or randomized field experiments, where program placement or eligibility to participate in programs is assigned randomly to the population, is one of the most powerful tools for solving selection biases because as long as the sample size is large enough, the law of large numbers will ensure the mean independence, i.e., E(y0|d= 1) =E(y0|d= 0), and, therefore, E(y1|d= 1) −E(y0|d= 0) gives a consistent estimate of ATT under the following assumptions: (i) that there are no spillover effects, i.e., the treatment effect is not spilled over to the untreated group, which is known as stable unit treatment value assumption (Wooldridge 2002); and (ii) that there is no repercussion on others through market mechanisms, called the general equilibrium effects (Heckman, Lochner, and Taber 1998).1
Although the experimental approach is being adopted by more and more studies on impact evaluation, it is not always feasible to do. In particular, strictly randomizing participation in the program is quite difficult because it needs to force people who are unwilling to participate to become part of the treatment group. Therefore, it would only be possible for policymakers to randomize eligibility. A problem of this approach is that eligibility is not equal to treatment, so that we cannot obtain ATT or ATE as long as some individuals opt out. In this case, we can estimate intention-to-treatment (ITT), in which the average impact of the availability of the program, rather than participation in the program, is evaluated. To best of our knowledge, there are two papers that use ITT based on randomized controlled trials to evaluate the impacts of microcredit, i.e., Banerjee et al. (2009) and Karlan and Zinman (2009a).
Although an ITT estimator is useful, it critically depends on the share of people who actually participate in the program, which again depends on the outside option in that society, which varies across location and time. Thus, a critical shortcoming of an ITT estimator is the difficulty generalizing the estimation results or derived implications to other societies (external validity) (Ito 2007).
Partly because of such a problem, there are still many studies that rely on a nonexperimental or quasi-experimental approach, which constructs the counterfactual from observational data. Several statistical methods based on these approaches have been developed. These include: (i) a matching method, (ii) an instrumental variables method, (iii) a regression discontinuity design, and (iv) a difference-in-difference approach. In the following subsection, we will briefly explain the basic concepts and assumptions underlying these methodologies.
B. Nonexperimental Evaluation Design
1. Matching method
The matching method attempts to find a comparison group that is identical or very similar, ideally in all aspects, to the treatment group except for the treatment status. Then, the average outcome of the selected comparison group is compared with that of the treatment group. Similarity between the two groups is generally evaluated by observable characteristics. In other words, this method relies on the assumption that conditioning on observable characteristics, participation in microcredit is independent from the outcome of interest, which can be expressed by (y0,y1) ?d | x for ATE and y0?d | x for ATT, where x is a set of observable characteristics. These assumptions imply conditional mean independence, i.e., E(yd|d= 1, x) =E(yd|d= 0, x) =E(yd|x). Therefore, for ATT,
Because the last equality holds by the assumption of E(y0|d= 1, x) =E(y0|d= 0, x), ATT can be estimated consistently, unless unobservable characteristics are important determinants of selection to participate. The same applies for ATE.
One of the shortcomings of this method is that if x is high-dimensional, it is quite difficult to find a good comparison group similar to the treatment group in alldimensions of x. Rosenbaum and Rubin (1983) show, however, that matching on a single index that captures the propensity to participate conditional on x gives consistent estimates of the treatment effect in the same way as matching on all the elements of x. Let (d =1 |x) =Pr(x) denote the probability of participation given observable covariates x. To obtain unbiased ATT, an important assumption is that conditional on the probability of participation, treatment is independent of outcome in the absence of treatment, y0?d |Pr(x). In addition, there should be a substantial overlap in covariates x between comparison and treatment groups, so that individuals with the same x have a positive probability of being both participants and nonparticipants, i.e., 0 < Pr(x) < 1. Then, if we can find matched pairs of the comparison and treatment groups over the region of common support, who have exactly the same propensity scores, the simple average difference between the matched comparison and treatment groups can be treated as the impact of microcredit. If propensity scores between these two groups are close, but not identical, we need some adjustment for weighting determined by the distance between propensity scores of the treatment and comparison groups.2
A modified matching method is the so-called “pipeline comparison,” where tenured participants are matched with incoming participants who have applied for but not yet received loans from MFIs. The incoming clients are seen as a good comparison group because they are also self-selected into the program presumably in the same way as the tenured clients. Thus, biases caused by self-selection are potentially controlled for and the resultant differences in the outcome of interest are attributable to microcredit. Although many studies have adopted this pipeline comparison, there are potential problems associated with this methodology. We will explain those problems and discuss several studies that overcome such problems in the next subsection C.
2. Instrumental variables method
The instrumental variables method is a statistical technique that controls for selection biases. It requires variables affecting participation to program but not outcome as identification. In a regression framework,
where yi is an outcome of interest of household i, e is an error term, α is a constant term equal to E(y0| d= 0), and β is a parameter equal to E(y1| d= 1) −E(y0| d= 0). As is well known, if d is correlated with e, the OLS estimator for β will be biased.
Now, suppose that there is an instrumental variable, z, that satisfies corr(zi, di) ≠ 0 and corr(zi, ei) = 0, and takes the form of
where corr(zi, ui) = 0. Combining the two equations gives
Because corr(zi, ei) = 0 and corr(zi, ui) = 0 by assumption, the above reduced-form estimation is unbiased. Then, dividing ββ2 by β2, we can estimate the main parameter β consistently. In a special case, where z is binary with a single index d and further d1i≥d0i (monotonicity) holds, β becomes
which is known as the Wald estimator. Because this estimates the change in y relative to d when such a change is induced only by the change in z from 0 to 1, the estimated impact does not reflect the average impact on the whole population. Rather, it identifies the impact on the subpopulation who takes the treatment only when offered. In this sense, the instrumental variables estimator thus calculated is often termed the local average treatment effect (LATE).
3. Regression discontinuity design
The regression discontinuity design is an idea that individuals around some critical cut-off point for program eligibility are similar. For example, suppose that MFIs target individuals with less than 1 ha of land and that those with just above 1 ha of land are ineligible to participate. Because this eligibility criterion is exogenously determined by MFIs, it would be reasonable to assume that both observable and unobservable characteristics of households are uncorrelated with eligibility. In contrast, the probability of participation as well as outcomes of just below and above the cut-off point would be quite different due to eligibility. Based on these assumptions, the regression discontinuity design compares outcomes of individuals just below the cut-off point for eligibility with those just above the cut-off point for eligibility.
Formally, let c denote some cut-off point on a certain variable C, which governs the program eligibility and d= 1 if C > c and 0 otherwise. If this rule is deterministic (sharp regression discontinuity), the impact estimator can be written by E(y1| c≤C < c+e) −E(y0| c−e≤C < c) for small e.
In contrast, if the eligibility condition C > c is enforced with error (fuzzy regression discontinuity), we must scale up the differences, by dividing the difference in the probability of treatment:
This is equivalent to the Wald estimate using a dummy for C > c as an instrument for treatment status. Therefore, this also assesses the mean impact on the selected subpopulation around the cut-off point rather than the mean impact on the population as a whole (Ravallion 2008).
4. Difference-in-difference method
The difference-in-difference (DID) method is basically a combination of before–after comparison and with–without comparison.3 It compares observed changes in outcome of participants with those of nonparticipants over time. As argued previously, although before–after comparison alone or with–without comparison alone generally fails to account for selection problems, the DID method can eliminate biases due to fixed unobservable characteristics. To express it more formally, DID in the regression form can be written as
where yit is an outcome of interest of household i in year t, β's are parameters to be estimated, T is the time period, which takes the value of 1 for the post-treatment and 0 for the pre-treatment period, and e is an unobserved error term. In this specification, the parameter β2 is the DID estimator. Estimation biases emerge if the error term is correlated with the treatment status, such that corr(eit, Di) ≠ 0. Yet, suppose the error term e comprises the time-invariant component ν and a mean zero time-varying component ε, i.e., eit=vi+εit, and suppose further that the time-varying component is independent of participation, i.e., εit?Di, for all households i and that εit is not serially correlated, i.e., corr(εit, εit+1) = 0. Taking the first difference, the above equation can be rewritten as
where Δ represents changes in corresponding variables over time. This equation clearly shows that the impact of time-invariant characteristics, vi, is effectively differenced out. Because Δεi is uncorrelated with Di by assumption, the DID estimator β2 is unbiased. Then, analogous to the above ATTs, it takes the form of
The last term of the right-hand side of this equation will cause potential biases if E(Δy0|D= 1) ≠E(Δy0|D= 0). Therefore, the validity of DID rests on “parallel time drift,” where the change in outcome in the comparison group between pre-treatment and post-treatment periods is identical with that in the treatment group in the absence of treatment.
The DID method so far assumes that the data on both pre-treatment and post-treatment statuses are available. A modification of this approach is that, instead of using differences in participants and nonparticipants between pre-treatment and post-treatment periods, a difference in eligible and non-eligible households in program villages is compared with the same difference in nonprogram villages. A modified assumption in this case is that a difference between eligible and non-eligible households in nonprogram villages is a good counterfactual of the same difference in program villages in the absence of treatment. If this assumption is satisfied, DID would give a consistent estimator.
As is shown, these methodologies estimate different impacts of microcredit, depending critically on specific assumptions. The validity of such different assumptions and resultant estimated impacts are not without controversy. In the next subsection C, we examine what the recent published empirical literature tells us about the impact of microcredit.
C. Empirical Evidence of Impact Evaluation
Of the estimation strategies explained earlier, the pipeline comparison has gained momentum among microfinance practitioners because of its simplicity and low financial burden. Indeed, this methodology requires neither panel data nor interviews with non-clients. Applications of this method include a series of Assessing the Impact of Microenterprise Services (AIMS) program publications, such as Barnes, Gaile, and Kibombo (2001) in Uganda and Dunn and Arbuckle (2001) in Peru, and others like Mosley (2001) in Bolivia, and UNCDF (2004) in Nigeria, Malawi, Haiti, and Kenya. These studies generally find positive and significant impacts of microcredit on enterprise profits and the welfare of tenured clients.
However, Karlan (2001) and Alexander-Tedeschi and Karlan (2010) argue that this approach is flawed. They identify possible biases brought about by dropout, timing of decision, and institutional dynamics. First, dropout biases emerge if the remaining tenured clients systematically differ from ex-clients who no longer receive microcredit at the time of survey. For example, successful clients, who can improve business profits sufficiently, will accumulate their own savings, no longer need microcredit, and eventually graduate from the program. In contrast, unlucky clients might fail to invest the money well, feel ineffective, and leave the program. In any case, the remaining clients may be different from ex-clients in their essential experiences, leading to biases in the estimation of the impact. Second, timing of decision problems occur if the incoming clients have some good reasons for not participating in the first place. For example, they may be less entrepreneurial or more risk averse than tenured clients. It is also possible that the incoming clients take advantage of being latecomers by observing the past experience of tenured clients in business management with credit. In both cases, the key assumption of pipeline comparisons becomes invalid because the incoming clients are self-selected into the program in a different manner from the tenured clients, and, therefore, the resultant difference of outcomes cannot be attributed solely to microcredit. Third, institutional dynamics biases emerge if MFIs expand their outreach strategically. For example, MFIs might first operate in promising areas that have good clients, then after achieving comfort with the local culture, economy and business practices move to poorer areas to serve the poor. If this is the case, the characteristics of tenured clients and incoming clients are highly likely to differ systematically.
Using data from Peru, Alexander-Tedeschi and Karlan (2010) show that the failure to take dropouts into account leads to a significant upward bias: Annual enterprise profit is positive and higher for tenured clients by 4,083 nuevos soles with the pipeline comparison, while it is surprisingly negative and lower by 588 nuevos soles with the inclusion of the dropouts in the sample. As for household income from all sources, the impact of microfinance remains positive, but drops from 6,569 nuevos soles in the pipeline comparison to 2,062 nuevos soles in their preferred approach. Similarly, Alexander-Tedeschi (2008) uses the two rounds of panel data in Peru to examine the potential biases caused by dropouts, timing of selection, and program placement, and find that the pipeline comparison significantly overestimates the benefit of microfinance due mainly to dropouts and timing of selection. Although the impact of microfinance on enterprise profit is still positive after controlling for such biases, these findings suggest that the results of pipeline comparison should be interpreted with caution.4
Coleman (1999) studies the impact of microfinance undertaken by village banks in Thailand using a similar approach to that of Alexander-Tedeschi (2008). He surveys not only tenured clients and non-clients in treated communities where banks were already in operation, but also incoming clients and non-clients in control communities where banks were not yet in operation, which allowed the author to apply the DID method (i.e., comparison of the difference between tenured clients and non-clients in treated communities against the same difference in control communities). One major advantage of Coleman's study over similar pipeline comparisons is that the order of program placement is random. Even though this method cannot perfectly control for biases caused by dropouts as pointed out by Montgomery (2005), it is considered to be more credible than purely comparing clients with non-clients or tenured clients with incoming clients (Karlan 2001; Karlan and Goldberg 2007).5 Like Alexander-Tedeschi (2008), the estimation result shows that the failure to account for the selection process significantly overestimates the impact of microfinance. Indeed, Coleman (1999) finds that the correct specification shows insignificant impacts on physical assets, production, sales, expenses, labor time, and expenditures on health care and education. Moreover, he finds that clients of microfinance are more likely to borrow from informal money lenders. He explains that this is because many clients joined the program due to social reasons, such as “being a part of group,” without having identified projects to invest in. Therefore, they tend to use loans for consumption purposes, and when they have to repay microcredit, they borrow money from informal moneylenders due to the lack of money at hand.
Coleman (2006) extends his previous analysis to include the dropouts in order to control for potential dropout biases suggested by Karlan (2001) and also to examine why microcredit has little impact. To explore the latter issue, he differentiates committee members from ordinary members of the village banks as the committee members constitute the relatively better-off segment in the communities. Similar to his previous study, Coleman (2006) finds that the impacts of microcredit on the ordinary members are largely statistically insignificant or sometimes even negative and significant, whereas those on committee members are mostly significantly positive in various outcomes, including income, savings, productive expenses, and labor time. This implies that the negligible average impacts of microcredit as found in the previous study are largely because the microcredit under study does not bring about benefits for the poor. Applying the method related to Coleman (2006), Kondo et al. (2008) also show that the benefits of microfinance are disproportionally captured by the wealthier households in the rural Philippines.
Using panel data collected in Indonesia in 2007 and 2008, Takahashi, Higashikata, and Tsukada (2010) also address a similar research question. Based on propensity score matching combined with the DID method, they ask the extent to which microcredit improves the welfare of clients and whether it helps the poor. The advantage of their method is that estimation biases arising from differences in observable characteristics can be controlled for by propensity score matching, while those arising from time-invariant unobservable characteristics can be controlled for using the DID method. Their results show that the impacts of microcredit on household income and profits of self-employed businesses are largely insignificant, whereas those on sales (revenues) of self-employed businesses are positive and significant, implying that microcredit contributes to enlarging business size, but not profits. However, once the sample is divided into poor and nonpoor households, the effect of the increased business sales is positive and significant only for the nonpoor. Besides, the poor can benefit from microcredit to increase their schooling investment in their children. Based on these findings, they conclude that, although microcredit can potentially contribute to the reduction of intergenerational poverty through schooling investment, it might not have immediate impacts on poverty alleviation.
A study by Pitt and Khandker (1998) in rural Bangladesh is among the most influential and frequently cited papers in the impact evaluation literature. They collected their sample from both program and nonprogram villages and include village fixed effects in estimating the impacts in order to mitigate program placement biases. Furthermore, they use exogenous eligibility criteria to identify the impacts. The NGO microfinance programs they study target households that own less than half an acre of land. This rule is exogenously determined by the NGOs, and, therefore, participation in microcredit can be treated as exogenous to some extent provided that households cannot frequently engage in land transactions, and this rule is strictly applied by the NGOs. Using this eligibility rule as an identification strategy, Pitt and Khandker present an application of regression discontinuity design, with an intricate econometric technique called weighted exogenous sampling maximum likelihood–limited information maximum likelihood–fixed effects (WESML–LIML–FE). Pitt and Khandker (1998) show, among other things, that every additional 100 taka to a woman increases household consumption by 18 taka, that the increased consumption is more apparent in food-shortage seasons, which is an indication that microcredit helps consumption smoothing, and that a 1% increase in credit from Grameen Bank to a woman increases girls' school enrollment by 1.86%. Furthermore, it is found that, when controlling for village fixed effects and other observable characteristics, microcredit in Bangladesh successfully reaches the poor and significantly contributes to poverty reduction.
Morduch (1998), however, argues that the assumptions underlying Pitt and Khandker's estimation are erroneous and that a revised estimation remarkably changes the results. According to Morduch, land markets in Bangladesh are rather active and many sample clients purchase/sell their land. In addition, the eligibility criterion of less than 0.5 acres of landholding is often violated. Then, applying the simple DID method, he finds little evidence of positive impacts on clients except for consumption smoothing.
McKernan (2002), based on the estimation method similar to that of Pitt and Khandker (1998), examines the sensitivity of estimates to eligibility criteria and finds that the results regarding the effects of participation in BRAC and BRDB are not sensitive, but those for Grameen Bank are somewhat sensitive. In addition, McKernan finds that because high-profit households are more likely to participate with Grameen Bank, the failure to control for the selection mechanism overestimates the impact on business profits by more than 200 percentage points.
Against such counterevidence, Khandker (2005) uses panel data and conducts a robustness check. The results show that positive impacts on consumption remain in the revised specification. Moreover, the results even show an increase in impact: every additional 100 taka leads to increased consumption by 20.5 taka.
The positive impact of microcredit on consumption smoothing, which is agreed upon by both Pitt and Khandker (1998) and Morduch (1998), is challenged by Menon (2006b). She draws her sample from eight Grameen thanas (Grameen is the only program that operates in these thanas) and estimates the impacts of consumption smoothing by nonlinear least squares. The results show that, although microcredit indeed helps to improve the recipients' ability to smooth seasonal shocks, its effect declines over time and it has virtually no impact after four years of participation. Extending the data to other MFIs, Menon (2006a) also reaches the same conclusion. In contrast, Chemin (2008), using the same data set as Pitt and Khandker, demonstrates that even the average impact on consumption smoothing, measured by variation of log expenditure, disappears if selection is controlled for by propensity score matching.
Roodman and Morduch (2009) revisit the evidence presented by Pitt and Khandker (1998), Morduch (1998), and Khandker (2005). As a first step, they return to the original survey to reconstruct the data for their study. Summary statistics on the mean and standard deviation show that their reconstructed data match well with Pitt and Khandker (1998) and Morduch (1998), although not with Khandker (2005). As a next step, they adopt the same estimation methodologies to replicate the results and also conduct some specification tests to explore the validity of assumptions underlying these papers. Surprisingly, their replication exercises and specification tests show that the impact of all three papers is weak with their reconstructed data and the results are not particularly reliable: the impact on consumption is even negative and significant in comparison with Pitt and Khandker (1998); the impact on consumption volatility is weaker than that which Morduch finds; the use of panel data, as used by Khandker, does not necessarily yield a more credible estimate because it cannot compensate for the lack of clearly exogenous variation in the treatment variable. Having obtained such results, they do not claim that microcredit has little impact; rather, they insist the essential difficulty inherent in exploring the causal relationship between the provision of microcredit and the improvement of household welfare with nonexperimental approaches.
To date, to the best of our knowledge, only two experimental studies exist in the field of microfinance evaluation. Banerjee et al. (2009) conduct a randomized controlled trial in Indian slums, while Karlan and Zinman (2009a) conduct a similar exercise in the Philippines. Broadly, neither study reveals a significant impact of microcredit on poverty reduction. Banerjee et al. (2009) show, among other things, that households with an existing business increase durable expenditures and their profits, but, importantly, the average impacts on expenditure per capita as well as expenditures on education and health are not statistically significant. Meanwhile, Karlan and Zinman (2009a) show that there is no statistical difference in household income, the probability of being under the poverty line, and the quality of food between treatment and control groups. Moreover, subjective well-being slightly decreases for the treatment households.
In sum, although microcredit has increasingly gained popularity as an effective tool for poverty reduction, there is no solid evidence that supports its positive impact. Moreover, the above-mentioned studies largely find that a naïve estimate, which fails to eliminate the selection biases, tends to overestimate the impact of microcredit. This, in turn, implies that omitted variables, which affect outcomes and participation in the microcredit, are positively correlated and that it is the better-off who can benefit more from the existence of microcredit.