Inequality of Opportunity in Brazil: A Corrigendum

Authors

Errata

This article corrects:

  1. INEQUALITY OF OPPORTUNITY IN BRAZIL Volume 53, Issue 4, 585–618, Article first published online: 13 November 2007

Correspondence to: Francisco Ferreira, Development Research Group, The World Bank, 1818 H Street, NW, Washington DC 20433, USA (fferreira@worldbank.org).

Abstract

This note acknowledges and corrects a programming error in our paper “Inequality of Opportunity in Brazil” (Review of Income and Wealth, 53(4), 585–618, 2007). Once the error is corrected, our bounds approach to the identification of individual model parameters in the presence of omitted variable biases is much less useful than indicated in the original paper. In the specific context of the measurement of inequality of opportunity, this implies that the decomposition of overall inequality of opportunity into direct and indirect effects is not reliable. However, the parametric approach introduced in our paper remains useful for obtaining a lower-bound estimate of overall ex-ante inequality of opportunity, as proposed by Ferreira and Gignoux (2011).

Our paper “Inequality of Opportunity in Brazil” (Review of Income and Wealth, 53(4), 585–618, 2007) contains a non-trivial error.1 In that paper, we proposed a measure of inequality of opportunity as the share of earnings (w) inequality explained by predetermined, morally irrelevant circumstances (C). The main results of the paper were obtained from the OLS estimation of a reduced-form model given by:

display math(equation 10 in the paper)

We denoted a counterfactual earnings distribution where all differences in circumstances were eliminated as math formula, with math formula.2 If the actual earnings distribution is given by Φ(w), we proposed to measure inequality of opportunity in that distribution by the ratio math formula, where I denotes some well-behaved inequality measure, such as the Theil index. This is an indirect approach: math formula captures the inequality that remains when all inequality of opportunity (i.e., between people with different circumstances) is eliminated. So math formula, or the ratio of that difference to the total, are measures of inequality of opportunity.

Because equation (10) was the reduced form of a model containing effort as well as circumstance variables, this measure of inequality of opportunity should reflect both the direct effects of circumstances on earnings, and the indirect effects operating through efforts (E). To distinguish between those two categories of effects, we also estimated:

display math(equation 5′)

We recognized that the existence of omitted circumstance variables would bias the OLS estimates of ψ, and that omitted circumstance and effort variables would bias the estimates of α and β. We argued that suitable instruments were not available and proposed instead to investigate the likely magnitude of potential biases, by estimating upper and lower bounds both for the true coefficients and for the measures of inequality of opportunity, which were the main object of interest.

Focusing on equation (10) in the original paper, if the error term ε is not orthogonal to C (but the two are jointly normally distributed), then the estimated vector of coefficients math formula is biased, and the bias can be written as:

display math

where ΣX denotes the theoretical variance–covariance matrix of a random vector X, σx denotes the standard deviation of a variable x, and ρxy denotes the theoretical correlation coefficient between two variables x and y or the vector of correlation coefficients between a vector x and a variable y. Because these theoretical population parameters are unknown, our proposed solution was to evaluate the approximate size of the bias by:

display math

To compute this sample-based approximation, we calculated: math formula, where math formula is the variance of the OLS residual of the regressions above and math formula. math formula denotes drawings from a uniform distribution defined on (−1, 1), with any values such that K ≥ 1 being rejected. Finally, we also imposed a set of additional constraints on the signs of coefficient estimates (empirically backed by the literature). Please see the original paper for a more detailed description of the method. Using this approach, we reported bounds around both the regression coefficients and the measures of inequality of opportunity which (we hoped) were sufficiently narrow as to be informative.

Unfortunately, our calculation of the range of possible values for the biases in both equations (10) and (5′) contained a mistake. When empirically estimating math formula, a programming misspelling we made in Stata led us to use the standard error of the linear prediction (command option “stdp”), instead the standard error of the residual (command option “stdr”). This programming error led us to underestimate the value of math formula by a factor ranging from 37 to 92 (depending on the cohort considered).

When the error is corrected and the biases are recomputed, the bounds around the OLS estimates of the regression coefficients become much wider. The small set of conditions we had previously imposed on coefficients now proves insufficient to obtain informative bounds. An alternative approach, which illustrates how the “confidence intervals” widen as we move away from OLS assumptions, is to draw the correlation coefficients math formula for all circumstance variables from uniform distributions defined sequentially on broader supports: (−0.05, 0.05), (−0.1, 0.1), (−0.15, 0.15), and (−0.2, 0.2). Note, however, that these supports are all much narrower than the widest possible range used earlier: (−1, 1). Results from this approach are presented in Table 1 for selected regression coefficients (those on mean parental schooling), and in Table 2 for math formula, our measure of counterfactual inequality when all inequality due to circumstances is eliminated.

Table 1. Coefficients of Mean Parental Years of Schooling by Cohort, Reduced-form Modela,b
Mean parental schooling (years)b1936_40b1941_45b1946_50b1951_55b1956_60b1961_65b1966_70
  1. aOur dependent variable is the log of hourly wage rate, and explanatory variables include race, parental schooling (mean and difference from mother's and father's), regional dummies and father's occupational status. bFor our selected variable, mean parental years of schooling , we present the following values: the minimum and maximum coefficient estimates from the 90th confidence intervals of simulations, using four possible value intervals for correlation coefficients of our X's and the residuals: (−0.05, +0.05), (−0.1, +0.1), (−0.15, +0.15), (−0.2, +0.2); the OLS estimates and significance levels is in between; *significant at 10%; **significant at 5%; ***significant at 1%.
Upper bound estimates       
−0.2 ≤ rho (Xi, u) ≤ 0.20.2650.1950.1980.1850.1630.1490.136
−0.15 ≤ rho (Xi, u) ≤ 0.150.2420.1860.1880.1700.1510.1400.126
−0.1 ≤ rho (Xi, u) ≤ 0.10.2180.1740.1740.1540.1370.1270.113
−0.05 ≤ rho (Xi, u) ≤ 0.050.1950.1570.1570.1380.1230.1140.102
OLS estimates0.162***0.137***0.143***0.119***0.103***0.103***0.088***
Lower bound estimates       
−0.05 ≤ rho (Xi, u) ≤ 0.050.1350.1090.1080.0930.0820.0770.067
−0.1 ≤ rho (Xi, u) ≤ 0.10.0970.0770.0750.0620.0540.0520.043
−0.15 ≤ rho (Xi, u) ≤ 0.150.0570.0430.0390.0280.0250.0250.018
−0.2 ≤ rho (Xi, u) ≤ 0.20.0120.005−0.001−0.009−0.009−0.006−0.011
Table 2. Earnings Inequality When Inequality of Opportunity is Eliminated, Urban Men in Brazil: Counterfactual Theil Coefficients
 b1936_40b1941_45b1946_50b1951_55b1956_60b1961_65b1966_70
Total Observed Inequality0.8730.9970.7590.6550.7060.5800.566
Counterfactual inequality when circumstances are equalized Upper bound estimates       
−0.2 ≤ rho (Xi, u) ≤ 0.20.7540.7780.7100.6020.6580.5920.592
−0.15 ≤ rho (Xi, u) ≤ 0.150.7170.7340.6720.5610.6190.4420.553
−0.1 ≤ rho (Xi, u) ≤ 0.10.6880.6980.6450.5370.5950.4210.526
−0.05 ≤ rho (Xi, u) ≤ 0.050.6670.6730.6270.5250.5750.4100.507
Mean estimates0.6540.6560.6190.5190.5620.4070.494
Lower bound estimates       
−0.05 ≤ rho (Xi, u) ≤ 0.050.6470.6410.6060.5180.5580.4030.489
−0.1 ≤ rho (Xi, u) ≤ 0.10.6380.6330.6020.5210.5580.4050.486
−0.15 ≤ rho (Xi, u) ≤ 0.150.6380.6290.6010.5260.5610.4120.485
−0.2 ≤ rho (Xi, u) ≤ 0.20.6450.6290.6200.5330.5670.4280.487

Two implications arise from this exercise. First, once our coding error is corrected, the bounds approach employed in our original paper no longer appears useful for identifying a narrow range of possible values for the biases plaguing OLS regression coefficients. There is no rationale for restricting the possible correlation between explanatory variables and a regression residual ex-ante to a narrow interval such as (−0.2, 0.2). The true value of ρ ∈ (−1,1) is, of course, unknown. When we allow for the full possible range of values for that correlation coefficient, our use of sample moments to calculate approximate bounds on the value of the bias of OLS coefficients turns out to yield intervals that are too large to be of any practical use.

Second, the effect of correcting the error on the bounds around the estimates of counterfactual inequality—and in particular on the lower-bound estimate—is much less pronounced. In fact, as shown in Table 2, the lower-bound on the Theil coefficient of inequality when differences in circumstances are eliminated is quite robust to changes in the assumed correlation coefficients between circumstance variables and the regression residual.

The Krishnakumar Correction

After Mr. Esteban Puentes kindly pointed out our programming error to us, but before we had finished this corrigendum, we became aware of a note proposing a “correction” of our 2007 paper (Krishnakumar, 2013). That note, which is being published alongside this corrigendum, makes a number of notational corrections, which we largely accept. We should indeed have made the assumption of joint normality of C and ε explicit (or used probability limits and referred to the asymptotic bias), and used clearer notation to distinguish between population and sample moments.

However, contrary to what the note suggests, notational imprecision was not responsible for the error in our paper. In particular, we never estimated or reported what Krishnakumar (2013) calls “the BFM bias” in her Table 1. From the outset, our estimates of the bias were what she calls “the corrected BFM bias” for which, as she notes: “… for a known ρxu, the theoretical bias, the small sample bias and the corrected BFM bias (with the 1/N factor) are all of the same order of magnitude.” Neither is it the case that our bounds approach would have yielded complex bounds, as suggested in her Tables 2–4. The author ignores a crucial step in our approach, which was to discard any drawings of math formula for which K ≥ 1.

The error was not due to any of the points 1–4 in the note. It is due to the unfortunate Stata coding error described above. Whatever the reason, however, Krishnakumar (2013) is right in her final claim that “… the confidence intervals presented in Bourguignon, Ferreira and Menéndez (2007) is not correct, and the results do not provide the correct range of bias of their OLS estimates.”

Implications for the Measurement of Inequality of Opportunity

This unfortunate error, for which we apologize to our readers, implies that our bounds approach to the identification of individual model parameters in the presence of omitted variable biases is much less useful than indicated in the original paper. In the specific context of the measurement of inequality of opportunity, this means that the decomposition of overall inequality of opportunity into direct and indirect effects (as in Panel 2 of Table 5) is not reliable. Neither are estimates of the contribution of individual circumstance variables to earnings inequality (Table 6).

The error does not imply, however, that this parametric approach to measuring overall ex-ante inequality of opportunity is useless. In a subsequent paper, heavily inspired by our 2007 paper, Ferreira and Gignoux (2011) have proposed using inequality in the predicted incomes from equation (10) as a direct measure of inequality of opportunity: math formula. Those authors refer to this level measure of inequality of opportunity as IOL, and to its ratio to total observed inequality as IOR. Ferreira and Gignoux (2011) acknowledge that sub-decompositions of these measures into direct or indirect effects, or into the effects of individual circumstances, would require strong assumptions about the orthogonality of residuals in (10). But they also show that IOL and IOR can safely be interpreted as lower-bound estimates of overall inequality of opportunity—i.e., inequality due to all predetermined circumstances, not only to those that are observed. A formal proof is provided. For a more recent attempt at disentangling direct and indirect effects of circumstances on final outcomes, subject to its own set of assumptions, see Björklund et al. (2011).

Footnotes

  1. 1

    We are very grateful to Esteban Puentes for first pointing the error out to us.

  2. 2

    A hat denotes an OLS estimate and an overbar denotes an arithmetic mean.

Ancillary