Disclosure: The author certifies that there are no competing financial interests related to this work.
Using the oaxaca–blinder decomposition as an empirical tool to analyze racial disparities in obesity
Version of Record online: 15 APR 2014
Copyright © 2014 The Obesity Society
Volume 22, Issue 7, pages 1750–1755, July 2014
How to Cite
Sen, B. (2014), Using the oaxaca–blinder decomposition as an empirical tool to analyze racial disparities in obesity. Obesity, 22: 1750–1755. doi: 10.1002/oby.20755
- Issue online: 27 JUN 2014
- Version of Record online: 15 APR 2014
- Manuscript Accepted: 26 MAR 2014
- Manuscript Received: 6 NOV 2013
Racial disparities in obesity in the US are often assumed to reflect racial disparities in socio-economic status, diet and physical-activity. We present an econometric method that helps examine this by “decomposing” the racial gap in body-mass index (BMI) into how much can be explained by racial differences in “standard” predictors of BMI, and how much remains unexplained.
The Oaxaca–Blinder decomposition is widely used in other fields, but remains under-utilized in the obesity literature. We provide algebraic and graphical illustrations of the decomposition, and further illustrate it with an example using data for white and black respondents in Mississippi and Alabama. BMI is the outcome of interest. Predictor variables include income, education, age, marital status, children, mental health indicators, diet and exercise.
The mean predicted gap in BMI between white and black men is small, statistically insignificant, and can be attributed to racial differences in the predictor variables. The mean predicted gap for women is larger, statistically significant, and <10% of it can be explained by differences in predictor variables. Implications of the findings are discussed.
Wider application of this method is advocated in the obesity literature, to better understand racial disparities in obesity.
Substantial racial disparities in obesity prevail in the US [1, 2]. Obesity prevalence is also inversely associated with socioeconomic status (SES) [3-5], sometimes leading to the assumption that the racial obesity disparities reflect racial income and education disparities. For example, a Robert Wood Johnson Foundation report in 2010 explicitly states “rates of obesity are significantly higher for Blacks and Latinos, reflecting long-standing disparities in income, education and access to health care” . Racial differences in diet and physical activity (PA) are also assumed to contribute to the disparity . Therefore, there is an underlying assumption that obesity disparities is one of the health disparities that will substantially diminish if society can effectively diminishes disparities in income, education, and healthcare access, and/or modify the built environment in minority and disadvantaged neighborhoods to facilitate improved diet and PA behaviors [8-11]. However, there is a relative paucity of methods in the obesity literature that to help analyze the extent to which racial disparities in obesity would be reduced if certain SES, diet and PA disparities could be eliminated. Here we demonstrate an econometric method called Oaxaca–Blinder decomposition [referred to as the Peters–Belson method in some disciplines ] that can be particularly useful in this context. Specifically, we demonstrate how this method can be utilized to analyze how much racial disparity could still remain in the hypothetical situation where social and economic policies successfully improved several SES and diet and PA indicators for a disadvantaged minority race such that the mean levels of those indicators became at par with mean levels for Non-Hispanic whites.
Oaxaca (1973)  and Blinder (1973)  developed a regression-based decomposition to partition the gap in an outcome of interest between two groups into an “explained” and an “unexplained” portion. The “explained” portion of the gap is the difference in the outcome attributable to group differences in levels of a set of measured predictor variables between the “advantaged” and the “disadvantaged” groups. The “unexplained” portion arises from differentials in how the predictor variables are associated with the outcomes for the two groups. This portion would persist even if the disadvantaged group were to attain the same average levels of measured predictor variables as the advantaged group. The method has frequently been applied to analyze gender differences in wages and career advancement [13, 15-22]. It is gaining popularity in health disparities research , including areas such as analyzing racial disparities in birth outcomes . In fact, the World Bank has provided a detailed overview of the method for new researchers as part of its “learning resource series” . It has been applied to study regional disparities in obesity in Canada , but to date been applied only once, in an economics paper, to study racial disparities in obesity .
As an example to illustrate this method, we consider the disparity in body-mass-index (BMI) between non-Hispanic black (hereafter “black”) and non-Hispanic white (hereafter “white”) samples. We start with the assumption that BMI is a linear function of a set of measured characteristics, and can be estimated using multivariate linear regression models. The mean BMI for the two groups can be denoted as
represents a set of measured characteristics or predictors, the superscript corresponds to the white group, the superscript corresponds to the black group. is the mean value of the outcome variable. is a column vector of coefficients representing the associations between the predictors included in and BMI, and obtained from running separate regressions of the outcome for the two groups. The error term is assumed to be normally distributed with mean value of 0 for both races. can take different estimated values in the regression model for whites versus blacks.
BMI for blacks is assumed to exceed the BMI of whites on average, and this difference can be expressed as follows:
Based on the above equation, it can be deduced that racial difference in BMI may rise from differences in the mean values of the X variables, but also from differences in the values of . The Oaxaca–Blinder approach decomposes the overall difference into those two components – difference in mean values of X versus differences in values. Mathematically, this is done by creating a hypothetical term with the mean X values of the whites, but the of the blacks, and including it in eq.  as follows:
With simple algebraic manipulation, the above equation becomes the standard linear Oaxaca–Blinder decomposition, where the white–black gap in BMI is expressed as the following:
The first summation term on the right-hand-side of the equation is the “explained” portion—that is, portion of the aggregate group differences in BMI attributable to differences in the mean values of the -variables. It also represents the amount by which the racial difference in BMI would shrink in the hypothetical world where, other things equal, black study subjects now had the same mean levels of measured attributes as the white study subjects.
The next term shows the “unexplained” portion of the gap. This is due to the difference in the coefficient estimates, including the intercepts, β0. Essentially, this is the racial disparity in BMI that would remain even if black subjects had the mean levels of measured characteristics as the white subjects.
Why might this unexplained portion remain? In the traditional gender-gap in wages literature, where this method was first applied, this portion was often attributed to employer discrimination against female workers. However, the unexplained portion in the Oaxaca–Blinder decomposition may also exist for other reasons. The first is other factors that can affect the outcome variable, but that were omitted from the model. The second is some pattern of measurement error in variables that is systematically different for the two groups. Finally, there could potentially exist certain forms of societal discrimination such that the same attributes bring different returns in terms of BMI for white versus black study subjects. Specific examples will be discussed later in the paper.
When executing the actual decomposition with a study sample, the population β is replaced by a vector of estimated , which come with sampling variance and standard errors. Hence, the decomposition based on those will also have standard errors that are then used to calculate confidence intervals for the explained and unexplained components. Readers are referred to Jann  for details on calculation of sampling variances for the Oaxaca–Binder decomposition.
A simple graphical representation is presented in Figure 1, with BMI as the outcome variable and a single predictor variable , which is negatively associated with BMI, and mean level of is higher among whites than blacks. The intercepts and slopes are also different for blacks versus whites. is the hypothetical BMI for blacks if they were to have the same level of as the whites, but their own intercept and slope. Thus, denotes the overall racial gap in mean BMI, denotes the “explained” portion, denotes the unexplained portion.
We illustrate the method using data for whites and blacks from Alabama and Mississippi, drawn from the Behavioral Risk Factor Surveillance System (BRFSS). BRFSS is a state-based system of health surveys, representing the noninstitutionalized adult population from each of the 50 states. Respondents are interviewed via telephone, using a disproportionate stratified sampling (DSS) method. The information collected includes health risk behaviors, preventive health practices, and health care access. BRFSS data is routinely used to identify health problems and to develop, implement, and evaluate public health policies and programs . Poststratification weights are provided in the data. These weights adjust for differences in probability of selection and nonresponse, as well as nontelephone coverage, and are recommended for use in statistical estimation for obtaining representative, population-based estimates of risky health behavior or outcome prevalences.
Here, Mississippi is selected because it is the most obese state in the nation. Neighboring Alabama also ranks among the top five most obese states. The two states share geographical, climate and cultural similarities. This has the advantage of reducing concerns that the “unexplained” portion of the racial BMI gap is driven by altitude or temperature differences between states with high versus low black populations .
Our unweighted sample includes 11,901 Non-Hispanic white respondents (4,206 males, 7,695 females), and 4,840 Non-Hispanic black respondents (1,452 males, 3,388 females). Hispanics are excluded because of small sample-size (202 respondents). Because it is well-established that racial gaps in obesity vary by gender [1, 21, 31], the decomposition is conducted separately for males and females.
The outcome of interest is BMI. We use a standard set of predictors identified as obesity-risk factors in the literature, including indicators of SES (education, household income, health insurance, marital status, employment status), age, presence of minor children, mental health, fruit and vegetable consumption and whether they meet recommended levels of physical activity (PA). The statistical package Stata (v.12) with the “Oaxaca” add-on module is used , with the “svy” routine to adjust for survey weighting.
Tables 1 and 2 presents results for females and males, respectively. The difference in mean predicted BMI is significant for women (4.07, 95% CI: 3.63-4.52), but not for men (0.35, 95% C.I. −0.22-0.92). Henceforth, we primarily focus on discussing the results for females.
|Variablea||Mean (std dev)||Mean (std dev)||Coeff||Coeff||Contribution to “explained” gap||95% CI lower||95% CI upper|
|White; N = 7695||Black; N = 3388||White||Black|
|Age||50.105 (0.346)||44.397 (0.448)||0.007||0.026*||−0.039||(−0.141||0.062)|
|Education: some collegeb||0.289||0.300||−0.311||−0.666||−0.003||(−0.015||0.008)|
|Education: college graduate or higherb||0.291||0.196||−1.188***||−0.174||0.112||(0.050||0.174)|
|HH Income $25-$50K per yearc||0.225||0.192||−0.275||−1.450***||0.009||(−0.018||0.036)|
|HH Income $50K < per yearc||0.364||0.163||−1.708***||−2.127***||0.344||(0.177||0.511)|
|HH Income not reportedc||0.186||0.142||−2.217***||−1.807***||0.096||(0.035||0.157)|
|Days of poor mental health last month||5.820 (0.267)||6.089 (0.301)||0.016*||0.012||0.004||(−0.009||0.018)|
|Has health insurance||0.885||0.749||−0.961**||0.067||0.131||(0.015||0.246)|
|Children <18 years in household||0.751 (0.022)||1.128 (0.038)||−0.205||−0.176||−0.077||(−0.180||0.026)|
|Eats five or more servings of fruits and vegetables||0.201||0.177||−0.015||−0.348||0.000||(−0.013||0.014)|
|Meets recommended moderate PA levels||0.259||0.269||−0.168||−0.696*||−0.002||(−0.008||0.005)|
|Meets recommended vigorous PA levels||0.138||0.132||−0.019||1.239**||0.000||(−0.004||0.005)|
|Job is physically demanding||0.120||0.152||−0.110||0.493||−0.004||(−0.029||0.022)|
|Predicted body mass index||27.028 (6.217)||31.063 (7.740)|
|Total explained gap||0.343||(0.082||0.604)|
|Total unexplained gap||3.732||(3.195||4.268)|
|Total predicted gap||4.075||(3.628||4.522)|
|Variablea||Mean (std dev)||Mean (std dev)||Coeff||Coeff||Contribution to “explained” gap||95% CI lower||95% CI upper|
|White; N = 4206||Black; N = 1452||White||Black|
|Age||48.432 (0.458)||41.923 (0.797)||0.030*||−0.017||−0.112||(−0.268||0.044)|
|Education: some collegeb||0.265||0.282||1.204*||0.738*||−0.012||(−0.052||0.028)|
|Education: college graduate or higherb||0.339||0.129||1.005||−0.228||−0.048||(−0.186||0.090)|
|HH Income $25-$50K per yearc||0.219||0.211||0.356||1.464***||0.011||(−0.046||0.067)|
|HH Income $50K < per yearc||0.452||0.201||0.489||1.456***||0.365||(0.117||0.614)|
|HH Income not reportedc||0.155||0.163||−0.754||−0.308||0.003||(0.013||0.018)|
|Days of poor mental health last month||3.786 (0.252)||4.950 (0.461)||−0.012||0.026**||−0.031||(−0.069||0.008)|
|Has health insurance||0.865||0.697||−0.032||−0.787||−0.132||(−0.330||0.065)|
|Children <18 years in household||0.669 (0.032)||0.772 (0.056)||0.545**||−0.159||0.016||(0.034||0.066)|
|Eats 5 or more servings of fruits and vegetables daily||0.173||0.176||0.175||−0.091||0.000||(−0.004||0.004)|
|Meets recommended moderate PA levels||0.274||0.226||0.458||0.794**||0.038||(0.012||0.089)|
|Meets recommended vigorous PA levels||0.145||0.129||−0.073||−0.968**||−0.016||(−0.049||0.018)|
|Job is physically demanding||0.267||0.221||1.723**||−0.229||−0.011||(−0.045||0.023)|
|Predicted body mass index||28.278 (5.162)||28.730 (5.730)|
|Total explained gap||0.397||(0.096||0.698)|
|Total unexplained gap||−0.051||(−0.731||0.628)|
|Total predicted gap||0.346||(−0.227||0.919)|
The first columns present sample means by race. Noticeable racial differences exist in levels of the predictor variables. For example, black women are less likely to have a college degree than white women (19.6% vs. 29%), household incomes at or above $50,000 (16.3% vs. 36.4%), health insurance (74.9% vs. 88.5%) be married (34% vs. 66.4%), or report consuming five servings of fruits and vegetables (17.7% vs. 20%).
The next columns present from the individual regression models for black and white women, and average contribution (and 95% C.I.) of each predictor to the explained gap.
The combined differences in predictors only explain 0.34 units (95% CI: 0.08-0.60) of the 4.07-units gap in mean predicted BMI among women, which is <10%. The most significant contributors to the explained portion are the black-white differences in college education, higher family income, and access to health insurance. Notably, certain predictors variables like “marital status” actually detract—that is, literally subtract—from the “explained gap.” What this implies is that if black women were married in the same proportion as white women, this would actually increase the mean predicted gap in BMI between races rather than decreasing it.
Of the overall BMI gap, 91.6% (3.73 units, 95% CI: 3.19-4.26) is unexplained. In contrast, for men, the overall mean gap of 0.35 units is explained by the combined differences in the predictors (0.39, 95% CI: 0.10-0.70).There are the usual concerns about accuracy of secondary, self-reported data. Also, results may not be generalized to all of US. However, the purpose of the above analysis is largely to illustrate how to apply Oaxaca–Blinder decompositions to obesity disparity research.
We discussed earlier the three potential reasons for the “unexplained” portion of the racial disparity in BMI. These include omitted variables that may be important predictors of BMI—in this situation, some examples of such omitted variables may include other dietary behaviors such as fast food consumption (which is not included in BRFSS), time spent on sedentary activities, cultural perceptions of feminine beauty [32, 33], cultural attitudes and expectations in relation to food and physical activity , or the stigma and depreciation in quality of life associated with obesity [35-37]—which may differ across the races. The second reason is some type of systematic racial difference in what a variable actually measures—which can arguably be termed as measurement error—for example, the “has insurance” variable could mask differences in the quality of insurance that the average black respondent has versus what the average white respondent has, which in turn could translate into differences in access or quality of medical advice received even with health insurance. Or, for example, within a certain category of household income, white respondents may be more concentrated in the upper end of the category, whereas black respondents may be more concentrated in the lower end. Finally, there is the possibility of discrimination. In the gender wage-gap literature, this basically translated into employers' discrimination against women, such that women were given lower returns than men for the same qualifications. In this case, there may be more subtle forms of societal discrimination that contribute to the unexplained portion of the obesity disparity. For example, there is some evidence that black patients on average receive care from primary care physicians with lower qualifications and resources compared to white patients . This could potentially contribute to black women receiving less useful advice on healthy weight management than their white counterparts from their medical caregivers. A limitation of the Oaxaca–Blinder approach is that, while we can make several conjectures about what causes the unexplained portion of the gap, it offers no further insights into which of these conjectures might be the most plausible. At the same time, one can argue that, by revealing how much of obesity disparities can be explained by the “usual suspects” predictors included in large datasets like the BRFSS, and how much still remains unexplained, this decomposition can help motivate the inclusion of additional survey instruments into such large secondary datasets that are frequently used in obesity research.
The Oaxaca–Blinder approach has some other limitations. One is the widely cited problems is the “index number problem” (Oaxaca, 1973) where the choice of the reference group may affect the ratio of explained to unexplained portions of the gap. In this example, since the aim of policy-makers is typically to improve conditions for disadvantaged minority groups, it made logical sense to conduct the decomposition under the scenario where blacks had attained the same mean characteristics as whites. However, when there is not a clear motivation for selecting the reference group, then certain weighting or pooling mechanisms have been proposed . There is also the “indicator variable” problem , where results pertaining to categorical variables in the model may be sensitive to which category is selected as the omitted or base category. This does not affect the explained part of the gap, but can affect how much of the unexplained portion is due to differences in the intercept versus differences in the coefficient estimates. In situations where analyzing these differences is the primary focus, several solutions have been proposed, which are summarized by Jann .
We have provided the simplest mathematical exposition of the Oaxaca–Blinder method—which is the linear decomposition. The Oaxaca–Blinder decomposition has also been extended to binary outcome models, count data models and multilevel models. An excellent review is provided by Fortin et al. . Thus, this approach can be easily extended to models in obesity research where the outcome is binary—for example, obese versus not. The method has the potential to generate much useful information on obesity disparities which can provide useful guidance for policy-makers, hence we advocate for its greater utilization in obesity research.
- 4Department of Health and Human Services. The 2001 Report on Overweight and Obesity. 2001.
- 6F as in Fat: How Obesity Threatens America's Future: 2010. Issue Report. Trust for America's Health; 2010. http://healthyamericans.org/reports/obesity2010/Obesity2010Report.pdf. Accessed April 2, 2014., , , .
- 8Centers for Disease Control and Prevention. Higher education and income levels keys to better health, according to annual report on nation's health. 2012; http://www.cdc.gov/media/releases/2012/p0516_higher_education.html. Accessed 3/25/2014, 2014.
- 11Trust for America's Health & Robert Wood Johnson Foundation. F as in Fat: How obesity threatens America's future. 2013. Available at: http://www.fasinfat.org/. Accessed 3/25/2014.
- 12Comment on “using the Peters–Belson method in equal employment opportunity personnel evaluations” by Sinclair and Pan. Law Probability Risk 2009;8:119–122..
- 13Male–female wage differentials in urban labor markets. Int Econom Rev 1973;14:693..
- 21Racial disparities in obesity for males and females in three southern states in the US, across SES categories. Health 2012;04:1434–1441..
- 22Analyzing the gender gap in the salary of health administration faculty of health administration faculty. J Health Admin Educ 2012;29:303–317., , , .
- 25Explaining differences between groups: oaxaca decomposition. Analysing health equity using household survey data. Inst Learn Resourc Ser 2008;147–157., , , .
- 27The Relationship Between Perceptions of Neighborhood Characteristics and Obesity Among Children: Economic Aspects of Obesity. University of Chicago Press; Chicago. 2011. p 145–180., , .
- 28The Blinder-Oaxaca decomposition for linear regression models. Stata J 2008;8:453–479..
- 29Centers for Disease Control and Prevention. Behav Risk Factor Surveillance Syst 2008.
- 32Black women heavier and happier with their bodies than white women, poll finds. The Washington Post. 2012/03/21/, 2012; Lifestyle..
- 33Why black women are fat. The New York Times. 2012/05/05/, 2012; Opinion/Sunday Review..