Although BMI is the most widely used measure of obesity, debate still exists on how accurately BMI defines obesity. In this study, adiposity status defined by BMI and dual-energy X-ray absorptiometry (DXA) was compared in a large population to evaluate the accuracy of BMI. A total of 1,691 adult volunteers from Newfoundland and Labrador participated in the study. BMI and body fat percentage (%BF) were measured for all subjects following a 12-h fasting period. Subjects were categorized as underweight (UW), normal weight (NW), overweight (OW), or obese (OB) based on BMI and %BF criteria. Differences between the two methods were compared within gender and by age-groups. According to BMI criteria, 1.2% of women were classified as UW, 44.2% as NW, 34.2% as OW, and 20.3% as OB. When women were classified according to %BF criteria, 2.2% were UW, 29.6% were NW, 30.9% were OW, and 37.1% were OB. The overall discrepancy between the two methods for women was substantial at 34.7% (14.6% for NW and 16.8% for OB, P < 0.001). In men, the overall discrepancy was 35.2% between BMI and DXA (17.6% for OW and 13.5% for OB, P < 0.001). Misclassification by BMI was dependent on age, gender, and adiposity status. In conclusion, BMI misclassified adiposity status in approximately one-third of women and men compared with DXA. Caution should be taken when BMI is used in clinical and scientific research as well as clinical practice.
The incidence of obesity has increased substantially over the past three decades and is now one of the most important public health concerns with worldwide incidence at over 300 million people (1). Chronic health problems associated with obesity are numerous and include type 2 diabetes, heart disease, hypertension, and certain types of cancer (2). As the incidence of obesity increases, the need for accurate measurements of adiposity is becoming increasingly important to allow for appropriate diagnosis and treatment. BMI has been the dominant index used to measure obesity owing to its simplicity and low cost; however, it has recently come under criticism, as it fails to account for a number of adiposity-related factors including age, gender, and ethnicity. Reference methods such as dual-energy X-ray absorptiometry (DXA), air-displacement plethysmography, and underwater weighing provide a more accurate indication of body fat percentage (%BF; ref. 3,4,5,6), which is one of the fundamental links between obesity and its associated disease risk.
The use of BMI for the classification of adiposity status and disease risk is based on epidemiological associations of BMI with morbidity and mortality (7,8). Despite this, numerous studies have produced evidence that BMI has limited ability to accurately predict body composition as evidenced by sizable differences between BMI estimated BF and densitometrically determined BF (9,10,11,12). Furthermore, the relationship between BMI and %BF has been shown to vary with age, sex, and ethnicity (13,14,15). It is therefore essential to identify how well BMI criteria match more accurate reference methods based on %BF and to what extent major factors such as age, gender, and adiposity distort the accuracy of BMI. In the current study, we investigated differences between BMI-determined adiposity status and DXA to evaluate the accuracy of BMI. At the present time, there is little systematic data available in Canada regarding the accuracy of BMI compared to a standard reference method such as DXA, at the population level. The objectives of our study were as follows: (i) to determine the accuracy of BMI classifications compared to %BF classifications measured by DXA; (ii) to determine whether discrepancies between BMI and DXA are gender and age specific; (iii) to identify whether an individual's current adiposity status (i.e., overweight (OW) or obese (OB)) can affect the size of error in his/her BMI.
Methods and Procedures
Subjects (n = 1,712) were recruited from an ongoing large-scale nutritional genetics study of human complex diseases called the CODING (Complex Diseases in the Newfoundland population: Environment and Genetics) study (16,17,18). As BMI and %BF criteria are specific to individuals ≥20 years old, we excluded all participants below this age limit (21 individuals) leaving us with a cohort of 1,691 subjects (1,321 females, 370 males). All volunteers were from the Canadian province of Newfoundland and Labrador. Each individual completed a screening questionnaire that included information regarding physical characteristics, dietary habits, and physical activity levels. Inclusion criteria in the present study were as follows: (i) between the ages of 20 and 79 years old; (ii) at least third-generation Newfoundlander; (iii) healthy, without any serious metabolic, cardiovascular, or endocrine disease. All subjects provided written and informed consent, and the Human Investigation Committee of the Faculty of Medicine, Memorial University of Newfoundland approved the study.
Measurements of BMI and %BF
All measurements were performed following a 12-h fasting period. Subjects were weighed to the nearest 0.1 kg in standardized light clothes and without shoes on a platform manual scale balance as previously described by us (Health O Meter, Bridgeview, IL) (16,17,18). Height was measured using a fixed stadiometer to the nearest 0.1 cm. BMI was calculated as a person's weight in kilograms divided by his/her height in meters squared. Waist and hip circumference were measured to the nearest 0.1 cm using a flexible metric measuring tape while the participant was in a standing position. Waist circumference was measured as the horizontal distance around the abdomen at the level of the umbilicus, and hip circumference was measured as the largest circumference between the waist and thighs. Waist-to-hip ratio was calculated as waist circumference divided by hip circumference.
Whole body composition measurements including fat mass, lean body mass, and bone mineral densities were measured using DXA Lunar Prodigy (GE Medical Systems, Madison, WI). DXA is a relatively new reference method used to determine body composition that produces an accurate measurement of all adipose tissue within the body with a low margin of error. For this reason, DXA is considered to be one of the most accurate measurements of adiposity and is commonly used as a standard compared to less accurate field methods such as BMI. Measurements were performed on subjects following the removal of all metal accessories, while lying in a supine position as previously described (16,17,18). %BF was determined as a ratio of fat mass over total body mass (including bone mineral densities) using the manufacturer's software (version 4.0). Quality assurance was performed on our DXA scanner daily and the typical CV was 1.3% during the study period.
All data are reported as mean ± s.d. Prior to performing any statistical analyses, subjects were classified according to adiposity status using both BMI and %BF criteria. Subjects were classified using BMI as underweight (UW; <18.5 kg/m2), normal weight (NW; 18.5–24.9 kg/m2), OW (25.0–29.9 kg/m2), or OB (>30.0 kg/m2) according to criteria from the World Health Organization (19). Subjects were grouped according to %BF based on criteria recommended by Bray that is both age and gender specific (Table 1; ref. 20). Differences in physical characteristics between men and women were assessed using Student's t-test. Differences in adiposity classification between BMI and DXA were analyzed on the following three levels:
Table 1. Percentage body fat (%BF) cutoff points for women and men
1. Discrepancy analyses between BMI and DXA within gender: Men and women were separated into adiposity classifications according to BMI and %BF criteria. The number of subjects grouped into each adiposity category by both methods was calculated as a percentage of the total number of participants. Differences in percentages between BMI- and %BF-defined adiposity status were analyzed within gender using χ2 analyses.
2. Discrepancy analysis by age-group: BMI-defined adiposity classifications were compared to %BF criteria among different age-groups to investigate the effect of age on BMI accuracy. Women were separated into four groups according to their age (20–29.9, 30–39.9, 40–49.9, and 50+) and analysis repeated as above. Due to the small number of men in our cohort, similar analysis could not be performed, as the number in each cell (four age-groups by four weight groups) was too small for effective comparison.
3. Ranges of %BF based on BMI cutoffs: In order to study the range of %BF found in each BMI category, subjects were grouped by BMI into adiposity groups and then %BF averages for each BMI group were calculated along with minimum and maximum values.
SPSS version 16.0 (SPSS, Chicago, IL) was used for all analyses. Statistical analyses were two-sided and a P value <0.05 was considered to be statistically significant.
Physical characteristics of the subjects
Physical characteristics for female and male participants are shown in Table 2. The subjects' ages ranged from 20 to 76.8 years old. Male subjects were 3.0 years younger than women on average. Men were also 15.5 kg heavier and 13.3 cm taller compared to women and had BMI measurements 1.2 units higher that reflect averages seen in similar studies (21). Although BMI values were higher in men, women had increased %BF and trunk fat percentage.
Table 2. Physical characteristics of female and male subjects (n = 1,691)
General discrepancy analyses by gender
Significant discrepancies between BMI and %BF criteria were identified in both women and men. Of the 1,321 women included in our study, BMI classified 44.2% as NW whereas DXA classified only 29.6% as NW (Figure 1). Among OB women, there was again a large discrepancy between the two methods. According to BMI criteria, 20.3% of women in our cohort were OB; however, according to %BF criteria, 37.1% of women were OB. As a result, BMI classified 14.6% more women as NW and 16.8% less women as OB compared to %BF criteria determined by DXA (P < 0.001). Classification of UW and OW women was similar between the two methods (UW: BMI 1.2%, DXA 2.2%; OW: BMI 34.2%, DXA 30.9%). A total discrepancy of 34.7% was found between the two methods in women.
Of the 370 men included in this study, BMI and %BF classifications were similar for NW individuals (BMI 28.9%, DXA 31.6%; Figure 2). Among OW and OB men, significant differences were evident in adiposity classification among the two methods. BMI categorized 45.7% of men as OW and 24.9% as OB whereas DXA classified only 28.1% of men as OW and 38.4% as OB. BMI classified 17.6% more men as OW and 13.5% less men as OB compared to %BF criteria based on DXA measurements (P < 0.001). A total discrepancy of 35.2% was discovered between the two methods in men.
Discrepancy analyses by age-group
After separation of the female cohort into groups based on age, similar discrepancies were evident between BMI- and %BF-defined adiposity status across all four age-groups (Table 3). There was a significant discrepancy between the two methods for NW and OB women across all age-groups (P < 0.001). The discrepancy between BMI and DXA-determined %BF ranged from 11.5 to 18.9% for NW women and 13.3 to 22.5% for OB women. Women in their 20s demonstrated the largest discrepancy between BMI and %BF for the NW group and women in their 30s had the largest discrepancy in the OB group. Women in their 40s demonstrated the smallest discrepancy between the two methods among the four age-groups. The discrepancies found in the female cohort for UW and OW BMI classifications compared to %BF were not significant for all age-groupings.
Table 3. Percent discrepancies between BMI and DXA weight classifications in women according to age (n = 1,321)
Error range in classification by BMI
Figure 3 shows the variation in %BF according to BMI categories for men and women. A large range of error indexed by %BF was found in each BMI category for both genders. A total of 251 OB women (determined by DXA) were misclassified as either NW (n = 42) or OW (n = 209) by BMI criteria. There was a wide range in %BF for BMI-defined NW and OW women (4.6–51.1% and 14.8–51.8%, respectively). OW women (DXA) were also misclassified as UW and NW according to BMI criteria. NW and UW women (DXA) were misclassified as OW and NW, respectively. This suggests that BMI misclassifies female subjects across all four adiposity classifications. The data among men were similar. A total of 73 OB men (determined by DXA) were misclassified as NW (n = 7) or OW (n = 66) according to BMI criteria. The range in %BF for BMI-defined NW and OW men was 5.6–31.2% and 10.8–41.3%, respectively. Although the misclassifications were bidirectional, BMI tended to under-classify the majority of subjects.
Our study, involving a large sample from the Newfoundland population, demonstrates the limited ability of BMI to accurately estimate adiposity. One of the major findings in the present study is that there is a large discrepancy between BMI- and DXA-defined adiposity status that is both gender and age specific. Over one-third of women and men were misclassified by BMI criteria compared to %BF criteria determined by DXA. A significant proportion of OB individuals were misclassified as either NW or OW by BMI criteria. This poses serious health consequences on a population level, as the opportunity to intervene and reduce health risk in these individuals is lost. Overall, BMI had the poorest ability to predict true adiposity in NW and OB women, and in OW and OB men. Furthermore, this misclassification was influenced by age, with younger women (under 40 years old) demonstrating the largest discrepancy between the two methods. We also found significant intersubject variability in %BF for any given BMI value.
The ability of BMI to define adiposity status has been repeatedly questioned. It has previously been shown that BMI is not accurate at predicting adiposity status in the normal to mildly OB range (22,23,24) as well as in severely OB individuals (25). In particular, BMI was not accurate at predicting obesity in individuals with a body mass <80 kg compared to %BF determined by DXA (23). Similarly, a significant number of people with a BMI below 30 kg/m2 were actually OB when classified by %BF determined by bioelectric impedance analysis (22). A more recent study, involving a large multiethnic sample from the US population, found BMI to have limited diagnostic performance, especially in those with a BMI <30 kg/m2 (24). Despite BMI-defined obesity having good specificity when compared to bioelectric impedance analysis–defined obesity, BMI had low sensitivity, missing nearly half of %BF-determined OB people (24). These findings suggest that BMI may not be accurate at assessing adiposity status in NW and OW individuals. Our study included all ranges of BMI and %BF (16.0–54.3 kg/m2 and 4.6–59.9%, respectively) and revealed a higher discrepancy for each of these adiposity categories.
We also observed gender differences in the discrepancy between the two methods. Although there was good agreement between BMI and DXA for OW women, BMI had limited ability to predict the correct adiposity classification for NW and OB women. In men, however, the greatest discrepancy was evident in the OW and OB groups. BMI has a better correlation with lean mass compared to %BF in men but not in women (24) and that may explain why there was a greater discrepancy between BMI- and %BF-defined adiposity status in OW men but not in OW women. Furthermore, males demonstrate a linear relationship between BMI and %BF whereas females demonstrate a curvilinear relationship (25) and that may explain why we observed a high discrepancy between BMI- and DXA-defined NW women but not men. Gender differences in body composition are a profound physiological phenomenon; however, standard World Health Organization BMI criteria do not accommodate for this. Our results suggest that this problem needs addressing. A readjustment of obesity criteria to include accommodations for gender differences will increase the accuracy of BMI to predict adiposity in both males and females.
We also analyzed our data after stratifying females according to age-groups. The largest discrepancy between BMI and DXA weight classifications was evident in women under the age of 40 whereas there was moderate agreement between the two methods in older women. These results are surprising, as previous studies have found that the diagnostic performance of BMI diminishes as age increases (24), likely due to an increase in the ratio of fat mass to fat-free mass that is evident with age (26). Further studies are warranted to address the potential mechanism surrounding this phenomenon. Obesity criteria based on %BF are age specific; however, BMI criteria are identical across all age-groups. From our results, it is apparent that BMI cannot accurately reflect age-related changes in adiposity.
Our analysis was originally performed using %BF criteria from earlier publications by Dr. Bray. Bray's original obesity criteria (defined as BF >25% in men and BF >33% in women) lacked any adjustment for age or ethnicity (27). Using these criteria, we found that ∼72% of OB females and 54% of OB males were misclassified as NW or OW according to BMI criteria. Our current results indicate that the new Bray BF classifications (Table 1; ref. 20) are a better fit to BMI criteria; however, a significant margin of error still remains between the two methods. It is evident that age-, gender-, and ethnicity-specific criteria are necessary for more accurate BMI calculations that reflect %BF.
The findings from our study highlight the importance of exercising caution when defining adiposity status using BMI criteria. Although previous studies have demonstrated similar trends, most have small sample sizes (9,28,29) or have used less accurate methods to estimate %BF such as bioelectric impedance analysis or skin fold thickness (22,24,30). To the best of our knowledge, this is the first study of its kind to demonstrate a discrepancy between BMI- and DXA-defined adiposity in a large cohort containing both men and women of all different age-groups. Nevertheless, our study is not without limitations. Other methods to measure adiposity, such as bioelectric impedance analysis, are cheaper and easier to use, despite their reported limitations. Although DXA is considered to be one of the more accurate measurements for %BF, it is not without its own limitations. Lean tissue determined by DXA contains water as its dominant component; therefore, differences in hydration can affect the calculation of BF and, hence, may have also contributed to the discrepancy between BMI and DXA adiposity measurements. All subjects fasted for 12 h prior to having a DXA scan performed to control for differences in hydration; therefore, this should not have any significant effect on our results. Our study was also limited in the number of male participants and ethnic groups. Future studies investigating the discrepancy between BMI- and DXA-defined adiposity are warranted in a larger male cohort and in other populations.
In summary, we compared BMI adiposity classifications to DXA-determined adiposity classifications based on %BF in 1,691 adult Newfoundlanders. BMI misclassified 34.7% of women and 35.2% of men into an incorrect adiposity category. The misclassification was gender specific. BMI misclassifications were also influenced by age, with the largest discrepancy observed in women under 40 years old. Our findings support previous research and demonstrate the necessity to revise current BMI criteria to include such confounding factors as age, gender, and ethnicity (25,31,32). Further research is needed to help alleviate these problems so that BMI can continue to be used in everyday health appraisals. Using the current BMI criteria can be dangerous, as it may misdiagnose OB individuals as NW and result in missed opportunities to intervene and reduce disease risk. For these reasons, we recommend that caution should be taken when BMI is used in clinical and scientific research as well as in clinical practice.
We thank all volunteers who participated in the present study. We also like to recognize the following members who contributed to data collection: Dax Rumsey, James Thorburn, Amber Snow, Aihua Ma, Sandra Cooke, Christiane Dawe, Lesley Johnson, Curtis French, Sammy Khalili, Jessica Bishop, and Hong-Wei Zhang. G.S. holds the position of chair of pediatric genetics, which is supported by Novartis Pharmaceuticals. This study is supported in part by the Canadian Foundation for Innovation (CFI), the Canadian Institute for Health Research (operating grant: OOP-77984 to G.S.), and the Newfoundland and Labrador Centre for Applied Health Research (NLCAHR).