Professor in Family Medicine, Lund University, Department of Clinical Sciences, Malmo, Community Medicine, Entrance 59, Malmö University Hospital, S-205 02 Malmo, Sweden. E-mail: email@example.com
Objective: To validate self-reported information on weight and height in an adult population and to find a useful algorithm to assess the prevalence of obesity based on self-reported information.
Research Methods and Procedures: This was a cross-sectional survey consisting of 1703 participants (860 men and 843 women, 30 to 75 years old) conducted in the community of Vara, Sweden, from 2001 to 2003. Self-reported weight, height, and corresponding BMI were compared with measured data. Obesity was defined as measured BMI ≥ 30 kg/m2. Information on education, self-rated health, smoking habits, and physical activity during leisure time was collected by a self-administered questionnaire.
Results: Mean differences between measured and self-reported weight were 1.6 kg (95% confidence interval, 1.4; 1.8) in men and 1.8 kg (1.6; 2.0) in women (measured higher), whereas corresponding differences in height were −0.3 cm (−0.5; −0.2) in men and −0.4 cm (−0.5; −0.2) in women (measured lower). Age and body size were important factors for misreporting height, weight, and BMI in both men and women. Obesity (measured) was found in 156 men (19%) and 184 women (25%) and with self-reported data in 114 men (14%) and 153 women (20%). For self-reported data, the sensitivity of obesity was 70% in men and 82% in women, and when adjusted for corrected self-reported data and age, it increased to 81% and 90%, whereas the specificity decreased from 99% in both sexes to 97% in men and 98% in women.
Discussion: The prevalence of obesity based on self-reported BMI can be estimated more accurately when using an algorithm adjusted for variables that are predictive for misreporting.
Obesity is a major public health issue, and it is a risk factor for cardiovascular diseases, type 2 diabetes, and some types of cancer (1, 2, 3, 4). Therefore, innovative public health strategies for prevention and medical treatment of obesity are of great concern (5, 6). BMI is the most commonly used measure to identify and grade overweight, and a BMI ≥30 kg/m2 is considered obese (7). The accuracy of using BMI as measurement of body composition is debated as BMI is considered to predict body fat inadequately (8, 9, 10). The limitation of using BMI is that standing body height and body weight are used, but standing height is influenced by the length of legs and weight reflects both muscles and fat (9). However, BMI is still a simple and cost-effective assessment to describe obesity and risk factors in a population (11).
Both measured and self-reported weight and height are used for monitoring the prevalence and trends of obesity in populations. Bias in self-reported information may depend on demographic, cultural, social, and health characteristics of a particular population at a particular time (12, 13, 14, 15). Knowledge of what factors influence the reporting error, and their respective strengths, is important for planning and interpreting epidemiological studies based on self-reported height and weight. An underestimation of weight and an overestimation of height will underestimate BMI and, consequently, the prevalence of obesity; thus, important arguments for prevention and intervention would be missed (15, 16, 17). The aim of this paper was to validate self-reported information on body weight and height in an adult population and to find a useful simple algorithm to assess the prevalence of obesity based on self-reported information.
Research Methods and Procedures
The Skaraborg County, located in the southwest of Sweden, has ∼270 000 inhabitants (in year 2000) and consists of several small municipalities. The vast majority of the residents in Skaraborg are whites, and just a few inhabitants are immigrants. The Skaraborg Project is a comprehensive study comprising cross-sectional health surveys of the population in Skaraborg conducted every 5th year since 1977 (18).
In 2001 to 2003, a survey included 1811 inhabitants (904 men and 907 women) between 30 and 75 years of age from the community of Vara in Skaraborg. Participants, who were randomly selected from the population in strata by sex and age, were invited to participate in a health survey with two visits. During the first visit, blood specimens were obtained, and participants were asked to answer a questionnaire that covered information about socioeconomic factors, smoking habits, leisure time physical activity (LTPA),1 self-rated health (SRH), and self-reported weight and height.
The second visit 2 weeks later included a physical examination with anthropometric measurements. Participants without information on self-reported height and weight (one or both) were excluded in this study (n = 102) as well as pregnant women (n = 6). Thus, 1703 individuals remained and are analyzed in this study, 860 men and 843 women. The regional ethical committee at the University in Göteborg approved the protocol of The Skaraborg Project, and consent was obtained from all participants.
Assessment of Variables
Age was used both as a continuous variable and as a categorical variable (30 to 39, 40 to 49, 50 to 59, 60 to 69, ≥70 years). Length of education in years was used as an indicator for socioeconomic status in this study. The length of education in years for the participants was categorized into one of three groups: short (≤9 years, primary school level), medium (10 to 13 years, completed upper secondary school), and a long education (≥14 years or more, trained at college or university). Participants who reported having short and medium length of education were categorized as having a low education, whereas those with long were considered having a high education (19). LTPA was described and categorized as follows: level 1, reading or sedentary activity; level 2, light ordinary physical activity, such as gardening, bicycling, or walking to work, during at least 4 h/wk; level 3, regular sports, such as running, swimming, or tennis, for at least 2 h/wk; and level 4, regular hard training or participation in competitive sports a couple of times per week. Subjects who participated in regular physical activities at least 2 h/wk were considered to be physically active during leisure time (levels 3 and 4), and those reporting less were considered to have a low physical activity (levels 1 and 2) (20). Participants who smoked daily were considered as smokers, whereas never smokers and previous smokers were categorized as non-smokers. The participants graded general health (SRH) by choosing one of five alternatives: excellent, good, fair, poor, or very poor (21). This variable was dichotomized as those who reported their health as excellent or good were considered to rate their health as good, and all others were considered to rate their health as poor. Self-reported weight was recorded in kilometers and self-reported height in centimeters (7).
Measured weight (to the nearest 0.1 kg) and measured standing height (to the nearest centimeter) were assessed in light indoor clothes and no shoes. All participants were seen by one of the two nurses who conducted all study visits and made all measurements. The same digital scale for weight (calibrated against 100 kg; Tanita BVB800; Tanita Corp., Arlington Heights, IL) and the same stadiometer for height (Hultafors; Hultafors, Sweden) were used at the health center. BMI was calculated by the formula: BMI = weight (kilograms)/height2 (meters squared) (7). BMI was categorized into subgroups according to World Health Organization categories (underweight, BMI < 18.5 kg/m2; normal weight, 18.5 < BMI <25.0 kg/m2; overweight, 25 ≤ BMI <30.0 kg/m2; and obesity, BMI ≥ 30.0 kg/m2) (7). Subjects were further dichotomized into those who were non-obese (BMI < 30.0 kg/m2) and obese (BMI ≥ 30.0 kg/m2).
The statistical analyses were performed using SPSS/PC 12.0 (SPSS, Inc., Chicago, IL) (22). Men and women were examined separately and all analyses were adjusted for differences in age. All tests were two sided, and statistical significance was assumed at p < 0.05. Standard methods were used for descriptive statistics. For proportions direct standardization was used with the whole Vara population from 2002 as the standard population (23). Differences in means of weight, height, and BMI across categories of education, LTPA, smoking, and SRH were analyzed with linear regression with age as a covariate.
To find a useful algorithm for correction of self-reported BMI (BMISR), different types of regression were used to get the best fit. Both polynomial and exponential methods were tried, along with the linear regression, but they were not better in describing the curve. The linear regression was the most appropriate for this study's calibration situation, and it is simple and easy to apply in a population.
The regression analysis with measured BMI (BMIM) as dependent variable was used in different models adjusting for smoking, education, LTPA, SRH, and age, both separately and combined (see the “direct method,” (24). To evaluate the algorithms, the prevalence of obesity was estimated from the corrected BMI (BMIC) and compared with that based on measured BMI. Additionally, sensitivity, specificity, and predictive values were used as measures of validity of BMIC. For further evaluation, a receiver operating characteristics curve was generated by plotting sensitivity against 1-specificity for BMIC and age against obesity defined from BMIM, and the area under curve was calculated with 95% confidence intervals (CIs). The proportions of variance in BMIM that were explained by the corrected BMI (R2) were 0.917 in men and 0.958 in women. Algorithms adjusted for differences in age were:
Algorithms adjusted for differences in age, education, LTPA, smoking, and SRH were:
In comparison, an algorithm suggested by Kuskowska-Wolk on subjects residing in Skaraborg in 1977 (in men, BMISR = 2.292 + 0.893 BMIM; and in women, BMISR = 1.835 + 0.893 BMIM) was used with the self-reported data from this study (see the “indirect method,” (24, 25, 26).
In this study, 860 men with a mean (standard deviation) age of 47.0 (11.9) years and 843 women with mean age of 47.0 (11.7) years were included. The mean differences between measured and self-reported weight were 1.6 kg (CI, 1.4; 1.8) in men and 1.8 kg (1.6; 2.0) in women (measured higher), whereas corresponding differences in height were −0.3 cm (−0.5; −0.2) in men and −0.4 cm (−0.5; −0.2) in women (measured lower). BMIM was, on average, 0.6 kg/m2 (0.5; 0.7) higher in men and 0.8 kg/m2 (0.7; 0.9) higher in women, compared with BMISR (Table 1).
Table 1. Measured and self-reported weight, height, and BMI in men and women
Men (n = 860)
Women (n = 843)
Weight (kg) [mean (SD)]
Height (cm) [mean (SD)]
BMI (kg/m2) [mean (SD)]
Weight (kg) [mean (SD)]
Height (cm) [mean (SD)]
BMI (kg/m2) [mean (SD)]
SD, standard deviation; CI, confidence interval.
95% CI of mean differences
Misclassifications of Self-reported Data
Older men were more likely to overestimate body height and underestimate body weight than young men, whereas in women a corresponding finding was significant only for height (Table 2). Men underestimated their body weight more the better they rated their health (p = 0.002). Smokers in general reported their weight more correctly than non-smokers. Measures related to body size were associated with misreporting of height and weight in both men and women. Decreasing height was associated with an overestimation of height (p < 0.001, both sexes), and increasing overweight was associated with increasing underestimation of body weight (p < 0.001, both sexes). Age and measured BMI were the major causes of bias in self-reported BMI (p < 0.001, both sexes).
Table 2. Measures of body composition (means), and differences between measured and self-reported weight and height by background characteristics
Men (n = 860)
Study population [n (%)]
Measured weight (kg)
Self-reported weight (kg)
Differences (WM − WSR)
Measured height (cm)
Self-reported height (cm)
Differences (HM − HSR)
Measured BMI kg/m2
Self-reported BMI kg/m2
Differences (BMIM − BMISR)
WM − WSR, weight measured minus weight self-reported; HM − HSR, height measured minus height self-reported; BMIM − BMISR, BMI measured minus BMI self-reported. Missing data (men/women): smoking (5.3), education (16.19), leisure time physical activity (12.9), and self-reported health (110.96).
Height category according to percentiles of the study population.
Weight category according to percentiles of the study population.
BMIM category according to World Health Organization. Proportions were age-standardized using the whole Vara population (2002) as an external standard (23).
Table 3 shows measured, self-reported, and corrected BMI in both sexes. The prevalence of obesity according to BMIM was ∼5% higher in both men and women compared with BMISR. With corrected BMISR, the prevalence of obesity was closer to BMIM compared with self-reported. Using the first crude model, the prevalence underestimated BMIM by 1.8% in both men and women. When age also was accounted for, these differences changed to ∼1.0% in both men and women. In a model including age, education, LTPA, smoking, and SRH, the predictions improved to only a difference of ∼0.7% in both men and women. In comparison, the calibrating model suggested by Kuskowska-Wolk overestimated the prevalence of obesity in men by 4.7% and underestimated the prevalence in women by 1.7% (Table 3).
Table 3. The prevalence of obesity by measured and self-reported BMI and by different algorithms to correct self-reported BMI
Measured [n (%)]
Self-reported [n (%)]
Calibrated crude [n (%)]
Calibrated adjusted for age [n (%)]
Calibrated adjusted for age, smoking, education, LTPA, and SRH [n (%)]
LTPA, leisure time physical activity; SRH, self-rated health. Proportions were age-standardized using the whole Vara population as an external standard (23). Missing data (men, women): smoking (5.3) education (16.19), LTPA (12.9), and self-rated health (110.96).
In self-reported data, the sensitivity for identifying subjects with obesity was 69.9% in men and 81.5% in women, and when using the algorithm adjusted for age, the sensitivity increased to 80.8% and 89.7%, respectively (Table 4). Specificity decreased from ∼99% in both sexes to 97% in men and to 98% in women. The positive predictive values for obesity based on self-reported data were 95.6% in men and 98.0% in women and in the algorithm adjusted for age 86.9% and 93.8%, respectively. BMI corrected for age, education, LTPA, smoking, and SRH predicted obesity with a sensitivity of 79.4% in men and 91.0% in women and with a positive predictive value of 89.3% in men and 94.7% in women. The area under the receiver operating characteristics curve in men was 0.98 (95% CI, 0.98; 0.99) and in women 0.99 (0.98; 1.00), suggesting high sensitivity and specificity of the algorithms adjusted for BMISR and age (Figure 1).
Table 4. Sensitivity, specificity, and predictive values (positive and negative) of obesity (measured) by different algorithms to correct BMI based on self-reported data
When applying the algorithm by Kuskowska-Wolk, the sensitivity was higher, but the specificity and positive predictive values were lower in men compared with this study's algorithm. However, there were no differences in women (Table 4).
In the present study, self-reported information on weight and height underestimated the prevalence of obesity considerably compared with measured data. Age and measured BMI were important factors for misreporting height, weight, and BMI in both men and women. Discrepancies in estimated proportions of obesity were significantly smaller when an algorithm including BMI based on self-reported information and age was used to correct BMI. Further adjustment for other variables did not improve the prediction of obesity.
The participation rate in this study was as high as 82%, but the implications of the findings should still be considered because individuals with healthy lifestyles and non-manual work are more likely to attend health surveys (27, 28). The low number of participants excluded due to missing information on height and weight or lacking information on socioeconomic characteristics and lifestyles lend further support to the high quality in these data. Two nurses using the same devices and routines conducted the objective measurements of height and weight on all participants only 2 weeks after the collections of self-reported information; therefore, the validity should be considered as high. The variation in measured data could be considered as low.
In this study, both men and women underestimated their body weight, whereas an overestimation of body height was of marginal importance. This trend is in some contrast to previous reports showing that women in general are more likely to underestimate their body weight to a larger extent compared with men, and men are more likely to exaggerate their height than women (14, 15, 16, 29, 30, 31). Overestimation of body height and underestimation of body weight tended to increase by increasing age in both sexes in this study, which is supported by other studies that have shown the same pattern (14, 15, 16, 30, 31). The bias in self-reporting weight, height, and BMI was substantially related to a higher measured BMI in both men and women, also in accordance with previous studies (14, 15, 16, 31, 32). This might be expected because the ideal in Western cultures is to be leaner, and the social pressure may lead to the negative attitudes toward big body size (33, 34). In both sexes, self-reported body weight was more accurate in smokers than among non-smokers. This is consistent with smokers being more aware of their body weight than non-smokers because smoking has been described as being used to control weight, and smoking cessation is associated with weight gain (35, 36). There was a significant trend to underestimate the body weight the better men rated their health. This behavior may be explained by a higher likelihood to report advantageous personal information by men who are interested in their health (37, 38). Thus, bias in self-reporting of body height and body weight was associated with body composition, socioeconomic status, self-perceived health, and lifestyles with a somewhat different pattern between the sexes. In further studies, qualitative methods are warranted to understand these mechanisms better.
The calibration method, using a linear regression equation to predict measured from BMISR, has been debated because it is difficult to use and does not correct completely for the systematic bias in self-reported weight and height in population studies (29, 39, 40). The low validity of obesity based on self-reported weight and height have implications for population studies, and it leaves an uncertainty of the actual prevalence of obesity and information of the upward trend of obesity in a population (39). In this study, the calibrated data were more dependable because the sensitivity and specificity were higher compared with self-reported data; the algorithm becomes an important tool for further epidemiological studies of obesity.
The method has been used differently in various studies (25, 26, 29). In this study, the BMIM was used as the dependent variable to form an equation to predict correct BMI from self-reported data. However, Kuskowska-Wolk et al. used BMISR as the dependent variable, and they evaluated their equation to predict the correct BMI from self-reported data. An advantage of our “direct method” is that it can easily be applied in studies designed for evaluation of endpoint (diseases) using Cox regression or logistic regression (41).
In this study, the correction of BMISR improved the estimated prevalence of obesity in the population because individuals were more correctly categorized as being obese or not. Using the algorithm proposed by Kuskowska-Wolk, obesity was overestimated in men and underestimated in women with self-reported data from this study. This should be taken into account when using the Kuskowska-Wolk method in trend analyses. Further comparison on subjects residing in Skaraborg in 1977 shows that the prevalence of obesity was lower compared with 2002. The high and increasing prevalence of obesity in Vara should be taken into account when comparing the algorithms on data from a different population 25 years earlier. This observation may also be consistent with a change in the pattern of misreporting over time. The sensitivity was higher in both sexes using calibrated data adjusted for differences in age compared with self-reported information.
In conclusion, the prevalence of obesity in a population based on self-reported information can be estimated more accurately when variables that are predictive for misreporting are adjusted for. The corrections may, however, become less accurate, and the determinants of misreporting may change over time. Thus, it is important to validate calibration models at different points of time because the trend of escalating obesity is expected to continue. These findings should be considered when mailed health questionnaires are used in the epidemiology of obesity.
This work was supported by grants from the Swedish Research Council and the Skaraborg Institute and by the Skaraborg Primary Care (Skövde, Sweden), The Health and Medical Care Committee of the Regional Executive Board of the region Västra Götaland, the Malmo University Hospital, region Skane, and the Faculty of Medicine, Lund University, Sweden. We thank the participants from Vara and the staff of the Skaraborgs Project, Ann-Charlotte Aghamn, Susanne Andersson, and Marianne Persson.
The costs of publication of this article were defrayed, in part, by the payment of page charges. This article must, therefore, be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.