Development and validation of body fat prediction models in American adults

Summary Introduction Commonly used statistical models to predict body fat percentage currently rely on skinfold measures, anthropometric measures, or some combination of the two but do not account for the wide ranges of age and body mass index (BMI) present in the American adult population. The objective of this study was to develop a statistical regression model to predict in vivo body fat percentage (dual energy X‐ray) in men and women across significant age and obesity ranges. Methods This study included 228 adults between the ages of 21 and 70, with BMI between 18.5 and 40.0 kg m−2. The study population was split into training (n = 163) and validation (n = 65) groups, which were used to develop and validate the prediction models. The models were developed on the training group using a backwards stepwise regression analysis, with the initial predictors including age, BMI, and several anthropometric and skinfold measurements. Results The final statistical regression models included age, BMI, anthropometric measures, and skinfold measures with significant effects following the stepwise process. The models predicted body fat percentage in the testing group with average errors of less than 0.10% body fat in males and females, while the four previously existing methods (Durnin, Hodgdon, Jackson, and Woolcott) significantly underestimated or overestimated body fat in both genders, with errors ranging between 2% and 10%. Conclusions The final models included hand thickness, and the female model was dependent on waist circumference and two of the skinfold measures, while the male model used hip and thigh circumferences, along with three skinfold measures. By including the skinfold measurements separately, instead of only as sums like previous models have done, these models can account for the different relative contributions of each site to total body fat.


| INTRODUCTION
When considering body fat percentage (BFP) as a measurement of obesity status, high body fat, especially when combined with low body mass index (BMI), is associated with increased all-cause mortality 1 and cardiovascular disease mortality. 2 While approximations of wholebody and abdominal obesity can be approximated with measures such as BMI and waist and hip circumferences, body composition information can provide additional insight into cardiovascular health risks. 2 While measures such as BMI, waist circumference, and waist to hip ratio can approximate BFP and related health risks, imaging methods such as dual energy X-ray absorptiometry (DXA) scanning have been used as direct in vivo measures of body composition, including BFP. 3 Other methods have used skinfold measurements to model BFP in the general population 4-6 and the elderly. 3,[7][8][9] These methods have used skinfold to directly predict BFP 3,7,8 or to predict body density, [4][5][6]9 which can then be used to approximate BFP using the twocompartment equation developed by Siri. 10 The two-compartment model developed by Siri 10 uses assumed densities for both fat mass and fat free mass, so that BFP can be determined from the calculated body density.
Methods relying only on anthropometric measurements including circumference measurements and height [11][12][13] have also been employed for the purpose of predicting BFP. These methods have the advantage of accounting for individual body shape, while not requiring the equipment and training necessary to perform skinfold measurements. Specifically, the relative fat mass (RFM) metric 13 approximates BFP using only height and waist circumferences, making it one of the simplest methods of determining obesity status in American adults. While these established methods have proven effective for military personnel 11,12 and provided improvement over BMI alone in the American adult population, 13 they do not account for the additional effect of age in predicting body composition. When looking at correlations between anthropometric measures such as waist circumference and BMI with body composition, previous research has found that both age and race also play significant roles in predicting BFP. [14][15][16] These findings highlight how BMI alone does not provide accurate insight into body composition. 15 Because of the wide age and obesity ranges presented in the American adult population, 17 the known effects of age and BMI on predicting body composition, 16 and the significant predictive abilities of skinfolds 4-6 and anthropometric measurements 18-20 on body composition, accurate statistical models should attempt to include all of these variables. The methods previously determined by Durnin and Womersly, 4 Hodgdon, 11,12 Jackson and Pollock, 5,6 and Woolcott 13 have been developed to predict BFP in Navy personnel 11,12 and the general adult population [4][5][6]13 with simple regression models based on easily collected measurements. Additionally, the methods utilizing skinfolds 4-6 employ the sums of skinfolds, instead of allowing for contributions from separate skinfold sites to predict BFP individually. By contrast, this study aims to develop the most accurate body fat prediction models in American adults, with the requirement of including a larger number of input variables.
Of the reference methods observed for the purpose of comparison, the log-skinfold method, developed by Durnin and Womersly,4 uses the base-10 logarithm of a sum of four skinfolds (triceps, biceps, subscapular, and hip) to determine body density, while the sumskinfold method developed by Jackson and Pollock 5,6 uses a sum of seven skinfolds (chest, midaxillary, triceps, subscapular, abdominal, hip, and thigh) in its linear and quadratic terms, along with age, to determine density. The RFM method created by Woolcott 13 avoids using skinfold measurements and only employs height and waist circumferences. The Navy method, developed by Hodgdon, 11,12 uses logarithmic terms including the abdomen and neck circumferences for men; the waist, hip, and neck circumferences for women; and height for both genders. While the log-skinfold, sum-skinfold, and RFM methods all use the same inputs for men and women, with differing coefficients, the Navy method is the only one that requires a different set of measurements for men and women.
The objective of this study was to develop multiple regression models to predict BFP in working men and women using all of these parameters, in order to develop a clinical tool that will provide the most accurate results, and improve over the existing prediction methods without the need for expensive imaging equipment.  Notes. Values are shown as mean (SD).

| MATERIALS AND METHODS
Each participant had his or her height and mass recorded in order to confirm eligibility based on BMI. Female participants of childbearing age were then required to complete a pregnancy test, with a negative result being required for eligibility. Next, nine skinfolds and thirteen anthropometric measurements were collected ( Table 2). All of the arm and leg measurements were collected for the right sides only.
Each measurement was collected three times, and the average of the three was used for analysis. A whole-body DXA scan (Hologic QDR 1000/W, Bedford, MA, USA) of each participant was then collected using the same methods used in prior studies, 10 with the participant lying supine. BFP was determined from the scan as total fat mass divided by total body mass.
Before starting the statistical analysis, the full data set of 228 participants was randomly split into two subgroups: the training set, which contained 163 participants, and the testing set, which contained the remaining 65, with each set containing similar age and BMI distributions ( Table 3). The purpose of splitting the full data set into the testing and training sets is that the predictive models can be independently developed and validated on separate data sets, so that the models' performance on the testing set would be representative of the models' performance on real-world data. Developing and validating the models on separate sets also ensure that any overfitting of the models does not occur, and demonstrate the ability of the models to predict body fat values on data that were not initially used to create the models.
All analyses were performed in JMP Pro 12 (SAS Institute, Cary, NC, USA). Specifically, a backwards stepwise regression analysis was performed on the whole-body DXA determined BFP in the training subset within each gender group. The initial regression model contained age, BMI, age 2 , and BMI 2 and all their interaction terms, all skinfold measurements, and circumferences of the neck, waist, hips, and limbs. In each step of the analysis, the predictor with the largest P value was removed, and the analysis was repeated. This process of removing the least significant predictor and repeating the analysis continued until the P values for all predictors were below 0.10.
Once the training set model was finalized, it was applied to the anthropometric measures in the testing set, so that the predicted and actual segment parameters could be compared in the testing set, and used as a method of validating the models. For comparison purposes, several previously validated body fat estimation methods were also applied to the testing data set. These methods included those determined by Durnin and Womersly, 4      Notes. Values are shown as mean (SD).

FIGURE 1
Root mean square error for the testing group for the newly developed prediction model and the Navy, log-skinfold, sum-skinfold, and RFM models   and Woolcott 13 ), or skinfolds and age (Jackson 5,6 ). Because the backward stepwise regression process initially included all of the predictors, this study did not suffer from the limitation of being restricted only to specific categories of inputs.
Compared with the established methods of predicting BFP, the new model requires more measurements than the anthropometry-only methods used by the Navy 11,12 and RFM 13 equations, but with the benefit of providing significantly more accurate predictions in both men and women. While the RFM model provides significant improvement 13 over BMI alone for predicting body composition, it results in estimations that are about two percentage points higher than the new models ( Figure 2). The RFM 13 study and this study both used DXA scans to determine the actual body composition in participants; however, the RFM study recruited a larger sample size of American adults, while this study was limited to working adults with full time jobs. Although physical activity information was not collected as part of this study, previous research has indicated that men and women with active full-time jobs and men with sedentary full time jobs tend to be more active than unemployed adults, 18 so these potential differences in lifestyles and activity levels between the two study populations may have contributed towards different modeling outcomes for predicting BFP.
A similar comparison issue arises when observing the differences between the Navy 11,12 models and the results of this study, because of the Navy models being developed for active duty US Navy personnel. The Navy studies observed participants that were both younger and with less excess body weight 11,12 than the participants in this study. Interestingly, this difference in population and models used leads to the Navy method underestimating body fat in males, while overestimating body fat in females, indicating that with the increasing age and BMI, circumference measurements may play different roles in predicting body composition, and skinfold measurements are necessary for providing further predictive ability.
When comparing the log-skinfold 4 and sum-skinfold 5 While the models developed in this study account for overall body shape (from the waist, hip, and thigh circumferences) and localized body fat distribution (from the subscapular, abdominal, and thigh skinfolds measurements), they also account for the additional changes in body composition associated with age and BMI. The inclusion of age is especially important for accurate composition calculations because of the decrease in lean body mass and bone density that occurs with increasing age. [27][28][29] Because the models include this variety of inputs, a trained clinician is likely required to collect the necessary measurements, whereas more simple methods like RFM 13 can be determined by an individual without any clinical training. The extra requirements necessary for using these new models mean that they will likely be most useful in a clinical or medical setting where the ultimate goal is to most accurately determine BFP.
There were a few limitations for this study, mostly dependent on the population studied. While the study sample included a wide representation of age, race, and obesity levels, factors such as physical fitness and overall activity levels were not accounted for during recruitment. Because only working adults with full time jobs were eligible to participate in this study, the final prediction models are likely not applicable to special populations such as the elderly or athletes.
The population studied was also limited to participants with a BMI of less than 40.0 kg m −2 because of inaccuracy of abdominal and thigh skinfold measures in morbidly obese individuals, so the results may not be applicable to working adults with extreme levels of obesity.
The findings of this study are clinically significant because they provide a method of accurately predicting BFP in American adults, without the need for any imaging or specialized body densitometry equipment. The only necessary equipment includes a skinfold caliper, tape measure, and anthropometer. With this equipment and proper training, it is much cheaper and easier to collect than imaging or densitometry methods. Some future directions may include developing methods only using a tape measure, in combination with age and BMI, so that individuals without any clinical training or equipment can collect the data required to accurately predict body composition.