Artificial intelligence for the practical assessment of nutritional status in emergencies

This paper describes a novel method for detecting child malnutrition based on artificial intelligence and facial photography. Estimates of severe and moderate acute malnutrition in children are critical for rapid emergency responses. However, the two traditional measurement methods, mid‐upper arm circumference (MUAC) and weight‐for‐height (WFH), are impractical in conflict and catastrophic disaster situations. They require well‐trained enumerators, cumbersome equipment, and close supervision. The Method for Extremely Rapid Observation of Nutritional Status (MERON) addresses the problem, using simple facial photographs. Facial features are extracted to predict Body Mass Index (BMI) in adults and Weight for Height Z Score (WFHZ) in children under five. MERON correctly predicts adult BMI classification with 78% accuracy. A variant of the model, trained on a sample of 3167 children in Kenya, successfully classified 60% of cases. On most measures, MERON was easier and more culturally acceptable to use than the traditional measurement methods. If MERON were to be trained and validated on a larger sample, with more extreme cases, it would provide a practical solution to a recurrent humanitarian problem.

This paper describes the technology and reports on the MERON field testing and performance from the perspectives of predictive accuracy, practicality, cost, and cultural acceptability.
A UNICEF/WHO/World Bank study in 2017 found that nearly 51 million children under the age of five were wasted and 16 million were severely wasted.Children suffering from wasting have weakened immunity are susceptible to long-term developmental delays and face an increased risk of death, particularly when wasting is severe.They require urgent feeding, treatment, and care to survive (UNICEF/WHO/World Bank Group, 2018).Interventions to tackle malnutrition require timely and accurate information on child malnutrition status.The common anthropometric measurements used in emergencies to help determine undernutrition are Middle Upper Arm Circumference (MUAC), Weight for Height/Length (WFH), Weight for Age (WFA), and Height/Length for Age (HFA) Z scores.However, the current measurement techniques are cumbersome and require skilled personnel.
Obtaining accurate weight and length measurements is challenging, especially when the child is wriggling and upset.Appendix A provides a summary of the current measurement protocols, based on UNICEF guidance, for measuring weight, height, length and MUAC.The logistical and practical challenges of anthropometric measurement are associated with high measurement error.According to a study on severe malnutrition in hospitalized children in rural Kenya (Berkley et al., 2005), weight for height/length measurement is often problematic and not undertaken in practice because of a shortage of properly calibrated and functioning scales.WFH measurement protocols require correct recording of two separate values and then looking up a third value on a chart, potentially introducing transcription errors.It is also difficult to measure height in children who are ill or stressed in crowded wards.MUAC measurements are also prone to multiple errors.Even the choice of tape measure influences the results, as Rana et al. (2021) show.In some contexts, climate and/or cultural preferences make it inappropriate or unacceptable to weigh children without clothing.According to UNICEF protocols, a standard maximum of clothing should be agreed and standardized across children.UNICEF advises that an average estimated weight for clothing should be subtracted from all the weights measured, while recognizing that this will introduce some degree of error into the results.If the child is wriggling and the needle on the hanging scale does not stabilize, UNICEF's protocols indicate that weight is estimated using the value situated at the midpoint of the range of oscillations (UNICEF, n.d.).
Experiments using trained medical staff or students and undertaken in clinical conditions find variation in MUAC and WFL Z score observations.Mwangome et al. (2012) assess the inter-observer variability and accuracy of MUAC and weight for length z-score (WFLZ) among infants less than 6 months old performed by community health workers (CHWs) in Kilifi District, Kenya.The study finds strong correlations between the MUAC observations of health workers and of trainers, but considerable variation in WFLZ measurements.Saeed et al. (2005) assess inter and intra-observer variability in MUAC measurements of under-five children by community health nurses (CHN) in the Northern Region in Ghana.
They show a high degree of consistency in observations if the same student records observations on the same child over a three-day period, but high variation in observations between students on the same child.
In field survey conditions, higher measurement errors can be anticipated.Perumal et al. (2020), demonstrate "substantial heterogeneity" in anthropometric data quality across a review of international Demographic and Health Surveys (DHS), using a range of quality indicators.The DHS are typically conducted with high levels of training and supervision.It is particularly challenging to recruit, train, and supervise skilled survey personnel in conflict situations.There is a clear need for a lightweight and easy-to-use alternative.
Our research assesses a two-part hypothesis: (1) that sufficient features can be detected from facial photography to allow a characterization of facial morphology; and (2) that facial morphology is influenced by underlying changes in the distribution of body fat.Both hypotheses are broadly supported by the current literature.
Major advances have been attained in the methods for detecting facial features from photos and characterizing morphology, primarily for the purpose of facial recognition.In their seminal study on facial recognition, Lawrence et al. (1997) propose a hybrid neural-network solution in developing a computational model for facial recognition.The combination of local image sampling and a convolutional neural network allowed for rapid and consistently better classification performance than other existing facial recognition models at the time.Facial recognition technology has improved drastically since then, with growing datasets and deep learning techniques greatly improving performance.Zhou et al. (2015) show how larger datasets can improve facial recognition, but they also highlight the major challenges of data bias, and low false positive criteria in realworld applications.
A second strand of literature investigates the association between certain facial characteristics and an individual's BMI score, although Windhager et al. (2013) note that the literature is not extensive.Coetzee et al. (2010) show a strong correlation between adult BMI and four facial features.Lee, Do, and Kim (2012) show that it is possible to predict BMI status using facial characteristics.Their model predicts normal and overweight females according to BMI using facial features and they identify more than 40 facial characteristics that significantly differ between normal and overweight individuals.In an expansion of this study, Lee, Jang, and Kim (2012) use facial characteristics extracted from subject image data to predict BMI for males and females.Wen and Guo (2013) predict BMI for subjects of different ages and ethnicity using facial features extracted the MORPH II dataset.A study comprising 11,347 adult Korean men and women ranging from 18 to 80 years old, concludes that features in the buccal area of the face were the best indicators of normality and visceral obesity in males and females (Lee & Kim, 2014).Windhager et al. (2013) assess the effect of body fat on numerous facial features on a sample of 22 adolescent girls and find that body fat explains 8% of the variation in facial characteristics (p = 0.047; 10,000 permutations).They also find, as would be expected, a strong correlation between body fat and BMI, independent of age (r = 0.864, p 0.001).There is a strong theoretical basis for expecting a correlation between facial morphology and body fat and hence BMI.Levine et al. (1998) suggest that an upper distribution of fat is more predictive of the metabolic complications associated with obesity than the actual degree of overweight.They find a significant positive correlation between the areas of buccal fat, or fat stored in the cheekbones, and visceral fat which is stored in the abdomen.Facial morphology, and specifically the curvature of the cheek, is therefore a potential indicator of body fat and adult BMI.Section 2 summarizes the MERON method and includes a description of the data capture methods and fieldwork as well as a review of the MERON application architecture.Section 3 presents the results of the first round of testing including both statistical performance metrics and details of cultural acceptability and practicability.The paper concludes in Section 4, with recommendations for additional field research to improve reliability.

| Data
The research followed a two-stage process.First, we established whether it was possible to extract sufficient features from facial photography to classify BMI bands and estimate BMI.The second stage of the testing focused on whether the method could be applied to children under difficult field conditions.The methodology and data collection tools were designed to address the following research questions: 1. How does the accuracy of MERON compare with traditional measurements in a field setting? 2. How cost-effective is MERON compared with traditional measurements in a field setting? 3. How feasible is it to collect data required for MERON in different field settings?
A successful proof of concept was achieved based on the analysis of thousands of facial images of adults in North America using the MORPH database.The database was developed by the University of North Carolina Wilmington for researchers studying adult age-progression.Ricanek and Tesafaye (2006) provide a comprehensive overview of MORPH.While this database was helpful in the initial testing of MERON, a different sample was required for understanding if MERON could be applicable to children under five.
The second model was trained using images and anthropometric data from under-five children in food insecure areas of Kenya.Data were collected between January and February 2018 in the diverse counties of Isiolo, Turkana, Marsabit, and Tana River, each with a high risk of underfive malnutrition.In collaboration with the Ministry of Health (MoH) and UNICEF Kenya, Kimetrica took photographs of children's faces and concurrently assessed anthropometrics as part of the Standardized Monitoring and Assessment of Relief and Transitions (SMART) surveys.SMART is an interagency initiative that was launched in 2002 by a network of humanitarian organizations aimed at improving nutrition surveys in emergencies.
Testing MERON in the context of a SMART survey allowed us to evaluate the cost, feasibility and cultural acceptability of using the method in realistic field conditions.Evidently, the data were not of quality that could be expected from a more tightly supervised clinical setting.Data collection began once ethical clearances had been obtained and the MERON methodology had been validated by Kenya's National Nutrition Technical Working Group.Ethical clearance was obtained from AMREF's government-appointed Ethics and Scientific Review Board.
A Kimetrica team participated in each SMART survey to ensure that photographic data was captured correctly and could be associated with the anthropometric records.The team also observed the reactions of children and their parents to both methods.Ministry of Health supervisors carried out back-checks on 10% of the total sample with anthropometric measurements for quality control purposes.They also collected qualitative data on household perceptions and preference of photos versus the traditional methods.
Strict measures were enforced to protect child and parent privacy and to prevent malicious use of the photographs.The system only saves a vector that corresponds to semantic abstraction of the original image after it goes through computational operations (i.e., convolution, nonlinearity function, etc.) and dimensionality reduction.This vector encoding cannot be used to reconstruct the pixel image of the raw photo.
Vectorisation occurred shortly after data capture, allowing the image file to be deleted from all devices and servers.Parents accompanied their children for all photo taking.SMART survey team leaders were trained on the MERON photo capture.The SMART survey data had to be synced daily, so the photographs were not embedded in the SMART survey form.Given the concern of syncing photos in low bandwidth environments, it was agreed to de-link the photos from the SMART survey forms, and to sync the photos to secure AWS servers on completion of each survey round, using encrypted file transfer.SMART survey enumerators and team leaders were trained on the simultaneous capture of SMART survey data and photographs to ensure the correct matching of the photograph and anthropometric data for each child.The SMART team leaders were trained by Kimetrica on the following photo capture requirements in Box 1.
Photos were captured concurrently by UNICEF/MoH enumerators through the KoBo photo capture form (https://kobo.kimetrica.com/enketo/x/#YYyu) and were then matched and linked using a Pandas script.
In total, 4075 images were captured.While all enumerators were comfortable using tablets and had been trained on the photo capture requirement, some data quality issues arose.In 908 images, faces were not detected for several reasons including: • Facial detection model was unable to detect a face in an image because of low contrast; • No face could be detected in an image; • Face was out of focus in an image; • Face was not aligned in an image causing partial obstruction of face.
For the cases of low contrast, we were able to recover most facial images using the method of Contrast Limited Adaptive Histogram Equalisation (CLAHE), which applies a transformation function to each pixel based on neighbouring pixels rather than the entire image.The transformation function is proportional to the cumulative distribution function of neighbouring pixels and adjusts the pixel intensity to enhance contrast.CLAHE limits the contrast amplification to reduce the amount of noise that is introduced.We were unable to address the other issues through data processing.However, the Kimetrica team subsequently designed an app to analyse image quality at the point of data capture, and upload only the images that are suitable for feature detection.

| MERON model architecture
MERON employs three distinct models to predict malnutrition.The first model uses VGGFace base architecture, which was pre-trained on 2 million face images, to extract facial features encoded in the vector form (n = 2048).The technology selection criteria included performance, computational cost, and ease/cost of deployment.The VGGFace model was selected as the best performing lightweight option available at the time of MERON development.The base architecture offers several models and the team selected ResNet50.The output of VGGFace is a vector that corresponds to semantic abstraction of the original image after it goes through computational operations (i.e., convolution, nonlinearity function, etc.).This vector does not map to one specific facial feature but acts as an abstract representation of multiple features.A subsequent neural BOX 1 Photo capture requirements • Distance from the subject: The enumerator taking the photo must stand at a distance of at least 25 cm from the subject.
• Frame of the photo: The frame of the photo should capture the entire face, neck and shoulders.
• Position of face: The subject should be facing forward and looking straight at the camera.The eyes should be open.
• Facial expression: The subject should not smile and the mouth should be closed.The expression should be neutral, where possible.
• Remove all obstruction: The head should not be covered (no hats) unless it is for religious or medical reasons, no headgear, no jewellery, nothing should be covering the face.Remove any hair covering the face or the eyes.The photo must not contain any other objects or people.
• Background: Ideally, the background should be solid, preferably a wall, of any shade or colour.If in a room, the room should be well lit.Use the photo cloth provided.
• Photo colour: The photo must not be in black and white, it must be a colour photo.
• Photo orientation: Landscape not portrait.
• Photo resolution: The resolution must be at a minimum 1.5 megapixels.
• Format: The format must be in JPEG format.
• Camera settings: The camera should be on the following settings: macro, no flash, auto-focus and grid lines.With the grid line setting on, the screen is divided into nine squares.Enumerators should ensure that the photo is taken in the centre/middle.network layer is trained on the feature vector combined with metadata to optimize overall prediction accuracy.It is open source and can be found at: https://github.com/kimetrica/MERON_model/blob/master/meron/model/single_model.py.
A second model reduces the dimensionality of the feature vectors (n = 512) using an auto encoder.The third model is a fully connected neural network, which applies a trainable neural network layer to predict malnutrition status through a classification model and WHZ through a regression model.Figure 1 provides an overview of the MERON workflow.
Figure 2 represents the detailed schematic of VGGFace feature extraction.The input is the pixel array of the photo and passes through several layers of convolution and max pooling operations.The last pooling layer is merged with metadata and serves as the input vector for the Neural Network layer as shown in Figure 1.
As the input image progresses through the convolution and max pooling layers, the feature maps generated become increasingly higher-level abstractions.At the last max pooling layer, the feature vector is combined with subject metadata before being processed by the Neural Network layer where nonlinearity is applied to the input.The Neural Networks are optimized separately for the classification and the regression outputs.
An API linked to the fully connected model has been built that allows the user to post an image, along with the required metadata.The API returns a WHZ-score and malnutrition category, and it can be found at: https://meron.kimetrica.com/.

| Model results
We quantify the classification model performance by precision, recall and F1 scores.Estimates are based on a testing dataset of 838 observations, from a total usable sample of 3167 observations (26%).
The precision is given by: The recall is given by: The F1 score is given by: True positive (tp) indicates that the model correctly predicts that the child belongs to an under-weight category from an image, tn (true negative) means that the model correctly predicts that the child is not under-weight, and false positives and negatives are the incorrect model predictions.The recall score measures the proportion of true positives that is correctly predicted.The precision score captures the proportion of all predicted positives that are true positives.The F1 score is a weighted average of the precision and recall.We take the true condition of malnutri-  1.
The results can also be shown through a confusion matrix (see Figure 3).  2 shows the precision and recall score for malnutrition classification using MUAC.
T A B L E 1 Precision and recall scores using MERON.
second major cost saving is in labour time as the traditional methods require two enumerators per observation, but MERON requires only one.
MERON saves costs primarily by reducing enumeration time.Based on Kimetrica's experience conducting anthropometric assessments, it takes 12.5 min per observation on average to set up equipment, prepare a child, and take weight, height and MUAC measurements.It took approximately 129 h to assess the 621 children.In contrast, the MERON process took approximately 5 min per child and 52 h in total, a total time saving of 78 h.For this sample size, the hourly cost is $133, so the time savings are substantial.While the costs are dependent on the particularities of the sample area, anthropometric protocols are relatively standard and take a similar enumeration time everywhere.It is likely, therefore, that the time savings will be similar in other geographical settings.

| Cultural acceptability
Most caregivers in all four counties stated that it is culturally acceptable to take a photograph of their child to assess malnutrition.Figure 5 provides the response data.
In three out of four counties (Isiolo, Turkana and Marsabit) caregivers preferred photographs to traditional methods of assessing malnutrition.
In Tana River only 17% of caregivers preferred photos.Data on the preferred method is summarized in Figure 6.
Caregivers stated they preferred photography over traditional methods because it is faster, involves no physical contact with enumerators, is less stressful for children, and it is less invasive and cumbersome.Respondents also mentioned that it is easier to take photo of disabled persons, the method is more hygienic and it is less likely to injure to the baby than the suspended weighing scales.The main reason cited for preferring the traditional method was familiarity.It was also noted that with anthropometric methods, caregivers can immediately see the weight and length of the child, and this helps to track the progress of their child in terms of height and weight gain.Some respondents, especially in Tana River, could not understand or believe how a photo could determine the malnutrition status of their child.

| CONCLUSIONS
The preliminary results presented here suggest that the toolkit for rapid assessment of nutritional epidemiology can be enhanced by including an orthogonal metric, based on facial morphology from photography.The method is inexpensive-approximately half the cost per observation of F I G U R E 5 Is facial photography culturally acceptable?(By county).
F I G U R E 6 Is facial photography the preferred method?(By county).
Much of the work has been focused on markers of attractiveness in adults, and research on child facial morphology is less common.However, Huang et al. (2015) create a predictive model of children's body weight using age and morphological facial features.The model uses a three-layer feed-forward artificial neural network model and an age-based, Centre for Disease Control inferred median body weight along with three facial feature distances derived from facial images.The model predicts a child's body weight with a correlation coefficient of 0.94 and a mean prediction error of 0.48.MERON combines a classification model and a regression model which individually detect a set of optimized facial features and patterns.These features are constructed from a predetermined set of facial landmarks.The patterns are self-derived from the model, based on training data.
The vertical axis represents the actual classification of the image while the horizontal axis represents model classification.If, for example, the model accurately classifies every single image, then all the off-diagonal terms are zero.The confusion matrix shows the number of observations in each category, from the total testing sample of 838 observations.The results show that the model predicts the normal category with the highest accuracy followed by MAM.SAM accuracy is low, reflecting the distribution of the data used to train the model (see below).Image augmentation was applied to the under-represented MAM and SAM pictures to up-sample those categories for model training.Even after optimizing the model hyperparameters, the lack of sample data on MAM and SAM children was a constraint to accuracy.During model optimization, there was a trade-off between accuracy in the normal class and accuracy of the under-represented malnutrition classes.As the standard MUAC measurements were taken during the SMART surveys, we can compare our model with MUAC as a predictor of WFL/WFH classifications.Table tion classification as indicated by the WFH or WFL Z-score.A WFH or WFL score greater than À2 is considered normal.If the WFH or WFL score is between À2 and À3 the individual is moderately acutely malnourished (MAM) while a score of less than À3 indicates that the individual is severely acutely malnourished (SAM).Our model results are compared to these classifications.The precision and recall scores are broken down for each malnutrition category in Table