The reliability of weight‐for‐length/height Z scores in children

Abstract The World Health Organisation (WHO) recommends weight‐for‐length/height (WFL/H), represented as a Z score for diagnosing acute malnutrition among children aged 0 to 60 months. Under controlled conditions, weight, height and length measurements have high degree of reliability. However, the reliability when combined into a WFL/H Z score, in all settings is unclear. We conducted a systematic review of published studies assessing the reliability of WFL/Hz on PubMed and Google scholar. Studies were included if they presented reliability scores for the derived index of WFL/Hz, for children under 5 years. Meta‐analysis was conducted for a pooled estimate of reliability overall, and for children above and below 24 months old. Twenty six studies on reliability of anthropometry were identified but only three, all community‐based studies, reported reliability scores for WFL/Hz. The overall pooled intra‐class correlation coefficient (ICC) estimate for WFL/Hz among children aged 0 to 60 months was 0.81 (95% CI 0.64 to 0.99). Among children aged less than 24 months the pooled ICC estimate from two studies was 0.72 (95% CI 0.67 to 0.77) while the estimate reported for children above 24 months from one study was 0.97 (95% CI 0.97 to 0.99). Although WFL/Hz is recommended for diagnosis of acute under nutrition among children below 5 years, information on its reliability in all settings is sparse. In community settings, reliability of WFL/Hz is considerably lower than for absolute measures of weight and length/height, especially in younger children. The reliability of WFL/Hz needs further evaluation.


Introduction
Childhood undernutrition is responsible for approximately 3.1 million child deaths each year (Black et al. 2013) and between 11 and 41% of hospital admissions (Bejon et al. 2008). The global prevalence of severe acute undernutrition (SAM) has been estimated at 19 million children (Black et al. 2008;Kerac et al. 2011). For the assessment of acute undernutrition amongst children ages 0 to 60 months, the WHO recommends the use of weight-for-length/height (WFL/H) represented as a Z score or a percentile. In children aged 6 to 60 months unadjusted MUAC is also recommended (WHO & UNICEF 2009). Weight and length/height are measured separately before being converted to WFL/Hz by examining look-up tables or using computer software. It is possible that the process of converting weight and length/height into WFL/Hz results in decreased reliability for WFL/Hz compared to the individual measures because during the conversion, errors from both measures are incorporated. In anthropometric assessment, reliability is defined as the consistency of results when repeated examinations are performed by the same (intra-observer) or different (inter-observer) observers under the same conditions (Hennekens & Buring 1987). Reliability of WFLz is an important requirement as it directly impacts on the admission and discharge criterion for children in nutrition interventions.
Previous studies of reliability have been conducted within a carefully controlled hospital or research environment using highly qualified and carefully trained health workers, and have reported high reliability scores for absolute measures of weight, length/ height and MUAC (WHO 2006, Johnson et al. 2009). However the reliability of derived indices such as WFL/Hz and HFAz has generally not been reported. In resource poor countries, community health workers (CHWs) are used to deliver community health services including anthropometric assessment, basic nutrition counseling and education at household level (Ministry of Health 2006). We set out to review published literature on reliability of WFL/H performed by community health workers and where WFL/H Z score conversion was performed using computer software to eliminate look-up errors.

Objective
To evaluate the inter-observer variation of the WHO recommended anthropometric criteria for assessing acute undernutrition (weight-for-length/height) in children aged 0 to 60 months.

Search strategy
The search was conducted on PubMed using the following search terms: 'use OR reliability OR repeatability OR "inter-observer" OR "inter-observer" OR "inter observer") and (anthropom* OR "weight for height" OR "weight for length") and (infants OR children)'. From the search, relevant titles and abstracts were identified while those obviously irrelevant were excluded. The remaining articles were reviewed to identify those reporting reliability scores for derived indicators of wasting in children. Additional studies not picked up by the search strategy were identified from reviewing the list of references from the full texts.

Inclusion/exclusion criteria
Reliability studies were considered in this review if they reported a reliability estimate for WFL/H using either the technical error of measurement (TEM 1 ) and/or the intra-class correlation coefficient (ICC 2 ). Pooled ICC was calculated by meta-analysis using a random effect model assuming heterogeneity between sites measuring different populations (Shrout & Fleiss 1979).

Results
Of the 390 articles identified in the initial search, 23 reported the reliability of weight and or height/length 1 The smaller the TEM the more reliable the measure. 2 ICC is a number between 0 and 1 and the closest the estimate is to 1 the better the reliability.

Key messages
• For the assessment of acute undernutrition amongst children ages 0 to 60 months, the use of weight-forlength/height (WFL/H) represented as a Z score or a percentile is recommended. • There are more published articles reporting the inter-observer reliability estimates for absolute weight and height/length measures than reliability estimates for WFL/Hz. • All studies reported high reliability scores for absolute measures of weight and length/height but this did not translate to high scores for the combined index of WFL/Hz. • Future studies on reliability of anthropometric measures to present data on derived indicators of undernutrition to validate their usability across different settings. among children. Three additional studies were identified from reviewing the reference list of the identified articles (Warner 2000;Morris & Flores 2002;Jamaiyah et al. 2010). Of these 26 publications, 21 reported reliability scores for absolute anthropometric measures such as weight, height/length, midupper arm circumference and head circumference but not for their composite indices like WFA or WFL/H (Engstrom 1988;Bhushan & Paneth 1991;Voss et al. 1991;Rosenberg et al. 1992;Voss & Bailey 1994;Doull et al. 1995;Johnson et al. 1997;Johnson et al. 1998;Poustie et al. 2000;Warner 2000;Bradley et al. 2001;Morris & Flores 2002;Vegelin et al. 2003;Wang et al. 2003;WHO 2006;Frainer et al. 2007;Johnson et al. 2009;Jamaiyah et al. 2010;Ngirabega et al. 2010;Stomfai et al. 2011;West et al. 2011). One study reported reliability scores for WFA (Lima et al. 2010) and another reported the body-mass index (BMI) (Oza-Frank et al. 2012) but not for WFL/Hz. Three studies (Velzeboer et al. 1983;Ayele et al. 2012b;Mwangome et al. 2012) reported the reliability estimates for WFL/H thus meeting the inclusion criteria and were included in the review (Fig. 1). In the Velzeboer study, (Velzeboer et al. 1983), since the confidence intervals (CI) for the ICC estimates were not provided in the published article, they were recal-culated using the formulae provided on page 158 of her thesis (Velzeboer 1979). In all the studies, community members (farmers, community health workers or health promoters) with minimal knowledge of anthropometry were trained for not less than 2 days on how to measure weight, length/height and MUAC among children. No formal standardization tests were done instead a practical session was organized where the trainer performed an assessment of measuring techniques by observing and correcting skill and comparing their measures to trainees measures. Thereafter, each measurer was instructed to take repeated measures of children individually. In all the studies, the ICC for the composite measure of WFL/Hz is considerably lower than that of the single measures of weight and length/height (Table 1). In Ayele's study (Ayele et al. 2012b), age-specific reliability scores (above and below 24 months) were made available (Ayele et al. 2012a) and thus the WFL/Hz pooled ICC estimate for children below 24 months was estimated at 0.72 (95% CI 0.67 to 0.77) while for children below 5 years was 0.81 (95% CI 0.64 to 0.99) (Fig. 2). Only one study (Ayele et al. 2012b)presented reliability scores for WFL/Hz estimated by TEM (Table 1).

Discussion
There are no published data on reliability of WFL/Hz as evaluated in strictly controlled conditions. The studies identified in this review assessed the reliability of WFL/Hz within a community setting. We found more articles reporting the inter-observer reliability estimates for absolute measures than reliability estimates for indicators of undernutrition. The limited data may explain the wide confidence interval observed for the pooled ICC estimate for all the studies included in this review (Fig. 1). More data on the reliability of indicators of undernutrition among children needs to be generated to facilitate our understanding on their usability.
Previously, it was assumed that acceptable intra and inter-observer reliability estimates for weight and length/height could be translated to acceptable reliable scores for WFL/Hz measure. Data from this review is not supportive of this assumption. In all the

Reliability of WFL/Hz in children 477
three studies, high reliability scores were reported separately for absolute measures of weight and length/height but not for the combined index of WFL/ Hz. It is possible that WFL/Hz is sensitive to individual variations in the component measures of weight and length (Mwangome et al. 2012) and that these individual variations are compounded when weight and length are presented as a ratio resulting in a lower reliability score for WFL/H (Velzeboer et al. 1983). This hypothesis would need to be validated using data from more controlled conditions. The reliability scores for WFLz in younger children (0 to 24 months) appear to be lower than that of WFHz among older children (24 to 59 months); the confidence interval for pooled ICC on younger children; <24 months, and the ICC for older children; >24 months, did not overlap. This observation indicates increased levels of measurement error among the younger children likely because they cannot cooperate with measurements as well as the older children .Additionally, for infants aged below 6 months, the infantile position may hinder the reliable measuring of length. Studies evaluating the reliability of anthropometry among children within a community settings have indicated higher likelihood of variation in measuring length compared to weight (Velzeboer et al. 1983;Mwangome et al. 2012) and have pointed to the complexity in technique, equipment and the nature of the infant as possible sources of error (Velzeboer et al. 1983;Walker et al. 2013). It may be that ensuring reliable measures of length/height through introducing more intensive training of health workers, using paired observer to compare readings (WHO 2006) using electronics for direct data entry and using simplified look-up Z score charts (Kerac et al. 2009) will increase the reliability of WFL/Hz. Notwithstanding; WFL/H measure has additional characteristics that make it unattractive to minimally trained health workers in poorer settings; it's equipment are more costly to buy, install and maintain, it consumes a lot of time to measure and interpret as weight and length/height are measured separately and health workers are required to looked up a table to interpret (Myatt et al. 2006).Thus in addition to finding reliable measure of length/height in children, research should focus on identifying a simpler and possibly a more reliable assessment tool in place of WFL/Hz. More data is needed to affirm these observations.
Although high reliability of WFL/Hz does not necessarily ensure validity, low reliability means that there is an increased chance of underestimating or failing to detect undernutrition among children using this tool. This observation is of important concern to public health practitioners and policy makers.

Conclusion
Although WFL/Hz is recommended as the anthropometric criteria for diagnosis of acute undernutrition among children below 5 years, there are hardly any data to describe its reliability in either controlled or practical settings. Future studies on reliability of anthropometric measures present data on derived indicators of undernutrition such as WFL/Hz, WFAz and HFAz, in addition to that of absolute measures. This will inform, clarify and validate their usability across different settings. In the meantime, training and application on the use of more reliable anthropometric indicators of acute undernutrition in endemic settings should be encouraged as the poor reliability of WFL/Hz amongst infants under field conditions may limit its interpretation in this age group.

Contributions
The study was conceived, designed and executed by MM under the supervision of JB. Both authors were involved in data acquisition, analysis, interpretation and manuscript writing. They have read and approved the final manuscript.