Validation of maternal report of nutrition‐related interventions and counselling during antenatal care in southern Nepal

Abstract The delivery of nutrition‐related interventions and counselling during antenatal care is critical for a healthy pregnancy for both mother and child. However, the accuracy of maternal reports of many of these services during household surveys has not yet been examined. Our objectives were to assess the validity of the maternal reports of 10 antenatal nutrition interventions, including counselling, and examine associates between maternal characteristics and accuracy. Maternal report of services received collected during a post‐partum survey was compared to the gold standard, the direct observation of all women's antenatal care visits. Individual‐level validity was assessed by calculating indicator sensitivity, specificity and area under the operating curve (AUC). The inflation factor (IF) measured population‐level bias. For five indicators, the high true coverage limited our ability to assess the validity of the maternal reports. There were no indicators that had both high individual‐level validity (AUC > 0.70) and low population bias (0.75 < IF < 1.25). Indicators with greater true coverage estimates had higher sensitivity and lower specificity estimates compared to those indicators with lower true coverage. There were no maternal characteristics associated with the accuracy of the report. Maternal report of antenatal nutrition‐related interventions and counselling during household surveys was found to have variable validity across indicators. Additional research in settings with varying coverage levels should be considered to best inform antenatal care coverage measurement in household surveys.


| INTRODUCTION
Adequate nutrition during pregnancy is vital for the health of both mothers and infants. Poor quality of diet, including lack of diversity of foods consumed and low caloric intake, are two routes of experiencing malnutrition during pregnancy, although the two are not mutually exclusive. Multiple nutrient deficiencies, such as iron and calcium, have been associated with poor maternal and infant outcomes, including pregnancy-induced hypertension, preterm birth, low birth weight and death Christian et al., 2008). Low maternal body mass index (BMI), defined as <18.5 kg/m 2 and inadequate maternal weight gain have also been linked to poor birth outcomes Christian et al., 2008;Han, Lutsiv, et al., 2011). Furthermore, it has been estimated that maternal undernutrition contributes to 800,000 neonatal deaths each year .
Micronutrient supplementation, deworming and counselling on diet and healthy behaviours during pregnancy are commonly delivered through antenatal care (ANC) to improve health and nutrition outcomes in both mothers and infants WHO, 2016). Calcium supplementation has been shown to reduce preterm birth in women with low calcium intake and incidence of preeclampsia and eclampsia, which are a leading cause of maternal mortality (Hofmeyr et al., 2018;Say et al., 2014). Mass deworming during pregnancy reduces maternal anaemia by 23%, although it had no effect on birth outcomes (Salam et al., 2019). Nutrition education and counselling (NEC) commonly provide information on increasing nutrient intake during pregnancy, with an emphasis on protein and micronutrients, use of fortified foods and/or adherence to supplements (WHO, 2016;Girard & Olude, 2012). NEC can increase energy and protein intake, gestational weight gain and birth weight and decrease the risk of anaemia and preterm birth (Girard & Olude, 2012;Ota et al., 2015;Nikièma et al., 2017;Demilew et al., 2020). Gestational weight gain that does not comply with National Academy of Medicine guidelines is associated with poor birth outcomes; small for gestational age births and preterm birth for weight gain below recommended levels and large for gestational age, macrosomia and caesarean delivery for weight gain above recommended levels (Goldstein et al., 2017). NEC can also improve adherence to supplementation; in a study in Nepal, when NEC was included with iron folic-acid supplementation compliance and maternal haemoglobin levels were improved compared to women receiving supplementation only (Adhikari et al., 2009). Separate from NEC, counselling about risks of tobacco and alcohol use during pregnancy (WHO, 2016), including reduced birth weight and size, is also recommended (Abraham et al., 2017;Nykjaer et al., 2014).

In many low-and-middle-income countries (LMIC) including
Nepal, large population-based surveys are used to collect data on intervention coverage including ANC services. The Demographic Health Survey (DHS) Program has conducted over 400 surveys in more than 90 countries since 1984 (The DHS Program, 2021). However, many of the indicators for nutrition-related interventions and counselling during ANC have not yet been validated to determine if maternal report produces accurate data. In the DHS, data on ANC are collected by asking women with a live birth in the last 5 years (starting with DHS8, 3 years) to recall the interventions she received during pregnancy. It is necessary that the current coverage of these interventions is accurately assessed to identify the populations in need, to track progress in improving coverage, and to plan future health programming accordingly. The DHS program revises the questionnaires every 5 years, which offers an opportunity to add, drop or improve questions based on newly generated evidence.
Previous validation studies examining indicators of ANC, labour and delivery, post-natal or care-seeking for child illness have demonstrated that the accuracy of answers obtained by maternal recall is varied Carter et al., 2018;McCarthy et al., 2016McCarthy et al., , 2020 One study in China examined maternal recall of ANC compared to medical records, however, this study did not examine the validity of nutrition services, apart from weight measurement, or NEC, only family planning advice (Liu et al., 2013). Another study examining indicators of ANC in Bangladesh, Cambodia and Kenya found higher validity for observable actions, such as weight measurement than counselling (McCarthy et al., 2020). That study compared direct observation to maternal report at an exit interview immediately following the observation. A study in the same study area in Sarlahi, Nepal, assessed examination of maternal recall of birthweight to identify low birth weight infants. The authors reported low sensitivity for maternal recall of this information (Chang et al., 2018).
This study aimed to assess the validity of the maternal reports of receipt of antenatal nutrition-related interventions and NEC. A secondary aim was to examine any maternal characteristics associated with the accurate recall of the nutrition-related interventions and NEC.

| Study sites and participants
The study population included pregnant women who presented for their first ANC visit at one of five public health posts in two Key messages • The coverage of nutrition-related interventions and counselling was high (>89%) for the majority of indicators.
• Maternal reports resulted in higher coverage estimates than what was observed for all but one indicator.
• The greater the indicator's true coverage estimate, the greater the sensitivity and lower the specificity values. municipalities in Sarlahi district Nepal between December 2018 and November 2019. Sites were within the Nepal Nutrition Intervention Project Sarlahi (NNIPS) study area which is located in Nepal's Province 2. The sites were chosen based on ANC client caseload, accessibility and were limited to two municipalities for approval purposes. This Province has the greatest proportion of women with a low BMI (<18.5 0 kg/m 2 ), short height (<145 cm) and anaemia (<11.0 g/dl for pregnant women, <12.0 g/dl for nonpregnant women) of all seven provinces, indicating a great need for nutrition interventions (Nepal MoH, 2017). The Sarlahi district is located in the Southern Terai region, bordering the Indian state of Bihar. Women were considered eligible if they were married, 15 years old or older, lived in the study area at the time of enrolment and did not plan to leave the study area during the study period. Women who had already received ANC care for this pregnancy received an ultrasound scan during this pregnancy or were planning to leave the study area were deemed ineligible.
The target sample size of 300 women for the validation study was established using the estimated iron folic-acid coverage of 50% from the 2016 DHS, as this was one of the primary indicators of interest (Bryce et al., 2021;Nepal MoH, 2017). This would allow for a 0.13 wide 95% confidence interval around an area under the curve (AUC) estimate of 0.50 . The study aimed to enrol 450 women to account for loss to follow-up, including women who sought ANC elsewhere or did not have a live birth.

| Data collection
Direct observation of ANC visits was used to establish the gold standard for the validation of maternal reports of nutrition-related interventions and counselling . Trained study staff observed all ANC visits the enrolled participants attended at one or more of the five health posts. Before study initiation, the data collector training included didactic instruction and training on the checklist via prerecorded videos of mock ANC visits. Additionally, the data collectors observed real ANC visits using the 28-item checklist, which was then compared to the trainer's record of the visit using the same checklist. Data collectors used a checklist of 28 items to record whether a specific intervention was 'provided' or 'not provided'.
Study staff also administered a short demographic questionnaire at the enrolment and a brief follow-up questionnaire at each subsequent ANC visit. The follow-up questionnaire inquired about careseeking between observed visits. The specific question asked was 'Since the last time NNIPS staff talked to you up until now, have you received advice from any health providers about nutrition for your pregnancy?'. This was done to attempt to improve the gold standard by identifying a subset of participants where all interventions provided were observed by the study team. Post-partum interviews to assess the maternal reports of interventions received were conducted on average 6 months after delivery at the woman's home or her familial home, called her maiti. During this interview, data were also collected on socioeconomic status and pregnancy outcome.
Following the interview, the study staff noted whether there was anyone else (i.e., husband, mother-in-law) present during the interview and whether the study staff felt the individual helped answer the questions 'never, a little or a lot'. Enrolment was completed by November 2019 and direct observation of subsequent ANC visits continued through mid-March 2020 when all nonemergency health services were disrupted by the COVID-19 pandemic. The COVID-19 shutdown delayed a portion of the post-partum interviews, resulting in some recall periods greater than 6 months. There were 26 women that had not yet delivered at the time of the shutdown. Given that only emergency services were still offered at the health posts, we do not believe that we missed any routine ANC. The post-partum interviews resumed on 8 May 2020, but were halted again 5 days later (18 were completed during this time). The post-partum interviews were resumed on 8 August 2020, were completed in November 2020.

| Analysis
The sensitivity (Se) and specificity (Sp) were calculated from 2 × 2 tables comparing the gold standard direct observation during ANC to the post-partum maternal report of interventions provided during ANC with 95% confidence intervals assuming a binomial distribution.
The calculations based on a small number of true positives or true negatives are presented but flagged to interpret with caution. This is because these estimates have a high degree of uncertainty (95% confidence intervals greater than 15 percentage points) ( McCarthy et al., 2020). The area under the operating curve (AUC) is typically used to compare cut-offs for diagnostic tests by plotting the sensitivity against 1-specificity, but in this case, it represents a summary measure of individual-level validity. An AUC equal to 0.50 represents an indicator performing as well as a random guess and an AUC equal to 1 represents perfect validity . Previous validation studies have used different AUC cut-offs to represent high individual-level validity, for example, AUC ≥ 0.60, AUC ≥ 0.67 or AUC ≥ 0.70 (Chang et al., 2018;Liu et al., 2013;McCarthy et al., 2018;Stanton et al., 2013), and for this study, we determined a priori a cut of AUC ≥ 0.70 as high individual-level validity .
The population-level validity of the indicators was assessed by estimating the inflation factor (IF). The IF represents whether the indicator coverage measured by the survey would be over or underestimated in the setting, calculated by dividing the study coverage (Pr) by the true coverage (P). The true coverage is from the gold standard direct observation. The study coverage is calculated using the indicator sensitivity and specificity in the following equation: (Vecchio, 1966). An IF equal to 1.00 indicates that the study coverage generated by the survey question is equal to the true coverage. An IF between 0.75 and 1.25 indicates low population bias. Additionally, for each indicator, the measured coverage values (Pr) were plotted across a range of true coverage (P) values. This is done to illustrate whether the survey measure over or BRYCE ET AL. | 3 of 11 under-estimates the true coverage, given the value of the true coverage.
As a sensitivity analysis, the validation analyses were rerun in the subcohort of women who did not report receiving advice or services from another health provider regarding nutrition for their pregnancy between ANC observations. The follow-up questionnaire did not ask explicitly about what type of advice or about deworming receipt or weight measurement between observations. This sensitivity analysis was run in this smaller cohort because we are more confident that we observed all care, which serves as a truer gold standard.
Bivariable and multivariable log-binomial regressions were run to assess whether there were maternal characteristics associated with accurate responses. The binary variable of 'accuracy' for each indicator was coded as 'accurate' for a maternal response in agreement Ethnic group was considered, but all but one enrolled woman was Madeshi so this was dropped due to lack of variation.
The overall validation study sample size was an estimated 300 pregnant women, assuming a 50% prevalence for iron-folic acid supplementation, to establish a 0.13 wide 95% confidence interval for an AUC equal to 0.50. To allow for loss-to-follow up, visits for ANC care that were not observed, and adverse birth outcomes, the study aimed to enrol 450 women total. All analyses were conducted using Stata Version 14.

| RESULTS
Of the 441 women enrolled in the study, 434 women completed the post-partum interview. Among these, 168 (38.7%) reported receiving nutrition-related counselling at some point during pregnancy outside of the five study clinics (Figure 1). Table 1 presents characteristics of enroled women by those who did and did not report receiving care between visits. Nearly 60% of participants reported zero years of education and one-third of women reported zero prior live births. Compared to those who did not, women who received nutrition-related counselling between visits attended more observed ANC visits, enrolled earlier in pregnancy, had some formal education, and were more likely to have a prior live birth.
The observed and reported coverage of the indicators of interest is presented in Table 2. For all but the two weight-related indicators, the coverage measured by the maternal report was greater than the The sensitivity analysis among pregnant women who reported never receiving nutrition advice between ANC observations is presented in Table S1. All women reported receiving information about a diverse diet, so a 2 × 2 table could not be constructed for this indicator. The restriction improved specificity in some cases, for example for the deworming indicator specificity, increased from 30.7% to 41.8%. However, for the majority of indicators changes to the summary level AUC and IF values did not change by more than a percentage point.
None of the maternal characteristics was associated with the accuracy of reporting for any indicator (Table 4). Maternal age, education and having a previous live birth did not have a consistent direction of association with accuracy, nor were the magnitudes of association large or statistically significant. Compared to the lowest quartile, the higher quartiles were associated with slightly more accuracy for the 10 indicators, although this association was not   Within the counselling topics examined in our study, the sensitivity of the nutrition-specific counselling was higher than that of nausea management or substance use, indicating that women who were observed receiving nutrition-related counselling also recalled receiving this information, likely indicating a successful information transfer. The nutrition counselling was often communicated with a visual aid, for example, a flip chart, which may have resulted in more effective communication than nausea or substance use counselling (Odackal et al., 2020).
The indicators with higher true coverage values were found to have had high sensitivity and lower specificity in our study population. The authors of another study in the same area found a similar relationship, concluding that the high coverage of service may result in a woman reporting receipt because she assumed she should have received it, rather than actually recalling its receipt (Carter et al., 2021).
The validity of maternal recall of weight measurement has been examined in two prior studies. The study in rural China, which compared maternal reports 2-5 years after delivery to paper and electronic-based health records, also had a very high prevalence of weight measurement during pregnancy (98%) and reported findings nearly identical to ours; sensitivity equal to 0.98, a specificity of 0.0 and an AUC equal to 0.49 (Liu et al., 2013). The similarity between the two sets of findings is likely driven by the near-complete coverage in both populations. With almost complete coverage, the specificity is based on so few observations and a small number of false positives can drive it downward. The analysis of a three-country survey reported better individual-level validity for weight measurement than in our study (McCarthy et al., 2020). However, these results compared maternal reports at exit interviews to direct observation, whereas the recall period in our study was approximately 6-months post-partum and on average 10 months following a woman's last ANC visit. The three-country survey assessed recall of advice on diet and nutrition in a Bangladeshi population, where women were able to more accurately recall immediately following a visit compared to our longer post-partum recall period (McCarthy et al., 2020). However, we did not find an association between the length of the recall period and accuracy.
The three-country validation study that compared direct observation of antenatal and post-natal care services to exit interviews reported that women were better able to recall concrete interventions, such as blood pressure measurements, than topics discussed in counselling (McCarthy et al., 2020). For the indicators with adequate confidence intervals assessed in our study, comparing the concrete intervention, received deworming medication, to the counselling topics, management of nausea and substance use, the differences are not as apparent. The sensitivity of receipt of deworming medication is greater than the counselling topic indicators, but its lower specificity results in an AUC nearly equal to that of both substance use indicators. The higher sensitivity could be due to a number of factors.
Women may be able to recall the concrete indicator better than the counselling, as McCarthy et al. posited. The true coverage of deworming was greater than that of the counselling topics, which could lead women to more frequently report its receipt as described earlier (Carter et al., 2021). This hypothesis would also help explain the lower specificity of the deworming medication indicator, where women report receiving the indicator because they presume to have received it instead of actually recalling its receipt.
After marriage in Nepal, women typically move in with their husband's families where the mothers-in-law preside over many of the household decisions. For first pregnancies, however, some women return to their familial home or maiti. Previous studies in Nepal have demonstrated that mothers-in-law have particular influence over antenatal and perinatal care decisions (Masvie, 2006;Simkhada et al., 2010). Anecdotally, many of the women in our population attended ANC with their mother or mothers-in-law, which could have had an impact on the pregnant woman's intake of information and resulted in lower recall accuracy. It is possible that the mothers-inlaw engaged with the provider during the counselling session more than the pregnant woman herself, which could reduce the pregnant woman's ability to recall receiving this information post-partum.
Unfortunately, we did not record who else was present at the ANC visits. We did measure whether the woman received help answering the questions at the post-partum interview, though when included in the model (data not shown) this factor had no association with accuracy or impact on the other included variables. Additionally, whether the mother-in-law helped answer questions during the postpartum interview does not necessarily reflect their presence or role at the ANC visit. For future research, it would be interesting to examine whether the presence of specific family members during ANC has an impact on the accuracy of maternal reports.
A sensitivity analysis that was limited to women who never reported receiving nutrition-related advice elsewhere than where the study observed them did not improve validity for the majority of indicators. However, even with this restriction, we may not have produced a true gold standard; we asked about receiving advice from a health care provider, but this does not limit the inclusion of advice from peers, mothers or mothers-in-law. Therefore, it is possible that specificity was not improved because although the women in the subset did not receive advice from health providers, they may be recalling information received from other sources that we did not capture in our data collection. Furthermore, we were unable to restrict the analysis to receipt of the other nutrition services like deworming and weight measurement between visits because these were not included in the follow-up questionnaire.
In the study population, nearly all women received counselling on nutrition (diversity of diet and increasing intake) at some point during pregnancy. However, the true coverage of counselling on nausea management and not drinking or smoking was much lower. This may be because counselling on these topics is not standard for all women, rather just women who complain of nausea or who report drinking or smoking. The low specificity for these indicators could also be explained by the fact that women could have picked up this information from peers, mothers or mothers-in-law, or it may just be considered general knowledge. The proportion of women who reported receiving these counselling messages was 20-30 absolute percentage points higher than the observed proportion. Therefore, although the question in the survey asked about receiving this information at our health posts specifically, the women may inherently know the information or have heard it elsewhere and thus reported receiving it during ANC.
There were no maternal characteristics that had a statistically significant relationship with accurate maternal reports. This finding is consistent with some studies McCarthy et al., 2018) but another examining accuracy of recall of low birth weight at the same study site found that higher education and parity were associated with the accurate maternal recall (Chang et al., 2018). The number of months that had passed since the last ANC observation also had no association with maternal accuracy, which was the same finding in the low birth weight study in Nepal (Chang et al., 2018). Our range of recall time is much shorter than the three-year range of the DHS and, therefore, may not be extrapolated to the longer recall period of the DHS. However, a 6-month follow-up period was feasible resource-wise and much more reflective of a household survey than other studies that use the exit interview to collect maternal recall.
A strength of this study is that we employed direct observation by a trained study observer as the gold standard. The study observers went through detailed training and had to achieve a standard for intra-and interobserver reliability before being approved for the work. A second strength was the length of the recall period, which was more similar to the DHS and MICS recall periods of 2-5 years than other studies that have compared maternal reports at an exit interview. A limitation of the study was that it only captured women attending ANC at government health posts, so the findings may not be generalisable to women who attended private health facilities only or who did not attend ANC. Although we did our best to account for care-seeking outside of the study observations, a final limitation is that we were unable to observe care at every possible source in the community over the entire pregnancy.
Large household surveys like the DHS are the main source of coverage data, but these findings suggest that accuracy data produced by maternal recall in these surveys may be variable. The efforts to strengthen electronic health records and information systems in these settings could offer an alternate measurement method, however, they do not currently capture counselling provision during ANC.
Updating these systems to include counselling measurement should be a component in future efforts to strengthen national health systems. Routine health data could be used in conjunction with data generated by household surveys to best inform future counselling coverage measurement.

| CONCLUSION
This study adds to the growing evidence base demonstrating that there is variability in how accurately a woman can recall services received during ANC. The measurement of the 10 indicators by the maternal report had low to moderate individual-level validity and low to high population-level bias. The high coverage of five of the 10 indicators limited our certainty surrounding these estimates and they should be examined in additional settings across a range of true coverage.