National routine data for low birthweight and preterm births: Systematic data quality assessment for United Nations member states (2000–2020)

Low birthweight (<2500 g) and preterm birth (<37 weeks) are markers of newborn vulnerability. To facilitate informed decisions about investments in prevention and care, it is imperative to enhance data quality and use. Hence, the objective of this study is to systematically assess the quality of data concerning low birthweight and preterm births within routine administrative data sources.


| I N TRODUC TION
The majority of the world's neonatal deaths are in newborns with low birthweight, most of whom are preterm. 1,2In 2020, an estimated 19.8 million (95% CI: 18.3-21.6million) newborns were born with a low birthweight, and 13.4 million (95% CI: 12.3-15.2million) were born preterm. 3,4With commitment to reduce newborn morbidity and mortality, as evidenced by Sustainable Development Goal (SDG) 3.2 and Every Newborn Action Plan national target of 12 or fewer neonatal deaths per 1000 live births by 2030, countries are prioritising the quality of care for babies born with low birthweight and preterm.This focus on improving care for these vulnerable newborns is now at the forefront of national health agendas. 5,6o achieve these targets, availability of high-quality data is essential at all levels of the health system.At the individual level, detailed information is essential for effective clinical management and for assessing which services are meeting the demands and requirements for providing life-saving care of low birthweight newborns. 7At the district level, the presence of reliable and accurate health information empowers health planners and managers to make informed decisions regarding the efficient operation of health facilities and of the overall healthcare system.At national level, health information serves as the foundation for appropriate resource allocation, informs policy interventions, and facilitates the evaluation of healthcare initiatives and programmes.Similarly, global and regional level estimates play a crucial role in tracking and monitoring progress and making comparisons between countries and across time periods.The ability to assess key aspects of data quality, completeness, accuracy and timeliness, is vital for generating comparable estimates and continually improving the quality data over time.
Low birthweight and preterm data in routine information systems are typically generated from records of live births in public and private health facilities including hospitals and other health facilities.Most high-income and upper middle-income countries collect data that can be analysed at an individual level. 8However, the majority of low-and middle-income countries (LMICS) rely on data systems where individual level data are aggregated at a facility level and only the summary measures are reported up through the data system to the district, subnational and national level.These aggregate data are typically available on a monthly or quarterly basis.Civil registration and vital statistics (CRVS) age including with ultrasound assessment is becoming increasingly attainable.Moving toward the collection of individual level data would enable monitoring of quality of care and longer-term outcomes.This is crucial for every child and family and essential for measuring progress towards relevant sustainable development goals.The assessment will inform countries' actions for data quality improvement at national level and use of data for impact.

K E Y W O R D S
data quality, data use, low birthweight, newborn, preterm birth, routine data system

What was known?
Low birthweight and preterm birth are common, respectively affecting an estimated 19.8 and 13.4 million births in 2020, with increased mortality risks particularly in the neonatal period, and impacting human capital through the whole life course.Despite 80% of births worldwide now being in health facilities, there are missed opportunities in routinely collecting and reporting of data for birthweight and gestational age.

What was done that is new?
National routine data on low birthweight and preterm birth for 195 United Nations member states from 2000 to 2020, were systematically collated into a combined database totalling >700 million live births.The WHO data quality framework was adapted and used to undertake standardised data quality assessments.

What was found?
National routine data for low birthweight and preterm birth increased from 2000 to 2020.Nearly two-thirds of member states (124/195) have low birthweight data available in the latest year, compared with 40% (82/195) for preterm birth.Low birthweight data reporting increased consistently over time in most regions; however, some gaps remain, especially in Southern Asia.There was less availability of preterm birth data, and reporting was more variable compared with low birthweight, especially in lowand middle-income settings.Countries with high quality data showed small differences in low birthweight rates between routine and adjusted survey low birthweight rates (external consistency).Countries in sub-Saharan Africa and South Asia have the biggest gaps for both data coverage and quality.For countries are another important source of population-level information on births.Although many countries are investing in these systems, coverage of birth registration remain incomplete in many LMICS, 9,10 only some capture information on birthweight and fewer include gestational age.
In the past the primary source of population-level birth outcome data in LMICs was standardised periodic household surveys such as the Demographic and Health Surveys (DHS) typically conducted every 3-5 years. 11Although most surveys collect information on birthweight, missing or inaccurate birthweight information, for example due to few home births being weighed and measurement errors including heaping or poor recall, remains a challenge necessitating pre-modelling adjustments to survey estimates of low birthweight to create 'adjusted' survey estimates for inclusion in global low birthweight models.In addition, due to concerns about the accuracy of reporting gestational age, few surveys collect information related to preterm birth. 3,12,13In view of these challenges, costs associated with population-based surveys and time-lag of data to inform action, routine health data systems are the most promising way to close data gaps on low birthweight and preterm births in all settings, including LMICs.However, data quality concerns about routine health system data remain, including omission of births around the thresholds of viability, missing or inaccurate information on birthweight or gestational age and, less commonly, failure to use standard international definitions. 14n terms of addressing issues with data quality in routine health data, WHO has produced a Data Quality Assurance Framework which sets out a series of assessment tools and a consistent approach aimed at supporting countries in assessing and improving the quality of their routine data to improve its utility. 15,16his study uses the most recent global database of national routine administrative data on low birthweight and preterm birth for 195 countries and areas.The data cover the period from 2000 to 2020 and were compiled as part of the process for producing updated estimates for these outcomes at national, regional and global levels. 3,4In this paper we aim to assess availability and quality of aggregate national routine data for 195 UN member states, specifically concerning low birthweight and preterm births.This assessment serves as important input for estimating levels and trends as well as in guiding efforts to improve the quality of data.

| Data source
This study uses a dataset compiled as part of the efforts to update the low birthweight and preterm rates for 195 countries under UN members.The methodology used for collation of this dataset has been described in previous publications in detail. 3,4Briefly, a systematic search was conducted, including Ministry of Health and National Statistical Office publications, as well as datasets from WHO Member States with over 80% of births occuring in health facilities.This search aimed to identify data related to low birthweight and preterm births from routine administrative data systems including CRVS, Health Management Information Systems (HMIS) and Medical Birth Registries.For countries where such data not publicly available, national point persons were contacted by UN partners to identify additional data.This data quality analysis considers all routine administrative data collated during the creation of the UN's estimates for low birthweight and preterm births in, regardless of whether the data met the inclusion criteria for the modelled estimates.Countries were categorised based on their Sustainable Development Goal regions, with a modification that combined Europe and Northern America region with Australia and New Zealand. 17

| Dimension of data quality and analysis
The WHO data quality review was developed to support routine, annual and periodic independent desk-review assessments of facility-reported data. 17,18It includes four dimensions of quality: completeness and timeliness, internal consistency, external consistency and external comparison of population data.While many of the data collection systems used to generate routine low birthweight and preterm already reporting low birthweight and/or preterm rates, data quality gaps could be rapidly closed by improving key metadata information including definitions, gestational age sub-groups (e.g.<28 weeks) and availability in the public domain.

What next?
Missed opportunities for increased coverage of low birthweight data: Routine health information systems (RHIS) such as DHIS-2 are in place in over 70 countries in Southern Asia and sub-Saharan Africa, with investments in improving coverage and quality of data.Data quality assessment of routine data is crucial and the data quality criteria we adapted from the WHO data quality framework could be used for national data improvement efforts and to inform future estimates.DHIS-2 in many countries includes low birthweight, hence increases in low birthweight data coverage and quality are possible now in these two regions.Preterm data are less commonly included in DHIS-2, therefore more work is needed on gestational age assessment and incorporating these data into DHIS-2.Missed opportunities could be assessed by looking at time series of coverage gaps and quality gaps.Further studies are needed to explore drivers of change within countries that have achieved progress in their routine data systems to glean lessons for other countries.births data use facility-based data, our database includes only aggregate national data, hence some of the dimensions were adapted to align with the requirements of our dataset (Table S3).In this study, we have applied these adapted data quality dimensions and conducted descriptive analyses in accordance with these four quality dimensions using R, STATA and EXCEL.

| Availability and coverage of reported data (Dimension 1)
We assessed and summarised availability and coverage of reported data on low birthweight and preterm birth to inform estimation of trends in these outcomes.Population coverage was estimated for each country using the United Nations World Population Prospects live birth estimates. 18The geographical distribution of low birthweight and preterm birth from 2000 to 2020 was displayed visually by creating colourcoded maps.In these maps, the lightest colour was assigned to countries with 1-5 years of data within this time period, while the darkest colour was was used for countries with 16-20 years of data.

| Quality of reporting of low birthweight and preterm data (Dimension 2)
We assessed the quality of the reporting of low birthweight and preterm data for all country-years from 2000 to 2020 according to five indicators: Low birthweight/preterm rate data: the percentage of country-years with reported low birthweight or preterm data; Sub-group (<1000 g, or <28 weeks): the percentage of country-years reporting number of live births around the threshold of viability (<1000 g, or <28 weeks); Number of live births with missing birthweight or GA: the percentage of country-years reporting information on number of live births with missing birthweight/ or GA; National data source in the public domain: the percentage of country-years with national low birthweight or preterm data online in the public; Definitions of low birthweight or preterm: the percentage of country-years reporting the definition used for low birthweight and preterm birth.For each country the quality indicators were categorised as ≥75%, 50% to <75%, <50% of country-years or no data available and plotted as heatmaps.

| Internal and external consistency of reported data (Dimensions 3 and 4)
Internal consistency was assessed by visually examining trends in low birthweight and preterm births over time in country plots.Additionally, we assessed the distribution of the low birthweight and preterm birth rates by region across four different time periods (2000-2004, 2005-2009,   2010-2014, 2015-2019).External consistency was assessed by comparing low birthweight rates obtained from national routine data with adjusted low birthweight rates derived from nationally representative household surveys for country-years where data from both sources were available. 19or the purpose of categorising countries according to data quality for routine administrative sources, we adopted three data quality categories which were developed for the most recent low birthweight estimates (Table S2). 3 High quality (category A) included countries with civil registration and vital statistics or medical birth registry data with very high recorded birthweight coverage and facility birth (≥90%) and low evidence of omission of births around threshold of viability.Medium quality (category B) included countries with civil registration and vital statistics or medical birth registry data not meeting high quality criteria.Low quality (category C) countries included those reporting data from aggregate data systems, e.g.'Routine health information systems (RHIS) (e.g.DHIS2) or 'Other, hospital-based systems' or reporting low birthweight rate only with no other information to assess quality.To visualise the consistency between data from these different sources, we plotted the mean percentage difference in low birthweight rate from the two sources was plotted by year for each data quality category.As information on national preterm birth rates is not routinely collected in standard household surveys, this analysis was limited to low birthweight.

| R E SU LTS
Data on low birthweight was available for 125 countries, and preterm birth data was available for two-thirds of these countries (79 countries).Over the period from 2000 to 2020, a total of 719.3 million live births were reported, 56.5 million classified as low birthweight and 33.5 million as preterm.However, the availability of information on low birthweight status for the most recent year group (2015-2020) accounted for only 37% of estimated UN live births globally and information on preterm birth was available for just 18% of live births in the same period (Table 1).
In all regions, more countries had data on low birthweight than on preterm birth.The region with the highest data capture for low birthweight and preterm birth in routine data systems was the North American, Australia, New Zealand and European region, where nearly 95% of live births in the most recent reporting year group (2015-2020) had information on low birthweight status, and three-quarters had information on preterm birth (Table 1).In contrast, in sub-Saharan Africa, for only 13% of live births was information on low birthweight reported in routine systems during 2015-2020, and only 8% information on preterm birth.In Southern Asia <1% of births had reported information on preterm birth.Most countries in North America, Australasia and Europe had over 10 years of data available for both low birthweight and preterm birth (Figure 1A,B).Similarly, many countries in Latin America and the Caribbean had high availability of  time series data.On the other hand, availability of time series data was generally low in the other regions where only a few countries had data spanning the entire period from 2000 to 2020, and even fewer had data available for preterm birth (Table 1, Figure S2).
Reporting across all five indicators of the quality of low birthweight and preterm data reporting was generally higher in countries in North America, Australasia and Europe than in other regions.However, large gaps remain with information reported on births around the thresholds of viability (<1000 g and <28 weeks) in 38% and 25% of country-years for low birthweight and preterm birth, respectively (Figure 2A,B).Information on those with missing birthweight or gestational age was reported for <30% country-years.Metadata including the definition of low birthweight or preterm were reported for <25% country years (Figure 2A,B).These reporting gaps were greater in all other regions, with substantial gaps especially for Sub-Saharan Africa and Southern Asia regions across all indicators.
At regional level, the most internal consistency trends in both low birthweight and preterm birth rates are observed in countries within North American, Australia, New Zealand and European and Latin American and the Caribbean regions (Figure 3A,B).The greatest fluctuations in reported low birthweight and preterm birth rates were notably found in Southern Asia followed by sub-Saharan Africa.These fluctuations may have resulted from a combination of the limited data availability from these regions, with different countries contributing data from different time periods, and the varying quality of data.Some countries such as Chile, Colombia, Paraguay and Republic of Korea, exhibit increasing rates of both low birthweight and preterm births over time (Table S2).Notably, Brazil experienced a sharp increase in preterm births from 2011, whereas the USA saw a substantial decline from 2007.It is not clear whether this may attributable to change in their data-capturing system or a true epidemiological difference (Table S2).
For countries with both routine administrative and adjusted survey data, overall, the two low birthweight estimates were very similar for countries within the highest data quality category (Figure 4).Reported low birthweight rates F I G U R E 4 Differences between routine administrative low birthweight and adjusted survey low birthweight rates by data quality category.The y-axis represents the mean percentage difference in low birthweight rates, which can range from −30% to +30%.The horizontal line at 0 serves as reference line.The x-axis represents the time period between 2000 and 2020, which shows trends of the mean percentage difference in low birthweight rates over time.The graph categorises countries data into three groups: Group A (high quality routine data and survey), Group B (medium quality routine data and survey), and Group C (low quality routine data and survey).These categorisations are based on data quality criteria for UN 2020 LBW estimates. 3For each group, the graph displays the mean percentage difference in low birthweight rates over the specified time period.For each data quality classification group (A, B and C), the graph illustrates how the low birthweight rate calculated from the survey data compares with the rate from routine data over the years from 2000 to 2020.When the bars are consistently below the reference line (0) for a specific group and time period, this suggests that survey data consistently report higher low birthweight rates than the routine data for that specific data quality group and time period.Similarly, when the bars are consistently above the reference line, this implies that the routine data consistently report higher rates than the survey data for that specific data quality group and time period.LBW rate from routine data is higher LBW rate from survey is higher in routine administrative data were overall slightly lower than the adjusted survey estimates for countries in the medium data quality category, and substantially lower for those with the lowest quality routine data.

| DISCUS SION
Low birthweight and preterm birth affect every country, but the nations facing the highest burdens often struggle with the lowest levels of data availability and quality.Our analysis has shed light on significant data gaps and measurement challenges related to both outcomes, while also revealing opportunities for improvement.To address these issues, we proposed standardised data quality dimensions adapted from the widely recognised WHO data quality framework, allowing comparison over time and between countries.
As a result of improvements in national routine health information systems worldwide, including the adoption of DHIS-2 and increasing rates of facility births, the availability of low birthweight data is improving in all regions.However, it is noteworthy that fewer than half of all countries in sub-Saharan Africa, Southern and South-Eastern Asia currently possess national level low birthweight data.As facility births continue to increase in these regions, it has become imperative to invest in ensuring that every birth is accurately weighed and recorded in the routine system, to close the remaining coverage gaps for low birthweight data.
We found substantial data availability gaps between low birthweight and preterm data, with only 78 countries with preterm data versus 118 countries with low birthweight data in the most recent year group (2015-2020).This gap was again most notable in sub-Saharan Africa and Southern Asia.Increasing preterm birth data coverage to the same level as low birthweight data, will require understanding and resolving factors associated with missed preterm data opportunities.For example, it is plausible that preterm data exist in unpublished government reports and internal documents but are not published due to lack of perceived value (e.g.not having a direct national or global target) or concerns regarding data quality.Publication of these data in the public domain and in a form that is accessible to researchers, clinicians and the civilian population could enable assessment of these data and their potential inclusion in local and national policy and planning for newborn health and future global comparisons.Along with this, in settings where routine information systems remain incomplete, nationally representative surveys and Health Demographic and Surveillance Sites (HDSS) could play an interim role as a source of gestational age data, although the reliability of these data sources will be dependent on accurate gestational age assessment within healthcare settings and effective communication of these results to women. 13ur study shows that there are several gaps in reporting metadata (dimension 2), with only few countries providing information on low birthweight/preterm definitions, missing values or offering URL links.This absence of comprehensive metadata poses challenges when assessing trends, especially when data originate from different sources and measurements.Such discrepancies could result in loss of comparability both between countries and over time.Improving the quality of reporting requires the application of data management standards such as FAIR principles.The FAIR Data Principles outline essential steps to ensure that all data and metadata are Findable, Accessible, Interoperable and Reusable. 20,21These principles help ensure that data assets include all necessary supplemental details for both humans and machines to identify, qualify and utilise the data effectively.Furthermore, the use of standardised templates and formats as a basis for data collection promotes data consistency, as well as interoperability with other perinatal data reporting systems such as those for stillbirths.This boosts the value of existing data resources and enables the comparison and potential integration of data from different sources.Ultimately, this approach enables the generation of new information, including future cycles of low birthweight/preterm birth estimates.
The WHO data quality framework recommends the evaluation of internal consistency.Given that we had access only to aggregated data, we sought to assess internal consistency by comparing reported low birthweight and preterm rates at both country and region levels, over time.The result of this analysis showed that internal consistency was observed for both low birthweight and preterm rates within high-income country regions.However, in stark contrast, the assessment indicated poor internal consistency in the case of Southern Asia and sub-Saharan Africa regions.
Results from our external comparability analysis showed variations in the reporting of low birthweight rates when comparing routine data with survey data.We observed that lower rates of over-reporting of low birthweight were evident in routine administrative data compared with survey data, particularly in the group characterised by high-quality routine administrative data.Similarly, higher levels over-reporting were observed in the group with the lowest quality routine administrative data.Countries classified under the high-quality category (category A), which also exhibit instances of data heaping, might have low birthweight rates that are underestimated.Equally, countries falling within medium quality category (category B) with heaping rates higher than average may still have underestimated low birthweight rates.Consequently, modelled low birthweight estimates for Category B countries may substantially diverge from adjusted survey estimates.
Using this big dataset and the standard data quality framework from WHO, it was possible to assess the quality of data from national routine systems at global level.However, as this study was based on aggregate data, we were not able to assess other more specific measures of data quality such as heaping.Future research using individual-level data could provide more insights into data quality, and lead to further solutions to tackle remaining challenges and improve health data systems for all births.
The findings from this study could form the basis for development of a data quality score for each country-year for low birthweight and preterm birth data, which could be used to inform statistical analyses (e.g. as a model input), data weights or sensitivity analyses.The assessments can also be used to help interpret results, to understand and quantify biases and discuss potential limitations of national, regional and global low birthweight and preterm estimates. 3,4However, the benefits of improved data quality and coverage reach far beyond the realm of global comparisons.At facility level, the improvement of the quality of care during childbirth depends on improving how health providers effectively utilise real time data, particularly birthweight and gestational age information, to identify at-risk newborns and provide better services directly at the point-ofcare.Insufficient access to these critical data often hampers health workers' ability to make well-informed clinical care decisions, resulting in inadequately delivered services. 22,23Collecting data on birthweight and gestational age on every birth and generating individual level data that are inter-operable with routine health information systems such as DHIS-2 could help drive more data use at the point of care and improve data quality at the ward level.Regular data collection efforts also contribute to the enhancement of the low birthweight and preterm birth data, facilitating the flow of accurate information up the healthcare system to the district and national levels. 24This, in turn, fosters action on policy, programmes and investments aimed at better serving these vulnerable newborns.

| CONCLUSION
The increasing number of births in healthcare facilities and the expansion of electronic health information systems, can provide an opportunity for capturing data on low birthweight and gestational age.Nevertheless, in regions such as sub-Saharan Africa and South Asia, where the burdens of low birthweight and preterm birth are the highest, significant data gaps persist.To accelerate progress in reducing these outcomes, investments in data improvement within these high-burden regions are needed.Furthermore, conducting further research to improve data systems, particularly concerning the capture of gestational age, is essential.Ultimately, the insights gained from these assessments will inform the national-level actions to enhance data quality and maximise the effective use of data for impactful outcomes.

AU T HOR C ON T R I BU T ION S
All authors contributed to the design of the study protocols for collation and data quality assessment of low birthweight and preterm birth data for low birthweight and preterm birth estimates.YO, HB, EO and JEL adapted the WHO data quality for the purposes of this study with inputs from JY, LS and JR.The analysis was led by YO with contribution from EB and statistical and epidemiological oversight from HB, EO and JEL.The paper was drafted by YO and HB with JEL.All authors reviewed and helped to revise the paper.All authors reviewed and agreed to the final version.

AC K NO W L E D GE M E N T S
Firstly, and most importantly, we thank the women and families whose data were included in the national datasets.Many thanks to Claudia DaSilva and all the relevant administrative staff for their support.

F U N DI NG I N FOR M AT ION
The Vulnerable Newborn Measurement Collaboration was funded by the Children's Investment Fund Foundation through grants awarded to the London School of Hygiene & Tropical Medicine (1803-02535) with sub-awards, and to the Johns Hopkins Bloomberg School of Public Health.We thank all relevant national governments and other funders for their investments enabling the input data.

C ON F L IC T OF I N T E R E S T S TAT E M E N T
None declared.

DATA AVA I L A BI L I T Y S TAT E M E N T
The data which support this study will be openly available at the time of publication at LSHTM data Compass [https:// doi.org/ 10. 17037/ DATA.00003095], reference number [reference number].

T A B L E 1
Population coverage of low birthweight and preterm routine administrative data by region, 2000-2020.

F I G U R E 2
Availability of five key metadata reporting indicators for (A) low birthweight and (B) preterm births by region (2000-2020).BW, birthweight; GA, gestational age.preterm rate % of country-years preterm information available for U R E 3 (A) Internal consistency of low birthweight data by region over time.The x-axis (Time period) represents the time periods from 2000 to 2020, grouped into 5-year intervals.The y-axis (Preterm rate) represents the low birthweight rate.Each box plot, the interquartile range (IQR) and median, provides a visual representation of the low birthweight rate distribution within specific region and time periods.Consistency in the median low birthweight rate across the time periods implies stable low birthweight rates, evident in regions such as Eastern Asia, Southern Eastern Asia and Oceania; North America, Australia and NZ, Central Asia and Europe; Latin America and the Caribbean; and Western Asia and Northern Africa regions.In contrast, low birthweight rates in the sub-Saharan Africa and Southern Asia show variation, suggesting inconsistency across the time period.(B) Internal consistency of preterm rate by region over time.The x-axis represents the time periods from 2000 to 2020, grouped into 5-year intervals.The y-axis represents the preterm rate.Each box plot, the interquartile range (IQR) and median, provides a visual representation of the preterm rate distribution within specific region and time periods.Consistency in the median preterm rate across the time periods implies stable preterm rates, evident in regions such as North America, Australia and NZ, Central Asia and Europe, and Latin America and the Caribbean.In contrast, preterm rates in the sub-Saharan Africa and Southern Asia regions show variation, suggesting inconsistency across the time period.In addition, the absence of box plot for the 2000-2004 time period signals a lack of available preterm data during those specific years, emphasising the limitations in data collection or reporting for that period.*Excluding Australia and New Zealand.